Adaptive Neural Networks for Efficient Inference

Tolga Bolukbasi,Joseph Wang,O. Dekel,Venkatesh Saligrama

Published 2017 in International Conference on Machine Learning

ABSTRACT

We present an approach to adaptively utilize deep neural networks in order to reduce the evaluation time on new examples without loss of accuracy. Rather than attempting to redesign or approximate existing networks, we propose two schemes that adaptively utilize networks. We first pose an adaptive network evaluation scheme, where we learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example. We show that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples. We pose a global objective for learning an adaptive early exit or network selection policy and solve it by reducing the policy learning problem to a layer-by-layer weighted binary classification problem. Empirically, these approaches yield dramatic reductions in computational cost, with up to a 2.8x speedup on state-of-the-art networks from the ImageNet image recognition challenge with minimal (<1%) loss of top5 accuracy.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Machine Learning
Publication date
2017-02-25
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1702.07811
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The cascading neural network: building the Internet of Smart Things
2017cited by this paper
Changing Model Behavior at Test-Time Using Reinforcement Learning
2017cited by this paper
The cascading neural network: building the Internet of Smart Things
2017cited by this paper
Pruning Random Forests for Prediction on a Budget
2016cited by this paper
Learning Structured Sparsity in Deep Neural Networks
2016cited by this paper
LCNN: Lookup-Based Convolutional Neural Network
2016cited by this paper
Spatially Adaptive Computation Time for Residual Networks
2016cited by this paper
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
2016cited by this paper
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
2016cited by this paper
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size
2016cited by this paper
Binarized Neural Networks
2016cited by this paper
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2015cited by this paper
PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
2015cited by this paper
Efficient Learning by Directed Acyclic Graph For Resource Constrained Prediction
2015influential reference
Distilling the Knowledge in a Neural Network
2015cited by this paper
Quantized Convolutional Neural Networks for Mobile Devices
2015cited by this paper
Deep Residual Learning for Image Recognition
2015influential reference
Conditional Computation in Neural Networks for faster models
2015cited by this paper
Compressing Neural Networks with the Hashing Trick
2015cited by this paper
Sparse Convolutional Neural Networks
2015influential reference
Compressing Deep Convolutional Networks using Vector Quantization
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Model Selection by Linear Programming
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Feature-Cost Sensitive Learning with Submodular Trees of Classifiers
2014cited by this paper
Supervised Sequential Classification Under Budget Constraints
2013cited by this paper
Local Supervised Learning through Space Partitioning
2012cited by this paper
Cost-Sensitive Tree of Classifiers
2012cited by this paper
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Model compression
2006cited by this paper

CITED BY

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States
2026cites this paper
Bio-inspired adaptive neurons for dynamic weighting in Artificial Neural Networks
2026cites this paper
Time-sensitive data analytics: A survey of anytime techniques, applications and challenges
2026cites this paper
When Models Know When They Do Not Know: Calibration, Cascading, and Cleaning
2026cites this paper
PEER: Towards reliable and efficient inference via Patience-Based Early Exiting with Rejection.
2026cites this paper
AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth
2026cites this paper
Benchmarking the speed–accuracy tradeoff in object recognition by humans and neural networks
2025cites this paper
Frugal AI: Introduction, Concepts, Development and Open Questions
2025cites this paper
Can LLMs Improve Sanctions Screening in the Financial System? Evidence from a Fuzzy Matching Assessment
2025cites this paper
DPNet: Dynamic Pooling Network for Accurate and Efficient Size-Aware Tiny Object Detection
2025cites this paper
Downsized and Compromised?: Assessing the Faithfulness of Model Compression
2025cites this paper
How Do LLMs Use Their Depth?
2025cites this paper
Early-Exit DNN Inference on HMPSoCs
2025cites this paper
Adaptive Spiking with Plasticity for Energy Aware Neuromorphic Systems
2025cites this paper
Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks
2025cites this paper
Assessing bias and computational efficiency in vision transformers using early exits
2025cites this paper
Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving
2025cites this paper
Neural network task specialization via domain constraining
2025cites this paper
Adaptive Low Light Enhancement via Joint Global-Local Illumination Adjustment
2025cites this paper
RA-MOSAIC: Resource Adaptive Edge AI Optimization over Spatially Multiplexed Video Streams
2025cites this paper
Mamba base PKD for efficient knowledge compression
2025cites this paper
Singular Value Decomposition-based lightweight LSTM for time series forecasting
2025cites this paper
UniPCGC: Towards Practical Point Cloud Geometry Compression via an Efficient Unified Approach
2025cites this paper
Toward Unified Expertise: Learning a Single Vision Model From Diverse Perception
2025cites this paper
Computational fairness in adaptive neural networks
2025cites this paper
IHPE: A Lightweight Hand Pose Estimation Network for Surgical Motion Capture in Medical Education
2025cites this paper
SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
2025cites this paper
Dynamic Slimmable Networks for Efficient Speech Separation
2025cites this paper
Early Exit Based on Deep Learning Model for Polyp Colonoscopy Image Classification
2025cites this paper
DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation
2025cites this paper
Dataset Pruning Using Early Exit Networks
2025cites this paper
AEBNAS: Strengthening Exit Branches in Early-Exit Networks through Hardware-Aware Neural Architecture Search
2025cites this paper
A Time-Domain Audio Separation Network with Frequency-Domain Attention and Adaptive Convolution for Electroencephalogram Denoising
2025cites this paper
Enhancing computational efficiency in digital twins: a survey of techniques and challenges for fast inference
2025cites this paper
RCENet: Recursive Concatenation and Enhancement Network for Real-Time Super-Resolution
2025cites this paper
ECC-SNN: Cost-Effective Edge-Cloud Collaboration for Spiking Neural Networks
2025cites this paper
Continuous Thought Machines
2025cites this paper
Position-Aware Depth Decay Decoding (D3): Boosting Large Language Model Inference Efficiency
2025cites this paper
Dynamic Neural Network Structure: A Review for its Theories and Applications
2025cites this paper
Online deep learning’s role in conquering the challenges of streaming data: a survey
2025cites this paper
Gatekeeper: Improving Model Cascades Through Confidence Tuning
2025cites this paper
FreeNet: An efficient frequency-domain early exiting network for dynamic inference
2025cites this paper
Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion
2025cites this paper
Personalized Top-k Set Queries Over Predicted Scores
2025cites this paper
Environment-Aware Dynamic Pruning for Pipelined Edge Inference
2025cites this paper
Dynamic Cross-Modal Feature Interaction Network for Hyperspectral and LiDAR Data Classification
2025cites this paper
Context-aware Dynamic Pruning for Speech Foundation Models
2025cites this paper
BAPEN: Towards Versatile Audio Phase Retrieval
2025cites this paper
Fine-Grained Image Captioning via Dynamic Query and Deformable Cross-Attention
2025cites this paper
Void in Language Models
2025cites this paper
ESCAN: Efficient GPU sharing for cascade neural network inference
2025cites this paper
Cost-Aware Routing for Efficient Text-To-Image Generation
2025cites this paper
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
2025cites this paper
Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
2025cites this paper
ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs
2025cites this paper
Adaptive LiDAR Scanning: Harnessing Temporal Cues for Efficient 3D Object Detection via Multi-Modal Fusion
2025cites this paper
Exploring facial attribute inference with ResNet: a study on head pose estimation and gender prediction
2024cites this paper
Early-Exit Meets Model-Distributed Inference at Edge Networks
2024cites this paper
A stable and efficient dynamic ensemble method for pothole detection
2024cites this paper
Inference latency prediction for CNNs on heterogeneous mobile devices and ML frameworks
2024cites this paper
Characterizing Disparity Between Edge Models and High-Accuracy Base Models for Vision Tasks
2024cites this paper
EdgeBoost: Confidence Boosting for Resource Constrained Inference via Selective Offloading
2024cites this paper
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
2024cites this paper
Model Adaptation for Time Constrained Embodied Control
2024cites this paper
Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE
2024cites this paper
Agreement-Based Cascading for Efficient Inference
2024cites this paper
Fast yet Safe: Early-Exiting with Risk Control
2024cites this paper
DNCs Require More Planning Steps
2024cites this paper
Search, Examine and Early-Termination: Fake News Detection with Annotation-Free Evidences
2024cites this paper
The Entanglement of Communication and Computing in Enabling Edge Intelligence
2024influential citation
ACF: An Adaptive Compression Framework for Multimodal Network in Embedded Devices
2024cites this paper
Learning Iterative Reasoning through Energy Diffusion
2024cites this paper
Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models
2024cites this paper
Multi-Resolution Model Compression for Deep Neural Networks: A Variational Bayesian Approach
2024cites this paper
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
2024cites this paper
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
2024cites this paper
Conditional computation in neural networks: Principles and research trends
2024cites this paper
Lightweight Inference for Forward-Forward Algorithm
2024cites this paper
DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices
2024cites this paper
When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination
2024cites this paper
Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning
2024cites this paper
Not all Layers of LLMs are Necessary during Inference
2024cites this paper
Integrating Dynamic Routing with Reinforcement Learning and Multimodal Techniques for Visual Question Answering
2024cites this paper
Two grids are better than one: Hybrid indoor scene reconstruction framework with adaptive priors
2024cites this paper
Dynamic multi-headed self-attention and multiscale enhancement vision transformer for object detection
2024cites this paper
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
2024cites this paper
MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing
2024cites this paper
Job assignment in machine learning inference systems with accuracy constraints
2024cites this paper
AdaDet: An Adaptive Object Detection System Based on Early-Exit Neural Networks
2024cites this paper
Temporal Decisions: Leveraging Temporal Correlation for Efficient Decisions in Early Exit Neural Networks
2024cites this paper
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling
2024cites this paper
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
2024cites this paper
Efficient out-of-distribution detection via layer-adaptive scoring and early stopping
2024cites this paper
Priority-Aware Model-Distributed Inference at Edge Networks
2024cites this paper
Early-Exit Deep Neural Network - A Comprehensive Survey
2024influential citation
SOI: Scaling Down Computational Complexity by Estimating Partial States of the Model
2024cites this paper
Harnessing Temporal Information for Efficient Edge AI
2024cites this paper
JIGSAW: Edge-based Streaming Perception over Spatially Overlapped Multi-Camera Deployments
2024cites this paper
CAS: Fusing DNN Optimization & Adaptive Sensing for Energy-Efficient Multi-Modal Inference
2024cites this paper
Dynamic Diffusion Transformer
2024cites this paper