Adaptive Computation Time for Recurrent Neural Networks

Published 2016 in arXiv.org

ABSTRACT

This paper introduces Adaptive Computation Time (ACT), an algorithm that allows recurrent neural networks to learn how many computational steps to take between receiving an input and emitting an output. ACT requires minimal changes to the network architecture, is deterministic and differentiable, and does not add any noise to the parameter gradients. Experimental results are provided for four synthetic problems: determining the parity of binary vectors, applying binary logic operations, adding integers, and sorting real numbers. Overall, performance is dramatically improved by the use of ACT, which successfully adapts the number of computational steps to the requirements of the problem. We also present character-level language modelling results on the Hutter prize Wikipedia dataset. In this case ACT does not yield large gains in performance; however it does provide intriguing insight into the structure of the data, with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences. This suggests that ACT or other adaptive computation methods could provide a generic method for inferring segment boundaries in sequence data.

PUBLICATION RECORD

Publication year
2016
Venue
arXiv.org
Publication date
2016-03-29
Fields of study
Computer Science
Identifiers
arXiv 1603.08983
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
2016cited by this paper
Order Matters: Sequence to sequence for sets
2015cited by this paper
Conditional Computation in Neural Networks for faster models
2015cited by this paper
Neural Programmer-Interpreters
2015cited by this paper
DRAW: A Recurrent Neural Network For Image Generation
2015cited by this paper
End-To-End Memory Networks
2015cited by this paper
Training Very Deep Networks
2015cited by this paper
Learning to Transduce with Unbounded Memory
2015cited by this paper
Grid Long Short-Term Memory
2015cited by this paper
Pointer Networks
2015cited by this paper
Deep Sequential Neural Network
2014cited by this paper
Distributed Representations of Sentences and Documents
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Neural Turing Machines
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Generating Sequences With Recurrent Neural Networks
2013cited by this paper
Speech recognition with deep recurrent neural networks
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
First Experiments with PowerPlay
2012cited by this paper
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
2012cited by this paper
Self-Delimiting Neural Networks
2012cited by this paper
I and J
2012cited by this paper
Multi-column deep neural networks for image classification
2012cited by this paper
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
2011cited by this paper
Et al
2008cited by this paper
Universal artificial intelligence
2004cited by this paper
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
2001cited by this paper
Long Short-Term Memory
1997influential reference
The Effects of Adding Noise During Backpropagation Training on a Generalization Performance
1996cited by this paper
Guessing can Outperform Many Long Time Lag Algorithms
1996cited by this paper
Emergence of simple-cell receptive field properties by learning a sparse code for natural images
1996cited by this paper
Gradient-based learning algorithms for recurrent networks and their computational complexity
1995cited by this paper
Modular Elliptic Curves and Fermat′s Last Theorem(抜粋) (フェルマ-予想がついに解けた!?)
1995cited by this paper
Paradigms and processes in reading comprehension.
1982cited by this paper

CITED BY

SD-E2: Semantic Exploration for Reasoning Under Token Budgets
2026cites this paper
On the Role of Iterative Computation in Reinforcement Learning
2026cites this paper
DynaMoE: Dynamic Token-Level Expert Activation with Layer-Wise Adaptive Capacity for Mixture-of-Experts Neural Networks
2026cites this paper
Resonant Sparse Geometry Networks
2026cites this paper
Are Your Reasoning Models Reasoning or Guessing? A Mechanistic Analysis of Hierarchical Reasoning Models
2026cites this paper
Scale-Consistent State-Space Dynamics via Fractal of Stationary Transformations
2026cites this paper
Learning to Forget Attention: Memory Consolidation for Adaptive Compute Reduction
2026cites this paper
AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth
2026cites this paper
SCIoT: Design and Evaluation of a Split Computing Framework for Collaborative Inference in the IoT
2026cites this paper
ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning
2026cites this paper
CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute
2026cites this paper
Synthetic Intuition: A System-1/System-2 Architecture for Fast and Slow Thinking in Large Language Models
2026cites this paper
Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers
2026cites this paper
No More, No Less: Least-Privilege Language Models
2026cites this paper
LoopViT: Scaling Visual ARC with Looped Transformers
2026cites this paper
Generalizing GNNs with Tokenized Mixture of Experts
2026cites this paper
PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking
2026cites this paper
Entropic-Time Inference: Self-Organizing Large Language Model Decoding Beyond Attention
2026cites this paper
When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching
2026cites this paper
Pretraining with Token-Level Adaptive Latent Chain-of-Thought
2026cites this paper
Speed is Confidence
2026cites this paper
T3C: Test-Time Tensor Compression with Consistency Guarantees
2026cites this paper
Elastic Spectral State Space Models for Budgeted Inference
2026cites this paper
PRISM: Festina Lente Proactivity -- Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
2026cites this paper
Agentic Test-Time Scaling for WebAgents
2026cites this paper
Excitation: Momentum For Experts
2026cites this paper
Efficient Sparse Selective-Update RNNs for Long-Range Sequence Modeling
2026influential citation
Artificial Agency Program: Curiosity, compression, and communication in agents
2026influential citation
Understanding Dynamic Compute Allocation in Recurrent Transformers
2026influential citation
Learning When Not to Attend Globally
2025cites this paper
Early-Exit Graph Neural Networks
2025cites this paper
Context-Driven Dynamic Pruning for Large Speech Foundation Models
2025cites this paper
Universal Reasoning Model
2025cites this paper
Neural Networks as Universal Finite-State Machines: A Constructive Deterministic Finite Automaton Theory
2025cites this paper
Mobius: Mixture-Of-Experts Transformer Model in Epigenetics of ME/CFS and Long COVID
2025cites this paper
Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models
2025cites this paper
Zero-Overhead Introspection for Adaptive Test-Time Compute
2025cites this paper
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
2025cites this paper
Learning Unmasking Policies for Diffusion Language Models
2025cites this paper
H-Model: Dynamic Neural Architectures for Adaptive Processing
2025cites this paper
MID-L: Matrix-Interpolated Dropout Layer with Layer-wise Neuron Selection
2025cites this paper
CellARC: Measuring Intelligence with Cellular Automata
2025cites this paper
Your Latent Reasoning is Secretly Policy Improvement Operator
2025cites this paper
Visual Artificial Intelligence: Unlocking Efficiency with Psychovisual Models
2025cites this paper
Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models
2025cites this paper
Market-based Architectures in RL and Beyond
2025cites this paper
Dynamic Neural Network Structure: A Review for its Theories and Applications
2025cites this paper
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
2025cites this paper
Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion
2025cites this paper
Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
2025cites this paper
MIND over Body: Adaptive Thinking using Dynamic Computation
2025influential citation
Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation
2025cites this paper
Continuous Thought Machines
2025influential citation
Exact Learning of Arithmetic with Differentiable Agents
2025cites this paper
Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production
2025cites this paper
Do Language Models Use Their Depth Efficiently?
2025cites this paper
Understanding Complexity in VideoQA via Visual Program Generation
2025cites this paper
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier
2025cites this paper
MeSH: Memory-as-State-Highways for Recursive Transformers
2025cites this paper
Void in Language Models
2025cites this paper
AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking
2025cites this paper
On Network-Aware Semantic Communication and Edge-Cloud Collaborative Intelligence Systems
2025cites this paper
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
2025cites this paper
Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models
2025cites this paper
Generalizable Reasoning through Compositional Energy Minimization
2025cites this paper
Latent Thought Models with Variational Bayes Inference-Time Computation
2025cites this paper
ERDE: Entropy-Regularized Distillation for Early-exit
2025cites this paper
Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization
2025cites this paper
Hierarchical Reasoning Models: Perspectives and Misconceptions
2025cites this paper
Entropy After <\Think> for reasoning model early exiting
2025cites this paper
Optimal Stopping vs Best-of-N for Inference Time Optimization
2025cites this paper
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
2025cites this paper
Exploring Shared-Weight Mechanisms in Transformer and Conformer Architectures for Automatic Speech Recognition
2025cites this paper
Neural Field Turing Machine: A Differentiable Spatial Computer
2025cites this paper
ADMP-GNN: Adaptive Depth Message Passing GNN
2025cites this paper
Analog optical computer for AI inference and combinatorial optimization
2025cites this paper
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling
2025cites this paper
STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers
2025influential citation
The data scientist as a mainstay of the tumor board: global implications and opportunities for the global south
2025cites this paper
UniPCGC: Towards Practical Point Cloud Geometry Compression via an Efficient Unified Approach
2025cites this paper
Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
2025cites this paper
Short-Term Wind Power Prediction Based on Wind2vec-BERT Model
2025cites this paper
Large Language Models Are Human-Like Internally
2025cites this paper
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
2025cites this paper
A graph convolutional neural network that dynamically adapts its architecture to input data at inference time to predict the short-term states of traffic
2025cites this paper
Learning Model Successors
2025cites this paper
Online deep learning’s role in conquering the challenges of streaming data: a survey
2025cites this paper
Learning to Stop Overthinking at Test Time
2025influential citation
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity
2025cites this paper
DeltaProduct: Increasing the Expressivity of DeltaNet Through Products of Householders
2025cites this paper
Playing Hex and Counter Wargames using Reinforcement Learning and Recurrent Neural Networks
2025cites this paper
Zero Token-Driven Deep Thinking in LLMs: Unlocking the Full Potential of Existing Parameters via Cyclic Refinement
2025cites this paper
Int2Int: a framework for mathematics with transformers
2025cites this paper
Dynamic Mixture-of-Experts for Visual Autoregressive Model
2025cites this paper
ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts
2025cites this paper
Deep Learning Anytime Prediction via Enforcing Runtime Monotonicity for Early-Exit Activity Recognition
2025cites this paper
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
2025cites this paper
Temporal Zoom Networks: Distance Regression and Continuous Depth for Efficient Action Localization
2025cites this paper
Context-aware Dynamic Pruning for Speech Foundation Models
2025cites this paper
Visual Programmability: A Guide for Code-as-Thought in Chart Understanding
2025cites this paper