When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching

Published 2026 in Unknown venue

ABSTRACT

Transformers allocate uniform computation to every position, regardless of difficulty. State Space Models (SSMs) offer efficient alternatives but struggle with precise information retrieval over a long horizon. Inspired by dual-process theories of cognition (Kahneman, 2011), we propose AMOR (Adaptive Metacognitive Output Router), a hybrid architecture that dynamically engages sparse attention only when an SSM backbone is"uncertain"--as measured by prediction entropy. Compared to standard transformers, AMOR gains efficiency by projecting keys and values from SSM hidden states (Ghost KV), reusing the SSM's O(n) computation rather than requiring O(n^2) attention at every layer. On small-scale synthetic retrieval tasks, AMOR outperforms both SSM-only and transformer-only baselines, achieving perfect retrieval accuracy while engaging attention on only 22% of positions. We validate that prediction entropy reliably signals retrieval need, with a gap of 1.09 nats (nearly half the entropy range) between retrieval and local positions. Additionally, our approach provides interpretable adaptive computation, where routing decisions can be understood in information-theoretic terms.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-01-22
Fields of study
Computer Science
Identifiers
arXiv 2602.13215
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Context-Selective State Space Models: Feedback is All You Need
2025cited by this paper
Hierarchical Reasoning Model
2025cited by this paper
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
2024cited by this paper
Zamba: A Compact 7B SSM Hybrid Model
2024cited by this paper
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
2024cited by this paper
Hymba: A Hybrid-head Architecture for Small Language Models
2024cited by this paper
Jamba: A Hybrid Transformer-Mamba Language Model
2024cited by this paper
Retentive Network: A Successor to Transformer for Large Language Models
2023cited by this paper
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
2023cited by this paper
RWKV: Reinventing RNNs for the Transformer Era
2023cited by this paper
A survey of uncertainty in deep neural networks
2021cited by this paper
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
2021cited by this paper
PonderNet: Learning to Ponder
2021cited by this paper
Efficiently Modeling Long Sequences with Structured State Spaces
2021cited by this paper
Linformer: Self-Attention with Linear Complexity
2020cited by this paper
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
2020cited by this paper
Longformer: The Long-Document Transformer
2020cited by this paper
Long Range Arena: A Benchmark for Efficient Transformers
2020cited by this paper
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
2020cited by this paper
Universal Transformers
2018cited by this paper
Attention is All you Need
2017cited by this paper
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2017cited by this paper
The Consciousness Prior
2017cited by this paper
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
2016cited by this paper
Adaptive Computation Time for Recurrent Neural Networks
2016cited by this paper
Thinking fast and slow.
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
2013cited by this paper

CITED BY

No citing papers are available for this paper.