Latent Thought Models with Variational Bayes Inference-Time Computation

Deqian Kong,Minglu Zhao,Dehong Xu,Bo Pang,Shu Wang,Edouardo Honig,Zhangzhang Si,Chuan Li,Jianwen Xie,Sirui Xie,Ying Nian Wu

Published 2025 in International Conference on Machine Learning

ABSTRACT

We propose a novel class of language models, Latent Thought Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors (inference-time computation), and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional Large Language Models (LLMs), such as the number of iterations in inference-time computation and number of latent thought vectors. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling tasks. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model size, and achieve competitive performance in conditional and unconditional text generation.

PUBLICATION RECORD

Publication year
2025
Venue
International Conference on Machine Learning
Publication date
2025-02-03
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2502.01567
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Simplified and Generalized Masked Diffusion for Discrete Data
2024cited by this paper
Large Concept Models: Language Modeling in a Sentence Representation Space
2024cited by this paper
Training Large Language Models to Reason in a Continuous Latent Space
2024cited by this paper
Liger Kernel: Efficient Triton Kernels for LLM Training
2024cited by this paper
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
2024cited by this paper
Simple and Effective Masked Diffusion Language Models
2024influential reference
Molecule Design by Latent Prompt Transformer
2023cited by this paper
Training Chain-of-Thought via Latent-Variable Inference
2023cited by this paper
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
2023influential reference
Amortizing intractable inference in large language models
2023cited by this paper
Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference
2023cited by this paper
PaLM: Scaling Language Modeling with Pathways
2022cited by this paper
Continuous diffusion for categorical data
2022cited by this paper
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
2022cited by this paper
Composing Ensembles of Pre-trained Models via Iterative Consensus
2022cited by this paper
Latent Diffusion Energy-Based Model for Interpretable Text Modeling
2022cited by this paper
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
2022cited by this paper
Training Compute-Optimal Large Language Models
2022cited by this paper
Structured Denoising Diffusion Models in Discrete State-Spaces
2021cited by this paper
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
2021cited by this paper
Training Verifiers to Solve Math Word Problems
2021cited by this paper
Autoregressive Diffusion Models
2021cited by this paper
Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification
2021cited by this paper
Generative Text Modeling through Short Run Inference
2021cited by this paper
Scaling Laws for Neural Language Models
2020cited by this paper
AUTO-ENCODING VARIATIONAL BAYES
2020cited by this paper
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
2020cited by this paper
Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse
2019cited by this paper
Root Mean Square Layer Normalization
2019cited by this paper
The Curious Case of Neural Text Degeneration
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019influential reference
Bayesian Model-Agnostic Meta-Learning
2018cited by this paper
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
2018cited by this paper
Spherical Latent Spaces for Stable Variational Autoencoders
2018cited by this paper
Dynamic Evaluation of Neural Sequence Models
2017cited by this paper
Attention is All you Need
2017cited by this paper
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
2017cited by this paper
Beam Search Strategies for Neural Machine Translation
2017cited by this paper
Decoupled Weight Decay Regularization
2017cited by this paper
Pointer Sentinel Mixture Models
2016cited by this paper
Using Fast Weights to Attend to the Recent Past
2016cited by this paper
Adaptive Computation Time for Recurrent Neural Networks
2016cited by this paper
What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated.
2016cited by this paper
Variational Inference: A Review for Statisticians
2016cited by this paper
The LAMBADA dataset: Word prediction requiring a broad discourse context
2016cited by this paper
Character-level Convolutional Networks for Text Classification
2015cited by this paper
Generating Sentences from a Continuous Space
2015cited by this paper
Human-level concept learning through probabilistic program induction
2015cited by this paper
One billion word benchmark for measuring progress in statistical language modeling
2013cited by this paper
A Deep and Tractable Density Estimator
2013cited by this paper
Bootstrapping in a language of thought: a formal model of numerical concept learning.
2012cited by this paper
Machine learning - a probabilistic perspective
2012influential reference
Stochastic variational inference
2012cited by this paper
Contributions of memory circuits to language: the declarative/procedural model.
2004influential reference
The neural basis of lexicon and grammar in first and second language: the declarative/procedural model
2001cited by this paper
An Introduction to Variational Methods for Graphical Models
1999cited by this paper
Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.
1995cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
THE LANGUAGE OF THOUGHT
1977influential reference

CITED BY

Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning
2026cites this paper
BFS-PO: Best-First Search for Large Reasoning Models
2026cites this paper
A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law
2025influential citation
Latent Adaptive Planner for Dynamic Manipulation
2025cites this paper
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
2025cites this paper
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
2025cites this paper
Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning
2025cites this paper
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
2025cites this paper
From Stochastic Parrots to Digital Intelligence: The Evolution of Language Models and Their Cognitive Capabilities
2025cites this paper
Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space
2025cites this paper
Generative Actor Critic
2025cites this paper
Modeling Language as a Sequence of Thoughts
2025cites this paper
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
2025cites this paper
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
2025cites this paper