What they do when in doubt: a study of inductive biases in seq2seq learners

Published 2020 in International Conference on Learning Representations

ABSTRACT

Sequence-to-sequence (seq2seq) learners are widely used, but we still have only limited knowledge about what inductive biases shape the way they generalize. We address that by investigating how popular seq2seq learners generalize in tasks that have high ambiguity in the training data. We use SCAN and three new tasks to study learners' preferences for memorization, arithmetic, hierarchical, and compositional reasoning. Further, we connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases. In our experimental study, we find that LSTM-based learners can learn to perform counting, addition, and multiplication by a constant from a single training example. Furthermore, Transformer and LSTM-based learners show a bias toward the hierarchical induction over the linear one, while CNN-based learners prefer the opposite. On the SCAN dataset, we find that CNN-based, and, to a lesser degree, Transformer- and LSTM-based learners have a preference for compositional generalization over memorization. Finally, across all our experiments, description length proved to be a sensitive measure of inductive biases.

PUBLICATION RECORD

Publication year
2020
Venue
International Conference on Learning Representations
Publication date
2020-06-26
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2006.14953
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

On Aspects of the Theory of Syntax
2021cited by this paper
Towards a Human-like Open-Domain Chatbot
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
A Formal Hierarchy of RNN Architectures
2020cited by this paper
Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks
2020cited by this paper
Permutation Equivariant Models for Compositional Generalization in Language
2020cited by this paper
Identity Crisis: Memorization and Generalization under Extreme Overparameterization
2019cited by this paper
Compositionality Decomposed: How do Neural Networks Generalise?
2019cited by this paper
Human few-shot learning of compositional instructions
2019influential reference
Learning the Dyck Language with Attention-based Seq2Seq Models
2019cited by this paper
LSTM Networks Can Perform Dynamic Counting
2019cited by this paper
Word-order Biases in Deep-agent Emergent Communication
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019cited by this paper
CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks
2019influential reference
Joint Source-Target Self Attention with Locality Constraints
2019cited by this paper
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
2019cited by this paper
Learning Inductive Biases with Simple Neural Networks
2018cited by this paper
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
2018cited by this paper
The Description Length of Deep Learning models
2018cited by this paper
On the Practical Computational Power of Finite Precision RNNs for Language Recognition
2018cited by this paper
Hierarchical Neural Story Generation
2018cited by this paper
Jump to better conclusions: SCAN both left and right
2018cited by this paper
Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks
2018cited by this paper
Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study
2017cited by this paper
The Marginal Value of Adaptive Gradient Methods in Machine Learning
2017cited by this paper
Convolutional Sequence to Sequence Learning
2017cited by this paper
A Closer Look at Memorization in Deep Networks
2017cited by this paper
Attention is All you Need
2017cited by this paper
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
2017cited by this paper
Language Modeling with Gated Convolutional Networks
2016cited by this paper
Deep Learning
2016cited by this paper
Optimization Methods for Large-Scale Machine Learning
2016cited by this paper
Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner
2016cited by this paper
Sequence to Sequence Learning with Neural Networks
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
The learnability of abstract syntactic principles.
2011cited by this paper
A tutorial introduction to the minimum description length principle
2004influential reference
Information Theory, Inference, and Learning Algorithms
2004cited by this paper
Long Short-Term Memory
1997cited by this paper
Rethinking innateness: A connectionist perspective on development.
1997cited by this paper
Rethinking Innateness: A Connectionist Perspective on Development
1996cited by this paper
On the computational power of neural nets
1992cited by this paper
Handwritten Digit Recognition with a Back-Propagation Network
1989cited by this paper
Modeling By Shortest Data Description*
1978cited by this paper
A Formal Theory of Inductive Inference. Part II
1964influential reference

CITED BY

(R)NNs too expressive?
2026cites this paper
Success and failure of compositional generalisation in distributional models of language
2025cites this paper
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
2025cites this paper
A Survey of Inductive Reasoning for Large Language Models
2025cites this paper
Can Input Attributions Interpret the Inductive Reasoning Process Elicited in In-Context Learning?
2024cites this paper
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
2024cites this paper
No Such Thing as a General Learner: Language models and their dual optimization
2024cites this paper
Self-attention Networks Localize When QK-eigenspectrum Concentrates
2024cites this paper
Emergent Word Order Universals from Cognitively-Motivated Language Models
2024cites this paper
Can Input Attributions Explain Inductive Reasoning in In-Context Learning?
2024cites this paper
On the Empirical Complexity of Reasoning and Planning in LLMs
2024cites this paper
Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023)
2023cites this paper
Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation
2023cites this paper
Injecting structural hints: Using language models to study inductive biases in language learning
2023cites this paper
Empirical Analysis of the Inductive Bias of Recurrent Neural Networks by Discrete Fourier Transform of Output Sequences
2023cites this paper
Language acquisition: do children and language models follow similar learning stages?
2023cites this paper
Large sequence models for sequential decision-making: a survey
2023cites this paper
Tree-shape Uncertainty for Analyzing the Inherent Branching Bias of Unsupervised Parsing Models
2023cites this paper
Inductive Bias Is in the Eye of the Beholder
2023cites this paper
Exploiting Representation Bias for Data Distillation in Abstractive Text Summarization
2023cites this paper
Discrete and continuous representations and processing in deep learning: Looking forward
2022cites this paper
What do Large Language Models Learn beyond Language?
2022cites this paper
Exploring Length Generalization in Large Language Models
2022cites this paper
How BPE Affects Memorization in Transformers
2021cites this paper
Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN
2021cites this paper
On the proper role of linguistically-oriented deep net analysis in linguistic theorizing
2021cites this paper
The King is Naked: on the Notion of Robustness for Natural Language Processing
2021cites this paper
How Do Neural Sequence Models Generalize? Local and Global Cues for Out-of-Distribution Prediction
2021cites this paper
How Can Self-Attention Networks Recognize Dyck-n Languages?
2020cites this paper