Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

Published 2015 in Neural Information Processing Systems

ABSTRACT

Despite the recent achievements in machine learning, we are still very far from achieving real artificial intelligence. In this paper, we discuss the limitations of standard deep learning approaches and show that some of these limitations can be overcome by learning how to grow the complexity of a model in a structured way. Specifically, we study the simplest sequence prediction problems that are beyond the scope of what is learnable with standard recurrent networks, algorithmically generated sequences which can only be learned by models which have the capacity to count and to memorize sequences. We show that some basic algorithms can be learned from sequential data using a recurrent network associated with a trainable memory.

PUBLICATION RECORD

Publication year
2015
Venue
Neural Information Processing Systems
Publication date
2015-03-03
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1503.01007
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Perceptrons
2021cited by this paper
Gated Feedback Recurrent Neural Networks
2015cited by this paper
RECURRENT NEURAL NETWORKS
2014cited by this paper
Memory Networks
2014influential reference
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Neural Turing Machines
2014cited by this paper
Learning to Execute
2014cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Statistical Language Models Based on Neural Networks
2012cited by this paper
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
2012cited by this paper
High-Performance Neural Networks for Visual Object Classification
2011cited by this paper
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
2011cited by this paper
Large-Scale Machine Learning with Stochastic Gradient Descent
2010cited by this paper
Scaling learning algorithms towards AI
2007cited by this paper
Pattern Recognition and Machine Learning
2006cited by this paper
LSTM recurrent networks learn simple context-free and context-sensitive languages
2001cited by this paper
Random Forests
2001cited by this paper
Fractal encoding of context‐free grammars in connectionist networks
2000cited by this paper
Context-free and context-sensitive dynamics in recurrent neural networks
2000influential reference
Toward a connectionist model of recursion in human linguistic performance
1999influential reference
A Recurrent Neural Network that Learns to Count
1999cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Long Short-Term Memory
1997cited by this paper
Designing a Counter: Another Case Study of Dynamics and Activation Landscapes in Recurrent Networks
1997cited by this paper
A Recurrent Network that performs a Context-Sensitive Prediction Task
1996influential reference
Mechanisms for Sentence Processing
1996cited by this paper
Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks
1995influential reference
Gradient-based learning algorithms for recurrent networks and their computational complexity
1995cited by this paper
Discrete recurrent neural networks for grammatical inference
1994influential reference
Learning long-term dependencies with gradient descent is difficult
1994influential reference
Learning and development in neural networks: the importance of starting small.
1993cited by this paper
A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages
1992influential reference
Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3)
1992influential reference
Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages
1992cited by this paper
The Induction of Dynamical Recognizers
1991influential reference
Finding Structure in Time
1990cited by this paper
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
1989cited by this paper
Generalization of backpropagation with application to a recurrent gas market model
1988cited by this paper
Learning internal representations by error propagation
1986cited by this paper
Context-free parsing in Connectionist Networks
1985cited by this paper

CITED BY

Artificial Intelligence in Drug Discovery: Integrative Advances From Data to Therapeutic Innovation
2026cites this paper
Parallelizable Neural Turing Machines
2026cites this paper
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
2025cites this paper
Cross-Domain Fake Review Detection via Orthogonal Counterfactual Representations
2025cites this paper
Bearing Syntactic Fruit with Stack-Augmented Neural Networks
2025cites this paper
From part to whole: AI-driven progress in fragment-based drug discovery.
2025cites this paper
Emergent Stack Representations in Modeling Counter Languages Using Transformers
2025cites this paper
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
2025cites this paper
Position as Probability: Self-Supervised Transformers that Think Past Their Training for Length Extrapolation
2025cites this paper
Exact Learning of Arithmetic with Differentiable Agents
2025cites this paper
A Systematic Study of Compositional Syntactic Transformer Language Models
2025cites this paper
StackTrans: From Large Language Model to Large Pushdown Automata Model
2025influential citation
Common Benchmarks Undervalue the Generalization Power of Programmatic Policies
2025cites this paper
Memory-Based Augmentation Network for Video Captioning
2024cites this paper
Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer
2024cites this paper
A Transformer with Stack Attention
2024influential citation
Memory Mosaics
2024cites this paper
Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length
2024cites this paper
Perfect detection of computer-generated text faces fundamental challenges
2024cites this paper
Neuro-mimetic Task-free Unsupervised Online Learning with Continual Self-Organizing Maps
2024cites this paper
On the Markov Property of Neural Algorithmic Reasoning: Analyses and Methods
2024cites this paper
Exploring Learnability in Memory-Augmented Recurrent Neural Networks: Precision, Stability, and Empirical Insights
2024influential citation
Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory
2024cites this paper
Targeted Syntactic Evaluation on the Chomsky Hierarchy
2024cites this paper
Learning Universal Predictors
2024cites this paper
MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models
2024cites this paper
Learning Program Behavioral Models from Synthesized Input-Output Pairs
2024cites this paper
Thinking Tokens for Language Modeling
2024cites this paper
Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models
2024cites this paper
On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages
2024cites this paper
DKVMN&MRI: A new deep knowledge tracing model based on DKVMN incorporating multi-relational information
2024cites this paper
Compositional Generalization Across Distributional Shifts with Sparse Tree Operations
2024cites this paper
Understanding Transformer Reasoning Capabilities via Graph Algorithms
2024cites this paper
BiRFIA: Selective Binary Rewriting for Function Interception on ARM
2023influential citation
Deep Learning in Requirement Engineering: A Statistical Justification
2023cites this paper
CLeAR: Continual Learning on Algorithmic Reasoning for Human-like Intelligence
2023cites this paper
On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks
2023cites this paper
Formal and Empirical Studies of Counting Behaviour in ReLU RNNs
2023cites this paper
Benchmarking Neural Network Generalization for Grammar Induction
2023cites this paper
DeepArc: Modularizing Neural Networks for the Model Maintenance
2023cites this paper
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
2023cites this paper
Multi-Property De Novo Drug Design Using Multi-Objective Deep Reinforcement Learning
2023cites this paper
On the Tensor Representation and Algebraic Homomorphism of the Neural State Turing Machine
2023cites this paper
Augmenting Recurrent Graph Neural Networks with a Cache
2023cites this paper
Language models in molecular discovery
2023cites this paper
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
2023cites this paper
Style Locality for Controllable Generation with kNN Language Models
2023cites this paper
Deep Learning for Natural Language Processing: A Survey
2023cites this paper
Recursive Algorithmic Reasoning
2023cites this paper
Neural Priority Queues for Graph Neural Networks
2023cites this paper
Birth of a Transformer: A Memory Viewpoint
2023cites this paper
Differentiable Tree Operations Promote Compositional Generalization
2023cites this paper
A Framework for Inference Inspired by Human Memory Mechanisms
2023cites this paper
ReBADD-SE: Multi-objective molecular optimisation using SELFIES fragment and off-policy self-critical sequence training
2023cites this paper
Deep Cognitive Networks: Enhance Deep Learning by Modeling Human Cognitive Mechanism
2023cites this paper
Neural Attention Memory
2023cites this paper
VClinic: A Portable and Efficient Framework for Fine-Grained Value Profilers
2023cites this paper
Memory-Based Meta-Learning on Non-Stationary Distributions
2023cites this paper
Question and Answering System For Investment Promotion Based on NLP
2023cites this paper
A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text
2023cites this paper
Length Generalization in Arithmetic Transformers
2023cites this paper
Scaling Transformer to 1M tokens and beyond with RMT
2023cites this paper
DeepLig: A De-Novo Computational Drug Design Approach to Generate Multi-Targeted Drugs
2023cites this paper
Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks
2023cites this paper
Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
2023influential citation
GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis
2022cites this paper
De Novo Drug Design Using Self-Attention Based Variational Autoencoder
2022cites this paper
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
2022cites this paper
A Call for Clarity in Beam Search: How It Works and When It Stops
2022cites this paper
Multi-channel word embeddings for sentiment analysis
2022cites this paper
Leveraging Relaxed Equilibrium by Lazy Transition for Sequence Modeling
2022cites this paper
An explainable multi-parameter optimization approach for de novo drug design against proteins from central nervous system
2022influential citation
Network structure
2022cites this paper
Fine-tuning Image Transformers using Learnable Memory
2022cites this paper
Neural Networks and the Chomsky Hierarchy
2022influential citation
Cache-Memory Gated Graph Neural Networks
2022cites this paper
Scalable Adaptive Computation for Iterative Generation
2022cites this paper
Benchmarking Learning Efficiency in Deep Reservoir Computing
2022cites this paper
Deep reinforcement learning for inverse inorganic materials design
2022cites this paper
Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models
2022cites this paper
The Surprising Computational Power of Nondeterministic Stack RNNs
2022cites this paper
Predicting the need for a reduced drug dose, at ﬁrst prescription — Prédire la nécessité de réduire le dosage des médicaments, avant la première prescription Hétérogénéité de réponses aux médicaments et e ﬀ ets indésirables
2022cites this paper
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
2022cites this paper
Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder
2022cites this paper
Recurrent Memory Transformer
2022cites this paper
UnweaveNet: Unweaving Activity Stories
2021cites this paper
Turing Completeness of Bounded-Precision Recurrent Neural Networks
2021cites this paper
Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks
2021influential citation
A molecular generative model with genetic algorithm and tree search for cancer samples
2021cites this paper
State-Space Constraints Can Improve the Generalisation of the Differentiable Neural Computer to Input Sequences With Unseen Length
2021cites this paper
Minimum Description Length Recurrent Neural Networks
2021cites this paper
Learning Bounded Context-Free-Grammar via LSTM and the Transformer: Difference and Explanations
2021cites this paper
\infty-former: Infinite Memory Transformer
2021cites this paper
Trends in Deep Learning for Property-driven Drug Design.
2021cites this paper
Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
2021influential citation
Recent omics-based computational methods for COVID-19 drug discovery and repurposing
2021cites this paper
A large-scale benchmark for few-shot program induction and synthesis
2021cites this paper
pix2rule: End-to-end Neuro-symbolic Rule Learning
2021cites this paper
Personalized Federated Learning through Local Memorization
2021cites this paper
ABC: Attention with Bounded-memory Control
2021cites this paper