Improving the Neural GPU Architecture for Algorithm Learning

Published 2017 in arXiv.org

ABSTRACT

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a new technique - hard nonlinearities with saturation costs- that has general applicability. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.

PUBLICATION RECORD

Publication year
2017
Venue
arXiv.org
Publication date
2017-02-28
Fields of study
Computer Science, Engineering
Identifiers
arXiv 1702.08727
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Recent Advances in Neural Program Synthesis
2018cited by this paper
Can Active Memory Replace Attention?
2016cited by this paper
Recurrent Dropout without Memory Loss
2016influential reference
Learning Efficient Algorithms with Hierarchical Attentive Memory
2016cited by this paper
Extensions and Limitations of the Neural GPU
2016influential reference
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
2015cited by this paper
Pointer Networks
2015cited by this paper
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
2015cited by this paper
Adding Gradient Noise Improves Learning for Very Deep Networks
2015influential reference
Neural Random Access Machines
2015cited by this paper
Neural GPUs Learn Algorithms
2015influential reference
Learning Simple Algorithms from Examples
2015cited by this paper
Neural Programmer-Interpreters
2015cited by this paper
Grid Long Short-Term Memory
2015cited by this paper
Learning to Transduce with Unbounded Memory
2015cited by this paper
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
2015cited by this paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015cited by this paper
Reinforcement Learning Neural Turing Machines - Revised
2015cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Neural Turing Machines
2014cited by this paper
Rectifier Nonlinearities Improve Neural Network Acoustic Models
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Long Short-Term Memory
1997influential reference
Inductive Inference: Theory and Methods
1983cited by this paper
Language Identification in the Limit
1967cited by this paper

CITED BY

NeuralThink: Algorithm Synthesis that Extrapolates in General Tasks
2024cites this paper
Counting and Algorithmic Generalization with Transformers
2023cites this paper
L'apprentissage algorithmique, une nouvelle étape pour l'IA. Une application aux opérations arithmétiques
2023cites this paper
Discrete Denoising Diffusion Approach to Integer Factorization
2023cites this paper
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking
2022cites this paper
Neural Networks and the Chomsky Hierarchy
2022cites this paper
Attention is Turing-Complete
2021cites this paper
RecT: A Recursive Transformer Architecture for Generalizable Mathematical Reasoning
2021cites this paper
Performing arithmetic using a neural network trained on images of digit permutation pairs
2021cites this paper
Recursive Transformer: A Novel Neural Architecture for Generalizable Mathematical Reasoning
2021cites this paper
A Primer for Neural Arithmetic Logic Modules
2021influential citation
Matrix Shuffle- Exchange Networks for Hard 2D Tasks
2020cites this paper
Neural Arithmetic Units
2020cites this paper
iNALU: Improved Neural Arithmetic Logic Unit
2020cites this paper
Progress Extrapolating Algorithmic Learning to Arbitrary Sequence Lengths
2020cites this paper
Growing Neural Cellular Automata
2020cites this paper
Neural Game Engine: Accurate learning of generalizable forward models from pixels.
2020cites this paper
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
2020influential citation
Neural Status Registers
2020cites this paper
Visualizing the Impact of Feature Attribution Baselines
2020cites this paper
Switchblade - a Neural Network for Hard 2D Tasks
2020cites this paper
Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures
2020cites this paper
Image segmentation via Cellular Automata
2020cites this paper
Financial Fraud Detection with Improved Neural Arithmetic Logic Units
2020cites this paper
Performance analysis of SOTA Transformer Network in Numerical Expression Calculation
2020cites this paper
Learning Lower Bounds for Graph Exploration With Reinforcement Learning
2020cites this paper
Electroencephalogram (EEG) for Delineating Objective Measure of Autism Spectrum Disorder (ASD) (Extended Version)
2019cites this paper
Neural Shufﬂe-Exchange Networks − Sequence Processing in O( n log n ) Time
2019cites this paper
Deep Learning for Symbolic Mathematics
2019cites this paper
A longstanding tradition in machine learning opposes rule-based inference to statistical learning
2019cites this paper
Is Attention All What You Need? - An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
2019influential citation
On the Turing Completeness of Modern Neural Network Architectures
2019influential citation
Study of Recurrent Neural Network models used for evaluating Numerical Expressions
2019cites this paper
Towards Finding Longer Proofs
2019cites this paper
Measuring Arithmetic Extrapolation Performance
2019cites this paper
Structured learning and inference with neural networks and generative models by
2019cites this paper
Synthetic Data for Deep Learning
2019cites this paper
Attending to Mathematical Language with Transformers
2018cites this paper
Neural Arithmetic Expression Calculator
2018cites this paper
Evaluating the EEG and Eye Movements for Autism Spectrum Disorder
2018cites this paper
Training Neural Machines with Trace-Based Supervision
2018influential citation
T HINKING D EEPER W ITH R ECURRENT N ETWORKS : L OGICAL E XTRAPOLATION W ITHOUT O VERTHINKING
year unknowncites this paper
Algorithmic learning, a next step for AI. An application to arithmetic operations
year unknowncites this paper