Continuous Learning in a Hierarchical Multiscale Neural Network

Thomas Wolf,Julien Chaumond,Clement Delangue

Published 2018 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.

PUBLICATION RECORD

Publication year
2018
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2018-05-01
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/p18-2001 arXiv 1805.05758
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Deep Contextualized Word Representations
2018cited by this paper
Universal Language Model Fine-tuning for Text Classification
2018cited by this paper
Nested LSTMs
2018cited by this paper
Fine-tuned Language Models for Text Classification
2018cited by this paper
Frustratingly Short Attention Spans in Neural Language Modeling
2017cited by this paper
Regularizing and Optimizing LSTM Language Models
2017cited by this paper
Dynamic Evaluation of Neural Sequence Models
2017influential reference
Learning to Generate Reviews and Discovering Sentiment
2017cited by this paper
Learning to learn by gradient descent by gradient descent
2016influential reference
Using Fast Weights to Attend to the Recent Past
2016influential reference
HyperNetworks
2016influential reference
Hierarchical Multiscale Recurrent Neural Networks
2016influential reference
Optimization as a Model for Few-Shot Learning
2016cited by this paper
Pointer Sentinel Mixture Models
2016cited by this paper
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
2016cited by this paper
Recurrent Memory Networks for Language Modeling
2016cited by this paper
Memory-Efficient Backpropagation Through Time
2016cited by this paper
Learning to Optimize
2016cited by this paper
Overcoming catastrophic forgetting in neural networks
2016cited by this paper
A Neural Attention Model for Abstractive Sentence Summarization
2015cited by this paper
A Clockwork RNN
2014cited by this paper
Neural Turing Machines
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Continuous Space Translation Models for Phrase-Based Statistical Machine Translation
2012cited by this paper
Deep Neural Network Language Models
2012cited by this paper
Persistent Activity in Neural Networks with Dynamic Synapses
2007cited by this paper
Synaptic computation
2004cited by this paper
Catastrophic forgetting in connectionist networks.
1999cited by this paper
Neural Networks with Dynamic Synapses
1998cited by this paper
Long Short-Term Memory
1997cited by this paper
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
1995cited by this paper
Uniqueness of the weights for minimal feedforward nets with a given input-output map
1992cited by this paper
Learning Complex, Extended Sequences Using the Principle of History Compression
1992cited by this paper

CITED BY

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline
2022cites this paper
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges
2022cites this paper
Continual Learning for Recurrent Neural Networks: an Empirical Evaluation
2021cites this paper
Sequential Domain Adaptation through Elastic Weight Consolidation for Sentiment Analysis
2020cites this paper
Continual learning in recurrent neural networks
2020cites this paper
Continual Learning Long Short Term Memory
2020cites this paper