A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Quoc V. Le,N. Jaitly,Geoffrey E. Hinton

Published 2015 in arXiv.org

ABSTRACT

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

PUBLICATION RECORD

  • Publication year

    2015

  • Venue

    arXiv.org

  • Publication date

    2015-04-03

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-39 of 39 references · Page 1 of 1

CITED BY

Showing 1-100 of 752 citing papers · Page 1 of 8