An Autoencoder Approach to Learning Bilingual Word Representations

A. Chandar,Stanislas Lauly,H. Larochelle,Mitesh M. Khapra,Balaraman Ravindran,V. Raykar,Amrita Saha

Published 2014 in Neural Information Processing Systems

ABSTRACT

Cross-language learning allows one to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. We empirically investigate the success of our approach on the problem of cross-language text classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). In experiments on 3 language pairs, we show that our approach achieves state-of-the-art performance, outperforming a method exploiting word alignments and a strong machine translation baseline.

PUBLICATION RECORD

  • Publication year

    2014

  • Venue

    Neural Information Processing Systems

  • Publication date

    2014-02-06

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-30 of 30 references · Page 1 of 1

CITED BY

Showing 1-100 of 348 citing papers · Page 1 of 4