Learning Multilingual Word Representations using a Bag-of-Words Autoencoder

Stanislas Lauly,A. Boulanger,H. Larochelle

Published 2014 in arXiv.org

ABSTRACT

Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. The autoencoder is trained to reconstruct the bag-of-word representation of given sentence from an encoded representation extracted from its translation. We evaluate our approach on a multilingual document classification task, where labeled data is available only for one language (e.g. English) while classification must be performed in a different language (e.g. French). In our experiments, we observe that our method compares favorably with a previously proposed method that exploits word-level alignments to learn word representations.

PUBLICATION RECORD

Publication year
2014
Venue
arXiv.org
Publication date
2014-01-08
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1401.1803
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning Semantic Representations for the Phrase Translation Model
2013cited by this paper
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013influential reference
Exploiting Similarities among Languages for Machine Translation
2013cited by this paper
A Neural Autoregressive Topic Model
2012cited by this paper
Inducing Crosslingual Distributed Representations of Words
2012influential reference
Large-Scale Learning of Embeddings with Reconstruction Sampling
2011cited by this paper
Natural Language Processing (Almost) from Scratch
2011cited by this paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
2010cited by this paper
Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization
2009influential reference
Visualizing Data using t-SNE
2008cited by this paper
A Scalable Hierarchical Distributed Language Model
2008cited by this paper
Hierarchical Probabilistic Neural Network Language Model
2005cited by this paper

CITED BY

A novel multi-layer feature fusion-based BERT-CNN for sentence representation learning and classification
2023cites this paper
Learning Sense Embeddings from Definitions in Dictionaries
2022cites this paper
Learning Sense Embeddings from Dictionary Definition
2022cites this paper
Dependency Based Bilingual word Embeddings without word alignment
2020cites this paper
Hierarchical Mapping for Crosslingual Word Embedding Alignment
2020cites this paper
Research on the application of Naive Bayes and Support Vector Machine algorithm on exercises Classification
2020cites this paper
Sharing is Caring: Exploring machine learning-enabled methods for regional medical imaging exchange using procedure metadata.
2020cites this paper
Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
2020cites this paper
Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection
2020cites this paper
En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects
2019cites this paper
Unsupervised Evaluation of Human Translation Quality
2019cites this paper
Morphological segmentation for extracting Spanish-Nahuatl bilingual lexicon
2019cites this paper
Compositional representations of language structures in multilingual joint-vector space
2019cites this paper
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
2018cites this paper
Cross-lingual document similarity estimation and dictionary generation with comparable corpora
2018cites this paper
A Multi-task Approach to Learning Multilingual Representations
2018cites this paper
Person-Job Fit
2018cites this paper
Computational models for multilingual negation scope detection
2018cites this paper
Mining and Learning from Multilingual Text Collections using Topic Models and Word Embeddings. (Explorer et Apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds )
2017cites this paper
Low-resource bilingual lexicon extraction using graph based word embeddings
2017cites this paper
Cross-language plagiarism detection over continuous-space representations of language
2017influential citation
Continuous space models for CLIR
2017cites this paper
Cross-view Embeddings for Information Retrieval
2017cites this paper
A survey of cross-lingual embedding models
2017cites this paper
A Survey of Cross-lingual Word Embedding Models
2017cites this paper
Feature extraction using Latent Dirichlet Allocation and Neural Networks: A case study on movie synopses
2016cites this paper
Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences
2016cites this paper
Lightweight Random Indexing for Polylingual Text Classification
2016cites this paper
Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language
2016cites this paper
Bilingual lexicon extraction for a distant language pair using a small parallel corpus
2015cites this paper
Semi-automatic Filtering of Translation Errors in Triangle Corpus
2015cites this paper
Judgment Language Matters: Towards Judgment Language Informed Vector Space Modeling
2015cites this paper
Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling
2015cites this paper
Learning to Represent Words in Context with Multilingual Supervision
2015cites this paper
Judgment Language Matters: Multilingual Vector Space Models for Judgment Language Aware Lexical Semantics
2015cites this paper
Distributed representations for compositional semantics
2014influential citation
Learning Bilingual Word Representations by Marginalizing Alignments
2014cites this paper
A Deep Architecture for Semantic Parsing
2014cites this paper
Multilingual Models for Compositional Distributed Semantics
2014cites this paper
Multilingual Distributed Representations without Word Alignment
2013cites this paper