An Autoencoder Approach to Learning Bilingual Word Representations

A. Chandar,Stanislas Lauly,H. Larochelle,Mitesh M. Khapra,Balaraman Ravindran,V. Raykar,Amrita Saha

Published 2014 in Neural Information Processing Systems

ABSTRACT

Cross-language learning allows one to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. We empirically investigate the success of our approach on the problem of cross-language text classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). In experiments on 3 language pairs, we show that our approach achieves state-of-the-art performance, outperforming a method exploiting word alignments and a strong machine translation baseline.

PUBLICATION RECORD

Publication year
2014
Venue
Neural Information Processing Systems
Publication date
2014-02-06
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1402.1454
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Synthesis Lectures on Human Language Technologies
2016cited by this paper
Improving Vector Space Word Representations Using Multilingual Correlation
2014cited by this paper
Multilingual Models for Compositional Distributed Semantics
2014cited by this paper
Learning Continuous Phrase Representations for Translation Modeling
2014cited by this paper
Learning Semantic Representations for the Phrase Translation Model
2013cited by this paper
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013influential reference
Synthesis Lectures on Human Language Technologies
2013cited by this paper
Exploiting Similarities among Languages for Machine Translation
2013cited by this paper
Parsing with Compositional Vector Grammars
2013cited by this paper
Multilingual Distributed Representations without Word Alignment
2013cited by this paper
Inducing Crosslingual Distributed Representations of Words
2012influential reference
A Neural Autoregressive Topic Model
2012cited by this paper
Sentiment Analysis and Opinion Mining
2012cited by this paper
Natural Language Processing (Almost) from Scratch
2011influential reference
Learning Discriminative Projections for Text Similarity Measures
2011cited by this paper
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
2011cited by this paper
Large-Scale Learning of Embeddings with Reconstruction Sampling
2011cited by this paper
Translingual Document Representations from Discriminative Projections
2010cited by this paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
2010cited by this paper
Cross-lingual Annotation Projection for Semantic Roles
2009cited by this paper
Book Review: Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper
2009cited by this paper
Co-Training for Cross-Lingual Sentiment Classification
2009cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
A Scalable Hierarchical Distributed Language Model
2008cited by this paper
Learning Multilingual Subjective Language via Cross-Lingual Projections
2007cited by this paper
Hierarchical Probabilistic Neural Network Language Model
2005cited by this paper
Europarl: A Parallel Corpus for Statistical Machine Translation
2005cited by this paper
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
2003cited by this paper
Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora
2001cited by this paper
Automatic Cross-Language Retrieval Using Latent Semantic Indexing
1997cited by this paper

CITED BY

Cross-lingual embedding methods and applications: A systematic review for low-resourced scenarios
2025cites this paper
Deep Autoencoder Neural Networks: A Comprehensive Review and New Perspectives
2025cites this paper
Debiasing Multilingual LLMs in Cross-lingual Latent Space
2025cites this paper
Need of UAVs and Physical Layer Security in Next-Generation Non-Terrestrial Wireless Networks: Potential Challenges and Open Issues
2025cites this paper
A study on hybrid-architecture deep learning model for predicting pressure distribution in 2D airfoils
2025cites this paper
Neural Methods for Data-to-text Generation
2024cites this paper
Research on cross-lingual multi-label patent classification based on pre-trained model
2024cites this paper
Scenario-Adaptive Key Establishment Scheme for LoRa-Enabled IoV Communications
2024cites this paper
Rank Reduction Autoencoders
2024cites this paper
Sentiments analysis for intelligent customer service dialogue using hybrid word embedding and stacking ensemble
2024cites this paper
Unsupervised semantic analysis and zero-shot learning of newsgroup topics
2024cites this paper
Cher at KSAA-CAD 2024: Compressing Words and Definitions into the Same Space for Arabic Reverse Dictionary
2024cites this paper
Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings
2024cites this paper
Research on knowledge extraction in knowledge graph construction
2023cites this paper
Sarcasm Detection followed by Sentiment Analysis for Bengali Language: Neural Network & Supervised Approach
2023cites this paper
Sparse Generative Embeddings of Handwritten Digits
2023cites this paper
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
2023cites this paper
Answer Prediction for Questions from Tamil and Hindi Passages
2023cites this paper
Super-resolution of large field of view infrared image based on residual convolutional auto-encoders
2023cites this paper
Unsupervised Parallel Sentences of Machine Translation for Asian Language Pairs
2022cites this paper
Diversified feature representation via deep auto-encoder ensemble through multiple activation functions
2022cites this paper
Joint embedding of biological networks for cross-species functional alignment
2022influential citation
Vehicle-Key: A Secret Key Establishment Scheme for LoRa-enabled IoV Communications
2022cites this paper
Stock market network based on bi-dimensional histogram and autoencoder
2022cites this paper
Word Embedding Methods for Word Representation in Deep Learning for Natural Language Processing
2022cites this paper
Constructing a health indicator for bearing degradation assessment via an unsupervised and enhanced stacked autoencoder
2022cites this paper
MULTILINGUAL DOCUMENT EMBEDDING WITH SEQUENTIAL NEURAL NETWORK MODELS
2022influential citation
Innovations in Neural Data-to-text Generation
2022cites this paper
Impact of Sentence Representation Matching in Neural Machine Translation
2022cites this paper
A Large Scale Document-Term Matching Method Based on Information Retrieval
2022cites this paper
Deep Learning in Sentiment Analysis: Recent Architectures
2022cites this paper
Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision
2022cites this paper
A Survey of Cross-Lingual Sentiment Analysis Based on Pre-A Survey of Cross-Lingual Sentiment Analysis Based on Pre-Trained Models Trained Models
2022cites this paper
Graph Learning Based Autoencoder for Hyperspectral Band Selection
2022cites this paper
A review on matrix completion for recommender systems
2022cites this paper
Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages
2022cites this paper
A Game for Crowdsourcing Adversarial Examples for False Information Detection
2022cites this paper
Latent representation learning based autoencoder for unsupervised feature selection in hyperspectral imagery
2021cites this paper
Clustering Monolingual Vocabularies to Improve Cross-Lingual Generalization
2021cites this paper
Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way
2021cites this paper
Can Jellyfish Dream? Conceptual Representations in Unsupervised Generative Learning
2021cites this paper
Biologically Feasible Generative Neural Network Architecture with Effective Concept Learning Capacity
2021cites this paper
A comparative study of neural machine translation models for Turkish language
2021cites this paper
Leveraging long short-term memory (LSTM)-based neural networks for modeling structure–property relationships of metamaterials from electromagnetic responses
2021cites this paper
Generating Topic-Preserving Synthetic News
2021cites this paper
English–Welsh Cross-Lingual Embeddings
2021cites this paper
Personalized tag recommendation via denoising auto-encoder
2021cites this paper
ProsoBeast Prosody Annotation Tool
2021cites this paper
Measuring associational thinking through word embeddings
2021cites this paper
Jointly learning bilingual word embeddings and alignments
2021influential citation
A Joint Training Framework for Open-World Knowledge Graph Embeddings
2021cites this paper
Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis
2021cites this paper
Bilingual Textual Similarity in Scientific Documents
2021influential citation
IELTS translation education corpus construction based on bilingual non-parallel data model
2021cites this paper
Comparing MultiLingual and Multiple MonoLingual Models for Intent Classification and Slot Filling
2021cites this paper
Reading the city through its neighbourhoods: Deep text embeddings of Yelp reviews as a basis for determining similarity and change
2020cites this paper
Variational Inference for Text Generation: Improving the Posterior
2020cites this paper
Auto-Key
2020cites this paper
Cross-lingual sentiment classification in low-resource Bengali language
2020cites this paper
‘MetaNETs’ - Accelerated discovery and design of photonic metamaterials using deep learning
2020cites this paper
CorrNet: Fine-Grained Emotion Recognition for Video Watching Using Wearable Physiological Sensors
2020cites this paper
Embedding for Anomaly Detection on Health Insurance Claims
2020cites this paper
Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection
2020cites this paper
Cross-lingual embedding for cross-lingual question retrieval in low-resource community question answering
2020cites this paper
Topic-Preserving Synthetic News Generation: An Adversarial Deep Reinforcement Learning Approach
2020cites this paper
Transformer based Multilingual document Embedding model
2020cites this paper
Cross-lingual sentiment classification in low-resource Bengali language
2020cites this paper
BUCC2020: Bilingual Dictionary Induction using Cross-lingual Embedding
2020cites this paper
Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization
2020cites this paper
Deep Cooperative Reconstruction with Security Constraints in multi-view environments
2020cites this paper
Time-Aware User Embeddings as a Service
2020cites this paper
Cross-lingual text similarity exploiting neural machine translation models
2020cites this paper
LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items
2020cites this paper
Extracting degradation trends for roller bearings by using a moving-average stacked auto-encoder and a novel exponential function
2020cites this paper
Research on Traffic Acoustic Event Detection Algorithm Based on Sparse Autoencoder
2020cites this paper
Autoencoders for strategic decision support
2020cites this paper
Multi-objective variational autoencoder: an application for smart infrastructure maintenance
2020cites this paper
Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment Classification
2020cites this paper
Wasserstein GAN based on Autoencoder with back-translation for cross-lingual embedding mappings
2020cites this paper
Latent Modelling of Urban Data: Enriching Computational Analysis in Urban Studies by Applying Novel Methods
2020cites this paper
Artificial Intelligence and Language
2020cites this paper
Diagnosis and Analysis of Celiac Disease and Environmental Enteropathy on Biopsy Images using Deep Learning Approaches
2020cites this paper
Canonical Correlation Analysis With L2,1-Norm for Multiview Data Representation
2020cites this paper
CrowDEA: Multi-view Idea Prioritization with Crowds
2020cites this paper
A Comprehensive Survey of Multilingual Neural Machine Translation
2020cites this paper
Exploiting Comparable Corpora to Enhance Bilingual Lexicon Induction from Monolingual Corpora
2020cites this paper
Constructing a health indicator for roller bearings by using a stacked auto-encoder with an exponential function to eliminate concussion
2020cites this paper
Explorations in Word Embeddings : graph-based word embedding learning and cross-lingual contextual word embedding learning. (Explorations de plongements lexicaux : apprentissage de plongements à base de graphes et apprentissage de plongements contextuels multilingues)
2019cites this paper
MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation
2019cites this paper
A Deep Neural Network Framework for English Hindi Question Answering
2019cites this paper
ADEPOS: A Novel Approximate Computing Framework for Anomaly Detection Systems and its Implementation in 65-nm CMOS
2019cites this paper
Expert2Vec: Distributed Expert Representation Learning in Question Answering Community
2019cites this paper
Enhancing Phrase-Based Statistical Machine Translation by Learning Phrase Representations Using Long Short-Term Memory Network
2019cites this paper
A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings
2019cites this paper
Hierarchical Prototype Learning for Zero-Shot Recognition
2019cites this paper
Multiview Deep Learning
2019cites this paper
ProLFA: Representative Prototype Selection for Local Feature Aggregation
2019cites this paper
CrossLang: the system of cross-lingual plagiarism detection
2019cites this paper
Multi-Objective Autoencoder for Fault Detection and Diagnosis in Higher-Order Data
2019cites this paper
Fast and Accurate Bilingual Lexicon Induction via Matching Optimization
2019cites this paper