Multilingual Distributed Representations without Word Alignment

Published 2013 in International Conference on Learning Representations

ABSTRACT

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic representations can successfully be applied to a number of monolingual applications such as sentiment analysis. At the same time, there has been some initial success in work on learning shared word-level representations across languages. We combine these two approaches by proposing a method for learning distributed representations in a multilingual setup. Our model learns to assign similar embeddings to aligned sentences and dissimilar ones to sentence which are not aligned while not requiring word alignments. We show that our representations are semantically informative and apply them to a cross-lingual document classification task where we outperform the previous state of the art. Further, by employing parallel corpora of multiple language pairs we find that our model learns representations that capture semantic relationships across languages for which no parallel data was used.

PUBLICATION RECORD

Publication year
2013
Venue
International Conference on Learning Representations
Publication date
2013-12-20
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 1312.6173
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Multiscale recurrent neural network based language model
2015cited by this paper
Learning Multilingual Word Representations using a Bag-of-Words Autoencoder
2014cited by this paper
Recurrent Convolutional Neural Networks for Discourse Compositionality
2013cited by this paper
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013cited by this paper
Exploiting Similarities among Languages for Machine Translation
2013cited by this paper
The Role of Syntax in Vector Space Models of Compositional Semantics
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Multilingual Deep Learning
2013cited by this paper
Inducing Crosslingual Distributed Representations of Words
2012influential reference
Semantic Compositionality through Recursive Matrix-Vector Spaces
2012influential reference
Multimodal learning with deep Boltzmann machines
2012cited by this paper
Experimental Support for a Categorical Compositional Distributional Model of Meaning
2011cited by this paper
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
2011cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Mathematical Foundations for a Compositional Distributional Model of Meaning
2010cited by this paper
Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space
2010cited by this paper
cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models
2010cited by this paper
From Frequency to Meaning: Vector Space Models of Semantics
2010cited by this paper
Recurrent neural network based language model
2010cited by this paper
Learning Bilingual Lexicons from Monolingual Corpora
2008cited by this paper
A Structured Vector Space Model for Word Meaning in Context
2008cited by this paper
A unified architecture for natural language processing: deep neural networks with multitask learning
2008cited by this paper
Vector-based Models of Semantic Composition
2008cited by this paper
Combining Symbolic and Distributional Models of Meaning
2007cited by this paper
Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora
2007cited by this paper
Distance Metric Learning for Large Margin Nearest Neighbor Classification
2005cited by this paper
Grounded spoken language acquisition: experiments in word learning
2003cited by this paper
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
2002cited by this paper
Précis of How Children Learn the Meanings of Words
2001cited by this paper
A Synopsis of Linguistic Theory, 1930-1955
1957cited by this paper

CITED BY

Joint Multimodal Contrastive Learning for Robust Spoken Term Detection and Keyword Spotting
2025cites this paper
Research on cross-lingual multi-label patent classification based on pre-trained model
2024cites this paper
Towards Invisible Backdoor Attacks in the Frequency Domain against Deep Neural Networks
2023cites this paper
Stealthy Low-frequency Backdoor Attack against Deep Neural Networks
2023cites this paper
A Survey on Attacks and Their Countermeasures in Deep Learning: Applications in Deep Neural Networks, Federated, Transfer, and Deep Reinforcement Learning
2023cites this paper
Mitigating Adversarial Attacks using Pruning
2023cites this paper
Deep Learning in Sentiment Analysis: Recent Architectures
2022cites this paper
Word Embedding Methods for Word Representation in Deep Learning for Natural Language Processing
2022cites this paper
Computational linguistics and Natural Language Processing
2022cites this paper
Self-Supervised Speech Representation Learning: A Review
2022cites this paper
A multitarget backdooring attack on deep neural networks with random location trigger
2021cites this paper
Measuring associational thinking through word embeddings
2021cites this paper
Discriminative Multimodal Learning via Conditional Priors in Generative Models
2021influential citation
Semantic Extraction for Sentence Representation via Reinforcement Learning
2021cites this paper
Towards Learning Language Agnostic Features for NLP in Low-resource Languages
2021cites this paper
Intelligent Machine Translation of Chinese English News in the Context of Communication Observations and a Priori Modeling
2021cites this paper
A Chinese-Thai Cross-language Word Embedding Method Based on Unequal Corpus of Small Dictionaries
2021cites this paper
Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part I
2020cites this paper
Multimodal One-Shot Learning of Speech and Images
2020cites this paper
Approximate Inference in Variational Autoencoders
2020cites this paper
Direct multimodal few-shot learning of speech and images
2020cites this paper
CCA-Flow: Deep Multi-view Subspace Learning with Inverse Autoregressive Flow
2020cites this paper
The Geometry of Distributed Representations for Better Alignment, Attenuated Bias, and Improved Interpretability
2020cites this paper
A Feature-Based On-Line Detector to Remove Adversarial-Backdoors by Iterative Demarcation
2020cites this paper
Cross-lingual Word Embeddings beyond Zero-shot Machine Translation
2020cites this paper
Exploiting Comparable Corpora to Enhance Bilingual Lexicon Induction from Monolingual Corpora
2020cites this paper
A Unified Framework for Detecting Audio Adversarial Examples
2020cites this paper
Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images
2020cites this paper
Detecting the Most Insightful Parts of Documents Using a Regularized Attention-Based Model
2020cites this paper
Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part III
2020cites this paper
Recurrent Neural Audiovisual Word Embeddings for Synchronized Speech and Real-Time Mri
2020cites this paper
A multi-view approach for Mandarin non-native mispronunciation verification
2020influential citation
Universal Vector Neural Machine Translation With Effective Attention
2020cites this paper
Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks
2020cites this paper
Learning Robust Representations via Multi-View Information Bottleneck
2020cites this paper
AI-Chatbot Using Deep Learning to Assist the Elderly
2019cites this paper
Everything old is new again: A multi-view learning approach to learning using privileged information and distillation
2019cites this paper
Assessing kinetic meaning of music and dance via deep cross-modal retrieval
2019cites this paper
Best Practices for Learning Domain-Specific Cross-Lingual Embeddings
2019cites this paper
Char-RNN for Word Stress Detection in East Slavic Languages
2019cites this paper
Compositional representations of language structures in multilingual joint-vector space
2019influential citation
Corpus-based approaches to Igbo diacritic restoration
2019cites this paper
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks
2019cites this paper
Robust Cross-lingual Embeddings from Parallel Sentences
2019cites this paper
Cross-Lingual and Low-Resource Sentiment Analysis
2019cites this paper
Discrete Argument Representation Learning for Interactive Argument Pair Identification
2019cites this paper
An Instruction Embedding Model for Binary Code Analysis
2019cites this paper
On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval
2019cites this paper
On the use of prior and external knowledge in neural sequence models
2019cites this paper
Image search using multilingual texts: a cross-modal learning approach between image and text Maxime Portaz Qwant Research
2019cites this paper
Exploring Crosslingual Word Embeddings for Semantic Classification in Text and Dialogue
2019influential citation
Harnessing sense-level information for semantically augmented knowledge extraction
2018cites this paper
An efficient framework for learning sentence representations
2018cites this paper
Acoustic Feature Learning Using Cross-Domain Articulatory Measurements
2018cites this paper
An Adversarial Joint Learning Model for Low-Resource Language Semantic Textual Similarity
2018cites this paper
Lowering the Cost of Improved Cross-Lingual Sentiment Analysis by
2018cites this paper
Finely Tuned, 2 Billion Token Based Word Embeddings for Portuguese
2018cites this paper
A neural generative autoencoder for bilingual word embeddings
2018cites this paper
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
2018cites this paper
Embedding Learning Through Multilingual Concept Induction
2018influential citation
Absolute Orientation for Word Embedding Alignment
2018cites this paper
Characterizing Departures from Linearity in Word Translation
2018cites this paper
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
2018cites this paper
Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps
2018cites this paper
Attention-based neural network for short-text question answering
2018cites this paper
Person-Job Fit
2018cites this paper
NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings
2018cites this paper
Cross-lingual Lexical Sememe Prediction
2018cites this paper
Multimodal One-shot Learning of Speech and Images
2018influential citation
A Stronger Baseline for Multilingual Word Embeddings
2018influential citation
Correlation Networks for Speaker Normalization in Automatic Speech Recognition
2018cites this paper
Transferred Embeddings for Igbo Similarity, Analogy, and Diacritic Restoration Tasks
2018cites this paper
Using Communities of Words Derived from Multilingual Word Vectors for Cross-Language Information Retrieval in Indian Languages
2018influential citation
Computational models for multilingual negation scope detection
2018cites this paper
A Universal Semantic Space
2018cites this paper
Time Series Forecasting with Recurrent Neural Networks in Presence of Missing Data
2018cites this paper
Statistical Machine Translation of English Text to API Code Usages :
2018cites this paper
32 15 v 3 [ cs . C L ] 1 4 D ec 2 01 4 Sequence to Sequence Learning with Neural Networks
2018cites this paper
Closed form word embedding alignment
2018cites this paper
Unsupervised Parallel Sentence Extraction from Comparable Corpora
2018cites this paper
Multilingual Embeddings Jointly Induced from Contexts and Concepts: Simple, Strong and Scalable
2018cites this paper
Learning Knowledge Representation Across Knowledge Graphs
2017cites this paper
A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
2017cites this paper
A Survey of Cross-lingual Word Embedding Models
2017influential citation
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
2017cites this paper
Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction
2017cites this paper
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
2017cites this paper
Adversarial Training for Unsupervised Bilingual Lexicon Induction
2017cites this paper
Cross-lingual sentiment transfer with limited resources
2017cites this paper
Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
2017influential citation
Common Representation Learning Using Step-Based Correlation Multi-modal CNN
2017cites this paper
The representational geometry of word meanings acquired by neural machine translation models
2017influential citation
Toponym Resolution with Deep Neural Networks
2017cites this paper
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
2017cites this paper
Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context
2017cites this paper
Classics of Deep Learning Approach for Human Behaviour Ontology: A Survey
2017cites this paper
Natural language processing for resource-poor languages
2017influential citation
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text
2017cites this paper
Medical Incident Report Classification using Context-based Word Embeddings
2017cites this paper
Mining and Learning from Multilingual Text Collections using Topic Models and Word Embeddings. (Explorer et Apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds )
2017cites this paper