A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

Published 2019 in arXiv.org

ABSTRACT

This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary problem. An extensive empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations, obtaining competitive results for Semantic Textual Similarity and state-of-the-art performance for word similarity and POS tagging (English and Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities. In other words, we can leverage pre-trained source embeddings from a resource-rich language in order to improve the word representations for under-resourced languages.

PUBLICATION RECORD

Publication year
2019
Venue
arXiv.org
Publication date
2019-10-29
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 2001.06381
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
2020influential reference
A Survey of the Usages of Deep Learning for Natural Language Processing
2020cited by this paper
Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Text Classification Algorithms: A Survey
2019cited by this paper
Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019cited by this paper
A Compare-Aggregate Model with Latent Clustering for Answer Selection
2019cited by this paper
XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019cited by this paper
Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
2019cited by this paper
Pooled Contextualized Embeddings for Named Entity Recognition
2019influential reference
Choosing Transfer Languages for Cross-Lingual Learning
2019cited by this paper
CoQA: A Conversational Question Answering Challenge
2018cited by this paper
Near or Far, Wide Range Zero-Shot Cross-Lingual Dependency Parsing
2018cited by this paper
Zero-shot Neural Transfer for Cross-lingual Entity Linking
2018cited by this paper
Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
2018cited by this paper
Deep Contextualized Word Representations
2018cited by this paper
Neural Cross-Lingual Named Entity Recognition with Minimal Resources
2018cited by this paper
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations
2018influential reference
Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
2018influential reference
Bilingual Embeddings with Random Walks over Multilingual Wordnets
2018influential reference
Improving Cross-Lingual Word Embeddings by Meeting in the Middle
2018cited by this paper
Contextual String Embeddings for Sequence Labeling
2018cited by this paper
Learning Word Meta-Embeddings by Autoencoding
2018influential reference
Rapid Adaptation of Neural Machine Translation to New Languages
2018cited by this paper
Neural Factor Graph Models for Cross-lingual Morphological Tagging
2018cited by this paper
Estudio de word embeddings y métodos de generación de meta embeddings
2018cited by this paper
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
2018influential reference
Building Named Entity Recognition Taggers via Parallel Corpora
2018cited by this paper
A Survey of Word Embeddings Evaluation Methods
2018influential reference
Word Translation Without Parallel Data
2017influential reference
Convolutional Sequence to Sequence Learning
2017cited by this paper
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
2017cited by this paper
SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity
2017cited by this paper
A simple neural network module for relational reasoning
2017cited by this paper
Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling
2017cited by this paper
Advances in Pre-Training Distributed Word Representations
2017cited by this paper
Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
2017cited by this paper
Gated Self-Matching Networks for Reading Comprehension and Question Answering
2017cited by this paper
HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate Semantic Textual Similarity
2017influential reference
Simple and Effective Dimensionality Reduction for Word Embeddings
2017influential reference
Attention is All you Need
2017cited by this paper
Learning bilingual word embeddings with (almost) no bilingual data
2017cited by this paper
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
2017cited by this paper
Think Globally, Embed Locally - Locally Linear Meta-embedding of Words
2017influential reference
ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge
2017influential reference
SupWSD: A Flexible Toolkit for Supervised Word Sense Disambiguation
2017cited by this paper
A Survey of Cross-lingual Word Embedding Models
2017cited by this paper
Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation
2017cited by this paper
On the Role of Seed Lexicons in Learning Bilingual Word Embeddings
2016influential reference
Learning Word Meta-Embeddings
2016influential reference
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
2016cited by this paper
Bilingual Word Embeddings from Parallel and Non-parallel Corpora for Cross-Language Text Classification
2016cited by this paper
Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet
2016influential reference
Word Embedding Evaluation and Combination
2016cited by this paper
Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations
2016cited by this paper
context2vec: Learning Generic Context Embedding with Bidirectional LSTM
2016cited by this paper
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
2016influential reference
Cross-lingual Models of Word Embeddings: An Empirical Comparison
2016cited by this paper
Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t.
2016cited by this paper
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
2016influential reference
Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings
2016cited by this paper
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
2016cited by this paper
Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations
2016cited by this paper
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
2016cited by this paper
Dependency Based Embeddings for Sentence Classification Tasks
2016cited by this paper
Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
2016cited by this paper
Random Walks and Neural Network Language Models on Knowledge Bases
2015influential reference
Deep Multilingual Correlation for Improved Word Embeddings
2015cited by this paper
Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations
2015cited by this paper
Joint Word Representation Learning Using a Corpus and a Semantic Lexicon
2015influential reference
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation
2015cited by this paper
Deep Neural Language Models for Machine Translation
2015cited by this paper
Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning
2015cited by this paper
End-to-end learning of semantic role labeling using recurrent neural networks
2015cited by this paper
The proof and measurement of association between two things.
2015cited by this paper
SensEmbed: Learning Sense Embeddings for Word and Relational Similarity
2015cited by this paper
From Paraphrase Database to Compositional Paraphrase Model and Back
2015cited by this paper
Not All Neural Embeddings are Born Equal
2014cited by this paper
WordRep: A Benchmark for Research on Learning Word Representations
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
An Unsupervised Model for Instance Level Subcategorization Acquisition
2014cited by this paper
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
2014cited by this paper
Improving Vector Space Word Representations Using Multilingual Correlation
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
2014cited by this paper
Linguistic Regularities in Continuous Space Word Representations
2013cited by this paper
PPDB: The Paraphrase Database
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Cross-lingual Transfer of Semantic Role Labeling Models
2013cited by this paper
Multimodal Distributional Semantics
2013cited by this paper
Exploiting Similarities among Languages for Machine Translation
2013cited by this paper
Better Word Representations with Recursive Neural Networks for Morphology
2013cited by this paper
Large-scale learning of word relatedness with constraints
2012influential reference
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
2012cited by this paper
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
2012cited by this paper
Roget's thesaurus and semantic similarity
2012cited by this paper
SemEval-2012 Task 2: Measuring Degrees of Relational Similarity
2012cited by this paper
How we BLESSed distributional semantic evaluation
2011cited by this paper

CITED BY

Bridging Natural Language Processing and Psycholinguistics: computationally grounded semantic similarity datasets for Basque and Spanish
2023influential citation
Bridging Natural Language Processing and Psycholinguistics: computationally grounded semantic similarity and relatedness datasets for Basque and Spanish
2023influential citation
Advances in monolingual and crosslingual automatic disability annotation in Spanish
2023cites this paper
A Survey on Word Meta-Embedding Learning
2022cites this paper
Automatic Taxonomy Classification by Pretrained Language Model
2021cites this paper
Benchmarking Meta-embeddings: What Works and What Does Not
2021influential citation
Learning Efficient Task-Specific Meta-Embeddings with Word Prisms
2020cites this paper