A Methodology for Bilingual Lexicon Extraction from Comparable Corpora

Published 2015 in HyTra@ACL

ABSTRACT

Dictionary extraction using parallel corpora is well established. However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new approach which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between words and multiword-units. 3) Improving the quality of computed word translations by applying an interlingua approach, which, by relying on several pivot languages, allows an effective multi-dimensional cross-check. 4) We investigate that, by looking at foreign citations, language translations can even be derived from a single monolingual text corpus.

PUBLICATION RECORD

Publication year
2015
Venue
HyTra@ACL
Publication date
2015-07-01
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/W15-4108
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The CogALex-IV Shared Task on the Lexical Access Problem
2014cited by this paper
Extracting Multiword Translations from Aligned Comparable Documents
2014cited by this paper
Multiply-constrained semantic search in the Remote Associates Test.
2013cited by this paper
Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora
2013cited by this paper
Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge
2012cited by this paper
Identifying Word Translations from Comparable Documents Without a Seed Lexicon
2012cited by this paper
Rare Word Translation Extraction from Aligned Comparable Documents
2011cited by this paper
Likey: Unsupervised Language-Independent Keyphrase Extraction
2010cited by this paper
The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus
2010influential reference
A Linguistically Grounded Graph Model for Bilingual Lexicon Extraction
2010cited by this paper
JeuxDeMots and PtiClic: games for vocabulary assessment and lexical acquisition
2009cited by this paper
Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge
2009cited by this paper
Word sense disambiguation: A survey
2009cited by this paper
Automatic Dictionary Expansion Using Non-parallel Corpora
2008cited by this paper
The Computation of Associative Responses to Multiword Stimuli
2008cited by this paper
Learning Bilingual Lexicons from Monolingual Corpora
2008cited by this paper
SemEval-2007 Task 07: Coarse-Grained English All-Words Task
2007cited by this paper
Compiling French-Japanese Terminologies from the Web
2006cited by this paper
Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation
2006cited by this paper
A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora
2004cited by this paper
Word sense discovery based on sense descriptor dissimilarity
2003cited by this paper
Discovering word senses from text
2002cited by this paper
Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages
2002cited by this paper
Languages
2000cited by this paper
A statistical word-level translation model for comparable corpora
2000cited by this paper
Ethnologue: Languages of the World
1999influential reference
Automatic Identification of Word Translations from Unrelated English and German Corpora
1999cited by this paper
Identifying Word Translations in Non-Parallel Texts
1995cited by this paper
Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus
1995cited by this paper
Retrieving Collocations from Text: Xtract
1993cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
A Statistical Approach to Machine Translation
1990cited by this paper
Ethnologue Languages of the World
1988cited by this paper
Languages of the World
1977cited by this paper
Distributional Structure
1954cited by this paper

CITED BY

Exploring cross-lingual word embeddings for the inference of bilingual dictionaries
2019cites this paper