Domain Adaptation for Statistical Machine Translation

Published 2018 in arXiv.org

ABSTRACT

Statistical machine translation (SMT) systems perform poorly when it is applied to new target domains. Our goal is to explore domain adaptation approaches and techniques for improving the translation quality of domain-specific SMT systems. However, translating texts from a specific domain (e.g., medicine) is full of challenges. The first challenge is ambiguity. Words or phrases contain different meanings in different contexts. The second one is language style due to the fact that texts from different genres are always presented in different syntax, length and structural organization. The third one is the out-of-vocabulary words (OOVs) problem. In-domain training data are often scarce with low terminology coverage. In this thesis, we explore the state-of-the-art domain adaptation approaches and propose effective solutions to address those problems.

PUBLICATION RECORD

Publication year
2018
Venue
arXiv.org
Publication date
2018-04-05
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 1804.01760
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Translating Pro-Drop Languages with Reconstruction Models
2018cited by this paper
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
2018cited by this paper
Conference Papers
2018cited by this paper
Exploiting Cross-Sentence Context for Neural Machine Translation
2017cited by this paper
A novel and robust approach for pro-drop language translation
2017influential reference
Semantics-Enhanced Task-Oriented Dialogue Translation: A Case Study on Hotel Booking
2017cited by this paper
A Novel Approach to Dropped Pronoun Translation
2016cited by this paper
Automatic Construction of Discourse Corpora for Dialogue Translation
2016cited by this paper
Dropped pronoun generation for dialogue machine translation
2016influential reference
PRAGUE, CZECH REPUBLIC
2016cited by this paper
The DCU Discourse Parser for Connective, Argument Identification and Explicit Sense Classification
2015influential reference
An Empirical Study of Smoothing Techniques for Language Modeling
2015cited by this paper
The DCU Discourse Parser: A Sense Classification Task
2015influential reference
Linguistically-augmented perplexity-based data selection for language models
2015cited by this paper
Effective Hypotheses Re-ranking Model in Statistical Machine Translation
2014influential reference
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
2014cited by this paper
Domain Adaptation for Medical Text Translation using Web Resources
2014influential reference
Statistical Machine Translation
2014cited by this paper
Combining Domain Adaptation Approaches for Medical Text Translation
2014influential reference
Factored Statistical Machine Translation for Grammatical Error Correction
2014influential reference
Adaptation of machine translation for multilingual information retrieval in the medical domain
2014cited by this paper
A Systematic Comparison of Data Selection Criteria for SMT Domain Adaptation
2014influential reference
Cross-Language Information Retrieval
2014influential reference
Data Selection via Semi-supervised Recursive Autoencoders for SMT Domain Adaptation
2014cited by this paper
An Experimental Platform for Cross-Language Document Retrieval
2013influential reference
Vector Space Model for Adaptation in Statistical Machine Translation
2013cited by this paper
Domain adaptation for translation models in statistical machine translation
2013cited by this paper
Synthesis Lectures on Human Language Technologies
2013cited by this paper
The CNGL-DCU-Prompsit Translation Systems for WMT13
2013cited by this paper
Free/Open-Source Rule-Based Machine Translation
2013cited by this paper
UM-Checker: A Hybrid System for English Grammatical Error Correction
2013cited by this paper
iCPE: A Hybrid Data Selection Model for SMT Domain Adaptation
2013influential reference
Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
2013cited by this paper
Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity
2013cited by this paper
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
2013cited by this paper
A New State-of-The-Art Czech Named Entity Recognizer
2013cited by this paper
The Joy of Parallelism with CzEng 1.0
2012cited by this paper
Domain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study
2012cited by this paper
An Improvement in Cross-Language Document Retrieval Based on Statistical Models
2012cited by this paper
CRFs-Based Chinese Word Segmentation for Micro-Blog with Small-Scale Data
2012influential reference
Parallel Data, Tools and Interfaces in OPUS
2012cited by this paper
A simple and effective weighted phrase extraction for machine translation adaptation
2012cited by this paper
Topic Models for Dynamic Translation Model Adaptation
2012cited by this paper
Approximate Sentence Retrieval for Scalable and Efficient Example-Based Machine Translation
2012cited by this paper
A Joint Chinese Named Entity Recognition and Disambiguation System
2012cited by this paper
FreeLing 3.0: Towards Wider Multilinguality
2012cited by this paper
TQDL: Integrated Models for Cross-Language Document Retrieval
2012cited by this paper
Automatic acquisition of named entities for rule-based machine translation
2011cited by this paper
Domain Adaptation for Machine Translation by Mining Unseen Words
2011cited by this paper
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation
2011cited by this paper
DOCUMENT TRANSLATION RETRIEVAL BASED ON STATISTICAL MACHINE TRANSLATION TECHNIQUES
2011cited by this paper
Experiments on Domain Adaptation for Patent Machine Translation in the PLuTO project
2011cited by this paper
Training Machine Translation with a Second-Order Taylor Approximation of Weighted Translation Instances
2011cited by this paper
Domain Adaptation via Pseudo In-Domain Data Selection
2011influential reference
KenLM: Faster and Smaller Language Model Queries
2011cited by this paper
MultiUN: A Multilingual Corpus from United Nation Documents
2010influential reference
TectoMT: Modular NLP Framework
2010cited by this paper
Lingua-Align: An Experimental Toolbox for Automatic Tree-to-Tree Alignment
2010cited by this paper
Convergence of Translation Memory and Statistical Machine Translation
2010cited by this paper
Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation
2010cited by this paper
Intelligent Selection of Language Model Training Data
2010influential reference
Feature-Rich Discriminative Phrase Rescoring for SMT
2010cited by this paper
Boilerplate detection using shallow text features
2010cited by this paper
Master's Thesis Institute of Formal and Applied Linguistics
2010cited by this paper
Combining Content-Based and URL-Based Heuristics to Harvest Aligned Bitexts from Multilingual Sites with Bitextor
2010cited by this paper
Discriminative Corpus Weight Estimation for Machine Translation
2009cited by this paper
News from OPUS — A collection of multilingual parallel corpora with tools and interfaces
2009cited by this paper
Web page classification: Features and algorithms
2009cited by this paper
Sentence Boundary Detection and the Problem with the U.S.
2009influential reference
Domain Adaptation for Statistical Machine Translation with Monolingual Resources
2009cited by this paper
WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content
2008influential reference
Parallel Implementations of Word Alignment Tool
2008influential reference
IRSTLM: an open source toolkit for handling large scale language models
2008influential reference
Investigations on large-scale lightly-supervised training for statistical machine translation.
2008cited by this paper
Mixture-Model Adaptation for SMT
2007cited by this paper
Factored Translation Models
2007cited by this paper
Capturing practical natural language transformations
2007cited by this paper
Moses: Open Source Toolkit for Statistical Machine Translation
2007influential reference
Transductive learning for statistical machine translation
2007cited by this paper
Domain Adaptation in Statistical Machine Translation with Mixture Modelling
2007cited by this paper
Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions
2007cited by this paper
Improving Statistical Machine Translation Performance by Training Data Selection and Optimization
2007influential reference
Word-Based Alignment, Phrase-Based Translation: What’s the Link?
2006cited by this paper
Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
2006cited by this paper
Automatic Acquisition of Chinese-English Parallel Corpus from the Web
2006cited by this paper
Dependency treelet translation: the convergence of statistical and example-based machine-translation?
2006cited by this paper
Comparing example-based and statistical machine translation
2005cited by this paper
Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
2005cited by this paper
Adaptation of the translation model for statistical machine translation based on information retrieval
2005cited by this paper
Europarl: A Parallel Corpus for Statistical Machine Translation
2005influential reference
PEBL: Web page classification without negative examples
2004cited by this paper
Example-based controlled translation
2004cited by this paper
Modelling highly inflected languages
2004cited by this paper
The Alignment Template Approach to Statistical Machine Translation
2004cited by this paper
Language Model Adaptation for Statistical Machine Translation via Structured Query Models
2004cited by this paper
Building parallel corpora by automatic title alignment using length-based and text-based approaches
2004cited by this paper
Discovering Parallel Text from the World Wide Web
2004cited by this paper
Effective Phrase Translation Extraction from Alignment Models
2003cited by this paper
Chunk-Based Statistical Translation
2003cited by this paper
Minimum Error Rate Training in Statistical Machine Translation
2003influential reference

CITED BY

Improving Language Model Integration for Neural Machine Translation
2023cites this paper
Multi-Domain Adaptation in Neural Machine Translation Through Multidimensional Tagging
2021cites this paper
Joint Training for Neural Machine Translation
2019cites this paper
What Level of Quality can Neural Machine Translation Attain on Literary Text?
2018cites this paper
Parallel fragments : Measuring their impact on translation performance
2017cites this paper
Building Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System
2017cites this paper
Building a Neural Machine Translation System Using Only Synthetic Parallel Data
2017cites this paper
Semi-Supervised Learning for Neural Machine Translation
2016influential citation