Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation

Published 2009 in Conference of the European Chapter of the Association for Computational Linguistics

ABSTRACT

We introduce a word segmentation approach to languages where word boundaries are not orthographically marked, with application to Phrase-Based Statistical Machine Translation (PB-SMT). Instead of using manually segmented monolingual domain-specific corpora to train segmenters, we make use of bilingual corpora and statistical word alignment techniques. First of all, our approach is adapted for the specific translation task at hand by taking the corresponding source (target) language into account. Secondly, this approach does not rely on manually segmented training data so that it can be automatically adapted for different domains. We evaluate the performance of our segmentation approach on PB-SMT tasks from two domains and demonstrate that our approach scores consistently among the best results across different data conditions.

PUBLICATION RECORD

Publication year
2009
Venue
Conference of the European Chapter of the Association for Computational Linguistics
Publication date
2009-03-30
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1609067.1609128
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Improved Statistical Machine Translation by Multiple Chinese Word Segmentation
2008influential reference
Generalizing Word Lattice Translation
2008influential reference
Optimizing Chinese Word Segmentation for Machine Translation Performance
2008cited by this paper
Bootstrapping Word Alignment via Word Packing
2007influential reference
Moses: Open Source Toolkit for Statistical Machine Translation
2007cited by this paper
MTTK: An Alignment Toolkit for Statistical Machine Translation
2006cited by this paper
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
2005cited by this paper
HMM Word and Phrase Alignment for Statistical Machine Translation
2005cited by this paper
A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005
2005cited by this paper
Integrated Chinese Word Segmentation in Statistical Machine Translation
2005cited by this paper
Do We Need Chinese Word Segmentation for Statistical Machine Translation?
2004influential reference
HHMM-based Chinese Lexical Analyzer ICTCLAS
2003influential reference
A Systematic Comparison of Various Statistical Alignment Models
2003cited by this paper
Minimum Error Rate Training in Statistical Machine Translation
2003cited by this paper
Statistical Phrase-Based Translation
2003cited by this paper
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
2002influential reference
SRILM - an extensible language modeling toolkit
2002cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics
2002influential reference
Models of translation equivalence among words
2000cited by this paper
HMM-Based Word Alignment in Statistical Translation
1996cited by this paper
A Stochastic Finite-State Word-Segmentation Algorithm for Chinese
1996cited by this paper
A Stochastic Finite-State Word-Segmentation Algorithm for Chinese
1994cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
Computer Intensive Methods for Testing Hypotheses: An Introduction
1990cited by this paper
Computer-intensive methods for testing hypotheses : an introduction
1989cited by this paper

CITED BY

Evaluation of Speech Translation Subtitles Generated by ASR with Unnecessary Word Detection
2024cites this paper
A Word Segmentation Method of Ancient Chinese Based on Word Alignment
2019cites this paper
Title Chinese-Japanese Machine Translation Exploiting Chinese Characters
2018cites this paper
Sentence‐Chain Based Seq2seq Model for Corpus Expansion
2017cites this paper
An Improved Statistical Machine Translation Method for United Chinese-Japanese Word Segmentation
2016cites this paper
A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
2016cites this paper
Word Re-Segmentation in Chinese-Vietnamese Machine Translation
2016cites this paper
Semi-supervised Chinese Word Segmentation based on Bilingual Information
2015cites this paper
Integrated Parallel Data Extraction from Comparable Corpora for Statistical Machine Translation
2015cites this paper
Discriminative Word Alignment Over Multiple Word Segmentations
2014cites this paper
Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints
2014influential citation
Character-Cluster-Based Segmentation using Monolingual and Bilingual Information for Statistical Machine Translation
2014influential citation
Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation
2014cites this paper
Improvement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information
2014influential citation
The application of source language information in Chinese-English statistical machine translation
2013cites this paper
Vietnamese to Chinese Machine Translation via Chinese Character as Pivot
2013cites this paper
Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
2013cites this paper
Chinese-Japanese Machine Translation Exploiting Chinese Characters
2013cites this paper
Exploring Multiple Chinese Word Segmentation Results Based on Linear Model
2013cites this paper
An Empirical Study on Word Segmentation for Chinese Machine Translation
2013cites this paper
An Improved Patent Machine Translation System Using Adaptive Enhancement for NTCIR-10 PatentMT Task
2013cites this paper
Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation
2012cites this paper
50th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Volume 2: Short Papers
2012cites this paper
Enhancing Statistical Machine Translation with Character Alignment
2012cites this paper
EBMT system of Kyoto University in OLYMPICS task at IWSLT 2012
2012cites this paper
A HYBRID STATISTICAL AND MORPHOLOGICAL ARABIC LANGUAGE DIACRITIZING SYSTEM
2012cites this paper
Word Alignment Combination over Multiple Word Segmentation
2011cites this paper
Character-Level System Combination: An Empirical Study for English-to-Chinese Spoken Language Translation
2011cites this paper
Word Segmentation for Dialect Translation
2011cites this paper
Improved Language Modeling for English-Persian Statistical Machine Translation
2010cites this paper
Performance evaluation of various training data in English-Persian Statistical Machine Translation
2010cites this paper
Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation
2010cites this paper
Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm
2010cites this paper
Low-resource machine translation using MaTrEx
2009cites this paper
Integrated Language Technology as part of Next Generation Localisation
2009cites this paper
Adapting Chinese Word Segmentation for Translation by Using a Bi- lingual Dictionary
2009cites this paper
An analysis of the effect of training data variation in English-Persian Statistical Machine Translation
2009cites this paper
Constrained word alignment models for statistical machine translation
2009cites this paper
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing the Fifth Workshop on South and Southeast Asian Natural Language Processing Wssanlp-2014 Wssanlp Organizers Workshop Chair Wssanlp Invited Speaker Program Committee Konkanverter -a Finite State Transducer Base
year unknowncites this paper