A Systematic Comparison of Various Statistical Alignment Models

Published 2003 in International Conference on Computational Logic

ABSTRACT

We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.

PUBLICATION RECORD

Publication year
2003
Venue
International Conference on Computational Logic
Publication date
2003-03-01
Fields of study
Computer Science
Identifiers
DOI 10.1162/089120103321337421
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Book Review: Cross-Language Information Retrieval by Jian-Yun Nie
2010cited by this paper
The Alignment Template Approach to Statistical Machine Translation
2004cited by this paper
A Statistical MT Tutorial Workbook
2003cited by this paper
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
2001cited by this paper
A Syntax-based Statistical Translation Model
2001cited by this paper
An Efficient A* Search Algorithm for Statistical Machine Translation
2001cited by this paper
Fast Decoding and Optimal Decoding for Machine Translation
2001cited by this paper
A Comparison of Alignment Models for Statistical Machine Translation
2000influential reference
An Unsupervised Method for Multilingual Word Sense Tagging Using Parallel Corpora
2000cited by this paper
Models of translation equivalence among words
2000cited by this paper
Minimally Supervised Morphological Analysis by Multimodal Alignment
2000cited by this paper
Algorithms for statistical translation of spoken language
2000influential reference
Chinese-Korean Word Alignment Based on Linguistic Comparison
2000cited by this paper
Robust Bilingual Word Alignment for Machine Aided Translation
1999cited by this paper
Improved Alignment Models for Statistical Machine Translation
1999influential reference
Decoding complexity in word-replacement translation models
1999cited by this paper
Automatic Acquisition of Hierarchical Transduction Models for Machine Translation
1998cited by this paper
Improving Statistical Natural Language Translation with Categories and Rules
1998cited by this paper
Fast decoding for statistical machine translation
1998cited by this paper
Manual Annotation of Translational Equivalence: The Blinker Project
1998cited by this paper
An iterative, DP-based search algorithm for statistical machine translation
1998cited by this paper
A DP based Search Algorithm for Statistical Machine Translation
1998cited by this paper
A Class-based Approach to Word Alignment
1997influential reference
Automated dictionary extraction for “knowledge-free” example-based translation
1997cited by this paper
HMM-Based Word Alignment in Statistical Translation
1996cited by this paper
Translating Collocations for Bilingual Lexicons: A Statistical Approach
1996cited by this paper
A Polynomial-Time Algorithm for Statistical Machine Translation
1996cited by this paper
Verbmobil: Towards a DRT-based translation of spontaneous negotiation dialogues
1995cited by this paper
The Candide System for Machine Translation
1994cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
Using cognates to align sentences in bilingual corpora
1993influential reference
Improved clustering techniques for class-based statistical language modelling
1993cited by this paper
But Dictionaries Are Data Too
1993influential reference
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
1977cited by this paper
An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process
1972cited by this paper
Measures of the Amount of Ecologic Association Between Species
1945cited by this paper

CITED BY

Controlled beam search for neural machine translation using subword units leveraging phrase-based statistical machine translation outputs
2026cites this paper
Context-Based Multilingual Translation Technology: on the Example of the Paratranslator Platform
2025cites this paper
EnerGIZAr: Leveraging GIZA++ for Effective Tokenizer Initialization
2025cites this paper
Cross-Lingual Semantic Integration: Enhancing Document Retrieval through Vector Space Models (VSM) and Machine Translation
2025cites this paper
Transformer-based model for moroccan Arabizi-to-Arabic transliteration using a semi-automatic annotated dataset
2025cites this paper
POS-Aware Neural Approaches for Word Alignment in Dravidian Languages
2025cites this paper
High-Dimensional Interlingual Representations of Large Language Models
2025cites this paper
The Devil Is in the Word Alignment Details: On Translation-Based Cross-Lingual Transfer for Token Classification Tasks
2025cites this paper
Understanding Decoder Read-Then-Generate Behavior in Transformer Neural Machine Translation
2025cites this paper
Translation and Fusion Improves Cross-lingual Information Extraction
2025cites this paper
Quantifying the Overlap: Attribution Maps and Linguistic Heuristics in Encoder-Decoder Machine Translation Models
2025cites this paper
A New NMT Model for Translating Clinical Texts from English to Spanish
2025cites this paper
Research on the Application of Machine Learning in Translation Systems
2025cites this paper
Widening the bottleneck of lexical choice for non-autoregressive translation
2025cites this paper
The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR
2025cites this paper
PhraseBT: A phrase-level back-translation data augmentation method for neural machine translation
2025cites this paper
TransAlign: Machine Translation Encoders are Strong Word Aligners, Too
2025cites this paper
An integrated framework for multi-feature fusion and intelligent recognition of design elements: Challenges and solutions
2025cites this paper
The application of GIS technology in accident prevention and management in intelligent traffic management systems
2025cites this paper
Lost in Alignment: A Survey on Cross-Lingual Alignment Methods for Contextualized Representation
2025cites this paper
From Hand-Crafted Rules to Zero-Shot Learning: A Practical History of Information Extraction
2025cites this paper
Alignment of Historical Manuscript Transcriptions and Translations
2025cites this paper
Enhancing distant low-resource neural machine translation with semantic pivot
2025cites this paper
Türkçeden Türk İşaret Diline makine çeviri sistemi
2025cites this paper
Exploring prompting for dialectical machine translation: a focus on north Jordanian Arabic
2025cites this paper
Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models
2024cites this paper
Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation
2024cites this paper
BinaryAlign: Word Alignment as Binary Sequence Labeling
2024cites this paper
Findings of the WMT 2024 Shared Task on Non-Repetitive Translation
2024cites this paper
SYSTRAN @ WMT24 Non-Repetitive Translation Task
2024cites this paper
A Bidirectional Statistical Machine Translation System for Exploring the Performance of the Low Resource Language Pair English-Nepali
2024cites this paper
Algorithm for Aligning Paragraphs and Sentences in Aligner Tool
2024cites this paper
A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs
2024cites this paper
jp-evalb: Robust Alignment-based PARSEVAL Measures
2024cites this paper
Implicit Discourse Relation Classification For Nigerian Pidgin
2024cites this paper
On Cross-Language Entity Label Projection and Recognition
2024influential citation
Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin
2024cites this paper
Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin
2024cites this paper
Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning
2024cites this paper
Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation
2024cites this paper
Large-Scale Bitext Corpora Provide New Evidence for Cognitive Representations of Spatial Terms
2024cites this paper
Enhancing Cross-lingual Sentence Embedding for Low-resource Languages with Word Alignment
2024cites this paper
Transliteration Characteristics in Romanized Assamese Language Social Media Text and Machine Transliteration
2024cites this paper
Computational Modelling of Plurality and Definiteness in Chinese Noun Phrases
2024cites this paper
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation
2024cites this paper
Teaching Large Language Models an Unseen Language on the Fly
2024influential citation
Emotion Classification in Low and Moderate Resource Languages
2024cites this paper
Cross-lingual Contextualized Phrase Retrieval
2024influential citation
A Lifelong Multilingual Multi-granularity Semantic Alignment Approach via Maximum Co-occurrence Probability
2024cites this paper
AssameseBackTranslit: Back Transliteration of Romanized Assamese Social Media Text
2024cites this paper
Insights into Natural Language Database Query Errors: from Attention Misalignment to User Handling Strategies
2024cites this paper
Evaluating Code-Switching Translation with Large Language Models
2024cites this paper
OTTAWA: Optimal TransporT Adaptive Word Aligner for Hallucination and Omission Translation Errors Detection
2024cites this paper
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
2024cites this paper
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
2024cites this paper
Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach
2024cites this paper
Distilling Monolingual and Crosslingual Word-in-Context Representations
2024cites this paper
Creating and Evaluating a Multilingual Corpus of UN General Assembly Debates
2024cites this paper
Word Alignment as Preference for Machine Translation
2024cites this paper
Neural Representation Learning in Linguistic Structured Prediction
2024cites this paper
Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios?
2024cites this paper
Sign Language Machine Translation
2024cites this paper
Preference Grammars and Decoding Algorithms for Probabilistic Synchronous Context Free Grammar Based Translation
2024cites this paper
Prediction of Translation Techniques for the Translation Process
2024cites this paper
Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval
2024influential citation
Multilingual Sentence Transformer as A Multilingual Word Aligner
2023cites this paper
Noisy Parallel Data Alignment
2023cites this paper
The Recent Advances in Automatic Term Extraction: A survey
2023cites this paper
Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models
2023influential citation
Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer
2023cites this paper
BLADE: The University of Maryland at the TREC 2023 NeuCLIR Track
2023cites this paper
Contextual Label Projection for Cross-Lingual Structured Prediction
2023cites this paper
Machine translation and its evaluation: a study
2023cites this paper
A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension
2023influential citation
PRHLT’s Submission to WLAC 2023
2023cites this paper
Word Sense Disambiguation for Ancient Greek: Sourcing a training corpus through translation alignment
2023cites this paper
Classical Philology in the Time of AI: Exploring the Potential of Parallel Corpora in Ancient Language
2023cites this paper
The semantic map of when and its typological parallels
2023cites this paper
Grounded Intuition of GPT-Vision's Abilities with Scientific Images
2023cites this paper
Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations
2023cites this paper
GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment
2023cites this paper
A Use Case: Reformulating Query Rewriting as a Statistical Machine Translation Problem
2023cites this paper
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
2023cites this paper
Improving BERTScore via Word Similarity Matrix Iterative Alignment Method
2023cites this paper
Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study
2023cites this paper
Experiments in training transformer sequence-to-sequence DRS parsers
2023cites this paper
Contextual Label Projection for Cross-Lingual Structure Extraction
2023cites this paper
Bilingual Terminology Alignment Using Contextualized Embeddings
2023cites this paper
Lexical Based Reordering Models for English to Telugu Machine Translation
2023cites this paper
Adopting Neural Translation Model in Data Generation for Inverse Text Normalization
2023cites this paper
SpeechAlign: A Framework for Speech Translation Alignment Evaluation
2023influential citation
An empirical analysis on statistical and neural machine translation system for English to Mizo language
2023cites this paper
Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for English to Indian Languages
2023cites this paper
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs
2023cites this paper
Wiki-En-ASR-Adapt: Large-Scale Synthetic Dataset for English ASR Customization
2023cites this paper
Supervised Feature-based Classification Approach to Bilingual Lexicon Induction from Specialised Comparable Corpora
2023cites this paper
Few-shot Named Entity Recognition: Definition, Taxonomy and Research Directions
2023cites this paper
adaptNMT: an open-source, language-agnostic development environment for neural machine translation
2023cites this paper
Audience-specific Explanations for Machine Translation
2023cites this paper
Character alignment methods for dialect-to-standard normalization
2023cites this paper