A Proposal for a Coherence Corpus in Machine Translation

Karin Sim Smith,Wilker Aziz,Lucia Specia

Published 2015 in DiscoMT@EMNLP

ABSTRACT

Coherence in Machine Translation (MT) has received little attention to date. One of the main issues we face in work in this area is the lack of labelled data. While coherent (human authored) texts are abundant and incoherent texts could be taken from MT output, the latter also contains other errors which are not specifically related to coherence. This makes it difficult to identify and quantify issues of coherence in those texts. We introduce an initiative to create a corpus consisting of data artificially manipulated to contain errors of coherence common in MT output. Such a corpus could then be used as a benchmark for coherence models in MT, and potentially as training data for coherence models in supervised settings.

PUBLICATION RECORD

Publication year
2015
Venue
DiscoMT@EMNLP
Publication date
2015-09-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/W15-2507
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers
2019cited by this paper
The role of artificially generated negative data for quality estimation of machine translation
2015cited by this paper
Findings of the 2015 Workshop on Statistical Machine Translation
2015cited by this paper
Improving the Translation of Discourse Markers for Chinese into English
2015cited by this paper
Translating Negation: A Manual Error Analysis
2015cited by this paper
Using Discourse Structure Improves Machine Translation Evaluation
2014cited by this paper
Generating artificial errors for grammatical error correction
2014cited by this paper
Findings of the 2014 Workshop on Statistical Machine Translation
2014cited by this paper
A Model of Coherence Based on Distributed Sentence Representation
2014cited by this paper
Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays
2014cited by this paper
Applying the semantics of negation to SMT through n-best list re-ranking
2014cited by this paper
A Topic-Based Coherence Model for Statistical Machine Translation
2013cited by this paper
Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction
2013influential reference
Findings of the 2013 Workshop on Statistical Machine Translation
2013cited by this paper
Graph-based Local Coherence Modeling
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
2013influential reference
Machine Translation with Many Manually Labeled Discourse Connectives
2013cited by this paper
Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation
2013cited by this paper
A Coherence Model Based on Syntactic Patterns
2012cited by this paper
Improving Pronoun Translation for Statistical Machine Translation
2012cited by this paper
Enriching Parallel Corpora for Statistical Machine Translation with Semantic Negation Rephrasing
2012cited by this paper
Discourse in Statistical Machine Translation
2012cited by this paper
Topic Models for Dynamic Translation Model Adaptation
2012cited by this paper
Collection of a Large Database of French-English SMT Output Corrections
2012influential reference
The Trouble with SMT Consistency
2012cited by this paper
Using Sense-labeled Discourse Connectives for Statistical Machine Translation
2012cited by this paper
Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level
2012cited by this paper
Extending the Entity Grid with Entity-Speci c Features
2011cited by this paper
Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation
2011cited by this paper
Utilization of Anaphora in Machine Translation
2011cited by this paper
Utilization of Anaphora in Machine Translation
2011cited by this paper
Extending the Entity Grid with Entity-Specific Features
2011cited by this paper
Automatically Evaluating Text Coherence Using Discourse Relations
2011cited by this paper
Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache
2010cited by this paper
Using Entity-Based Features to Model Coherence in Student Essays
2010cited by this paper
Aiding Pronoun Translation with Co-Reference Resolution
2010cited by this paper
LEXCONN: A French Lexicon of Discourse Connectives
2010cited by this paper
Modelling pronominal anaphora in statistical machine translation
2010cited by this paper
Using Syntax to Disambiguate Explicit Discourse Connectives in Text
2009cited by this paper
Reading Tea Leaves: How Humans Interpret Topic Models
2009cited by this paper
Correcting ESL Errors Using Phrasal SMT Techniques
2006cited by this paper
Shared Task: Statistical Machine Translation between European Languages
2005cited by this paper
Modeling Local Coherence: An Entity-Based Approach
2005cited by this paper
DiMLex: A Lexicon of Discourse Markers for Text Generation and Understanding
1998influential reference
Attention, Intentions, and the Structure of Discourse
1986cited by this paper
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Modeling Lexical Cohesion for Document-Level Machine Translation
year unknowncited by this paper

CITED BY

How are neural machine-translated Chinese-to-English short stories constructed and cohered? An exploratory study based on theme-rheme structure
2022cites this paper
Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks
2021cites this paper
Can Your Context-Aware MT System Pass the DiP Benchmark Tests? : Evaluation Benchmarks for Discourse Phenomena in Machine Translation
2020cites this paper
D I P B ENCHMARK T ESTS : E VALUATION B ENCH - MARKS FOR D I SCOURSE P HENOMENA IN MT
2020cites this paper
Coherence in Machine Translation Output
2019cites this paper
Coherence in machine translation
2018cites this paper
Entity-based coherence in statistical machine translation : a modelling and evaluation perspective
2018cites this paper
On Integrating Discourse in Machine Translation
2017cites this paper
Using a Graph-based Coherence Model in Document-Level Machine Translation
2017cites this paper
The Trouble with Machine Translation Coherence
2016cites this paper
An overview on text coherence methods
2016cites this paper