Evaluating Natural Language Generation via Unbalanced Optimal Transport

Yimeng Chen,Yanyan Lan,Ruibin Xiong,Liang Pang,Zhiming Ma,Xueqi Cheng

Published 2020 in International Joint Conference on Artificial Intelligence

ABSTRACT

Embedding-based evaluation measures have shown promising improvements on the correlation with human judgments in natural language generation. In these measures, various intrinsic metrics are used in the computation, including generalized precision, recall, F-score and the earth mover's distance. However, the relations between these metrics are unclear, making it difficult to determine which measure to use in real applications. In this paper, we provide an in-depth study on the relations between these metrics. Inspired by the optimal transportation theory, we prove that these metrics correspond to the optimal transport problem with different hard marginal constraints. However, these hard marginal constraints may cause the problem of incomplete and noisy matching in the evaluation process. Therefore we propose a family of new evaluation metrics, namely Lazy Earth Mover's Distances, based on the more general unbalanced optimal transport problem. Experimental results on WMT18 and WMT19 show that our proposed metrics have the ability to produce more consistent evaluation results with human judgements, as compared with existing intrinsic metrics.

PUBLICATION RECORD

Publication year
2020
Venue
International Joint Conference on Artificial Intelligence
Publication date
2020-07-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.24963/ijcai.2020/516
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges
2019cited by this paper
BERTScore: Evaluating Text Generation with BERT
2019influential reference
WMDO: Fluency-based Word Mover’s Distance for Machine Translation Evaluation
2019cited by this paper
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
2019influential reference
YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Computational Optimal Transport
2018cited by this paper
Deep Contextualized Word Representations
2018cited by this paper
Results of the WMT18 Metrics Shared Task: Both characters and embeddings achieve good performance
2018cited by this paper
Why We Need New Evaluation Metrics for NLG
2017cited by this paper
Deep Reinforcement Learning for Dialogue Generation
2016cited by this paper
Re-evaluating Automatic Metrics for Image Captioning
2016cited by this paper
Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers
2016influential reference
From Word Embeddings To Document Distances
2015influential reference
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics
2012cited by this paper
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
2005cited by this paper
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization
2005cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
A metric for distributions with applications to image databases
1998cited by this paper
Annals of Mathematical Statistics
1962cited by this paper

CITED BY

No citing papers are available for this paper.