Embedding-based evaluation measures have shown promising improvements on the correlation with human judgments in natural language generation. In these measures, various intrinsic metrics are used in the computation, including generalized precision, recall, F-score and the earth mover's distance. However, the relations between these metrics are unclear, making it difficult to determine which measure to use in real applications. In this paper, we provide an in-depth study on the relations between these metrics. Inspired by the optimal transportation theory, we prove that these metrics correspond to the optimal transport problem with different hard marginal constraints. However, these hard marginal constraints may cause the problem of incomplete and noisy matching in the evaluation process. Therefore we propose a family of new evaluation metrics, namely Lazy Earth Mover's Distances, based on the more general unbalanced optimal transport problem. Experimental results on WMT18 and WMT19 show that our proposed metrics have the ability to produce more consistent evaluation results with human judgements, as compared with existing intrinsic metrics.
Evaluating Natural Language Generation via Unbalanced Optimal Transport
Yimeng Chen,Yanyan Lan,Ruibin Xiong,Liang Pang,Zhiming Ma,Xueqi Cheng
Published 2020 in International Joint Conference on Artificial Intelligence
ABSTRACT
PUBLICATION RECORD
- Publication year
2020
- Venue
International Joint Conference on Artificial Intelligence
- Publication date
2020-07-01
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-24 of 24 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1