Fine-Grained Evaluation of English-Russian MT in 2025: Linguistic Challenges Mirroring Human Translator Training

Shushen Manakhimova,M. Kunilovskaya,Ekaterina Lapshinova-Koltunski,Eleftherios Avramidis

Published 2025 in Proceedings of the Tenth Conference on Machine Translation

ABSTRACT

We analyze how English–Russian machine translation (MT) systems submitted to WMT25 perform on linguistically challenging translation tasks, similar to problems used in university professional translator training. We assessed the ten top-performing systems using a fine-grained test suite containing 465 manually devised test items, which cover 55 lexical, grammatical, and discourse phenomena, in 13 categories. By applying pass/fail rules with human adjudication and micro/macro aggregates, we observe three performance tiers. Compared with the official WMT25 ranking, our ranking broadly aligns but reveals notable shifts. Our findings show that in 2025, even top-performing MT systems still struggle with translation problems that require deep understanding and rephrasing, much like human novices do. The best systems exhibit creativity and can be very good at handling such challenges, often producing more natural translations rather than producing word-for-word ren-ditions. However, persistent structural and lexical problems remain: literal word order carry-overs, misused verb forms, and rigid phrase translations were common, mirroring errors typically seen in beginner translator assignments.

PUBLICATION RECORD

Publication year
2025
Venue
Proceedings of the Tenth Conference on Machine Translation
Publication date
Unknown publication date
Fields of study
Not labeled
Identifiers
DOI 10.18653/v1/2025.wmt-1.61
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Yandex Submission to the WMT25 General Machine Translation Task
2025cited by this paper
UvA-MT’s Participation in the WMT25 General Translation Shared Task
2025cited by this paper
Command-A-Translate: Raising the Bar of Machine Translation with Difficulty Filtering
2025cited by this paper
Google Translate’s Research Submission to WMT2025
2025cited by this paper
Predictability of Microsyntactic Units across Slavic Languages: A translation-based Study
2025cited by this paper
Shy-hunyuan-MT at WMT25 General Machine Translation Shared Task
2025cited by this paper
Investigating the Linguistic Performance of Large Language Models in Machine Translation
2024cited by this paper
Machine Translation Metrics Are Better in Evaluating Linguistic Errors on LLMs than on Encoder-Decoder Systems
2024cited by this paper
Linguistically Motivated Evaluation of the 2023 State-of-the-art Machine Translation: Can ChatGPT Outperform NMT?
2023cited by this paper
Challenging the State-of-the-art Machine Translation Metrics from a Linguistic Perspective
2023cited by this paper
Linguistically Motivated Evaluation of the 2022 State-of-the-art Machine Translation Systems for Three Language Directions
2022cited by this paper
Entropy as a measurement of cognitive load in translation
2022cited by this paper
Linguistic Evaluation for the 2021 State-of-the-art Machine Translation Systems for German to English and English to German
2021cited by this paper
Evaluating the evaluator: a novel perspective on translation quality assessment
2021cited by this paper
Fine-grained linguistic evaluation for state-of-the-art Machine Translation
2020cited by this paper
How to Put the Translation Test to the Test?
2019cited by this paper
Why Translation Is Difficult: A Corpus-Based Study of Non-Literality in Post-Editing and From-Scratch Translation
2017cited by this paper
Comparing Machine Translation and Human Translation: A Case Study
2017cited by this paper
Cross-linguistic variation in system and text
2003cited by this paper
Translating as a Purposeful Activity: Functionalist Approaches Explained
1997cited by this paper
Translationese in Swedish novels translated from English
1986cited by this paper

CITED BY

No citing papers are available for this paper.