Bringing Ladin to FLORES+

Samuel Frontull,Thomas Ströhle,Carlo Zoli,Werner Pescosta,Ulrike Frenademez,Matteo Ruggeri,Daria Valentin,Karin Comploj,Gabriel Perathoner,Silvia Liotto,Paolo Anvidalfarei

Published 2025 in Proceedings of the Tenth Conference on Machine Translation

ABSTRACT

Recent advances in neural machine translation (NMT) have opened new possibilities for developing translation systems also for smaller, so-called low-resource, languages. The rise of large language models (LLMs) has further revolutionized machine translation by enabling more flexible and context-aware generation. However, many challenges remain for low-resource languages, and the availability of high-quality, validated test data is essential to support meaningful development, evaluation, and comparison of translation systems. In this work, we present an extension of the FLORES+ dataset for two Ladin variants, Val Badia and Gherdëina, as a submission to the Open Language Data Initiative Shared Task 2025. To complement existing resources, we additionally release two parallel datasets for Gherdëina–Val Badia and Gherdëina–Italian. We validate these datasets by evaluating state-of-the-art LLMs and NMT systems on this test data, both with and without leveraging the newly released parallel data for fine-tuning and prompting. The results highlight the considerable potential for improving translation quality in Ladin, while also underscoring the need for further research and resource development, for which this contribution provides a basis.

PUBLICATION RECORD

Publication year
2025
Venue
Proceedings of the Tenth Conference on Machine Translation
Publication date
Unknown publication date
Fields of study
Not labeled
Identifiers
DOI 10.18653/v1/2025.wmt-1.81
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
2025cited by this paper
Nesciun Lengaz Lascià Endò: Machine Translation for Fassa Ladin
2024cited by this paper
SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
2024cited by this paper
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation
2024cited by this paper
Rule-Based, Neural and LLM Back-Translation: Comparative Insights from a Variant of Ladin
2024cited by this paper
BM25S: Orders of magnitude faster lexical search via eager sparse scoring
2024cited by this paper
Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language
2024cited by this paper
Traduzione automatica “neurale” per il ladino della Val Badia
2024cited by this paper
Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM
2023cited by this paper
How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation
2023cited by this paper
How to Design Translation Prompts for ChatGPT: An Empirical Study
2023cited by this paper
CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation
2023cited by this paper
In-context Examples Selection for Machine Translation
2022cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
Massively Multilingual Neural Machine Translation
2019cited by this paper
Trivial Transfer Learning for Low-Resource Neural Machine Translation
2018cited by this paper
A Call for Clarity in Reporting BLEU Scores
2018influential reference
Transfer Learning for Low-Resource Neural Machine Translation
2016cited by this paper
La pianificazione linguistica: lingue, società e istituzioni
2004cited by this paper
Language
1999influential reference
Okapi at TREC-3
1994cited by this paper

CITED BY

Findings of the WMT 2025 Shared Task of the Open Language Data Initiative
2025cites this paper