We present the first systematic study of machine translation for Chakma, an endangered and extremely low-resource Indo-Aryan language, with the goal of supporting language access and preservation. We introduce a new Chakma-Bangla parallel and monolingual dataset, along with a trilingual Chakma-Bangla-English benchmark for evaluation. To address script mismatch and data scarcity, we propose a character-level transliteration framework that exploits the close orthographic and phonological relationship between Chakma and Bangla, preserving semantic content while enabling effective transfer from Bangla and multilingual pretrained models. We benchmark from-scratch MT, fine-tuned pretrained models, and large language models via in-context learning. Results show that transliteration is essential and that fine-tuning and in-context learning substantially outperform from-scratch baselines, with strong asymmetry across translation directions.
ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration
Aunabil Chakma,Aditya Chakma,Masum Hasan,Soham Khisa,Chumui Tripura,Rifat Shahriyar
Published 2024 in Unknown venue
ABSTRACT
PUBLICATION RECORD
- Publication year
2024
- Venue
Unknown venue
- Publication date
2024-10-14
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-25 of 25 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1