ChakmaNMT: Machine Translation for a Low-Resource and Endangered Language via Transliteration

Aunabil Chakma,Aditya Chakma,Masum Hasan,Soham Khisa,Chumui Tripura,Rifat Shahriyar

Published 2024 in Unknown venue

ABSTRACT

We present the first systematic study of machine translation for Chakma, an endangered and extremely low-resource Indo-Aryan language, with the goal of supporting language access and preservation. We introduce a new Chakma-Bangla parallel and monolingual dataset, along with a trilingual Chakma-Bangla-English benchmark for evaluation. To address script mismatch and data scarcity, we propose a character-level transliteration framework that exploits the close orthographic and phonological relationship between Chakma and Bangla, preserving semantic content while enabling effective transfer from Bangla and multilingual pretrained models. We benchmark from-scratch MT, fine-tuned pretrained models, and large language models via in-context learning. Results show that transliteration is essential and that fine-tuning and in-context learning substantially outperform from-scratch baselines, with strong asymmetry across translation directions.

PUBLICATION RECORD

  • Publication year

    2024

  • Venue

    Unknown venue

  • Publication date

    2024-10-14

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-25 of 25 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1