Low-Resource Language Models: Leveraging Transfer and Zero-Shot Learning for Underrepresented Languages

Rishabh Agrawal,Shashikant Reddy Lnu

Published 2025 in 2025 International Conference on Electronics and Computing, Communication Networking Automation Technologies (ICEC2NT)

ABSTRACT

The development of Large Language Models (LLMs) for underrepresented languages is challenged by the lack of large-scale, high-quality datasets. This paper addresses this limitation by leveraging transfer learning and zero-shot learning to develop robust models for low-resource languages. Using pre-trained LLMs such as GPT-4, we fine-tuned the models on datasets containing as few as 5,000 samples. The resulting models achieved notable improvements: translation accuracy with BLEU scores up to 49.8, task-specific F1 scores reaching 0.74, and perplexity values reduced to 17.5 with larger datasets of 20,000 samples. Zero-shot evaluations further demonstrated strong generalization capabilities, with F1 scores of 0.68 in Zulu sentiment analysis, 0.67 in Pashto named entity recognition, and 0.62 in Amharic machine translation. These findings highlight an effective, scalable approach to language modeling in data-constrained environments. Our work underscores the value of repurposing multilingual LLMs to create inclusive NLP technologies that bridge the language divide and support underrepresented linguistic communities.

PUBLICATION RECORD

Publication year
2025
Venue
2025 International Conference on Electronics and Computing, Communication Networking Automation Technologies (ICEC2NT)
Publication date
2025-09-03
Fields of study
Not labeled
Identifiers
DOI 10.1109/ICEC2NT65402.2025.11380043
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment
2023cited by this paper
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022cited by this paper
Improved Meta Learning for Low Resource Speech Recognition
2022cited by this paper
Meta-Adapter: Efficient Cross-Lingual Adaptation With Meta-Learning
2021cited by this paper
Language Models are Few-shot Multilingual Learners
2021cited by this paper
Few-shot Learning with Multilingual Language Models
2021cited by this paper
Are All Languages Created Equal in Multilingual BERT?
2020cited by this paper
CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP
2020cited by this paper
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
2020cited by this paper
UniTrans : Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data
2020cited by this paper
Meta-Learning in Neural Networks: A Survey
2020cited by this paper
Meta-Transfer Learning for Zero-Shot Super-Resolution
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Cross-lingual Language Model Pretraining
2019influential reference
Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings
2019cited by this paper
Meta Learning for End-To-End Low-Resource Speech Recognition
2019cited by this paper
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
2018cited by this paper
A Survey on Deep Transfer Learning
2018cited by this paper
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
2017cited by this paper
Review of various stages in speaker recognition system, performance measures and recognition toolkits
2017cited by this paper
Cross-lingual Models of Word Embeddings: An Empirical Comparison
2016cited by this paper
Transfer Learning for Low-Resource Neural Machine Translation
2016cited by this paper
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
2016cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Improving Vector Space Word Representations Using Multilingual Correlation
2014cited by this paper
A Survey on Transfer Learning
2010cited by this paper
Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages
2009cited by this paper
An introduction to hidden Markov models
1986cited by this paper

CITED BY

No citing papers are available for this paper.