The development of Large Language Models (LLMs) for underrepresented languages is challenged by the lack of large-scale, high-quality datasets. This paper addresses this limitation by leveraging transfer learning and zero-shot learning to develop robust models for low-resource languages. Using pre-trained LLMs such as GPT-4, we fine-tuned the models on datasets containing as few as 5,000 samples. The resulting models achieved notable improvements: translation accuracy with BLEU scores up to 49.8, task-specific F1 scores reaching 0.74, and perplexity values reduced to 17.5 with larger datasets of 20,000 samples. Zero-shot evaluations further demonstrated strong generalization capabilities, with F1 scores of 0.68 in Zulu sentiment analysis, 0.67 in Pashto named entity recognition, and 0.62 in Amharic machine translation. These findings highlight an effective, scalable approach to language modeling in data-constrained environments. Our work underscores the value of repurposing multilingual LLMs to create inclusive NLP technologies that bridge the language divide and support underrepresented linguistic communities.
Low-Resource Language Models: Leveraging Transfer and Zero-Shot Learning for Underrepresented Languages
Rishabh Agrawal,Shashikant Reddy Lnu
Published 2025 in 2025 International Conference on Electronics and Computing, Communication Networking Automation Technologies (ICEC2NT)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 International Conference on Electronics and Computing, Communication Networking Automation Technologies (ICEC2NT)
- Publication date
2025-09-03
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-31 of 31 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1