The conversion of mathematical formula images into LaTeX is crucial for document digitization, but modern Transformer architectures demand substantial computational resources, limiting their use in constrained environments. While powerful, the performance-efficiency trade-offs of these models in low-resource settings remain systematically unexplored. This study addresses this gap by proposing an efficient methodology using a pretrained, frozen Swin Transformer encoder paired with a lightweight DistilGPT-2 decoder. Our approach achieves a BLEU score of 0.1628 with an inference latency of 47.48 ms, while reducing parameter count by $23.6 \%$. Notably, this frozen-encoder strategy improves performance by $4.8 \%$ over full fine-tuning under identical constraints in our experiments, suggesting that preserving pre-trained visual features can be more effective for this task. This work provides a practical reference for resource-constrained mathematical OCR, enabling more accessible development and deployment for researchers with limited computational power. Index Terms—Deep learning, Mathematical expression recognition, Transformer architectures, Resource-constrained computing, Image-to-LaTeX conversion.
Lightweight Swin-Transformer and DistilGPT-2 for Image-to-LaTeX in Constrained Environments
Ni Nyoman Wulandari,Syukron Abu Ishaq Alfarozi,I. Ardiyanto
Published 2025 in 2025 International Conference on Advanced Technologies in Energy and Informatic (ICATEI)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 International Conference on Advanced Technologies in Energy and Informatic (ICATEI)
- Publication date
2025-10-22
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-24 of 24 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1