Lightweight Swin-Transformer and DistilGPT-2 for Image-to-LaTeX in Constrained Environments

Ni Nyoman Wulandari,Syukron Abu Ishaq Alfarozi,I. Ardiyanto

Published 2025 in 2025 International Conference on Advanced Technologies in Energy and Informatic (ICATEI)

ABSTRACT

The conversion of mathematical formula images into LaTeX is crucial for document digitization, but modern Transformer architectures demand substantial computational resources, limiting their use in constrained environments. While powerful, the performance-efficiency trade-offs of these models in low-resource settings remain systematically unexplored. This study addresses this gap by proposing an efficient methodology using a pretrained, frozen Swin Transformer encoder paired with a lightweight DistilGPT-2 decoder. Our approach achieves a BLEU score of 0.1628 with an inference latency of 47.48 ms, while reducing parameter count by $23.6 \%$. Notably, this frozen-encoder strategy improves performance by $4.8 \%$ over full fine-tuning under identical constraints in our experiments, suggesting that preserving pre-trained visual features can be more effective for this task. This work provides a practical reference for resource-constrained mathematical OCR, enabling more accessible development and deployment for researchers with limited computational power. Index Terms—Deep learning, Mathematical expression recognition, Transformer architectures, Resource-constrained computing, Image-to-LaTeX conversion.

PUBLICATION RECORD

Publication year
2025
Venue
2025 International Conference on Advanced Technologies in Energy and Informatic (ICATEI)
Publication date
2025-10-22
Fields of study
Not labeled
Identifiers
DOI 10.1109/ICATEI67676.2025.11405233
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Improvement of Small Object Detection of Handwritten Answer Sheet
2025cited by this paper
MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition
2024cited by this paper
Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts
2024cited by this paper
Image-to-LaTeX Converter for Mathematical Formulas and Text
2024influential reference
Compact Language Models via Pruning and Knowledge Distillation
2024cited by this paper
A Concise Survey of OCR for Low-Resource Languages
2024cited by this paper
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
2023cited by this paper
TRMER: Transformer-Based End to End Printed Mathematical Expression Recognition
2023influential reference
A Hybrid Vision Transformer Approach for Mathematical Expression Recognition
2022cited by this paper
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
2021cited by this paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021cited by this paper
Global Context-Based Network with Transformer for Image2latex
2021cited by this paper
Global
2021cited by this paper
Handwritten Mathematical Expression Recognition via Paired Adversarial Learning
2020cited by this paper
ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training
2019cited by this paper
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
2019cited by this paper
MIaS: Math-Aware Retrieval in Digital Mathematical Libraries
2018cited by this paper
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
2018cited by this paper
Attention is All you Need
2017cited by this paper
Image-to-Markup Generation with Coarse-to-Fine Attention
2016cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Adaptation of OCR Models for L A TEX Vision
year unknowncited by this paper

CITED BY

No citing papers are available for this paper.