LVM-OCR: A Transformer-Based Architecture for Context-Aware Document Understanding

Bay Nguyen Van,V. Hoang,Kiet Tran Trung,Nghia Dinh,H. H. Thien

Published 2025 in 2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)

ABSTRACT

In an era where digital transformation demands seamless document processing, we introduce a groundbreaking approach to Optical Character Recognition that harnesses the remarkable capabilities of Large Vision Models. Our innovative LVM-OCR framework represents a paradigm shift in how machines read and understand text, moving beyond traditional pattern matching to genuine visual comprehension. By weaving together transformer-based architectures with intelligent text-awareness mechanisms, we have created a system that does not just recognize characters, it understands context, adapts to complexity, and learns from ambiguity. Our extensive experiments reveal something extraordinary: a 15.3% leap in accuracy over conventional methods, with particular brilliance in deciphering degraded historical documents, multilingual texts, and complex layouts that would confound traditional systems. This is not merely an incremental improvement; it is a fundamental reimagining of what OCR technology can achieve.

PUBLICATION RECORD

Publication year
2025
Venue
2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)
Publication date
2025-12-01
Fields of study
Not labeled
Identifiers
DOI 10.1109/ICETISI67983.2025.11406044
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Understanding the architecture of vision transformer and its variants: A review
2024cited by this paper
Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey
2023cited by this paper
A Resource-Efficient Keyword Spotting System Based on a One-Dimensional Binary Convolutional Neural Network
2023cited by this paper
A Comprehensive Study of Optical Character Recognition
2022cited by this paper
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
2021cited by this paper
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
2021cited by this paper
Training data-efficient image transformers & distillation through attention
2020cited by this paper
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
2019cited by this paper
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019
2019cited by this paper
ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction
2019cited by this paper
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
2018cited by this paper
Attention is All you Need
2017cited by this paper
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
An Overview of the Tesseract OCR Engine
2007cited by this paper
Novel neural modulators.
2003cited by this paper
The IAM-database: an English sentence database for offline handwriting recognition
2002cited by this paper
Twenty Years of Document Image Analysis in PAMI
2000cited by this paper
Historical review of OCR research and development
1992cited by this paper

CITED BY

No citing papers are available for this paper.