In an era where digital transformation demands seamless document processing, we introduce a groundbreaking approach to Optical Character Recognition that harnesses the remarkable capabilities of Large Vision Models. Our innovative LVM-OCR framework represents a paradigm shift in how machines read and understand text, moving beyond traditional pattern matching to genuine visual comprehension. By weaving together transformer-based architectures with intelligent text-awareness mechanisms, we have created a system that does not just recognize characters, it understands context, adapts to complexity, and learns from ambiguity. Our extensive experiments reveal something extraordinary: a 15.3% leap in accuracy over conventional methods, with particular brilliance in deciphering degraded historical documents, multilingual texts, and complex layouts that would confound traditional systems. This is not merely an incremental improvement; it is a fundamental reimagining of what OCR technology can achieve.
LVM-OCR: A Transformer-Based Architecture for Context-Aware Document Understanding
Bay Nguyen Van,V. Hoang,Kiet Tran Trung,Nghia Dinh,H. H. Thien
Published 2025 in 2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)
- Publication date
2025-12-01
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-20 of 20 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1