LVM-OCR: A Transformer-Based Architecture for Context-Aware Document Understanding

Bay Nguyen Van,V. Hoang,Kiet Tran Trung,Nghia Dinh,H. H. Thien

Published 2025 in 2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)

ABSTRACT

In an era where digital transformation demands seamless document processing, we introduce a groundbreaking approach to Optical Character Recognition that harnesses the remarkable capabilities of Large Vision Models. Our innovative LVM-OCR framework represents a paradigm shift in how machines read and understand text, moving beyond traditional pattern matching to genuine visual comprehension. By weaving together transformer-based architectures with intelligent text-awareness mechanisms, we have created a system that does not just recognize characters, it understands context, adapts to complexity, and learns from ambiguity. Our extensive experiments reveal something extraordinary: a 15.3% leap in accuracy over conventional methods, with particular brilliance in deciphering degraded historical documents, multilingual texts, and complex layouts that would confound traditional systems. This is not merely an incremental improvement; it is a fundamental reimagining of what OCR technology can achieve.

PUBLICATION RECORD

  • Publication year

    2025

  • Venue

    2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)

  • Publication date

    2025-12-01

  • Fields of study

    Not labeled

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-20 of 20 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1