Transformer-Based Bidirectional Attention Network for Segmentation-Free Word-Level Text Recognition with Overlapping Characters

A. Pandey,Arun Kumar Shukla

Published 2025 in 2025 International Conference on Electronics and Computing, Communication Networking Automation Technologies (ICEC2NT)

ABSTRACT

Word-level text recognition continues to present a challenging task in document and scene analysis, particularly in the case of overcrowded characters, where conventional methods based on segmentation fail. This paper proposes a non-segmentation approach for text recognition based on transformer architecture with bidirectional attention. The model exploits an encoding-decoding transformer on top of a convolutional visual feature extraction module that permits a global contextual understanding across character sequences. This method does not rely on temporal convolutional networks or recurrent units but otherwise incorporates the bidirectional attention mechanisms added to improve alignment in sequences, character prediction, and so forth, without clear segmentation. We evaluate our model on benchmark datasets of synthetic and real-world overlapping words, which exhibit dramatic improvements over previous CNN-CTC and TCN methods in terms of accuracy and robustness. The architecture sets the limits of state of the art on end-to-end text recognition through complex visual distortions and brings to light the promise of transformer-based solutions in the area of fine-Grain character interaction.

PUBLICATION RECORD

Publication year
2025
Venue
2025 International Conference on Electronics and Computing, Communication Networking Automation Technologies (ICEC2NT)
Publication date
2025-09-03
Fields of study
Not labeled
Identifiers
DOI 10.1109/ICEC2NT65402.2025.11380006
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
2020cited by this paper
Aggregation Cross-Entropy for Sequence Recognition
2019cited by this paper
Decoupled Attention Network for Text Recognition
2019cited by this paper
Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition
2019cited by this paper
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
2019cited by this paper
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
2018cited by this paper
NRTR: A No-Recurrence Sequence-to-Sequence Model for Scene Text Recognition
2018cited by this paper
Rosetta: Large Scale System for Text Detection and Recognition in Images
2018cited by this paper
Comparison of HMM- and SVM-based stroke classifiers for Gurmukhi script
2017cited by this paper
A fuzzy approach to segment touching characters
2017cited by this paper
Attention is All you Need
2017cited by this paper
Temporal Convolutional Networks for Action Segmentation and Detection
2016cited by this paper
Semi-Supervised Learning for Neural Machine Translation
2016cited by this paper
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
2015cited by this paper
A robust method for coarse classifier construction from a large number of basic recognizers for on-line handwritten Chinese/Japanese character recognition
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
A neuro-fuzzy inference engine for Farsi numeral characters recognition
2010cited by this paper
Wavelet Adaptive Observer Based Control for a Class of Uncertain Time Delay Nonlinear Systems with Input Constraints
2009cited by this paper
Wavelet Adaptive Output Tracking Control for a Class of Delayed Uncertain MIMO Nonlinear Systems Subjected to Actuator Saturation
2009cited by this paper
A novel adaptive morphological approach for degraded character image segmentation
2005cited by this paper
Novel neural modulators.
2003cited by this paper
Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis
2001cited by this paper
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey
2000cited by this paper
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
1989cited by this paper

CITED BY

No citing papers are available for this paper.