Deep learning-based scene text detection has advanced significantly and has promising application prospects. The binarization of hyperbolic tangent network plus (HTBNet++) is designed to solve the problem of missing small text and dense text in scene images. By designing the backbone network based on CLIP, pretrained text and image features were introduced to achieve better feature extraction. At the same time, text and image prompts play a key role in feature fusion so that the features of text and image pairs can be better applied to text detection. In addition, an auxiliary segmentation loss was designed to guide the network to better perform reverse gradient propagation during the training process. Experimental results on Total-Text and TD500 datasets demonstrate that the proposed method significantly enhances text detection accuracy and robustness.
Scene Text Detection using Hyperbolic Tangent Binarization and CLIP
Published 2024 in International Conference on Robotics, Intelligent Control and Artificial Intelligence
ABSTRACT
PUBLICATION RECORD
- Publication year
2024
- Venue
International Conference on Robotics, Intelligent Control and Artificial Intelligence
- Publication date
2024-12-06
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-25 of 25 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1