Automated Multilingual Content Delivery for the Visually Impaired via AI-Driven Document Parsing

M. Selvaganapathy,N. Nishavithri,P. Prabakaran,R. Nithya,Kanimozhi Rajasekaran,A. B. Joice

Published 2025 in 2025 Fourth International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN)

ABSTRACT

In multilingual societies, the challenge of accessing printed information remains acute for visually impaired individuals and non-native English speakers. This work introduces a sophisticated and extensible framework that unites advanced Optical Character Recognition (OCR) algorithms and neural machine translation, specifically engineered for English-to-Tamil conversion. In order to elevate recognition reliability, the methodology incorporates enhanced preprocessing techniques including adaptive thresholding and robust noise reduction algorithms, critically improving performance across diverse document types. Architecturally, the system integrates recent advances in character recognition, leveraging the open-source Tesseract engine and contemporary sequence-to-sequence translation models, with modularity for expansion to additional languages and features. Experimental validation utilizes a diverse array of datasets and is substantiated by standard metrics such as BLEU for translation fidelity. Comparative analyses with recent commercial and academic benchmarks are provided, and results are presented with comprehensive statistical analysis. The entire pipeline is implemented for accessible deployment, emphasizing usability, performance, and extensibility. The findings confirm highly competitive OCR accuracy and translation reliability, with particular suitability for resource-constrained environments and social inclusion efforts.

PUBLICATION RECORD

Publication year
2025
Venue
2025 Fourth International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN)
Publication date
2025-11-20
Fields of study
Not labeled
Identifiers
DOI 10.1109/ICSTSN67075.2025.11398048
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Image Text-to-Speech Converter with Desired Language Translation
2023influential reference
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Attention is All you Need
2017cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
Attention-Based Models for Speech Recognition
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
A Secured Data Transmission Method using Enhanced Proactive Secret Sharing Scheme to Prevent Blackhole Attack in Manets- A Review
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Deep convolutional neural networks for LVCSR
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
The Kaldi Speech Recognition Toolkit
2011influential reference
google,我,萨娜
2006cited by this paper
Novel neural modulators.
2003cited by this paper

CITED BY

No citing papers are available for this paper.