Vision-to-Voice: Enhanced CNN-LSTM-Based Image Captioning with Assistive Text-to-Speech

Arunya Paul,Tejaswini Kar,S. Pahadsingh,Alokita Paul,Shruti

Published 2025 in 2025 IEEE 2nd International Conference on Green Industrial Electronics and Sustainable Technologies (GIEST)

ABSTRACT

Image captioning is a multidisciplinary task that combines the capabilities of computer vision and natural language processing, enabling automatic generation of descriptive text for images. This paper presents an approach that leverages Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks for enhanced image captioning, trained on the rich Flickr8k dataset, which provides diverse real-world images with multiple humanannotated captions. Our work differentiates itself by optimizing feature extraction and sequence generation to improve contextual accuracy and fluency. CNNs extract essential visual features, which are then processed by an LSTM-based decoder to generate coherent and meaningful captions while retaining contextual information. Additionally, we introduce an assistive text-to-voice feature that reads out the generated captions, making the system more accessible for the visually impaired. Experimental results demonstrate improved caption quality compared to existing approaches. This framework has broad applications, from assistive technologies to multimedia content enrichment, further advancing semantic understanding and human-computer interactions.

PUBLICATION RECORD

  • Publication year

    2025

  • Venue

    2025 IEEE 2nd International Conference on Green Industrial Electronics and Sustainable Technologies (GIEST)

  • Publication date

    2025-10-11

  • Fields of study

    Not labeled

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1