Coherent Visual Description of Textual Instructions

Shashank Mujumdar,Nitin Gupta,Abhinav Jain,S. Mehta

Published 2017 in IEEE International Symposium on Multimedia

ABSTRACT

Text is the easiest means to record information but need not always be the best means for understanding a concept. In psychological theories, it is argued that when information is presented visually, it provides a better means to understand a concept. While techniques exist for generating text from a given image, the inverse problem that is to automatically fetch coherent images to represent a given set of instructions (sequence of text), is a hard one. In this paper, we present a novel multistage framework to convert textual instructions into coherent visual descriptions (text instructions annotated with images). The key components in the proposed approach are: (i) novel framework, which combines the text as well as image analysis to generate visual descriptions; (ii) ensure coherency across visual descriptions, using a combination of deep learning and graph based approach. Effectiveness of our proposed approach is shown through a user study on a dataset of instructions and corresponding images collected from WikiHow website.

PUBLICATION RECORD

Publication year
2017
Venue
IEEE International Symposium on Multimedia
Publication date
2017-12-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/ISM.2017.26
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Feature Representation in Convolutional Neural Networks
2015cited by this paper
Ranking and retrieval of image sequences from multiple paragraph queries
2015cited by this paper
StorVi (Story Visualization): A Text-to-Image Conversion
2014cited by this paper
Vishit: A Visualizer for Hindi Text
2014cited by this paper
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
A Fast and Accurate Dependency Parser using Neural Networks
2014cited by this paper
Scene layout in text-to-scene conversion
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Chat with illustration: a chat system with visual aids
2012cited by this paper
Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task
2011cited by this paper
Video CooKing: Towards the Synthesis of Multimedia Cooking Recipes
2011cited by this paper
Enriching textbooks with images
2011cited by this paper
Spatial Relations in Text-to-Scene Conversion
2010cited by this paper
Multimedia Supplementation to a Cooking Recipe Text for Facilitating Its Understanding to Inexperienced Users
2010cited by this paper
Automatic Conversion of Natural Language to 3D Animation
2006cited by this paper
Generating A 3D Simulation Of A Car Accident From A Written Description In Natural Language: The CarSim System
2001cited by this paper
WordsEye: an automatic text-to-scene conversion system
2001influential reference
Put: language-based interactive manipulation of objects
1996cited by this paper
NAtural Language driven Image Generation
1984cited by this paper
A note on two problems in connexion with graphs
1959cited by this paper

CITED BY

No citing papers are available for this paper.