Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities
Yevgeni Berzak,Andrei Barbu,Daniel Harari,Boris Katz,S. Ullman
Published 2015 in Conference on Empirical Methods in Natural Language Processing
ABSTRACT
PUBLICATION RECORD
- Publication year
2015
- Venue
Conference on Empirical Methods in Natural Language Processing
- Publication date
2015-09-01
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-33 of 33 references · Page 1 of 1
CITED BY
Showing 1-36 of 36 citing papers · Page 1 of 1