Visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, but they are usually explored separately despite their intrinsic complementary relationship. In this paper, we propose an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can accomplish VQA and its dual task VQG simultaneously. By jointly trained on two tasks with our proposed dual regularizes (termed as Dual Training), our model has a better understanding of the interactions among images, questions and answers. After training, iQAN can take either question or answer as input, and output the counterpart. Evaluated on the CLEVR and VQA2 datasets, our iQAN improves the top-1 accuracy of the prior art MUTAN VQA method by 1.33% and 0.88% (absolute increase) respectiely. We also show that our proposed dual training framework can consistently improve model performances of many popular VQA architectures.1
Visual Question Generation as Dual Task of Visual Question Answering
Yikang Li,Nan Duan,Bolei Zhou,Xiao Chu,Wanli Ouyang,Xiaogang Wang
Published 2017 in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
ABSTRACT
PUBLICATION RECORD
- Publication year
2017
- Venue
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Publication date
2017-09-21
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
CONCEPTS
- clevr and vqa2 datasets
The benchmark datasets used to evaluate the model on synthetic reasoning and natural-image visual question answering.
Aliases: CLEVR, VQA2
- dual training
A joint optimization setup that trains question answering and question generation together as paired tasks.
- invertible bilinear fusion module
A bilinear fusion component designed to support invertible mapping inside the network.
- iqan
The Invertible Question Answering Network used to connect question answering and question generation in one model.
Aliases: Invertible Question Answering Network
- mutan
A bilinear fusion based visual question answering architecture used as the comparison baseline.
Aliases: MUTAN VQA method
- parameter sharing scheme
A design that reuses parameters between the answering and generation paths in the model.
- visual question generation
A task that generates a question from an image and an answer.
Aliases: VQG
REFERENCES
Showing 1-39 of 39 references · Page 1 of 1