Sarcasm functions as a distinct mode of communication, intended to convey a meaning contrary to its literal interpretation. With the rapid proliferation of social networks, various manifestations of multimodal sarcasm have become widespread. Consequently, there is a growing emphasis on discerning sarcasm conveyed through multimodal data. Existing research often provides only a superficial interpretation of images, lacking a thorough exploration of the contextual nuances embedded within, particularly in understanding the scene depicted in the image. In this paper, we aim to delve deeper into the information embedded within images. Specifically, we begin by extracting entity relationships from images to capture the contextual information they convey. Additionally, we utilize image text recognition to extract textual information from the images. After conducting a comprehensive analysis of the image information, we establish consistency modeling between the image content and text using external knowledge. Finally, we employ a graph neural network to process the constructed cross-modal graph and make predictions regarding sarcasm. Extensive experiments validate the state-of-the-art performance of our model on publicly available multimodal Twitter datasets.
Global Entity Relationship Enhancement Network for Multimodal Sarcasm Detection
Xiaobao Wang,Meng Ge,Lingshan Li,Di Jin,Kai He,Erik Cambria
Published 2026 in IEEE Transactions on Affective Computing
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
IEEE Transactions on Affective Computing
- Publication date
2026-01-01
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-37 of 37 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1