Due to the inherent complexity of video data, video action recognition faces significant challenges in modeling spatial-temporal dynamics and handling diverse scene contexts. Although scene graph-based methods can effectively model interactions between entities, most existing approaches overlook the rich semantic information embedded within scene graphs. Additionally, integrating large language models (LLMs) for semantic enhancement often suffers from hallucination problems, potentially introducing incorrect reasoning that misleads action recognition. To address these limitations, we propose SSGR-AR, a novel framework that structurally represents videos through scene graphs and constrain LLM reasoning using structured semantic paths derived from scene graph knowledge, ensuring controllable and reliable semantic enrichment. Moreover, we formulate entity alignment as a link prediction task and leverage a graph transformer to model the dynamic evolution of actions, thereby enhancing the model's capacity for long-term temporal reasoning. Experimental results on three widely used benchmark datasets show that our method outperforms state-of-the-art methods in terms of action recognition accuracy and generalization robustness.
SSGR-AR: Semantic-Enhanced Scene Graph Reasoning for Robust Video Action Recognition
Published 2025 in 2025 IEEE International Conference on Knowledge Graph (ICKG)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 IEEE International Conference on Knowledge Graph (ICKG)
- Publication date
2025-11-13
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-32 of 32 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1