Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos

A. Gupta,Praveen Srinivasan,Jianbo Shi,L. Davis

Published 2009 in 2009 IEEE Conference on Computer Vision and Pattern Recognition

ABSTRACT

Analyzing videos of human activities involves not only recognizing actions (typically based on their appearances), but also determining the story/plot of the video. The storyline of a video describes causal relationships between actions. Beyond recognition of individual actions, discovering causal relationships helps to better understand the semantic meaning of the activities. We present an approach to learn a visually grounded storyline model of videos directly from weakly labeled data. The storyline model is represented as an AND-OR graph, a structure that can compactly encode storyline variation across videos. The edges in the AND-OR graph correspond to causal relationships which are represented in terms of spatio-temporal constraints. We formulate an Integer Programming framework for action recognition and storyline extraction using the storyline model and visual groundings learned from training data.

PUBLICATION RECORD

Publication year
2009
Venue
2009 IEEE Conference on Computer Vision and Pattern Recognition
Publication date
2009-06-20
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2009.5206492
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Semantic event representation and recognition using syntactic attribute graph grammar
2009cited by this paper
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
2008cited by this paper
Event Modeling and Recognition Using Markov Logic Networks
2008cited by this paper
Max Margin AND/OR Graph learning for parsing the human body
2008cited by this paper
Learning realistic human actions from movies
2008cited by this paper
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers
2008cited by this paper
Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion
2008cited by this paper
Objects in Action: An Approach for Combining Action Understanding and Object Perception
2007cited by this paper
Situated Models of Meaning for Sports Video Retrieval
2007cited by this paper
Time as a guide to cause.
2006cited by this paper
Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video
2006cited by this paper
Composite Templates for Cloth Modeling and Sketching
2006cited by this paper
On Space-Time Interest Points
2005cited by this paper
Histograms of oriented gradients for human detection
2005cited by this paper
Protocols from perceptual observations
2005cited by this paper
Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks
2004cited by this paper
Recognition of group activities using dynamic probabilistic networks
2003cited by this paper
Matching Words and Pictures
2003cited by this paper
Extracting actors, actions and events from sports video -a fundamental approach to story tracking
2000cited by this paper
Maximum Entropy Markov Models for Information Extraction and Segmentation
2000cited by this paper
Parametric Hidden Markov Models for Gesture Recognition
1999cited by this paper
The Bayesian Structural EM Algorithm
1998cited by this paper
Action recognition using probabilistic parsing
1998cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993influential reference
Heuristics : intelligent search strategies for computer problem solving
1984cited by this paper
Causality : Models , Reasoning , and Inference
year unknowncited by this paper

CITED BY

DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization
2025cites this paper
From Videos to Indexed Knowledge Graphs - Framework to Marry Methods for Multimodal Content Analysis and Understanding
2025cites this paper
Video Captioning Method Based on Semantic Topic Association
2025cites this paper
Enhancing Auto-Generated Baseball Highlights via Win Probability and Bias Injection Method
2024influential citation
Multi-scale features with temporal information guidance for video captioning
2024cites this paper
Video emotional description with fact reinforcement and emotion awaking
2024cites this paper
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
2023cites this paper
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
2023cites this paper
Deep sequential collaborative cognition of vision and language based model for video description
2023cites this paper
Contact Part Detection From 3D Human Motion Data Using Manually Labeled Contact Data and Deep Learning
2023cites this paper
Multi-sentence video captioning using spatial saliency of video frames and content-oriented beam search algorithm
2023cites this paper
Embedded Cognition in Virtual Environments: An Ecological Approach to AI Study
2023cites this paper
Visual and language semantic hybrid enhancement and complementary for video description
2022cites this paper
Weakly-Supervised Generation and Grounding of Visual Descriptions with Conditional Generative Models
2022cites this paper
Video captioning based on vision transformer and reinforcement learning
2022cites this paper
Event prediction with rough-fuzzy sets
2022cites this paper
Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction
2021cites this paper
Vision-Based Human Activity Recognition
2021cites this paper
Modeling Long-Term Interactions to Enhance Action Recognition
2021cites this paper
A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics
2021cites this paper
Rough video conceptualization for real-time event precognition with motion entropy
2021cites this paper
Hybrid Reasoning Network for Video-based Commonsense Captioning
2021cites this paper
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics
2021cites this paper
Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing
2021cites this paper
An Automatic Detection of Fundamental Postures in Vietnamese Traditional Dances
2020cites this paper
Group Activity Recognition by Using Effective Multiple Modality Relation Representation With Temporal-Spatial Attention
2020cites this paper
A Generalized Earley Parser for Human Activity Parsing and Prediction
2020cites this paper
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
2020cites this paper
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
2020cites this paper
Artificial Bee Colony Algorithm Optimization for Video Summarization on VSUMM Dataset
2020cites this paper
Understanding Human Context in 3D Scenes by Learning Spatial Affordances with Virtual Skeleton Models
2019cites this paper
HUMAN ACTIVITY RECOGNITION USING ACCELEROMETER DATA WITH MULTI CLASS SVM
2019cites this paper
Domain Specific and Idiom Adaptive Video Summarization
2019cites this paper
AUTOMATIC MOTION BASED PERSON ACTIVITY RECOGNITION IN VIDEO SURVEILLANCES
2019cites this paper
Training Algorithms for Multiple Object Tracking
2019cites this paper
Visual to Text: Survey of Image and Video Captioning
2019cites this paper
A Survey on Human Group Activity Recognition by Analysing Person Action from Video Sequences Using Machine Learning Techniques
2019cites this paper
Visual Understanding through Natural Language
2019cites this paper
Visual Semantic Information Pursuit: A Survey
2019cites this paper
Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction
2018cites this paper
A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos
2018cites this paper
Data-Driven Visual Forecasting
2018cites this paper
Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration
2018cites this paper
Unsupervised Activity Learning and Parsing Learned Action 1 : Selected Visual Atoms : Selected Language Atoms
2018cites this paper
Learning to manipulate novel objects for assistive robots
2017cites this paper
TPC: Temporal Preservation Convolutional Networks for Precise Temporal Action Localization
2017cites this paper
The presentation of hidden Markov model for forecasting, discovering and extracting the human activities
2017cites this paper
HMM-based Activity Recognition with a Ceiling RGB-D Camera
2017cites this paper
The Role of Synchronic Causal Conditions in Visual Knowledge Learning
2017cites this paper
A framework of mining semantic-based probabilistic event relations for complex activity recognition
2017cites this paper
A novel algorithm to predict and detect suspicious behaviors of people at public areas for surveillave cameras
2017cites this paper
Learning Latent Super-Events to Detect Multiple Activities in Videos
2017cites this paper
Learning discriminative context models for concurrent collective activity recognition
2017cites this paper
Learning social affordance grammar from videos: Transferring human interactions to human-robot interactions
2017cites this paper
Predicting Human Activities Using Stochastic Grammar
2017cites this paper
A REVIEW ON MACHINE LEARNING ALGORITHMS ON HUMAN ACTION RECOGNITION
2017cites this paper
Pose-invariant action recognition for automated behaviour analysis
2017cites this paper
Attention-Based Two-Phase Model for Video Action Detection
2017cites this paper
Towards a Knowledge-Based Approach for Generating Video Descriptions
2017cites this paper
Unsupervised Video Understanding by Reconciliation of Posture Similarities
2017cites this paper
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
2017cites this paper
Semantic Pooling for Complex Event Analysis in Untrimmed Videos
2017cites this paper
Five Challenges for Intelligent Cinematography and Editing
2017cites this paper
A Survey of Content-Aware Video Analysis for Sports
2017cites this paper
Automatic group activity annotation for mobile videos
2017cites this paper
Deep embodiment: grounding semantics in perceptual modalities
2017cites this paper
Privacy-Respecting Smart Video Surveillance Based on Usage Control Enforcement
2016cites this paper
Cognitive Architecture for Adaptive Social Robotics
2016influential citation
Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research
2016cites this paper
Scene Person Structure Inference Machine Structure Inference Machine Structure Inference Machine Walking ? Waiting ? Waiting Waiting Waiting Walking Waiting Waiting Walking Waiting
2016cites this paper
Learning Visual Storylines with Skipping Recurrent Neural Networks
2016cites this paper
2D and 3D tracking and modelling.
2016cites this paper
Learning and Inferring Perceptual Causality from Video
2016cites this paper
Learning Interactive Affordance for Human-Robot Interaction
2016cites this paper
Automatic Analysis of Cricket And Soccer Broadcast Videos
2016cites this paper
Automatic Localization and Annotation of Spatio-Temporal Actions in Weakly Labelled Videos
2016cites this paper
Exploring deep learning based solutions in fine grained activity recognition in the wild
2016cites this paper
Affordance-map: learning hidden human context in 3D scenes through virtual human models
2016cites this paper
Multi-modal human aggression detection
2016cites this paper
Sum Product Networks for Activity Recognition
2016cites this paper
Detección de objetos en entornos dinámicos para videovigilancia
2016cites this paper
Label-Based Automatic Alignment of Video with Narrative Sentences
2016cites this paper
What are the Limits to Time Series Based Recognition of Semantic Concepts?
2016cites this paper
Structured Representation Using Latent Variable Models
2016cites this paper
Abstraction hierarchy and self annotation update for fine grained activity recognition
2016cites this paper
Automatic Human Activity Segmentation and Labeling in RGBD Videos
2016cites this paper
A new method for violence detection in surveillance scenes
2016cites this paper
Learning from large-scale visual data for robots
2016cites this paper
Exploring semantic concepts for complex event analysis in unconstrained video clips
2016cites this paper
Learning Social Affordance for Human-Robot Interaction
2016cites this paper
Story Understanding through Semantic Analysis and Automatic Alignment of Text and Video
2016cites this paper
Main objects interaction activity recognition in real images
2016cites this paper
Unsupervised Semantic Action Discovery from Video Collections
2016cites this paper
Computational Methods for Integrating Vision and Language
2016cites this paper
A survey on using domain and contextual knowledge for human activity recognition in video streams
2016cites this paper
$p$-Laplacian Regularized Sparse Coding for Human Activity Recognition
2016cites this paper
A Principled Framework for General Adaptive Social Robotics
2016cites this paper
Reliable Workspace Monitoring in Safe Human-Robot Environment
2016cites this paper
CONNECTING IMAGES AND NATURAL LANGUAGE A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
2016cites this paper
Recognizing Car Fluents from Video
2016cites this paper