Detecting Events and Key Actors in Multi-person Videos

Vignesh Ramanathan,Jonathan Huang,Sami Abu-El-Haija,Alexander N. Gorban,K. Murphy,Li Fei-Fei

Published 2015 in Computer Vision and Pattern Recognition

ABSTRACT

Multi-person event recognition is a challenging task, often with many people active in the scene but only a small subset contributing to an actual event. In this paper, we propose a model which learns to detect events in such videos while automatically "attending" to the people responsible for the event. Our model does not use explicit annotations regarding who or where those people are during training and testing. In particular, we track people in videos and use a recurrent neural network (RNN) to represent the track features. We learn time-varying attention weights to combine these features at each time-instant. The attended features are then processed using another RNN for event detection/ classification. Since most video datasets with multiple people are restricted to a small number of videos, we also collected a new basketball dataset comprising 257 basketball games with 14K event annotations corresponding to 11 event classes. Our model outperforms state-of-the-art methods for both event classification and detection on this new dataset. Additionally, we show that the attention mechanism is able to consistently localize the relevant players.

PUBLICATION RECORD

Publication year
2015
Venue
Computer Vision and Pattern Recognition
Publication date
2015-11-09
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2016.332 arXiv 1511.02917
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Active Object Localization with Deep Reinforcement Learning
2015cited by this paper
Describing Videos by Exploiting Temporal Structure
2015influential reference
Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition
2015cited by this paper
AttentionNet: Aggregating Weak Directions for Accurate Object Detection
2015cited by this paper
Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks
2015cited by this paper
Beyond short snippets: Deep networks for video classification
2015cited by this paper
Exploiting Image-trained CNN Architectures for Unconstrained Video Classification
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Discovering human interactions in videos with limited data labeling
2015cited by this paper
ActivityNet: A large-scale video benchmark for human activity understanding
2015cited by this paper
Unsupervised Learning of Video Representations using LSTMs
2015cited by this paper
DRAW: A Recurrent Neural Network For Image Generation
2015cited by this paper
The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Two-Stream Convolutional Networks for Action Recognition in Videos
2014cited by this paper
Finding action tubes
2014cited by this paper
Large-Scale Video Classification with Convolutional Neural Networks
2014cited by this paper
C3D: Generic Features for Video Analysis
2014influential reference
Scalable, High-Quality Object Detection
2014cited by this paper
Recurrent Models of Visual Attention
2014cited by this paper
Long-term recurrent convolutional networks for visual recognition and description
2014influential reference
Action Localization with Tubelets from Motion
2014cited by this paper
The application of two-level attention models in deep convolutional neural network for fine-grained image classification
2014cited by this paper
Video Action Detection with Relational Dynamic-Poselets
2014cited by this paper
Multiple Object Recognition with Visual Attention
2014cited by this paper
A discriminative CNN video representation for event detection
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Action Recognition with Stacked Fisher Vectors
2014cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Finding Actors and Actions in Movies
2013cited by this paper
Spatiotemporal Deformable Part Models for Action Detection
2013cited by this paper
Multimedia event detection with multimodal feature fusion and temporal concept localization
2013cited by this paper
Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
2013cited by this paper
Action and Event Recognition with Fisher Vectors on a Compact Feature Set
2013cited by this paper
Deep Neural Networks for Object Detection
2013cited by this paper
Explicit Modeling of Human-Object Interactions in Realistic Videos
2013cited by this paper
Better Exploiting Motion for Better Action Recognition
2013cited by this paper
Hybrid speech recognition with Deep Bidirectional LSTM
2013cited by this paper
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
2012cited by this paper
Discriminative Latent Models for Recognizing Contextual Group Activities
2012cited by this paper
Action bank: A high-level representation of activity in video
2012cited by this paper
Trajectory-Based Modeling of Human Actions with Motion Reference Points
2012cited by this paper
A database for fine grained activity detection of cooking activities
2012cited by this paper
Discovering discriminative action parts from mid-level video representations
2012cited by this paper
A large-scale benchmark dataset for event recognition in surveillance video
2011cited by this paper
Tracking multiple people under global appearance constraints
2011cited by this paper
Action recognition by dense trajectories
2011influential reference
Discriminative figure-centric models for joint action localization and recognition
2011cited by this paper
HMDB: A large video database for human motion recognition
2011cited by this paper
TRECVID 2011 - An Overview of the Goals, Tasks, Data,Evaluation Mechanisms, and Metrics
2011cited by this paper
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification
2010cited by this paper
What are they doing? : Collective activity classification using spatio-temporal relationship among people
2009cited by this paper
Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities
2009cited by this paper
Evaluation of Local Spatio-temporal Features for Action Recognition
2009cited by this paper
Learning realistic human actions from movies
2008cited by this paper
Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video
2007cited by this paper
Human Detection Using Oriented Histograms of Flow and Appearance
2006cited by this paper
Detecting group activities using rigidity of formation
2005cited by this paper
Recognizing human actions: a local SVM approach
2004cited by this paper
Activity recognition using the dynamics of the configuration of interacting objects
2003cited by this paper
Recognizing multitasked activities from video using stochastic context-free grammar
2002cited by this paper
Support Vector Machines for Multiple-Instance Learning
2002influential reference
Resolving Motion Correspondence for Densely Moving Points
2001cited by this paper
Recognizing Planned, Multiperson Action
2001cited by this paper
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
1998cited by this paper
ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS*
1957cited by this paper
Ieee Transactions on Pattern Analysis and Machine Intelligence Learning to Track and Identify Players from Broadcast Sports Videos
year unknowncited by this paper
Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Convolutional Neural Networks for Human Action Recognition
year unknowncited by this paper

CITED BY

Interpersonal Relationship Detection Using Multi-Head Graph Attention Networks With Multi-Feature Fusion
2025cites this paper
SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic
2025cites this paper
Deep learning-based group activity recognition in videos: A survey
2025cites this paper
DEFI-Net: Dual-Enhanced Feature Integration for Accurate Multi-Object Tracking in Sports Analytics
2025cites this paper
Occlusion-aware heatmap generation for enhancing 3D human pose estimation in multi-person environments
2025cites this paper
Important people detection based on multi-level key actor interaction graph
2025cites this paper
Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges
2025cites this paper
Calisthenics Skills Temporal Video Segmentation
2025cites this paper
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild
2025cites this paper
Optimizing Group Activity Recognition With Actor Relation Graphs and GCN-LSTM Architectures
2025cites this paper
Pedestrian Group Activity Recognition for Autonomous Vehicles and Robots: A Survey and Perspectives
2025cites this paper
Cricket Shot Analysis using Conditional Directed Spatio-Temporal Graph networks
2025cites this paper
Contextual motion-aware for group activity recognition
2024cites this paper
EIoU-distance loss: an automated team-wise player detection and tracking with jersey colour recognition in soccer
2024cites this paper
Human Action Anticipation: A Survey
2024cites this paper
MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding
2024cites this paper
INTELLIGENT COMPUTER VISION SYSTEM FOR SCORE DETECTION IN BASKETBALL
2024influential citation
Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances
2024cites this paper
Multistream Adaptive Attention-Enhanced Graph Convolutional Networks for Youth Fencing Footwork Training.
2024cites this paper
Detection of Anomalies in Video Streams Using LDPH Features and a Temporal-Stream
2024cites this paper
Group activity recognition using unreliable tracked pose
2024cites this paper
iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations
2023cites this paper
Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition Using Wrist-Worn Inertial Sensors
2023cites this paper
Analysis of Movement and Activities of Handball Players Using Deep Neural Networks
2023cites this paper
Baza slika za strojno učenje modela za detekciju plivača
2023cites this paper
MLP-AIR: An Efficient MLP-Based Method for Actor Interaction Relation Learning in Group Activity Recognition
2023cites this paper
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
2023cites this paper
Transformer-Based Two-Stream Network for Global and Local Motion Estimation
2023cites this paper
Learning Relation Models to Detect Important People in Still Images
2023influential citation
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
2023cites this paper
Real-Time Aerial Detection and Reasoning on Embedded-UAVs in Rural Environments
2023cites this paper
Recognizing sports activities from video frames using deformable convolution and adaptive multiscale features
2023cites this paper
Group Activity Recognition in Computer Vision: A Comprehensive Review, Challenges, and Future Perspectives
2023cites this paper
3DMesh-GAR: 3D Human Body Mesh-Based Method for Group Activity Recognition
2022influential citation
A Comprehensive Review of Computer Vision in Sports: Open Issues, Future Trends and Research Directions
2022cites this paper
Dual-branch Cross-Patch Attention Learning for Group Affect Recognition
2022cites this paper
Recurrent Neural Network
2022cites this paper
Smoking Behavior Detection Based on TF-YOLOv5
2022cites this paper
Spatio-Temporal Player Relation Modeling for Tactic Recognition in Sports Videos
2022cites this paper
A Large-scale Sports Tracking Dataset and Progressive Re-detection Based Sports Tracking
2022cites this paper
Towards Causality Inference for Very Important Person Localization
2022cites this paper
Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition
2022cites this paper
WTM: Weighted Temporal Attention Module for Group Activity Recognition
2022cites this paper
Interaction Classification with Key Actor Detection in Multi-Person Sports Videos
2022influential citation
Artificial Intelligence in Elite Sports—A Narrative Review of Success Stories and Challenges
2022cites this paper
Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
2022cites this paper
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
2022influential citation
An overview of Human Action Recognition in sports based on Computer Vision
2022cites this paper
Learning cricket strokes from spatial and motion visual word sequences
2022influential citation
Group Activity Recognition based on Temporal Semantic Sub-Graph Network
2022cites this paper
EventAnchor: Reducing Human Interactions in Event Annotation of Racket Sports Videos
2021cites this paper
Evaluating Soccer Player: from Live Camera to Deep Reinforcement Learning
2021cites this paper
Identifying players in broadcast videos using graph convolutional network
2021cites this paper
COMPOSER: Compositional Learning of Group Activity in Videos
2021cites this paper
Filtering active moments in basketball games using data from players tracking systems
2021cites this paper
GLM-Net: Global and Local Motion Estimation via Task-Oriented Encoder-Decoder Structure
2021influential citation
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding
2021cites this paper
Hands Off: A Handshake Interaction Detection and Localization Model for COVID-19 Threat Control
2021cites this paper
Position-Aware Participation-Contributed Temporal Dynamic Model for Group Activity Recognition
2021cites this paper
Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting
2021cites this paper
Multiperson Interactive Activity Recognition Based on Interaction Relation Model
2021cites this paper
Robust Real-Time Group Activity Recognition of Robot Teams
2021cites this paper
Real-time recognition of team behaviors by multisensory graph-embedded robot learning
2021cites this paper
Action Recognition in Handball Scenes
2021cites this paper
Pose is all you need: the pose only group activity recognition system (POGARS)
2021influential citation
Action Spotting and Temporal Attention Analysis in Soccer Videos
2021influential citation
NPU RGBD Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players+
2021cites this paper
Ball and Player Detection & Tracking in Soccer Videos Using Improved YOLOV3 Model
2021cites this paper
End-to-End Key-Player-Based Group Activity Recognition Network Applied to Basketball Offensive Tactic Identification in Limited Data Scenarios
2021cites this paper
A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games
2021cites this paper
Very Important Person Localization in Unconstrained Conditions: A New Benchmark
2021cites this paper
Global motion estimation with iterative optimization-based independent univariate model for action recognition
2021cites this paper
Football Players Movement Analysis in Panning Videos
2021cites this paper
Queue Time Estimation in Checkout Counters Using Computer Vision and Deep Neural Network
2021cites this paper
Social Relation Analysis from Videos via Multi-entity Reasoning
2021cites this paper
Two-stream deep spatial-temporal auto-encoder for surveillance video abnormal event detection
2021cites this paper
Context-based camera selection from multiple video streams
2021cites this paper
Spotting Football Events Using Two-Stream Convolutional Neural Network and Dilated Recurrent Neural Network
2021cites this paper
A Comprehensive Review of Group Activity Recognition in Videos
2021cites this paper
Distractor-Aware Tracker with a Domain-Special Optimized Benchmark for Soccer Player Tracking
2021cites this paper
COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
2021cites this paper
Human-Machine Cooperative Video Anomaly Detection
2020cites this paper
Basketball Footwork Recognition using Smart Insoles Integrated with Multiple Sensors
2020cites this paper
Multi-View Mouse Social Behaviour Recognition With Deep Graphic Model
2020cites this paper
Computer vision for detecting and tracking players in basketball videos
2020cites this paper
Skeleton-based Relational Reasoning for Group Activity Analysis
2020cites this paper
Human Identification and Interaction Detection in Cross-View Multi-Person Videos with Wearable Cameras
2020cites this paper
HiGCIN: Hierarchical Graph-Based Cross Inference Network for Group Activity Recognition
2020cites this paper
RNN-based Motion Prediction in Competitive Fencing Considering Interaction between Players
2020cites this paper
TVENet: Temporal variance embedding network for fine-grained action representation
2020cites this paper
Active Player Detection in Handball Scenes Based on Activity Measures
2020influential citation
Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic
2020cites this paper
GAIM: Graph Attention Interaction Model for Collective Activity Recognition
2020influential citation
SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos
2020influential citation
As Seen on TV: Automatic Basketball Video Production using Gaussian-based Actionness and Game States Recognition
2020cites this paper
Multimodal Video Saliency Analysis With User-Biased Information
2020cites this paper
Long-Term Action Dependence-Based Hierarchical Deep Association for Multi-Athlete Tracking in Sports Videos
2020cites this paper
Group Activity Prediction with Sequential Relational Anticipation Model
2020cites this paper
Spatio-Temporal VLAD Encoding of Visual Events Using Temporal Ordering of the Mid-Level Deep Semantics
2020cites this paper
Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos
2020cites this paper