Contextual Action Recognition with R*CNN

Georgia Gkioxari,Ross B. Girshick,Jitendra Malik

Published 2015 in IEEE International Conference on Computer Vision

ABSTRACT

There are multiple cues in an image which reveal what action a person is performing. For example, a jogger has a pose that is characteristic for jogging, but the scene (e.g. road, trail) and the presence of other joggers can be an additional source of information. In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system. We adapt RCNN to use more than one region for classification while still maintaining the ability to localize the action. We call our system R*CNN. The action-specific models and the feature maps are trained jointly, allowing for action specific representations to emerge. R*CNN achieves 90.2% mean AP on the PASAL VOC Action dataset, outperforming all other approaches in the field by a significant margin. Last, we show that R*CNN is not limited to action recognition. In particular, R*CNN can also be used to tackle fine-grained tasks such as attribute classification. We validate this claim by reporting state-of-the-art performance on the Berkeley Attributes of People dataset.

PUBLICATION RECORD

Publication year
2015
Venue
IEEE International Conference on Computer Vision
Publication date
2015-05-05
Fields of study
Computer Science
Identifiers
DOI 10.1109/ICCV.2015.129 arXiv 1505.01197
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Fast R-CNN
2015cited by this paper
From captions to visual concepts and back
2014cited by this paper
Finding action tubes
2014cited by this paper
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
2014cited by this paper
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
2014cited by this paper
Actions and Attributes from Wholes and Parts
2014influential reference
Weakly supervised object recognition with convolutional neural networks
2014cited by this paper
On learning to localize objects with minimal supervision
2014cited by this paper
Regularized Max Pooling for Image Categorization
2014cited by this paper
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
2014cited by this paper
Action Recognition From Weak Alignment of Body Parts
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014influential reference
Fine-Grained Activity Recognition with Holistic and Pose Based Features
2014influential reference
Two-Stream Convolutional Networks for Action Recognition in Videos
2014cited by this paper
Selective Search for Object Recognition
2013influential reference
PANDA: Pose Aligned Networks for Deep Attribute Modeling
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013influential reference
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Weakly Supervised Learning of Interactions between Humans and Objects
2012cited by this paper
Combining randomization and discrimination for fine-grained image categorization
2011cited by this paper
Action recognition from a distributed representation of pose and appearance
2011cited by this paper
Describing people: A poselet-based approach to attribute classification
2011cited by this paper
Human action recognition by learning bases of action attributes and parts
2011cited by this paper
The Pascal Visual Object Classes (VOC) Challenge
2010influential reference
Object Detection with Discriminatively Trained Part Based Models
2010influential reference
Author ' s personal copy The role of context in object recognition
2007cited by this paper
Multiple Instance Boosting for Object Detection
2005cited by this paper
Histograms of oriented gradients for human detection
2005cited by this paper
Distinctive Image Features from Scale-Invariant Keypoints
2004cited by this paper
A Framework for Multiple-Instance Learning
1997influential reference
Backpropagation Applied to Handwritten Zip Code Recognition
1989cited by this paper
Scene perception: detecting and judging objects undergoing relational violations.
1982cited by this paper
Author manuscript, published in "International Conference on Computer Vision (2013)" Action Recognition with Improved Trajectories
year unknowncited by this paper

CITED BY

IVEX-WA and IVEX-MetaStack Ensemble Models: A Transfer Learning Approach for Still-Image Human Action Recognition With XAI Visualization
2026cites this paper
Lightweight Multi-Scale Framework for Human Pose and Action Classification.
2026influential citation
Understanding Multimodal Complementarity for Single-Frame Action Anticipation
2026cites this paper
A Dual Mode Detection Method for Unexploded Ordnance Based on YOLOv5 for Low Altitude Unmanned Aerial Vehicle
2025cites this paper
Enhancing Pharmacy Warehouse Management With Faster R‐CNN for Accurate and Reliable Pharmaceutical Product Identification and Counting
2025cites this paper
Dynamic context learning using multiple visual scanpaths for action classification in still images
2025cites this paper
Sample Amplification and Model Adversarial Training Method for Optimizing Target Detection and Recognition Performance
2025cites this paper
Integrated Intelligent System for Automatic Classification and Counting in a Miniature Packaging Conveyor Using Machine Learning and Deep Learning
2025cites this paper
Multi-classifier information fusion for human activity recognition in healthcare facilities
2025cites this paper
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
2025cites this paper
PMSA-YOLO: lightweight vehicle detection with parallel multi-scale aggregation and attention mechanism
2025cites this paper
MPAR-RCNN: a multi-task network for multiple person detection with attribute recognition
2025cites this paper
Leveraging Deep Pre-trained Networks for Advanced Skin Lesion Classification for Human Monkeypox Detection
2025cites this paper
Weather Resilient Object Detection: Focus On Foggy Weather Conditions
2025cites this paper
Ensemble of Fast R-CNN with Bi-LSTM for Object Detection
2025cites this paper
A Survey of Human-Object Interaction Detection With Deep Learning
2025cites this paper
Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
2025cites this paper
CNNs, RNNs and Transformers in human action recognition: a survey and a hybrid model
2024cites this paper
A Survey on Backbones for Deep Video Action Recognition
2024cites this paper
Out-of-distribution Detection in Dependent Data for Cyber-physical Systems with Conformal Guarantees
2024cites this paper
OASNet: Object Affordance State Recognition Network With Joint Visual Features and Relational Semantic Embeddings
2024cites this paper
Automated Medical Image Captioning with Soft Attention-Based LSTM Model Utilizing YOLOv4 Algorithm
2024cites this paper
An Empirical Study of Mamba-based Pedestrian Attribute Recognition
2024cites this paper
Two-stage dual-channel driving distraction behavior recognition algorithm based on key point detection
2024influential citation
AJENet: Adaptive Joints Enhancement Network for Abnormal Behavior Detection in Office Scenario
2024influential citation
CKTN: Commonsense knowledge transfer network for human activity understanding
2024cites this paper
Research on construction machinery identification based on improved YOLOv5
2023cites this paper
Person Attributes Recognition Based on Hierarchical Classification
2023cites this paper
Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning
2023cites this paper
DP-Net: Learning Discriminative Parts for Image Recognition
2023cites this paper
Patch excitation network for boxless action recognition in still images
2023cites this paper
Still image action recognition based on interactions between joints and objects
2023cites this paper
Lightning Talk: Trinity - Assured Neuro-symbolic Model Inspired by Hierarchical Predictive Coding
2023cites this paper
Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition
2023cites this paper
Application of Thermal Imaging Based on Improved Faster R-CNN Algorithm in Distribution Cabinet Maintenance
2023cites this paper
Using Semantic Information for Defining and Detecting OOD Inputs
2023cites this paper
Scene text understanding: recapitulating the past decade
2023influential citation
Pedestrian Attribute Recognition via CLIP-Based Prompt Vision-Language Fusion
2023influential citation
CODiT: Conformal Out-of-Distribution Detection in Time-Series Data for Cyber-Physical Systems
2023cites this paper
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
2023cites this paper
Surgical action detection based on path aggregation adaptive spatial network
2023cites this paper
Body Part Information Additional in Multi-decoder Transformer-Based Network for Human Object Interaction Detection
2023cites this paper
Relation with Free Objects for Action Recognition
2023influential citation
Human Action Recognition in Still Images Using ConViT
2023cites this paper
Open-Category Human-Object Interaction Pre-training via Language Modeling Framework
2023cites this paper
Zero-Shot Human-Object Interaction (HOI) Classification by Bridging Generative and Contrastive Image-Language Models
2023cites this paper
LR-Net: A Block-based Convolutional Neural Network for Low-Resolution Image Classification
2022cites this paper
Study of Object Detection with Faster RCNN
2022cites this paper
An ensemble approach for still image-based human action recognition
2022cites this paper
A Block-based Convolutional Neural Network for Low-Resolution Image Classification
2022cites this paper
Deep Learning for Diabetic Retinopathy Analysis: A Review, Research Challenges, and Future Directions
2022cites this paper
Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images
2022cites this paper
IoT-based Human Activity Recognition Models based on CNN, LSTM and GRU
2022cites this paper
Attention Transfer in Self-Regulated Networks for Recognizing Human Actions from Still Images
2022cites this paper
Multi-person Identification and Localization Algorithm Using RGB-D Image Segmentation
2022influential citation
Indoor fire detection utilizing computer vision-based strategies
2022cites this paper
Spatial-Temporal Pyramid Graph Reasoning for Action Recognition
2022cites this paper
Human Monkeypox Classification from Skin Lesion Images with Deep Pre-trained Network using Mobile Application
2022cites this paper
Human Activity Recognition using CNN and Pretrained Machine Learning Models
2022cites this paper
Detection of Degraded Acacia tree species using deep neural networks on uav drone imagery
2022influential citation
MFANet: Multi-scale feature fusion network with attention mechanism
2022cites this paper
Development and evaluation of a vision-based transfer learning approach for indoor fire and smoke detection
2022cites this paper
Multi-Modal Knowledge Graph Construction and Application: A Survey
2022cites this paper
Runtime Monitoring of Deep Neural Networks Using Top-Down Context Models Inspired by Predictive Processing and Dual Process Theory
2022cites this paper
Visual Knowledge Learning
2022cites this paper
Deep-Learning-Based Cyber-Physical System Framework for Real-Time Industrial Operations
2022cites this paper
Literature review: efficient deep neural networks techniques for medical image analysis
2022influential citation
Top-down and bottom-up attentional multiple instance learning for still image action recognition
2022cites this paper
Learning Human-Object Interaction via Interactive Semantic Reasoning
2021cites this paper
Fight Detection from Still Images in the Wild
2021cites this paper
Pose-guided model for driving behavior recognition using keypoint action learning
2021cites this paper
Exploiting Egocentric Cues for Action Recognition for Ambient Assisted Living Applications
2021cites this paper
Action recognition in still images using a multi-attention guided network with weakly supervised saliency detection
2021cites this paper
Pose-guided action recognition in static images using lie-group
2021influential citation
Temporal modelling of first-person actions using hand-centric verb and object streams
2021cites this paper
Application of image processing and convolutional neural networks for flood image classification and semantic segmentation
2021cites this paper
Robust Visual Relationship Detection towards Sparse Images in Internet-of-Things
2021cites this paper
Optimized MobileNet + SSD: a real-time pedestrian detection on a low-end edge device
2021influential citation
Detecting OODs as datapoints with High Uncertainty
2021cites this paper
Is Object Detection Necessary for Human-Object Interaction Recognition?
2021cites this paper
A CV-Based Automatic Method of Acquiring and Processing Operation Data on Construction Site
2021cites this paper
An Improved Deep Relation Network for Action Recognition in Still Images
2021influential citation
Egocentric Vision-based Action Recognition: A survey
2021cites this paper
Static Image Action Recognition with Hallucinated Fine-Grained Motion Information
2021cites this paper
An Effective Method for Detecting and Classifying Diabetic Retinopathy Lesions Based on Deep Learning
2021cites this paper
Study on Temperature Variance for SimCLR based Activity Recognition
2021cites this paper
Activity and Relationship Modeling Driven Weakly Supervised Object Detection
2021cites this paper
Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior
2021cites this paper
A review of action recognition based on Convolutional Neural Network
2021cites this paper
Optimised ARG based Group Activity Recognition for Video Understanding
2021cites this paper
Intelligent plant cultivation robots based on key marker algorithm and improved A* algorithm
2021cites this paper
Vision-Based Human Activity Recognition
2021cites this paper
Transfer learning with fine tuning for human action recognition from still images
2021cites this paper
Unified Graph Structured Models for Video Understanding
2021cites this paper
Surgical Action and Instrument Detection Based on Multiscale Information Fusion
2021cites this paper
The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges and methods
2021cites this paper
Are all outliers alike? On Understanding the Diversity of Outliers for Detecting OODs
2021cites this paper
Scene Graph Inference via Multi-Scale Context Modeling
2021cites this paper
CPS-based manufacturing workcell for the production of hybrid medical devices
2021cites this paper
Multi-stream pose convolutional neural networks for human interaction recognition in images
2021cites this paper