Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition

Peng Wang,Yuanzhouhan Cao,Chunhua Shen,Lingqiao Liu,Heng Tao Shen

Published 2015 in IEEE transactions on circuits and systems for video technology (Print)

ABSTRACT

Encouraged by the success of convolutional neural networks (CNNs) in image classification, recently much effort is spent on applying the CNNs to the video-based action recognition problems. One challenge is that a video contains a varying number of frames, which is incompatible to the standard input format of the CNNs. Existing methods handle this issue either by directly sampling a fixed number of frames or bypassing this issue by introducing a 3D convolutional layer, which conducts convolution in spatial-temporal domain. In this paper, we propose a novel network structure, which allows an arbitrary number of frames as the network input. The key to our solution is to introduce a module consisting of an encoding layer and a temporal pyramid pooling layer. The encoding layer maps the activation from the previous layers to a feature vector suitable for pooling, whereas the temporal pyramid pooling layer converts multiple frame-level activations into a fixed-length video-level representation. In addition, we adopt a feature concatenation layer that combines the appearance and motion information. Compared with the frame sampling strategy, our method avoids the risk of missing any important frames. Compared with the 3D convolutional method, which requires a huge video data set for network training, our model can be learned on a small target data set because we can leverage the off-the-shelf image-level CNN for model parameter initialization. Experiments on three challenging data sets, Hollywood2, HMDB51, and UCF101 demonstrate the effectiveness of the proposed network.

PUBLICATION RECORD

Publication year
2015
Venue
IEEE transactions on circuits and systems for video technology (Print)
Publication date
2015-03-03
Fields of study
Computer Science
Identifiers
DOI 10.1109/TCSVT.2016.2576761 arXiv 1503.01224
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Unsupervised Learning of Video Representations using LSTMs
2015cited by this paper
Motion Part Regularization: Improving action recognition via trajectory group selection
2015cited by this paper
Action recognition with trajectory-pooled deep-convolutional descriptors
2015cited by this paper
Beyond short snippets: Deep networks for video classification
2015cited by this paper
Initialization Strategies of Spatio-Temporal Convolutional Neural Networks
2015cited by this paper
Action Recognition with Stacked Fisher Vectors
2014influential reference
Multi-view Super Vector for Action Recognition
2014influential reference
Pooled motion features for first-person videos
2014cited by this paper
Learning Spatiotemporal Features with 3D Convolutional Networks
2014cited by this paper
Long-term recurrent convolutional networks for visual recognition and description
2014cited by this paper
Submodular Attribute Selection for Action Recognition in Video
2014cited by this paper
Return of the Devil in the Details: Delving Deep into Convolutional Nets
2014cited by this paper
Two-Stream Convolutional Networks for Action Recognition in Videos
2014influential reference
Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice
2014influential reference
C3D: Generic Features for Video Analysis
2014cited by this paper
Large-Scale Video Classification with Convolutional Neural Networks
2014cited by this paper
MatConvNet: Convolutional Neural Networks for MATLAB
2014cited by this paper
Learning latent spatio-temporal compositional model for human action recognition
2013cited by this paper
Action Recognition with Actons
2013influential reference
Space-Time Robust Representation for Action Recognition
2013cited by this paper
A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-Dimensional Visual Data
2013cited by this paper
Better Exploiting Motion for Better Action Recognition
2013influential reference
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
2012cited by this paper
Aggregating Local Image Descriptors into Compact Codes
2012cited by this paper
Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition
2012influential reference
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Action recognition by dense trajectories
2011influential reference
HMDB: A large video database for human motion recognition
2011influential reference
Locality-constrained Linear Coding for image classification
2010cited by this paper
Improving the Fisher Kernel for Large-Scale Image Classification
2010influential reference
Actions in context
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Activity recognition using the velocity histories of tracked keypoints
2009cited by this paper
Hierarchical spatio-temporal context modeling for action recognition
2009cited by this paper
Learning realistic human actions from movies
2008cited by this paper
LIBLINEAR: A Library for Large Linear Classification
2008cited by this paper
Human Detection Using Oriented Histograms of Flow and Appearance
2006cited by this paper
Ieee Transactions on Pattern Analysis and Machine Intelligence 1 3d Convolutional Neural Networks for Human Action Recognition
year unknowncited by this paper
Author manuscript, published in "International Conference on Computer Vision (2013)" Action Recognition with Improved Trajectories
year unknowninfluential reference

CITED BY

A CNN-RNN Siamese framework with multi-level aggregation for video-based person re-identification.
2026cites this paper
Recognition of Daily Activities through Multi-Modal Deep Learning: A Video, Pose, and Object-Aware Approach for Ambient Assisted Living
2026cites this paper
GPPT: Graph pyramid pooling transformer for visual scene
2025cites this paper
Knot-TPP: A Unified Deep Learning Model for Process Incidence and Tool Wear Monitoring in Stacked Drilling
2025cites this paper
HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
2025cites this paper
Msst-eegnet: multi-scale spatio-temporal feature extraction using inception and temporal pyramid pooling for motor imagery classification
2025cites this paper
Multi-Head Attention-Based Framework with Residual Network for Human Action Recognition
2025influential citation
3DPyranet Features Fusion for Spatio-temporal Feature Learning
2025cites this paper
Graph dictionary learning for the study of human motion
2024cites this paper
Enhancing Carbon Sequestration: Innovative Models for Wettability Dynamics in CO2-Brine-Mineral Systems
2024cites this paper
Accelerated Information Processing Based on Deep Photonic Time-Delay Reservoir Computing
2024cites this paper
Rethinking HTG Evaluation: Bridging Generation and Recognition
2024cites this paper
Temporal Attention-Pyramid Pooling for Temporal Action Detection
2023influential citation
Multi-Channel Weight-Sharing Autoencoder Based on Cascade Multi-Head Attention for Multimodal Emotion Recognition
2023cites this paper
Scene context-aware graph convolutional network for skeleton-based action recognition
2023cites this paper
Time Scale Network: An Efficient Shallow Neural Network for Time Series Data in Biomedical Applications
2023cites this paper
Whole image average pooling-based convolution neural network approach for brain tumour classification
2023cites this paper
Design of Human Posture Recognition System Integrating Computer Vision and Deep Learning Algorithm
2023cites this paper
A Novel Automatic Content Generation and Optimization Framework
2023cites this paper
Gesture Recognition Techniques
2023cites this paper
Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study
2023cites this paper
Attentional Composition Networks for Long-Tailed Human Action Recognition
2023cites this paper
Neural network-based motion vector estimation algorithm for dynamic image sequences
2023cites this paper
ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems
2023cites this paper
Conservative Treatment and Rehabilitation Training for Rectus Femoris Tear in Basketball Training Based on Computer Vision
2022cites this paper
First-Person Hand Action Recognition Using Multimodal Data
2022cites this paper
Sports Action Recognition Based on Particle Swarm Optimization Neural Networks
2022cites this paper
Action recognition based on RGB and skeleton data sets: A survey
2022cites this paper
Intelligent Analysis Strategy of Pragmatic Failure in Cross-Cultural Communication Based on Convolution Neural Network
2022cites this paper
Analysis of Network Information Retrieval Method Based on Metadata Ontology
2022cites this paper
Visualizing the Passage of Time with Video Temporal Pyramids
2022cites this paper
Remote sensing data processing and analysis for the identification of geological entities
2022cites this paper
Identification of Diagenetic Facies Logging of Tight Oil Reservoirs Based on Deep Learning—A Case Study in the Permian Lucaogou Formation of the Jimsar Sag, Junggar Basin
2022cites this paper
CG-Recognizer: A biosignal-based continuous gesture recognition system
2022cites this paper
C2F-TCN: A Framework for Semi- and Fully-Supervised Temporal Action Segmentation
2022cites this paper
Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network
2022cites this paper
Literature on Hand GESTURE Recognition using Graph based methods
2022cites this paper
An Identity-Preserved Framework for Human Motion Transfer
2022cites this paper
Construction of Sports Training Management Information System Using AI Action Recognition
2022cites this paper
Unsupervised contrastive learning for few-shot TOC prediction and application
2022cites this paper
Content and Style Aware Generation of Text-Line Images for Handwriting Recognition
2021cites this paper
Driver Yawning Detection Based on Subtle Facial Action Recognition
2021cites this paper
D3D: Dual 3-D Convolutional Network for Real-Time Action Recognition
2021cites this paper
An On-Chip Binary-Weight Convolution CMOS Image Sensor for Neural Networks
2021cites this paper
Temporal Pyramid Pooling for Decoding Motor-Imagery EEG Signals
2021cites this paper
Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition
2021cites this paper
RGB-D Data-Based Action Recognition: A Review
2021cites this paper
Coarse to Fine Multi-Resolution Temporal Convolutional Network
2021cites this paper
Linear dynamical systems approach for human action recognition with dual-stream deep features
2021cites this paper
A Temporal Pyramid Pooling-Based Convolutional Neural Network for Remaining Useful Life Prediction
2021cites this paper
Convolutional Neural Network on Tanned and Synthetic Leather Textures
2021cites this paper
Automatic prediction of shear wave velocity using convolutional neural networks for different reservoirs in Ordos Basin
2021cites this paper
A high precision intrusion detection system for network security communication based on multi-scale convolutional neural network
2021cites this paper
Motion-Focused Contrastive Learning of Video Representations*
2021cites this paper
Multi-stream mixed graph convolutional networks for skeleton-based action recognition
2021cites this paper
Survey on Machine Learning in Speech Emotion Recognition and Vision Systems Using a Recurrent Neural Network (RNN)
2021cites this paper
A hybridization of deep learning techniques to predict and control traffic disturbances
2020cites this paper
Image-to-video person re-identification with cross-modal embeddings
2020cites this paper
Discriminative Multi-View Subspace Feature Learning for Action Recognition
2020influential citation
Multi-scale temporal feature-based dense convolutional network for action recognition
2020cites this paper
Revisiting Hard Example for Action Recognition
2020cites this paper
A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications
2020cites this paper
Monitoring and analysis of athletes’ local body movement status based on BP neural network
2020cites this paper
Review of tool condition monitoring in machining and opportunities for deep learning
2020cites this paper
Deep learning and case-based reasoning for predictive and adaptive traffic emergency management
2020cites this paper
Video action recognition with visual privacy protection based on compressed sensing
2020cites this paper
A Trajectory-based Attention Model for Sequential Impurity Detection
2020cites this paper
A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications
2020cites this paper
MS2L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition
2020influential citation
STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition
2020influential citation
Human action recognition based on action relevance weighted encoding
2020cites this paper
VirPreNet: A weighted ensemble convolutional neural network for the virulence prediction of influenza A virus using all 8 segments
2020cites this paper
Multi-agent deep neural networks coupled with LQF-MWM algorithm for traffic control and emergency vehicles guidance
2020cites this paper
Action recognition on continuous video
2020cites this paper
Recurrent bag-of-features for visual information analysis
2020cites this paper
Electrocardiogram classification of lead convolutional neural network based on fuzzy algorithm
2020cites this paper
Aligned Dynamic-Preserving Embedding for Zero-Shot Action Recognition
2020cites this paper
A Brief Survey of Deep Learning Techniques for Person Re-identification
2020cites this paper
Global and Local Knowledge-Aware Attention Network for Action Recognition
2020cites this paper
Shanidar Cave - An Interesting Archaeological Site in the Kurdistan Region, Iraq
2019cites this paper
D3-LND: A two-stream framework with discriminant deep descriptor, linear CMDT and nonlinear KCMDT descriptors for action recognition
2019cites this paper
Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks
2019cites this paper
Video spatiotemporal mapping for human action recognition by convolutional neural network
2019cites this paper
DKD–DAD: a novel framework with discriminative kinematic descriptor and deep attention-pooled descriptor for action recognition
2019cites this paper
Fusion of CNN- and COSFIRE-Based Features with Application to Gender Recognition from Face Images
2019cites this paper
Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition
2019cites this paper
Ontology-Based Global and Collective Motion Patterns for Event Classification in Basketball Videos
2019cites this paper
An improved neural network for TOC, S1 and S2 estimation based on conventional well logs
2019cites this paper
A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition
2019cites this paper
3D Behavior Recognition Based on Multi-Modal Deep Space-Time Learning
2019cites this paper
Spatial-temporal pyramid based Convolutional Neural Network for action recognition
2019cites this paper
Interpretation of intelligence in CNN-pooling processes: a methodological survey
2019cites this paper
Graph Based Skeleton Modeling for Human Activity Analysis
2019influential citation
Towards energy-efficient convolutional neural network inference
2019cites this paper
A Novel Approach for Robust Multi Human Action Detection and Recognition based on 3-Dimentional Convolutional Neural Networks
2019influential citation
Skeleton Based Temporal Action Detection with YOLO
2019cites this paper
Recognizing Micro Actions in Videos: Learning Motion Details via Segment-Level Temporal Pyramid
2019cites this paper
Video Representation via Fusion of Static and Motion Features Applied to Human Activity Recognition
2019cites this paper
Survey on Deep Neural Networks in Speech and Vision Systems
2019cites this paper
A Fourier Domain Training Framework for Convolutional Neural Networks Based on the Fourier Domain Pyramid Pooling Method and Fourier Domain Exponential Linear Unit
2019cites this paper