Part-Based Feature Aggregation Method for Dynamic Scene Recognition

Published 2019 in International Conference on Digital Image Computing: Techniques and Applications

ABSTRACT

Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.

PUBLICATION RECORD

Publication year
2019
Venue
International Conference on Digital Image Computing: Techniques and Applications
Publication date
2019-12-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/DICTA47822.2019.8946036
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Long-Short-Term Features for Dynamic Scene Classification
2019influential reference
A Super Descriptor Tensor Decomposition for Dynamic Scene Recognition
2019cited by this paper
Human Brain Tissue Segmentation in fMRI using Deep Long-Term Recurrent Convolutional Network
2018cited by this paper
D3: Recognizing dynamic scenes with deep dual descriptor based on key frames and key segments
2017influential reference
A Closer Look at Spatiotemporal Convolutions for Action Recognition
2017influential reference
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
2017influential reference
Temporal Residual Networks for Dynamic Scene Recognition
2017cited by this paper
Dynamic Scene Recognition with Complementary Spatiotemporal Features
2016cited by this paper
Dynamic Scene Classification Using Redundant Spatial Scenelets
2016cited by this paper
SPLeaP: Soft Pooling of Learned Parts for Image Classification
2016influential reference
Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation
2016cited by this paper
Dynamic texture and scene classification by transferring deep image features
2015cited by this paper
Deep Residual Learning for Image Recognition
2015influential reference
Fast R-CNN
2015cited by this paper
Mid-level deep pattern mining
2014cited by this paper
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
2014cited by this paper
Learning Deep Features for Scene Recognition using Places Database
2014influential reference
Bags of Spacetime Energies for Dynamic Scene Recognition
2014cited by this paper
Learning Spatiotemporal Features with 3D Convolutional Networks
2014influential reference
Two-Stream Convolutional Networks for Action Recognition in Videos
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Learning Discriminative and Shareable Features for Scene Classification
2014cited by this paper
Large-Scale Video Classification with Convolutional Neural Networks
2014influential reference
Blocks That Shout: Distinctive Parts for Scene Classification
2013cited by this paper
Representing Videos Using Mid-level Discriminative Patches
2013cited by this paper
Selective Search for Object Recognition
2013cited by this paper
Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis
2013cited by this paper
Aggregating Local Image Descriptors into Compact Codes
2012cited by this paper
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
2012cited by this paper
Dynamic scene understanding: The role of orientation features in space and time in scene classification
2012cited by this paper
Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis
2012cited by this paper
Unsupervised Discovery of Mid-Level Discriminative Patches
2012cited by this paper
Dynamic texture recognition based on distributions of spacetime oriented structure
2010cited by this paper
Object Detection with Discriminatively Trained Part Based Models
2010cited by this paper
Moving vistas: Exploiting motion for describing scenes
2010influential reference
Improving the Fisher Kernel for Large-Scale Image Classification
2010cited by this paper
ImageNet: A large-scale hierarchical image database
2009influential reference
Poselets: Body part detectors trained using 3D human pose annotations
2009cited by this paper
Recognizing indoor scenes
2009cited by this paper
LIBLINEAR: A Library for Large Linear Classification
2008cited by this paper
Author manuscript, published in "International Conference on Computer Vision (2013)" Action Recognition with Improved Trajectories
year unknowncited by this paper

CITED BY

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection
2025cites this paper
Feature Fusion of Deep and Spatio-Temporal Features for Dynamic Scene Understanding
2025influential citation
SAGN: Semantic-Aware Graph Network for Remote Sensing Scene Classification
2023cites this paper
A Novel Multi-Modal Network-Based Dynamic Scene Understanding
2022cites this paper
Cam-Net: Compressed Attentive Multi-Granularity Network For Dynamic Scene Classification
2020cites this paper