The launching of low-cost depth sensors opens up new potentials for RGB-D-based human action recognition. However, most of current RGB-D-based methods simply fuse multimodal features in a holistic manner and ignore the latent connections among different modalities. In this paper, we propose a multimodal correlative representation learning (MCRL) model for human action recognition from RGB-D videos. Specifically, we propose a spatio-temporal pyramid Fourier HOG feature (STPF-HOG) to capture local dynamic patterns around each human joint, which integrates both spatial arrangement and temporal structures. The proposed MCRL model utilizes multimodal data (skeleton, depth, and RGB), and learns shared structures among different modalities. The original low-level features are effectively compressed and projected to a latent subspace. Then, the discriminative shared features are learned by a supervised fashion. We formulate both subspace learning and shared features mining in a modified multi-task learning framework and solve the formulation using an iterative optimization algorithm. To perform computationally efficient action recognition, a robust collaborative representation classifier is presented by introducing a weight regularization matrix. Experimental results on three action datasets demonstrate that the proposed method leads to a more favorable performance compared with the state-of-the-art methods.
RGB-D Action Recognition Using Multimodal Correlative Representation Learning Model
Tianshan Liu,Jun Kong,Min Jiang
Published 2019 in IEEE Sensors Journal
ABSTRACT
PUBLICATION RECORD
- Publication year
2019
- Venue
IEEE Sensors Journal
- Publication date
2019-03-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-42 of 42 references · Page 1 of 1
CITED BY
Showing 1-25 of 25 citing papers · Page 1 of 1