RGB-D Action Recognition Using Multimodal Correlative Representation Learning Model

Published 2019 in IEEE Sensors Journal

ABSTRACT

The launching of low-cost depth sensors opens up new potentials for RGB-D-based human action recognition. However, most of current RGB-D-based methods simply fuse multimodal features in a holistic manner and ignore the latent connections among different modalities. In this paper, we propose a multimodal correlative representation learning (MCRL) model for human action recognition from RGB-D videos. Specifically, we propose a spatio-temporal pyramid Fourier HOG feature (STPF-HOG) to capture local dynamic patterns around each human joint, which integrates both spatial arrangement and temporal structures. The proposed MCRL model utilizes multimodal data (skeleton, depth, and RGB), and learns shared structures among different modalities. The original low-level features are effectively compressed and projected to a latent subspace. Then, the discriminative shared features are learned by a supervised fashion. We formulate both subspace learning and shared features mining in a modified multi-task learning framework and solve the formulation using an iterative optimization algorithm. To perform computationally efficient action recognition, a robust collaborative representation classifier is presented by introducing a weight regularization matrix. Experimental results on three action datasets demonstrate that the proposed method leads to a more favorable performance compared with the state-of-the-art methods.

PUBLICATION RECORD

Publication year
2019
Venue
IEEE Sensors Journal
Publication date
2019-03-01
Fields of study
Computer Science
Identifiers
DOI 10.1109/JSEN.2018.2884443
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition
2018cited by this paper
Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks
2018cited by this paper
Vision-Based Human Action Classification Using Adaptive Boosting Algorithm
2018cited by this paper
Skeleton embedded motion body partition for human action recognition using depth sequences
2018cited by this paper
Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks
2018cited by this paper
Specificity and Latent Correlation Learning for Action Recognition Using Synthetic Multi-View Data From Depth Maps
2017cited by this paper
Learning Approximate Stochastic Transition Models
2017cited by this paper
Max-Margin Heterogeneous Information Machine for RGB-D Action Recognition
2017cited by this paper
Unsupervised Learning of Long-Term Motion Dynamics for Videos
2017cited by this paper
Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks
2017cited by this paper
The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences
2017cited by this paper
Joint Distance Maps Based Action Recognition With Convolutional Neural Networks
2017cited by this paper
SkeletonNet: Mining Deep Part Features for 3-D Action Recognition
2017cited by this paper
Fisherposes for Human Action Recognition Using Kinect Sensor Data
2017cited by this paper
Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks
2016influential reference
Action Recognition From Depth Maps Using Deep Convolutional Neural Networks
2016influential reference
Discriminative Relational Representation Learning for RGB-D Action Recognition
2016influential reference
Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos
2016cited by this paper
Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks
2016cited by this paper
Real-Time RGB-D Activity Prediction by Soft Regression
2016cited by this paper
Human action recognition using RGB data
2016cited by this paper
Human action recognition using RGB-D sensor and deep convolutional neural networks
2016cited by this paper
Structure-Preserving Binary Representations for RGB-D Action Recognition
2016cited by this paper
Activity recognition using a supervised non-parametric hierarchical HMM
2016cited by this paper
Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition
2015cited by this paper
UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor
2015cited by this paper
DMMs-Based Multiple Features Fusion for Human Action Recognition
2015cited by this paper
A Survey on Human Action Recognition Using Depth Sensors
2015cited by this paper
Jointly Learning Heterogeneous Features for RGB-D Activity Recognition
2015influential reference
Learning Actionlet Ensemble for 3D Human Action Recognition
2014influential reference
Super Normal Vector for Activity Recognition Using Depth Sequences
2014cited by this paper
Multi-modal feature fusion for action recognition in RGB-D sequences
2014cited by this paper
Joint Collaborative Representation With Multitask Learning for Hyperspectral Image Classification
2014cited by this paper
Evaluating spatiotemporal interest point features for depth-based action recognition
2014cited by this paper
A Convex Formulation for Learning a Shared Predictive Structure from Multiple Tasks
2013cited by this paper
HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences
2013influential reference
A feasible method for optimization with orthogonality constraints
2012cited by this paper
Order-Preserving Sparse Coding for Sequence Classification
2012cited by this paper
Learning human activities and object affordances from RGB-D videos
2012cited by this paper
Real-time identification and localization of body parts from depth images
2010cited by this paper
A Convex Formulation for Learning Task Relationships in Multi-Task Learning
2010cited by this paper
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
2006cited by this paper

CITED BY

Optimizing action recognition: a residual convolution with hierarchical and gram matrix based attention mechanisms
2025cites this paper
UCFFormer: Recognizing human actions from multimodal sensors using unified contrastive fusion transformer
2025cites this paper
Attention mechanism based multimodal feature fusion network for human action recognition
2025cites this paper
V-DixMatch: A Semi-Supervised Learning Method for Human Action Recognition in Night Video Sensing
2024cites this paper
Research on the Stability of Pattern Recognition Collaborative Algorithms Based on Data Mining
2024cites this paper
Structural feature representation and fusion of human spatial cooperative motion for action recognition
2023cites this paper
MAVEN: A Memory Augmented Recurrent Approach for Multimodal Fusion
2023cites this paper
Multimodal cooperative self-attention network for action recognition
2023cites this paper
Depth Sensors-Based Action Recognition Using a Modified K-Ary Entropy Classifier
2023cites this paper
Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition
2023cites this paper
A BERT-Based Joint Channel–Temporal Modeling for Action Recognition
2023cites this paper
Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition
2023cites this paper
Real-Time Human Action Representation Based on 2D Skeleton Joints
2023cites this paper
VT-BPAN: vision transformer-based bilinear pooling and attention network fusion of RGB and skeleton features for human action recognition
2023cites this paper
MuMu: Cooperative Multitask Learning-Based Guided Multimodal Fusion
2022influential citation
Spatial-Temporal Information Aggregation and Cross-Modality Interactive Learning for RGB-D-Based Human Action Recognition
2022cites this paper
Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks
2021influential citation
Segment spatial-temporal representation and cooperative learning of convolution neural networks for multimodal-based action recognition
2021cites this paper
Multi-GAT: A Graphical Attention-Based Hierarchical Multimodal Representation Learning Approach for Human Activity Recognition
2021cites this paper
Finding Achilles' Heel: Adversarial Attack on Multi-modal Action Recognition
2020influential citation
Joint learning of convolution neural networks for RGB‐D‐based human action recognition
2020cites this paper
DTMMN: Deep transfer multi-metric network for RGB-D action recognition
2020cites this paper
Multiple depth-levels features fusion enhanced network for action recognition
2020cites this paper
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
2020influential citation
Disposable Multiwall Carbon Nanotubes Based Screen Printed Electrochemical Sensor With Improved Sensitivity for the Assay of Daclatasvir: Hepatitis C Antiviral Drug
2019cites this paper