Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Ashesh Jain,Amir Zamir,S. Savarese,Ashutosh Saxena

Published 2015 in Computer Vision and Pattern Recognition

ABSTRACT

Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatiotemporal graphs are a popular tool for imposing such high-level intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks (RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks.

PUBLICATION RECORD

Publication year
2015
Venue
Computer Vision and Pattern Recognition
Publication date
2015-11-17
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2016.573 arXiv 1511.05298
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback
2016cited by this paper
Fast R-CNN
2015cited by this paper
Objects2action: Classifying and Localizing Actions without Any Video Example
2015cited by this paper
Fully Connected Deep Structured Networks
2015influential reference
Visualizing and Understanding Recurrent Networks
2015cited by this paper
Conditional Random Fields as Recurrent Neural Networks
2015influential reference
Scene labeling with LSTM recurrent neural networks
2015cited by this paper
Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation
2015cited by this paper
Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation
2015influential reference
Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models
2015influential reference
Mind's eye: A recurrent visual representation for image caption generation
2015cited by this paper
Hierarchical recurrent neural network for skeleton based action recognition
2015cited by this paper
Semantic Image Segmentation via Deep Parsing Network
2015influential reference
Unsupervised Learning of Video Representations using LSTMs
2015cited by this paper
Recurrent Network Models for Human Dynamics
2015influential reference
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Recognizing object affordances in terms of spatio-temporal object-object relationships
2014cited by this paper
Overtaking vehicle detection using a spatio-temporal CRF
2014influential reference
Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments
2014cited by this paper
RECURRENT NEURAL NETWORKS
2014cited by this paper
Long-term recurrent convolutional networks for visual recognition and description
2014cited by this paper
Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks
2014cited by this paper
Learning Spatiotemporal Features with 3D Convolutional Networks
2014cited by this paper
Towards End-To-End Speech Recognition with Recurrent Neural Networks
2014cited by this paper
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
2014cited by this paper
Learning Deep Structured Models
2014cited by this paper
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
2014cited by this paper
Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation
2013cited by this paper
ACTIVE: Activity Concept Transitions in Video Event Classification
2013cited by this paper
PANDA: Pose Aligned Networks for Deep Attribute Modeling
2013cited by this paper
Generating Sequences With Recurrent Neural Networks
2013cited by this paper
Anticipating Human Activities Using Object Affordances for Reactive Robotic Response
2013influential reference
Learning human activities and object affordances from RGB-D videos
2012influential reference
Learning spatiotemporal graphs of human activities
2011cited by this paper
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
2011cited by this paper
Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models
2011cited by this paper
Track to the future: Spatio-temporal video segmentation with long-range motion cues
2011cited by this paper
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
2011cited by this paper
An Introduction to Conditional Random Fields
2010cited by this paper
Dynamical binary latent variable models for 3D human pose tracking
2010cited by this paper
Factored conditional restricted Boltzmann Machines for modeling motion style
2009cited by this paper
Cutting-plane training of structural SVMs
2009cited by this paper
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
2009cited by this paper
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
2009cited by this paper
FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs
2009cited by this paper
MoSIFT: Recognizing Human Actions in Surveillance Videos
2009cited by this paper
Curriculum learning
2009cited by this paper
Topologically-constrained latent variable models
2008cited by this paper
Key Object Driven Multi-category Object Recognition, Localization and Tracking Using Spatio-temporal Context
2008influential reference
Learning realistic human actions from movies
2008cited by this paper
The Recurrent Temporal Restricted Boltzmann Machine
2008cited by this paper
A discriminatively trained, multiscale, deformable part model
2008cited by this paper
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model
2007cited by this paper
Conditional random fields for activity recognition
2007cited by this paper
A Spatio-Temporal Probabilistic Model for Multi-Sensor Multi-Class Object Recognition
2007influential reference
Markov logic networks
2006cited by this paper
Modeling Human Motion Using Binary Latent Variables
2006cited by this paper
Conditional Random Fields for Object Recognition
2004cited by this paper
Discriminative Probabilistic Models for Relational Data
2002cited by this paper
Factor graphs and the sum-product algorithm
2001cited by this paper
Global training of document processing systems using graph transformer networks
1997cited by this paper
Globally Trained Handwritten Word Recognizer Using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models
1993cited by this paper
Author manuscript, published in "International Conference on Computer Vision (2013)" Action Recognition with Improved Trajectories
year unknowncited by this paper

CITED BY

Optimization of QoS in 5G optical networks for futuristic high-speed, low-latency applications
2026cites this paper
HT-GNN: Hyper-Temporal Graph Neural Network for Customer Lifetime Value Prediction in Baidu Ads
2026cites this paper
HuMo3D: Intention-Driven Dual-Branch Multimodal Human Motion Prediction in 3D Scenes
2026cites this paper
Dynamic spatiotemporal graph attention networks for cross-regional multi-disease forecasting and intervention optimization.
2026cites this paper
WeTRaC: Scalable EV charging demand forecasting for heavy-duty fleets
2026cites this paper
AR/VR等近眼显示系统中的晕动症问题
2025cites this paper
Robot Behavior Generation for Social Human-Robot Interaction
2025cites this paper
Evaluating Generative Vehicle Trajectory Models for Traffic Intersection Dynamics
2025cites this paper
AI-Driven Predictive Analytics for Cryptocurrency Price Volatility and Market Manipulation Detection
2025cites this paper
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
2025cites this paper
LiFedST: A linearized federated split-attention transformer for spatio-temporal forecasting
2025cites this paper
Reverberation: Learning the Latencies Before Forecasting Trajectories
2025cites this paper
How Does a Virtual Agent Decide Where to Look? Symbolic Cognitive Reasoning for Embodied Head Rotation
2025cites this paper
Highly Condensed All-MLP Architecture for Long-Term Human Motion Prediction
2025cites this paper
Deep Learning for Regular Raster Spatio-Temporal Prediction: An Overview
2025cites this paper
Multi-Resolution Haar Network: Enhancing human motion prediction via Haar transform
2025cites this paper
A Comprehensive Review of Deep Learning in Computer Vision for Monitoring Apple Tree Growth and Fruit Production
2025cites this paper
CacheFlow: Fast Human Motion Prediction by Cached Normalizing Flow
2025cites this paper
基于Transformer光变分类器的小天体短期目标自主识别技术
2025cites this paper
GGMotion: Group Graph Dynamics-Kinematics Network for Human Motion Prediction
2025cites this paper
Model Predictive Path Integral with Integrated Human Pose Prediction Network for Robotic Arm Motion Planning
2025cites this paper
From data fusion to dynamic reasoning: A survey on spatio-temporal knowledge graph construction and embedding methods
2025cites this paper
STGAT: Modeling Dynamic Spatial-Temporal Graphs with Attention and Dilated Convolutions
2025cites this paper
Human-Robot Interaction with Skeleton-Based Action Recognition and Motion Prediction
2025cites this paper
VahakAI Driver Monitoring System
2025cites this paper
AI-driven forecasting of rural development in India: a deep learning case study on the gram panchayat development plan
2025cites this paper
MetaSTH-Sleep: Towards Effective Few-Shot Sleep Stage Classification for Health Management with Spatial-Temporal Hypergraph Enhanced Meta-Learning
2025cites this paper
Progressively deeper attention networks for 3D human motion prediction
2025cites this paper
RelMap: Reliable Spatiotemporal Sensor Data Visualization via Imputative Spatial Interpolation
2025cites this paper
ALIEN: Implicit Neural Representations for Human Motion Prediction under Arbitrary Latency
2025cites this paper
Geometry-Aware Deep Learning for 3D Skeleton-Based Motion Prediction
2025influential citation
Evaluation of Body Parts Representations in Motion Reconstruction
2025cites this paper
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
2025cites this paper
Human Pose Estimation and Event Recognition via Feature Extraction and Neuro-Fuzzy Classifier
2025cites this paper
Attentive Radiate Graph for Pedestrian Trajectory Prediction in Disconnected Manifolds
2025cites this paper
LSTM and CNN Hybrid Model for Enhanced Fingerprint Recognition
2025cites this paper
Enhanced spatio-temporal motion prediction using transformer-augmented graph convolutional networks
2025cites this paper
MAGE:A Multi-stage Avatar Generator with Sparse Observations
2025cites this paper
HUMOF: Human Motion Forecasting in Interactive Social Scenes
2025cites this paper
Spatial-Similarity Dynamic Graph Bidirectional Double-Cell Network for Traffic Flow Prediction
2025cites this paper
Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic
2025cites this paper
A GCNN‐Based Method for Functional Zone Recognition by Integrating Building Spatial Morphology and Courtyard‐Level Context
2025cites this paper
Learning behavior aware features across spaces for improved 3D human motion prediction
2025cites this paper
Root Cause Analysis of Hydrogen Bond Separation in Spatio-Temporal Molecular Dynamics using Causal Models
2025cites this paper
Language-guided Recursive Spatiotemporal Graph Modeling for Video Summarization
2025cites this paper
Dynamic spatio-temporal graph convolutional networks with multi-scale dilated fusion attention for human motion prediction
2025cites this paper
Smart Contract Vulnerability Detection Based on Symbolic Execution and Graph Neural Networks
2025cites this paper
THADT: Temporal Hybrid Attention Diffusion Transformer for Human Pose Prediction
2025cites this paper
A Hybrid LSTM–STGNN Framework for Reliable Earthquake Magnitude Prediction
2025cites this paper
Irregular-patch graph attention network for underwater object detection
2025cites this paper
Optimizing human motion prediction through decoupled motion spatio-temporal trends
2025cites this paper
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling
2025cites this paper
Graph Neural Network (GNN) and its Application: A State-of-the-Art Survey
2025cites this paper
MetaSTH-Sleep: Towards Effective Few-Shot Sleep Stage Classification with Spatial-Temporal Hypergraph Enhanced Meta-Learning
2025cites this paper
FGO MythBusters: Explaining how Kalman Filter variants achieve the same performance as FGO in navigation applications
2025cites this paper
SPATS: a practical system for comparative analysis of spatio-temporal graph neural networks
2025influential citation
A Survey of Graph-Based Resource Management in Wireless Networks—Part II: Learning Approaches
2025cites this paper
Learning Human-Object Interactions in Videos by State Space Models
2025cites this paper
LuKAN: A Kolmogorov-Arnold Network Framework for 3D Human Motion Prediction
2025cites this paper
SpikeSTAG: Spatial-Temporal Forecasting via GNN-SNN Collaboration
2025cites this paper
Toward Physically Stable Motion Generation: A New Paradigm of Human Pose Representation
2025cites this paper
LAL: Enhancing 3D Human Motion Prediction with Latency-aware Auxiliary Learning
2025cites this paper
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
2025cites this paper
A masked autoencoder network for spatiotemporal predictive learning
2025cites this paper
Prediction of non-uniform reactions in PEMFC based on the multi-physics quantity fusion graph auto-encoder network
2025cites this paper
GraphMinNet: Learning Dependencies in Graphs with Light Complexity Minimal Architecture
2025influential citation
Machine Learning-based State of Charge Estimation: A Comparison between CatBoost model and C-BLSTM-AE model
2025cites this paper
Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions
2025cites this paper
Skeleton-Aware Representation of Spatio-Temporal Kinematics for 3D Human Motion Prediction
2025cites this paper
YOLO-PDC: algorithm for aluminum surface defect detection based on multiscale enhanced model of YOLOv7
2025cites this paper
Intention Reasoning for User Action Sequences via Fusion of Object Task and Object Action Affordances Based on Dempster–Shafer Theory
2025cites this paper
Refining Long-Term Predictions: Two-Stage Spatial-Temporal Feature Learning for 3D Human Motion Prediction
2025cites this paper
Temporal Continual Learning with Prior Compensation for Human Motion Prediction
2025cites this paper
A Semi-Conv-Transformer Model for Inflow Prediction of Newly Expanding Subway Lines
2025cites this paper
A keyframe weighted dual-channel attention GCN model for human skeleton motion prediction
2025cites this paper
Multi-stage human motion prediction algorithm based on spatiotemporal graph convolution
2025cites this paper
UniMotion: Bridging 2D and 3D Representations for Human Motion Prediction
2025influential citation
A Dual-Path Attention Fourier Convolutional Network for Human Motion Prediction
2025cites this paper
Motion In‐Betweening via Recursive Keyframe Prediction
2025cites this paper
Dynamic Node Graph Neural Network for Multimodal Music Recommendation
2025cites this paper
Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward
2025cites this paper
Gaussian Samples are not what you need
2024cites this paper
Dyn-GWN: Application of Graph Wave Networks on the Largest Traffic Dataset
2024cites this paper
Dual Self-attention Fusion Message Neural Network for Virtual Screening in Drug Discovery by Molecular Property Prediction
2024cites this paper
EgoCast: Forecasting Egocentric Human Pose in the Wild
2024cites this paper
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection
2024cites this paper
A Survey on Graph Neural Networks and its Applications in Various Domains
2024cites this paper
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments
2024cites this paper
Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation
2024cites this paper
Short-term Freeway Traffic Speed Multistep Prediction using an iTransformer Model
2024cites this paper
Joint-Aware Transformer: An Inter-Joint Correlation Encoding Transformer for Short-Term 3D Human Motion Prediction
2024cites this paper
Future Motion Dynamic Modeling via Hybrid Supervision for Multi-Person Motion Prediction Uncertainty Reduction
2024cites this paper
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations
2024cites this paper
MotionMap: Representing Multimodality in Human Pose Forecasting
2024cites this paper
Prompting Future Driven Diffusion Model for Hand Motion Prediction
2024cites this paper
MDMP: Multi-Modal Diffusion for Supervised Motion Predictions with Uncertainty
2024cites this paper
Interpreting Temporal Graph Neural Networks with Koopman Theory
2024cites this paper
STGformer: Efficient Spatiotemporal Graph Transformer for Traffic Forecasting
2024cites this paper
BadHMP: Backdoor Attack Against Human Motion Prediction
2024cites this paper
Robust Traffic Forecasting against Spatial Shift over Years
2024cites this paper