Stacked Hourglass Networks for Human Pose Estimation

Published 2016 in European Conference on Computer Vision

ABSTRACT

This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.

PUBLICATION RECORD

Publication year
2016
Venue
European Conference on Computer Vision
Publication date
2016-03-22
Fields of study
Computer Science
Identifiers
DOI 10.1007/978-3-319-46484-8_29 arXiv 1603.06937
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Deep learning for human part discovery in images
2016cited by this paper
Convolutional Pose Machines
2016cited by this paper
Semi-supervised Learning with Ladder Networks
2015cited by this paper
Holistically-Nested Edge Detection
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
2015cited by this paper
Deep multi-scale video prediction beyond mean square error
2015cited by this paper
Learning Deconvolution Network for Semantic Segmentation
2015cited by this paper
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
2015cited by this paper
Flowing ConvNets for Human Pose Estimation in Videos
2015cited by this paper
Human Pose Estimation with Iterative Error Feedback
2015influential reference
Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians
2015cited by this paper
Deep Reflectance Maps
2015cited by this paper
Stacked What-Where Auto-encoders
2015cited by this paper
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
2015cited by this paper
Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Fully convolutional networks for semantic segmentation
2014influential reference
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
2014influential reference
DeepEdge: A multi-scale bifurcated deep network for top-down contour detection
2014cited by this paper
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
2014cited by this paper
Recurrent Convolutional Neural Networks for Scene Labeling
2014cited by this paper
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations
2014cited by this paper
Parsing occluded people by flexible compositions
2014cited by this paper
Hypercolumns for object segmentation and fine-grained localization
2014cited by this paper
Pose Machines: Articulated Pose Estimation via Inference Machines
2014cited by this paper
MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation
2014cited by this paper
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
2014cited by this paper
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Efficient object localization using Convolutional Networks
2014cited by this paper
DeepPose: Human Pose Estimation via Deep Neural Networks
2013influential reference
MODEC: Multimodal Decomposable Models for Human Pose Estimation
2013influential reference
Strong Appearance and Expressive Spatial Models for Human Pose Estimation
2013cited by this paper
Learning Hierarchical Features for Scene Labeling
2013cited by this paper
Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
2013cited by this paper
Indoor Semantic Segmentation using depth information
2013cited by this paper
Articulated Human Detection with Flexible Mixtures of Parts
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Learning effective human pose estimation from inaccurate annotation
2011cited by this paper
Real-time human pose recognition in parts from single depth images
2011cited by this paper
Torch7: A Matlab-like Environment for Machine Learning
2011cited by this paper
Deconvolutional networks
2010cited by this paper
Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation
2010cited by this paper
Poselets: Body part detectors trained using 3D human pose annotations
2009cited by this paper
A discriminatively trained, multiscale, deformable part model
2008cited by this paper
Progressive search space reduction for human pose estimation
2008cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper

CITED BY

HPPRNet: Human parsing enabled pose refinement network for human activity recognition
2026cites this paper
YOLD: You Only Look Denseness for Tiny Object Detection in Aerial Images
2026cites this paper
A Cross Self-Attention Feature Fusion Module for 2D Multiple Human Pose Estimation
2026cites this paper
Sit‐to‐Stand Power From 2D Pose Estimation as an Indicator of Muscle Strength in Older Adults
2026cites this paper
Double-Chain Graph Convolution Transformer for 3D Human Pose Estimation
2026cites this paper
ECCFNet: enhanced context-aware cross-resolution fusion network for human pose estimation
2026influential citation
Automated Extraction of 3-D Windows From MVS Point Clouds by Comprehensive Fusion of Multitype Features
2026cites this paper
Shadow vanishing point detection via combined human/shadow adaptive modulation
2026cites this paper
Camera-Space Hand Mesh Reconstruction from a Monocular Image via Pseudo Stereo Perception
2026cites this paper
Improving generative adversarial network generalization for facialexpression synthesis
2026cites this paper
Exploiting Class-agnostic Visual Prior for Few-shot Keypoint Detection
2026cites this paper
Real-Time Rat Pose Estimation System via Miniature Stereo Vision for Robot-Rat Interaction
2026cites this paper
GS-UNet: ConvNeXt-based keypoint-driven visual servoing with cross-hierarchical attention gating for high-precision robotic assembly
2026cites this paper
Differentiable Neural Architecture Search for medical image segmentation: A systematic review and field audit.
2026cites this paper
MSPNet: A Multiscale Pyramid Network for Semantic Segmentation of Urban-Scale Photogrammetric Point Clouds
2026cites this paper
Graph-Based and Multi-Stage Constraints for Hand–Object Reconstruction
2026cites this paper
HEViTPose: towards high-accuracy and efficient 2D human pose estimation with cascaded group spatial reduction attention
2026influential citation
Learning Structural Relations for Robust Chest X-Ray Landmark Detection
2026cites this paper
Local-Global Feature Fusion for Enhancing 3D Human Pose Estimation
2026cites this paper
Automatic localization of cross sections corresponding to standard transesophageal echocardiographic views on computed tomography volume
2026cites this paper
A Study on Real-time Object Detection using Deep Learning
2026cites this paper
From Frames to Sequences: Temporally Consistent Human-Centric Dense Prediction
2026cites this paper
Measurement of Echocardiographic Parameters in Various Cardiac Phases: A Comparative Study of Deep Learning Models and Keypoint Representations
2026cites this paper
DSVTformer: Dual-stream Spatial-View-Temporal Transformer for multi-view 3D human pose estimation
2026cites this paper
Pose Under Covers: In-Bed Human Pose Estimation Using Multisensor Image Fusion and Graph Convolutional Network
2026cites this paper
GHGPSE-Net: a method towards spaceborne automated extraction of greenhouse-gas point sources using point-object-detection deep neural network
2026cites this paper
3D landmark detection on human point clouds: A benchmark and a dual cascade point transformer framework
2026cites this paper
RePose: A Real-Time 3D Human Pose Estimation and Biomechanical Analysis Framework for Rehabilitation
2026cites this paper
PECC: Position Encoding Coordinate Classification System Design for Human Pose Estimation
2026cites this paper
Balancing Speed and Accuracy: A Mouth-Eye 3D Keypoints Alignment Framework for Large Poses
2026cites this paper
Co-HSC: Complementary image-mesh fusion for dense human-scene contact estimation
2026cites this paper
Accurate urban solar potential estimation empowered by multimodal 3-D building reconstruction: a case study in Landshut, Germany
2026cites this paper
Aligning Computer Vision with Expert Assessment: An Adaptive Hybrid Framework for Real-Time Fatigue Assessment in Smart Manufacturing
2026cites this paper
K-nearest neighbor-enhanced Residual Learning Framework for image restoration
2026cites this paper
Multi-Modal Multi-Stage Multi-Task Learning for Occlusion-Aware Facial Landmark Localisation
2026cites this paper
MSTPFormer: Mamba-driven spatiotemporal bidirectional dual-stream parallel transformer for 3D human pose estimation
2026cites this paper
Deep learning for object detection: state of the art, challenges, and future directions
2026cites this paper
HyT-Pose: Accurate Pose Estimation by Iteratively Fusing Global and Local Context
2026cites this paper
Advancing depth-based semi-supervised three-dimensional hand pose estimation with consistency training
2026cites this paper
OccFace: Unified Occlusion-Aware Facial Landmark Detection with Per-Point Visibility
2026cites this paper
Development of a Novel Deep Learning-Based Gaze Estimation Method for Detecting Strabismus
2026cites this paper
HandFS: Wavelet-guided Frequency-Spatial Domain Feature Decoupling Network for 3D Hand Pose Estimation Under Occlusion
2026cites this paper
SPRITETOMESH: Automatic Mesh Generation for 2D Skeletal Animation Using Learned Segmentation and Contour-Aware Vertex Placement
2026cites this paper
Transformers Outperform ConvNets for Root Segmentation: A Systematic Comparison Across Nine Datasets
2026cites this paper
Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement
2026cites this paper
ROM-Pose: restoring occluded mask image for 2D human pose estimation
2025cites this paper
Partition Map-Based Fast Block Partitioning for VVC Inter Coding
2025cites this paper
3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer
2025cites this paper
InLite-HRNet: efficient band convolution network for human pose estimation
2025cites this paper
Action Recognition via Multi-View Perception Feature Tracking for Human-Robot Interaction
2025cites this paper
BR-Pose: enhancing human pose estimation through Bi-level routing attention and multi-level weight fusion
2025cites this paper
Healthy and Unhealthy Oil Palm Tree Detection Using Deep Learning Method
2025cites this paper
Low-resolution human pose estimation and action recognition via pose-driven super-resolution reconstruction
2025influential citation
AcuSim: A Synthetic Dataset for Cervicocranial Acupuncture Points Localisation
2025cites this paper
SKAD: A Unified Framework Guided by Structural Knowledge for Anomaly Detection of Dampers in Transmission Lines
2025cites this paper
A Hierarchical Progressive Perception System for Autonomous Luggage Trolley Collection
2025cites this paper
Sparse and transferable three-dimensional dynamic vascular reconstruction for instantaneous diagnosis
2025cites this paper
Robust keypoint-based method for peduncle pose estimation in unstructured environments
2025cites this paper
Generation driven understanding of localized 3D scenes with 3D diffusion model
2025cites this paper
Uncertainty Quantification and Quality Control for Heatmap-Based Landmark Detection Models
2025cites this paper
VisualCent: Visual Human Analysis using Dynamic Centroid Representation
2025cites this paper
Crowd Detection Using Very-Fine-Resolution Satellite Imagery
2025cites this paper
Monocular 3D hand pose estimation based on high-resolution network
2025cites this paper
YOLO-RS: Remote Sensing Enhanced Crop Detection Methods
2025cites this paper
Self-supervised keypoint detection based on affine transformation
2025cites this paper
PFLO: a high-throughput pose estimation model for field maize based on YOLO architecture
2025cites this paper
AI Method for LAMOST Fiber Detection Based on Front Illumination
2025cites this paper
Fetal Cerebellum Landmark Detection Based on 3D MRI: Method and Benchmark
2025cites this paper
Adaptive Detection of Fast-moving Celestial Objects Using a Mixture-of-experts and Physical-inspired Neural Network
2025cites this paper
A Survey on Deep Learning-Based Lane Detection Algorithms for Camera and LiDAR
2025cites this paper
Multi-Person Pose Estimation with Feature Enhancement and Decoupling Based on Contrastive Learning
2025cites this paper
Calibrating the Principal Point of Vehicle-Mounted Fisheye Cameras Using Point-Oriented Representation
2025cites this paper
ListPose: Lightweight and Implicit Spatial-Temporal Modeling with TokenPose for Video-Based Pose Estimation
2025cites this paper
A Survey of the State of the Art in Monocular 3D Human Pose Estimation: Methods, Benchmarks, and Challenges
2025cites this paper
A review of transformer-based human pose estimation: Delving into the relation modeling
2025cites this paper
Anti-drift pose tracker (ADPT), a transformer-based network for robust animal pose estimation cross-species
2025cites this paper
Spatial–Temporal–Geometric Graph Convolutional Network for 3-D Human Pose Estimation From Multiview Video
2025cites this paper
YOLO-MousePose: A Novel Framework and Dataset for Mouse Pose Estimation From a Top–Down View
2025cites this paper
Artificial neural networks for finger vein recognition: A survey
2025cites this paper
OA-WinSeg: Occlusion-Aware Window Segmentation With Conditional Adversarial Training Guided by Structural Prior Information
2025cites this paper
Perception-Enhanced Network for Accurate Human Pose Estimation
2025cites this paper
DCFormer: Divide-and-Conquer in 3D Human Pose Estimation Tasks
2025cites this paper
Artificial Intelligence in Fitness: Pose Estimation and Movement Correction
2025cites this paper
Diffusion-Refinement Pose Estimation With Hybrid Representation
2025influential citation
WiPE: Privacy-Friendly WiFi-Based Human Pose Estimation on Consumer Platform
2025cites this paper
Enhanced Nighttime Vehicle Detection for On-Board Processing
2025influential citation
Occluded human pose estimation based on part-aware discrete diffusion priors
2025cites this paper
Uncertainty-aware Long-tailed Weights Model the Utility of Pseudo-labels for Semi-supervised Learning
2025cites this paper
Advancing Active Speaker Detection for Egocentric Videos
2025cites this paper
MSANet: Mixed Spectral and Attention Network for Robust 3D Human Pose Estimation
2025cites this paper
Deep learning for recognition and detection of plant diseases and pests
2025cites this paper
Transformer-based weakly supervised 3D human pose estimation
2025cites this paper
Cricket Shot Analysis using Conditional Directed Spatio-Temporal Graph networks
2025cites this paper
VertexNet: a table structure recognition model based on key points for complex scenes
2025cites this paper
Single-Person 3D Human Pose Estimation Based on Deep Learning: A Review
2025cites this paper
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
2025cites this paper
Automatic Physical Examination Segmentation within Objective Structured Clinical Examination Videos
2025cites this paper
InfraEyeNet: Infrared eye landmark detection network with modified bottleneck module
2025cites this paper
HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network
2025influential citation
Lightweight CA-YOLOv7-Based Badminton Stroke Recognition: A Real-Time and Accurate Behavior Analysis Method
2025cites this paper