Entropy-driven Unsupervised Keypoint Representation Learning in Videos

A. Younes,Simone Schaub-Meyer,Georgia Chalvatzaki

Published 2022 in International Conference on Machine Learning

ABSTRACT

Extracting informative representations from videos is fundamental for effectively learning various downstream tasks. We present a novel approach for unsupervised learning of meaningful representations from videos, leveraging the concept of image spatial entropy (ISE) that quantifies the per-pixel information in an image. We argue that \textit{local entropy} of pixel neighborhoods and their temporal evolution create valuable intrinsic supervisory signals for learning prominent features. Building on this idea, we abstract visual features into a concise representation of keypoints that act as dynamic information transmitters, and design a deep learning model that learns, purely unsupervised, spatially and temporally consistent representations \textit{directly} from video frames. Two original information-theoretic losses, computed from local entropy, guide our model to discover consistent keypoint representations; a loss that maximizes the spatial information covered by the keypoints and a loss that optimizes the keypoints' information transportation over time. We compare our keypoint representation to strong baselines for various downstream tasks, \eg, learning object dynamics. Our empirical results show superior performance for our information-driven keypoints that resolve challenges like attendance to static and dynamic objects or objects abruptly entering and leaving the scene.

PUBLICATION RECORD

Publication year
2022
Venue
International Conference on Machine Learning
Publication date
2022-09-30
Fields of study
Computer Science
Identifiers
arXiv 2209.15404
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression
2022cited by this paper
Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos
2021cited by this paper
Self-Supervised Keypoint Discovery in Behavioral Videos
2021cited by this paper
Information-Theoretic Methods in Deep Neural Networks: Recent Advances and Emerging Opportunities
2021cited by this paper
An Efficient Image-to-Image Translation HourGlass-based Architecture for Object Pushing Policy Learning
2021cited by this paper
Generalization and Robustness Implications in Object-Centric Learning
2021cited by this paper
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation
2021cited by this paper
Illiterate DALL-E Learns to Compose
2021cited by this paper
PolarNet: Learning to Optimize Polar Keypoints for Keypoint Based Object Detection
2021cited by this paper
Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
2021cited by this paper
Graph Stacked Hourglass Networks for 3D Human Pose Estimation
2021cited by this paper
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
2021cited by this paper
Human Representation Learning.
2021cited by this paper
SEKD: Self-Evolving Keypoint Detection and Description
2020cited by this paper
Multimodal Learning of Keypoint Predictive Models for Visual Object Manipulation
2020cited by this paper
Unsupervised Object Keypoint Learning using Local Spatial Predictability
2020cited by this paper
The MAGICAL Benchmark for Robust Imitation
2020influential reference
Causal Discovery in Physical Systems from Videos
2020cited by this paper
Object-Centric Learning with Slot Attention
2020cited by this paper
DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation
2020cited by this paper
Training Deep Energy-Based Models with f-Divergence Minimization
2020cited by this paper
Plan2Vec: Unsupervised Representation Learning by Latent Plans
2020cited by this paper
Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation
2019cited by this paper
SuperGlue: Learning Feature Matching With Graph Neural Networks
2019cited by this paper
CenterNet: Keypoint Triplets for Object Detection
2019cited by this paper
A Mutual Information Maximization Perspective of Language Representation Learning
2019cited by this paper
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
2019cited by this paper
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
2019influential reference
PyTorch: An Imperative Style, High-Performance Deep Learning Library
2019influential reference
Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction
2019cited by this paper
Self-Supervised Correspondence in Visuomotor Policy Learning
2019cited by this paper
Unsupervised Learning of Object Keypoints for Perception and Control
2019influential reference
Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation
2019cited by this paper
Unsupervised Learning of Object Structure and Dynamics from Videos
2019influential reference
Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters
2019cited by this paper
Simitate: A Hybrid Imitation Learning Benchmark
2019influential reference
PifPaf: Composite Fields for Human Pose Estimation
2019cited by this paper
Local-Entropy Based Approach for X-Ray Image Segmentation and Fracture Detection
2019cited by this paper
An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation
2019influential reference
LF-Net: Learning Local Features from Images
2018cited by this paper
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
2018cited by this paper
Image-Dependent Local Entropy Models for Learned Image Compression
2018cited by this paper
Representation Learning with Contrastive Predictive Coding
2018cited by this paper
Learning deep representations by mutual information estimation and maximization
2018cited by this paper
Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation
2018influential reference
Unsupervised Learning of Object Landmarks through Conditional Image Generation
2018cited by this paper
From Coarse to Fine: Robust Hierarchical Localization at Large Scale
2018cited by this paper
A J.M.
2018cited by this paper
SuperPoint: Self-Supervised Interest Point Detection and Description
2017influential reference
Interpretable Convolutional Neural Networks
2017cited by this paper
SIFT Meets CNN: A Decade Survey of Instance Retrieval
2016cited by this paper
LIFT: Learned Invariant Feature Transform
2016cited by this paper
Stacked Hourglass Networks for Human Pose Estimation
2016influential reference
Interaction Networks for Learning about Objects, Relations and Physics
2016influential reference
Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields
2016cited by this paper
End-to-End Training of Deep Visuomotor Policies
2015influential reference
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
2015influential reference
On the Information Theoretic Limits of Learning Ising Models
2014cited by this paper
Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study
2013cited by this paper
State-of-the-Art in Visual Attention Modeling
2013cited by this paper
Dynamic Saliency Models and Human Attention: A Comparative Study on Videos
2012cited by this paper
Multimodal Learning
2012cited by this paper
Shannon Entropy based Randomness Measurement and Test for Image Encryption
2011cited by this paper
ORB: An efficient alternative to SIFT or SURF
2011influential reference
Distinctive Image Features from Scale-Invariant Keypoints Abstract by Matthijs Dorst Based on the paper by
2011cited by this paper
Computer Vision - Algorithms and Applications
2011cited by this paper
What is an object?
2010influential reference
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study
2010influential reference
Understanding the difficulty of training deep feedforward neural networks
2010cited by this paper
Relative Entropy Policy Search
2010cited by this paper
VISION A Computational Investigation into the Human Representation and Processing of Visual Information
2009influential reference
Visual Saliency Based on Conditional Entropy
2009cited by this paper
Computation of Image Spatial Entropy Using Quadrilateral Markov Random Field
2009influential reference
A comparison study of image spatial entropy
2009influential reference
Entropy-based Image Registration
2006influential reference
A Mathematical Theory of Communication
2006cited by this paper
Detection of nuclei in 4D Nomarski DIC microscope images of early Caenorhabditis elegans embryos using local image entropy and object tracking
2005cited by this paper
Statistical Inference Based on Divergence Measures
2005cited by this paper
Saliency Based on Information Maximization
2005influential reference
Distinctive Image Features from Scale-Invariant Keypoints
2004influential reference
An Information Maximization Model of Eye Movements
2004cited by this paper
Attentive Object Detection Using an Information Theoretic Saliency Measure
2004cited by this paper
A Comparison Study
2003cited by this paper
Entropy-based representation of image information
2002cited by this paper
Markov Random Field Modeling in Image Analysis
2001cited by this paper
Saliency, Scale and Image Description
2001cited by this paper
Evaluation of Interest Point Detectors
2000cited by this paper
Using spatial information as an aid to maximum entropy image threshold selection
1996influential reference
Mental representation of three-dimensional objects in visual problem solving and recognition.
1990cited by this paper
Self-organization in a perceptual network
1988cited by this paper
A Combined Corner and Edge Detector
1988cited by this paper
Image reconstruction from incomplete and noisy data
1978cited by this paper
On Estimation of a Probability Density Function and Mode
1962cited by this paper
Quantitative analysis
year unknowncited by this paper

CITED BY

LEARNet: A Learning Entropy-Aware Representation Network for Educational Video Understanding
2025cites this paper