An end-to-end generative framework for video segmentation and recognition

Published 2015 in IEEE Workshop/Winter Conference on Applications of Computer Vision

ABSTRACT

We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system is evaluated for both the recognition of complex activities as well as their parsing into action units. Using a variety of video datasets ranging from human cooking activities to animal behaviors, our experiments demonstrate that the resulting architecture outperforms state-of-the-art approaches for larger datasets, i.e. when sufficient amount of data is available for training structured generative models.

PUBLICATION RECORD

Publication year
2015
Venue
IEEE Workshop/Winter Conference on Applications of Computer Vision
Publication date
2015-09-07
Fields of study
Computer Science
Identifiers
DOI 10.1109/WACV.2016.7477701 arXiv 1509.01947
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities
2014cited by this paper
From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity
2014influential reference
Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts
2014cited by this paper
Temporal Sequence Modeling for Video Event Detection
2014cited by this paper
Parsing Videos of Actions with Segmental Grammars
2014cited by this paper
Multiple Granularity Analysis for Fine-Grained Action Detection
2014cited by this paper
Action Recognition with Stacked Fisher Vectors
2014cited by this paper
Author manuscript, published in "International Journal of Computer Vision (2013)" International Journal of Computer Vision manuscript No. (will be inserted by the editor) Image Classification with the Fisher Vector: Theory and Practice
2013cited by this paper
Combining embedded accelerometers with computer vision for recognizing food preparation activities
2013cited by this paper
Dense Trajectories and Motion Boundary Descriptors for Action Recognition
2013cited by this paper
Large-scale web video event classification by use of Fisher Vectors
2013cited by this paper
Daily Living Activities Recognition via Efficient High and Low Level Cues Combination and Fisher Kernel Representation
2013cited by this paper
Action and Event Recognition with Fisher Vectors on a Compact Feature Set
2013cited by this paper
Modeling Actions through State Changes
2013cited by this paper
A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition
2012cited by this paper
Aggregating Local Image Descriptors into Compact Codes
2012influential reference
Social behavior recognition in continuous video
2012influential reference
A database for fine grained activity detection of cooking activities
2012influential reference
LIBSVM: A library for support vector machines
2011cited by this paper
Unsupervised learning of event AND-OR grammar and semantics from video
2011cited by this paper
Modeling human activities as speech
2011cited by this paper
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification
2010influential reference
Improving the Fisher Kernel for Large-Scale Image Classification
2010influential reference
Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations
2010cited by this paper
Activity recognition using the velocity histories of tracked keypoints
2009influential reference
Temporal segmentation and activity classification from first-person sensing
2009cited by this paper
Semantic Representation and Recognition of Continued and Recursive Human Activities
2009cited by this paper
Fisher Kernels on Visual Vocabularies for Image Categorization
2007cited by this paper
Conditional models for contextual human motion recognition
2006cited by this paper
The HTK book version 3.4
2006cited by this paper
Discovering a Language for Human Activity 1
2005cited by this paper
View-Invariant Representation and Recognition of Actions
2002cited by this paper
Exploiting Generative Models in Discriminative Classifiers
1998cited by this paper
The HTK book
1995cited by this paper
A test for normality of observations and regression residuals
1987cited by this paper
On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown
1967cited by this paper
Author manuscript, published in "International Conference on Computer Vision (2013)" Action Recognition with Improved Trajectories
year unknowncited by this paper

CITED BY

Hierarchical Action Learning for Weakly-Supervised Action Segmentation
2026cites this paper
Improving action segmentation via explicit similarity measurement
2026cites this paper
Knowledge Graph Completion for Action Prediction on Situational Graphs - A Case Study on Household Tasks
2025cites this paper
Pose-Aware Weakly-Supervised Action Segmentation
2025cites this paper
2s-TAS: Two-Stream Transformer for Multi-modal Human Action Segmentation
2025cites this paper
F3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
2025cites this paper
Unsupervised temporal action segmentation with sample discrimination training and alignment-based boundary refinement
2025cites this paper
Multi-modal temporal action segmentation for manufacturing scenarios
2025cites this paper
Understanding Multi-Task Activities from Single-Task Videos
2025cites this paper
Looking into the Unknown: Exploring Action Discovery for Segmentation of Known and Unknown Actions
2025cites this paper
Cluster Assumption-Guided Timestamp-Supervised Temporal Action Segmentation
2025cites this paper
Graph-Cut-Based Semantic Optimization for Temporal Action Segmentation
2025cites this paper
Enhancing Temporal Action Segmentation with Large Language Models
2025influential citation
Cluster-Refined Optimal Transport for Unsupervised Action Segmentation
2025cites this paper
Not all samples are equal: Boosting action segmentation via selective incremental learning
2025cites this paper
Skeleton Motion Words for Unsupervised Skeleton-Based Temporal Action Segmentation
2025cites this paper
Towards Generalizing Temporal Action Segmentation to Unseen Views
2025cites this paper
Fine-grained Action Segmentation Network Based on Boundary Perception
2025cites this paper
Improving action segmentation via explicit similarity measurement
2025cites this paper
Computational segmentation of Wayang Kulit video recordings using a Cross-Attention Temporal Model
2024cites this paper
MGRFormer: A Multimodal Transformer Approach for Surgical Gesture Recognition
2024cites this paper
Learning Human Action Representations from Temporal Context in Lifestyle Vlogs
2024cites this paper
Faster Diffusion Action Segmentation
2024cites this paper
Hierarchical Vector Quantization for Unsupervised Action Segmentation
2024cites this paper
Error Detection in Egocentric Procedural Task Videos
2024cites this paper
HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection
2024cites this paper
Boundary-sensitive denoised temporal reasoning network for video action segmentation
2024cites this paper
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
2024cites this paper
Weakly-Supervised Action Learning in Procedural Task Videos via Process Knowledge Decomposition
2024cites this paper
Step Differences in Instructional Video
2024cites this paper
Human Activity Recognition Based On Video Summarization And Deep Convolutional Neural Network
2024cites this paper
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
2024cites this paper
Pose-aware video action segmentation
2024cites this paper
Human Motion Capture Data Segmentation Based on ST-GCN
2024cites this paper
CookingINWild: Unleashing the Challenges of Indian Cuisine Cooking Videos for Action Recognition
2024cites this paper
FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation
2024cites this paper
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
2024cites this paper
Random Walks for Temporal Action Segmentation with Timestamp Supervision
2024cites this paper
Egocentric Human Activities Recognition With Multimodal Interaction Sensing
2024cites this paper
ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy
2024cites this paper
MSLID-TCN: multi-stage linear-index dilated temporal convolutional network for temporal action segmentation
2024cites this paper
TSRN: two-stage refinement network for temporal action segmentation
2023cites this paper
Task parse tree: Learning task policy from videos with task-irrelevant components
2023cites this paper
Leveraging triplet loss for unsupervised action segmentation
2023cites this paper
Permutation-Aware Activity Segmentation via Unsupervised Frame-to-Segment Alignment
2023cites this paper
OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation
2023cites this paper
Video-Based Fatigue Estimation for Human-Robot Task Allocation Optimisation
2023cites this paper
Learning to Ground Instructional Articles in Videos through Narrations
2023cites this paper
SMC-NCA: Semantic-Guided Multi-Level Contrast for Semi-Supervised Temporal Action Segmentation
2023cites this paper
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
2023cites this paper
Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation
2023cites this paper
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation
2023cites this paper
Manual assembly actions segmentation system using temporal-spatial-contact features
2023cites this paper
Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
2023cites this paper
How Much Temporal Long-Term Context is Needed for Action Segmentation?
2023cites this paper
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation
2023cites this paper
Temporal Segment Transformer for Action Segmentation
2023cites this paper
Spatial-temporal graph transformer network for skeleton-based temporal action segmentation
2023cites this paper
Improved Sliding Window Smoothing for Video Temporal Action Segmentation and Recognition
2023cites this paper
SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Action Segmentation
2023cites this paper
Is there progress in activity progress prediction?
2023influential citation
STGA-Net: Spatial-Temporal Graph Attention Network for Skeleton-Based Temporal Action Segmentation
2023cites this paper
U-Transformer-based multi-levels refinement for weakly supervised action segmentation
2023cites this paper
A New Dataset and Approach for Timestamp Supervised Action Segmentation Using Human Object Interaction
2023cites this paper
Heterogeneous Graph Convolutional Network for Visual Reinforcement Learning of Action Detection
2023cites this paper
StrokeRehab: A Benchmark Dataset for Sub-second Action Identification
2022cites this paper
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
2022cites this paper
Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation
2022cites this paper
Weakly-Supervised Temporal Action Alignment Driven by Unbalanced Spectral Fused Gromov-Wasserstein Distance
2022cites this paper
Self-supervised temporal event segmentation inspired by cognitive theories
2022cites this paper
A 3DCNN-LSTM Multi-Class Temporal Segmentation for Hand Gesture Recognition
2022cites this paper
Efficient Two-Stream Network for Online Video Action Segmentation
2022cites this paper
Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency
2022cites this paper
Robust Action Segmentation from Timestamp Supervision
2022cites this paper
Learning Temporal Video Procedure Segmentation from an Automatically Collected Large Dataset
2022cites this paper
A Circular Window-based Cascade Transformer for Online Action Detection
2022cites this paper
Semi-Weakly-Supervised Learning of Complex Actions from Instructional Task Videos
2022cites this paper
Dilated Transformer with Feature Aggregation Module for Action Segmentation
2022cites this paper
Efficient U-Transformer with Boundary-Aware Loss for Action Segmentation
2022cites this paper
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
2022cites this paper
Cross-enhancement transformer for action segmentation
2022cites this paper
Timestamp-Supervised Action Segmentation from the Perspective of Clustering
2022cites this paper
AutoENP: An Auto Rating Pipeline for Expressing Needs via Pointing Protocol
2022cites this paper
Bottom-up improved multistage temporal convolutional network for action segmentation
2022cites this paper
Facial Tic Detection in Untrimmed Videos of Tourette Syndrome Patients
2022cites this paper
Do We Really Need Temporal Convolutions in Action Segmentation?
2022cites this paper
A-ACT: Action Anticipation through Cycle Transformations
2022cites this paper
Timestamp-Supervised Action Segmentation in the Perspective of Clustering
2022cites this paper
Temporal Action Segmentation: An Analysis of Modern Techniques
2022cites this paper
Local–Global Transformer Neural Network for temporal action segmentation
2022cites this paper
Distill and Collect for Semi-Supervised Temporal Action Segmentation
2022cites this paper
The BRIO-TA Dataset: Understanding Anomalous Assembly Process in Manufacturing
2022cites this paper
A temporal and channel-combined attention block for action segmentation
2022cites this paper
Dataset Augmentation Strategies for Visual Activity Recognition in Deep Neural Networks
2022cites this paper
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
2022cites this paper
Weakly supervised coarse-to-fine learning for human action segmentation in HCI videos
2022cites this paper
Automated freezing of gait assessment with marker-based motion capture and multi-stage spatial-temporal graph convolutional neural networks
2021cites this paper
Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering
2021cites this paper
Pyramid Dilated Attention Network for Action Segmentation
2021cites this paper
CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation
2021cites this paper