Deep Feature Flow for Video Recognition

Xizhou Zhu,Yuwen Xiong,Jifeng Dai,Lu Yuan,Yichen Wei

Published 2016 in Computer Vision and Pattern Recognition

ABSTRACT

Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two recent large scale video datasets. It makes a large step towards practical video recognition. Code would be released.

PUBLICATION RECORD

Publication year
2016
Venue
Computer Vision and Pattern Recognition
Publication date
2016-11-23
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2017.441 arXiv 1611.07715
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector
2017cited by this paper
STFCN: Spatio-Temporal FCN for Semantic Video Segmentation
2016cited by this paper
Joint Optical Flow and Temporally Consistent Semantic Segmentation
2016cited by this paper
T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos
2016cited by this paper
Exploiting Semantic Information and Deep Matching for Optical Flow
2016cited by this paper
Feature Space Optimization for Semantic Video Segmentation
2016cited by this paper
R-FCN: Object Detection via Region-based Fully Convolutional Networks
2016influential reference
Optical Flow Estimation Using a Spatial Pyramid Network
2016cited by this paper
The Cityscapes Dataset for Semantic Urban Scene Understanding
2016influential reference
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
2016cited by this paper
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
2016cited by this paper
Multi-class Multi-object Tracking Using Changing Point Detection
2016cited by this paper
Clockwork Convnets for Video Semantic Segmentation
2016cited by this paper
Optical Flow with Semantic Segmentation and Localized Layers
2016cited by this paper
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
2016influential reference
CRAFT Objects from Images
2016influential reference
Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video
2015influential reference
Conditional Random Fields as Recurrent Neural Networks
2015cited by this paper
BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2015cited by this paper
SSD: Single Shot MultiBox Detector
2015cited by this paper
Flowing ConvNets for Human Pose Estimation in Videos
2015cited by this paper
Spatio-temporal video autoencoder with differentiable memory
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015influential reference
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2015influential reference
EpicFlow: Edge-preserving interpolation of correspondences for optical flow
2015cited by this paper
FlowNet: Learning Optical Flow with Convolutional Networks
2015influential reference
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015cited by this paper
Rethinking the Inception Architecture for Computer Vision
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Accelerating Very Deep Convolutional Networks for Classification and Detection
2015cited by this paper
Fast R-CNN
2015influential reference
DeepID-Net: Deformable deep convolutional neural networks for object detection
2014influential reference
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014influential reference
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
2014influential reference
DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition
2014cited by this paper
Fully convolutional networks for semantic segmentation
2014cited by this paper
Going deeper with convolutions
2014influential reference
DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection
2014cited by this paper
A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
2013cited by this paper
Visualizing and Understanding Convolutional Networks
2013cited by this paper
DeepFlow: Large Displacement Optical Flow with Deep Matching
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Slow Feature Analysis for Human Action Recognition
2012cited by this paper
Indoor Segmentation and Support Inference from RGBD Images
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Deep Learning of Invariant Features via Simulated Fixations in Video
2012cited by this paper
Discriminative image warping with attribute flow
2011cited by this paper
SIFT Flow: Dense Correspondence across Different Scenes
2008influential reference
High Accuracy Optical Flow Estimation Based on a Theory for Warping
2004cited by this paper
Slow Feature Analysis: Unsupervised Learning of Invariances
2002influential reference
Determining Optical Flow
1981cited by this paper
Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation
year unknowncited by this paper
Universität Des Saarlandes Fachrichtung 6.1 – Mathematik a Survey on Variational Optic Flow Methods for Small Displacements a Survey on Variational Optic Flow Methods for Small Displacements
year unknowncited by this paper

CITED BY

Spatio-Temporal Attention for Consistent Video Semantic Segmentation in Automated Driving
2026cites this paper
RobustMMD: towards robust multimodal 3D detection via efficient multilevel adapter and spatial denoiser
2026cites this paper
A comprehensive overview of deep learning models for object detection from videos/images
2026influential citation
TransUTD: Underwater cross-domain collaborative spatial-temporal transformer detector.
2026cites this paper
Time2General: Learning Spatiotemporal Invariant Representations for Domain-Generalization Video Semantic Segmentation
2026cites this paper
Efficient On-Board Processing of Oblique UAV Video for Rapid Flood Extent Mapping
2026cites this paper
Reconstruction of time-resolved three-dimensional flow fields based on a multi-domain fusion transformer
2026cites this paper
Automated Video Object Detection of Motile Cells Under Microscopy
2025influential citation
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
2025cites this paper
HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison
2025influential citation
OV-VOD: Open-Vocabulary Video Object Detection
2025cites this paper
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
2025influential citation
IntSTR: An integrated spatio-temporal relation transformer for video object detection
2025cites this paper
Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception
2025cites this paper
Multi-Frame Joint Detection Approach for Foreign Object Detection in Large-Volume Parenterals
2025cites this paper
Video Semantic Segmentation Based on Spatiotemporal Dual Branch Attention
2025influential citation
Multimodal Spatio-temporal Graph Learning for Alignment-free RGBT Video Object Detection
2025cites this paper
Optimized RT-DETR for accurate and efficient video object detection via decoupled feature aggregation
2025cites this paper
Joint Lesion Detection and Classification of Breast Ultrasound Video via a Clinical Knowledge-Aware Framework
2025cites this paper
Difference Decomposition Networks for Infrared Small Target Detection
2025cites this paper
Research on Video Object Detection Algorithm Based on Feature Aggregation
2025cites this paper
Feature decomposition and difference network for RGB-T video object detection
2025cites this paper
Generalizing Reuse Patterns for Efficient DNN on Microcontrollers
2025cites this paper
CF6D: A Visual Perception Network for Collision-Free Grasping and Manipulation in Complex Industrial Environments
2025cites this paper
DATA: Domain-And-Time Alignment for High-Quality Feature Fusion in Collaborative Perception
2025cites this paper
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
2025cites this paper
Static-dynamic class-level perception consistency in video semantic segmentation
2025cites this paper
Efficient Bayer-Domain Video Computer Vision with Fast Motion Estimation and Learned Perception Residual
2025cites this paper
Exploiting Temporal State Space Sharing for Video Semantic Segmentation
2025cites this paper
Unveiling Semantic Structure - Event-Driven Framework for Video Analysis
2025cites this paper
Ma-Yolo: Video Object Detection Via Motion-Assisted Yolo
2025cites this paper
Energy-efficient online knowledge distillation for mobile video inference
2025cites this paper
A Dual-Channel and Frequency-Aware Approach for Lightweight Video Instance Segmentation
2025cites this paper
KFNet: KAN And optical flow gating alignment based gridded pixel interaction network for change detection
2025cites this paper
Edge–Cloud Collaborative Real-Time Video Object Detection for Industrial Surveillance Systems
2025cites this paper
Light-aware luminance adaptive enhancement network for RGBT video object detection
2025cites this paper
Infrastructure-Side Point Cloud Object Detection via Multi-Frame Aggregation and Multi-Scale Fusion
2025cites this paper
Enhanced Neuromorphic Semantic Segmentation Latency through Stream Event
2025cites this paper
DSFMamba: Dual-state fusion mamba for long-short term motion-adaptive streaming perception in autonomous driving
2025cites this paper
Spatiotemporal Video Segmentation via Mamba-Driven Dual-Scale Modeling
2025cites this paper
EC-HDLNet: Extended coati-based hybrid deep dilated convolutional learning network for brain tumor classification
2025cites this paper
Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment
2025cites this paper
Object Recognition for Millimeter Wave SAR Images Based on Dual-Branch Multiscale Fusion Network
2024cites this paper
Semi-Supervised Thyroid Nodule Detection in Ultrasound Videos
2024cites this paper
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
2024cites this paper
CRMEFNet: A coupled refinement, multiscale exploration and fusion network for medical image segmentation
2024cites this paper
Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency
2024cites this paper
Deep-TransVOD: Improved Video Object Detection with Low-Rank Tensor Decomposition
2024cites this paper
AFSDet: Video Small Object Detection Based on Adaptive Focused Slicing
2024cites this paper
Video semantic segmentation based on spatio-temporally polarized attention
2024influential citation
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
2024cites this paper
Towards Open-Vocabulary Video Semantic Segmentation
2024cites this paper
Real-Time AIoT for AAV Antenna Interference Detection via Edge–Cloud Collaboration
2024cites this paper
Ultrasound Video Segmentation of Pubic Symphysis and Fetal Head for Angle of Progression Measurement
2024cites this paper
SMITE: Segment Me In TimE
2024cites this paper
A Spatiotemporal Perspective on Dynamical Computation in Neural Information Processing Systems
2024cites this paper
Inter-Frame Multiscale Probabilistic Cross-Attention for Surveillance Object Detection
2024cites this paper
Frame-level Pain Intensity Assessment via Multilevel Hash-based Features and Transformer
2024cites this paper
Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring
2024cites this paper
SSTNet: Sliced Spatio-Temporal Network With Cross-Slice ConvLSTM for Moving Infrared Dim-Small Target Detection
2024cites this paper
Survey on fast dense video segmentation techniques
2024cites this paper
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
2024influential citation
Flying Bird Object Detection Algorithm in Surveillance Video
2024cites this paper
Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception
2024cites this paper
Efficient Semantic Segmentation for Compressed Video
2024cites this paper
Self-supervised spatial-temporal feature enhancement for one-shot video object detection
2024cites this paper
Joint Spatial and Temporal Feature Enhancement Network for Disturbed Object Detection
2024influential citation
A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot
2024cites this paper
DynaPP: A Dynamic Resolution Model with Patch Packing for Fast Online Video Detection
2024influential citation
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
2024cites this paper
FF-LPD: A Real-Time Frame-by-Frame License Plate Detector With Knowledge Distillation and Feature Propagation
2024influential citation
Towards real-time video analysis of flooded areas: redundancy-based accelerator for object detection models
2024cites this paper
Semantic segmentation algorithm for video from UAV based on adaptive keyframe scheduling via similarity measurement
2024cites this paper
A Semi-supervised Four-Chamber Echocardiographic Video Segmentation Algorithm Based on Multilevel Edge Perception and Calibration Fusion.
2024cites this paper
Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024
2024cites this paper
Stepwise Spatial Global-local Aggregation Networks for Autonomous Driving
2024cites this paper
Weakly Supervised Fixated Object Detection in Traffic Videos Based on Driver’s Selective Attention Mechanism
2024cites this paper
SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning
2024cites this paper
Zero-Shot Scene Change Detection
2024cites this paper
XS-VID: An Extremely Small Video Object Detection Dataset
2024cites this paper
Practical Video Object Detection via Feature Selection and Aggregation
2024cites this paper
A Flying Bird Object Detection Method for Surveillance Video
2024cites this paper
Gaseous Object Detection
2024cites this paper
Patchwise Temporal–Spatial Feature Aggregation Network for Object Detection in Satellite Video
2024cites this paper
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
2024cites this paper
Dual Correlation Network for Efficient Video Semantic Segmentation
2024influential citation
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation
2024cites this paper
MVSparse: Distributed Cooperative Multi-camera Multi-target Tracking on the Edge
2024cites this paper
Mixture of Scale Experts for Alignment-free RGBT Video Object Detection and A Unified Benchmark
2024cites this paper
Streamlined Video Object Detection with YOLOX YOLOV5 YOLOV7 and YOLOV8
2024cites this paper
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
2024cites this paper
Image lens flare removal algorithm using semantic information integration
2024cites this paper
Early Anticipation of Driving Maneuvers
2024cites this paper
Uni-AdaFocus: Spatial-Temporal Dynamic Computation for Video Recognition
2024cites this paper
Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
2024cites this paper
MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion
2024cites this paper
Antiocclusion Infrared Aerial Target Recognition With Vision-Inspired Dual-Stream Graph Network
2024cites this paper
Toward Reliable License Plate Detection in Varied Contexts: Overcoming the Issue of Undersized Plate Annotations
2024cites this paper
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
2024cites this paper
Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting
2024cites this paper