Object Detection in Videos with Tubelet Proposal Networks

Kai Kang,Hongsheng Li,Tong Xiao,Wanli Ouyang,Junjie Yan,Xihui Liu,Xiaogang Wang

Published 2017 in Computer Vision and Pattern Recognition

ABSTRACT

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods [15, 14] are based on spatiotemporal tubelets, which are essentially sequences of associated bounding boxes across time. However, the existing methods have major limitations in generating tubelets in terms of quality and efficiency. Motion-based [14] methods are able to obtain dense tubelets efficiently, but the lengths are generally only several frames, which is not optimal for incorporating long-term temporal information. Appearance-based [15] methods, usually involving generic object tracking, could generate long tubelets, but are usually computationally expensive. In this work, we propose a framework for object detection in videos, which consists of a novel tubelet proposal network to efficiently generate spatiotemporal proposals, and a Long Short-term Memory (LSTM) network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos. Experiments on the large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed framework for object detection in videos.

PUBLICATION RECORD

Publication year
2017
Venue
Computer Vision and Pattern Recognition
Publication date
2017-02-21
Fields of study
Computer Science
Identifiers
DOI 10.1109/CVPR.2017.101 arXiv 1702.06355
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Person Search with Natural Language Description
2017cited by this paper
ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection
2017cited by this paper
Spatio-Temporal Closed-Loop Object Detection
2017cited by this paper
Slicing Convolutional Neural Network for Crowd Video Understanding
2016cited by this paper
Seq-NMS for Video Object Detection
2016influential reference
Joint Detection and Identification Feature Learning for Person Search
2016cited by this paper
Object Detection from Video Tubelets with Convolutional Neural Networks
2016influential reference
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
2016cited by this paper
T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos
2016cited by this paper
Learning to Track at 100 FPS with Deep Regression Networks
2016cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Unsupervised Object Discovery and Tracking in Video Collections
2015cited by this paper
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2015cited by this paper
Fast R-CNN
2015influential reference
Visual Tracking with Fully Convolutional Networks
2015cited by this paper
You Only Look Once: Unified, Real-Time Object Detection
2015cited by this paper
Deeply learned attributes for crowded scene understanding
2015cited by this paper
Deep Residual Learning for Image Recognition
2015influential reference
Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning
2014cited by this paper
Fully convolutional networks for semantic segmentation
2014cited by this paper
Edge Boxes: Locating Object Proposals from Edges
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Efficient Image and Video Co-localization with Frank-Wolfe Algorithm
2014cited by this paper
Fully Convolutional Neural Networks for Crowd Segmentation
2014cited by this paper
DeepID-Net: Deformable deep convolutional neural networks for object detection
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Crowd Tracking with Dynamic Evolution of Group Structures
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
Selective Search for Object Recognition
2013cited by this paper
Learning object class detectors from weakly annotated video
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Localizing Objects While Learning Their Appearance
2010cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Long Short-Term Memory
1997cited by this paper
Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters
year unknowncited by this paper

CITED BY

Infrastructure-Side Point Cloud Object Detection via Multi-Frame Aggregation and Multi-Scale Fusion
2025cites this paper
Cutting‐Edge Deep Learning Methods for Image‐Based Object Detection in Autonomous Driving: In‐Depth Survey
2025cites this paper
Spatiotemporal Learning with Context-Aware Video Tubelets for Ultrasound Video Analysis
2025cites this paper
Learning semantic-unified cross-modal representations for open-vocabulary video scene graph generation
2025cites this paper
Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
2025cites this paper
Learning key lines for multi-object tracking
2024cites this paper
Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection
2024cites this paper
The influence of CNN architecture, image size and quality to object detection model on histological specimens
2024cites this paper
Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory
2024cites this paper
Farmland pest recognition based on Cascade RCNN Combined with Swin-Transformer
2024cites this paper
Envision – An Object Detection System using Jetson Nano
2024cites this paper
Video Object Detection From Compressed Formats for Modern Lightweight Consumer Electronics
2024cites this paper
Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency
2024cites this paper
Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models
2024cites this paper
Two Stage Video Classification Approach Using Convolution Neural Network
2024cites this paper
Vehicle Detection System Using Machine Learning
2024cites this paper
An Efficient Data Transmission Framework for Connected Vehicles
2024cites this paper
DynaPP: A Dynamic Resolution Model with Patch Packing for Fast Online Video Detection
2024cites this paper
Towards automatic farrowing monitoring—A Noisy Student approach for improving detection performance of newborn piglets
2024cites this paper
Computer Vision on the Edge: Individual Cattle Identification in Real-time with ReadMyCow System
2024cites this paper
Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring
2024cites this paper
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
2023cites this paper
Object-Level Feature Memory and Aggregation for Live-Stream Video Object Detection
2023cites this paper
Context Enhanced Transformer for Single Image Object Detection
2023cites this paper
Deep Learning-Based Standard Sign Language Discrimination
2023cites this paper
Object Detection in Drone Video with Temporal Fusion Network
2023cites this paper
Have We Ever Encountered This Before? Retrieving Out-of-Distribution Road Obstacles from Driving Scenes
2023cites this paper
A novel deep convolutional encoder–decoder network: application to moving object detection in videos
2023cites this paper
Identity-Consistent Aggregation for Video Object Detection
2023cites this paper
Global Memory and Local Continuity for Video Object Detection
2023cites this paper
Multiple Object Tracking With Appearance Feature Prediction and Similarity Fusion
2023cites this paper
A Comprehensive Report on Machine Learning-based Early Detection of Alzheimer's Disease using Multi-modal Neuroimaging Data
2022cites this paper
Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos
2022cites this paper
Micro- and Macroscopic Road Traffic Analysis using Drone Image Data
2022cites this paper
Object Permanence in Object Detection Leveraging Temporal Priors at Inference Time
2022cites this paper
Multi-object tracking with robust object regression and association
2022cites this paper
A comprehensive review of object detection with deep learning
2022cites this paper
Detection-Identification Balancing Margin Loss for One-Stage Multi-Object Tracking
2022cites this paper
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection
2022cites this paper
INT: Towards Infinite-frames 3D Detection with An Efficient Framework
2022cites this paper
Multi-view aggregation for real-time accurate object detection of a moving camera
2022cites this paper
Survey: Exploiting Data Redundancy for Optimization of Deep Learning
2022influential citation
VRDFormer: End-to-End Video Visual Relation Detection with Transformers
2022cites this paper
Video Sparse Transformer with Attention-guided Memory for Video Object Detection
2022cites this paper
Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection
2022influential citation
Temporal feature enhancement network with external memory for live-stream video object detection
2022cites this paper
MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection
2022cites this paper
Object Permanence Emerges in a Random Walk along Memory
2022cites this paper
A vehicle detection and tracking method for traffic video based on faster R-CNN
2022cites this paper
Video representation learning through prediction for online object detection
2022cites this paper
DeTracker: A Joint Detection and Tracking Framework
2022cites this paper
Visual Tracking With Object Center Displacement and CenterNet
2022cites this paper
Short-term anchor linking and long-term self-guided attention for video object detection
2021cites this paper
Detail texture detection based on Yolov4-tiny combined with attention mechanism and bicubic interpolation
2021cites this paper
Unsupervised object detection with scene-adaptive concept learning
2021cites this paper
Learning to Track Object Position through Occlusion
2021cites this paper
Spatio‐temporal feature fusion based correlative binary relevance for visual object detection
2021cites this paper
Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review
2021influential citation
A Comprehensive Survey of Scene Graphs: Generation and Application
2021cites this paper
Application of image processing and convolutional neural networks for flood image classification and semantic segmentation
2021cites this paper
Pedestrian Tracking through Coordinated Mining of Multiple Moving Cameras
2021cites this paper
You Don’t Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking
2021cites this paper
MoDist: Motion Distillation for Self-supervised Video Representation Learning
2021cites this paper
3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment
2021cites this paper
Detection-by-tracking of traffic signs in videos
2021influential citation
Deep Temporal Model-Based Identity-Aware Hand Detection for Space Human–Robot Interaction
2021cites this paper
Video Visual Relation Detection via Iterative Inference
2021cites this paper
3D-FCT: Simultaneous 3D Object Detection and Tracking Using Feature Correlation
2021cites this paper
Attention-Enabled Object Detection to Improve One-Stage Tracker
2021cites this paper
A feature temporal attention based interleaved network for fast video object detection
2021cites this paper
A novel memory mechanism for video object detection from indoor mobile robots
2021influential citation
Tracking Basketball Shots - Preliminary Results
2021cites this paper
A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link
2021influential citation
Real-Time Object Detection by Feature Map Forecast for Live Streaming Video
2021cites this paper
Few-Shot Video Object Detection
2021cites this paper
CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object Tracking
2021cites this paper
MaCLR: Motion-Aware Contrastive Learning of Representations for Videos
2021cites this paper
A spatio-temporal exposure correction neural network for autonomous vehicle
2021cites this paper
Deep Learning Methods for Object Detection in Autonomous Vehicles
2021cites this paper
Scene Graphs: A Survey of Generations and Applications
2021cites this paper
Learning to Track with Object Permanence
2021cites this paper
Multiple Object Detection Using Single Shot Multi-Box with MobileNet in Real-Time.
2021cites this paper
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety
2021cites this paper
Accurate Pig Detection for Video Monitoring Environment
2021cites this paper
Scene Graphs: A Review of Generations and Applications
2021cites this paper
A Center-Based Light and Simple Method for Multi-Object Tracking
2021cites this paper
Feature Flow: In-network Feature Flow Estimation for Video Object Detection
2020cites this paper
Unsupervised Feature Propagation for Fast Video Object Detection Using Generative Adversarial Networks
2020cites this paper
Plug & Play Convolutional Regression Tracker for Video Object Detection
2020cites this paper
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
2020cites this paper
RetinaTrack: Online Single Stage Joint Detection and Tracking
2020cites this paper
Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos
2020cites this paper
Tracking Objects as Points
2020cites this paper
Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians
2020cites this paper
A Simple Baseline for Multi-Object Tracking
2020cites this paper
Intelligent Vision with TensorFlow using Neural Network Algorithms
2020cites this paper
Spatial-Temporal Feature Aggregation Network For Video Object Detection
2020influential citation
TAO: A Large-Scale Benchmark for Tracking Any Object
2020cites this paper
Detection of Human Behaviour byObject Recognitionusing Deep Learning:A Review
2020cites this paper
Single Shot Video Object Detector
2020influential citation