Do We Need More Training Data?

Xiangxin Zhu,Carl Vondrick,Charless C. Fowlkes,Deva Ramanan

Published 2015 in International Journal of Computer Vision

ABSTRACT

Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defined on oriented gradient features. We investigate the performance of mixtures of templates as the number of mixture components and the amount of training data grows. Surprisingly, even with proper treatment of regularization and “outliers”, the performance of classic mixture models appears to saturate quickly (∼10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sim }10$$\end{document} templates and ∼100\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sim }100$$\end{document} positive training examples per template). This is not a limitation of the feature space as compositional mixtures that share template parameters via parts and that can synthesize new templates not encountered during training yield significantly better performance. Based on our analysis, we conjecture that the greatest gains in detection performance will continue to derive from improved representations and learning algorithms that can make efficient use of large datasets.

PUBLICATION RECORD

Publication year
2015
Venue
International Journal of Computer Vision
Publication date
2015-03-01
Fields of study
Computer Science
Identifiers
DOI 10.1007/s11263-015-0812-2 arXiv 1503.01508
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Face detection, pose estimation, and landmark localization in the wild
2012influential reference
Diagnosing Error in Object Detectors
2012cited by this paper
Distance Transforms of Sampled Functions
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
How Important Are "Deformable Parts" in the Deformable Parts Model?
2012cited by this paper
Ensemble of exemplar-SVMs for object detection and beyond
2011influential reference
LIBSVM: A library for support vector machines
2011influential reference
Nonparametric Scene Parsing via Label Transfer
2011cited by this paper
Unbiased look at dataset bias
2011cited by this paper
Finding the weakest link in person detectors
2011cited by this paper
Superparsing
2010cited by this paper
Object Detection with Discriminatively Trained Part Based Models
2010influential reference
What Does Classifying More Than 10, 000 Image Categories Tell Us?
2010cited by this paper
The Pascal Visual Object Classes (VOC) Challenge
2010cited by this paper
The Unreasonable Effectiveness of Data
2009cited by this paper
Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration
2009cited by this paper
Poselets: Body part detectors trained using 3D human pose annotations
2009cited by this paper
Multiple kernels for object detection
2009cited by this paper
Multi-PIE
2008cited by this paper
In defense of Nearest-Neighbor based image classification
2008cited by this paper
IM2GPS: estimating geographic information from a single image
2008cited by this paper
Nearest-Neighbor Methods in Learning and Vision
2008cited by this paper
Robust Truncated Hinge Loss Support Vector Machines
2007cited by this paper
Image Classification using Random Forests and Ferns
2007cited by this paper
Scene Completion Using Millions of Photographs
2007cited by this paper
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition
2006cited by this paper
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
2006cited by this paper
Histograms of oriented gradients for human detection
2005influential reference
Fast pose estimation with parameter-sensitive hashing
2003cited by this paper
Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods
1999cited by this paper
Some PAC-Bayesian Theorems
1998cited by this paper
Shape indexing using approximate nearest-neighbour search in high-dimensional spaces
1997cited by this paper
Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition
year unknowncited by this paper

CITED BY

Soil class mapping with severely limited sample data: an explicit rule of soil-landscape relationships in complex terrain
2026cites this paper
Evaluating large language models for inverse semiconductor design
2026cites this paper
Modelling Peatland Productivity by Water Table Depth or Near-Surface Water Contents via the DIMONA Online Platform
2025cites this paper
Cloud-based machine learning for scalable classification of software requirements: Insights from the PROMISE dataset
2025cites this paper
Large language models for identifying social determinants of health
2025cites this paper
Exploring soil temperature extremes: unraveling dynamics with local and spatial machine learning models
2025cites this paper
UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation
2025cites this paper
DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction
2025cites this paper
Pomerania Fish: A dataset for fishes across Pomerania freshwater waterbodies in-situ environments
2025cites this paper
The third wheel or the game changer? How AI could team up with neurologists in Parkinson's care.
2025cites this paper
A fuzzy decision-making network model for offshore wind turbine selection based on simulated annealing algorithm
2025cites this paper
Novel Machine Learning Unlocks High Lipid Productivity and Resolves Trade-offs in Algal Biofuel Production
2025cites this paper
Improving personalized federated learning to optimize site-specific performance in computer-aided detection/diagnosis
2025cites this paper
Dinic’s assembly parts matching optimization based on CACGAN gyroscope performance prediction
2025cites this paper
Finding the Sweet Spot: An Empirical Study on Dataset Size, Performance, and Efficiency in Relation Extraction
2025cites this paper
Advancing the Prediction and Evaluation of Blast-Induced Ground Vibration Using Deep Ensemble Learning with Uncertainty Assessment
2025cites this paper
Error in the Loop: How Human Mistakes Can Improve Algorithmic Learning
2025cites this paper
Acoustic Detection of Forest Wood-Boring Insects Under Co-Infestations
2025cites this paper
Are Classical Clone Detectors Good Enough for the AI Era?
2025cites this paper
Automated detection of sea cucumbers in turbid subtidal marine habitats: An explainable approach
2025cites this paper
Flash Flood Risk Classification Using GIS-Based Fractional Order k-Means Clustering Method
2025cites this paper
Dataset meta-level and statistical features affect machine learning performance
2024cites this paper
Data Deletion for Linear Regression with Noisy SGD
2024cites this paper
Generative Active Learning with Variational Autoencoder for Radiology Data Generation in Veterinary Medicine
2024cites this paper
Toward Robust Canine Cardiac Diagnosis: Deep Prototype Alignment Network-Based Few-Shot Segmentation in Veterinary Medicine
2024cites this paper
Raman spectroscopic deep learning with signal aggregated representations for enhanced cell phenotype and signature identification
2024cites this paper
Label-free live cell recognition and tracking for biological discoveries and translational applications
2024cites this paper
Investigation of distributed learning for automated lesion detection in head MR images
2024cites this paper
Automating synaptic plasticity analysis: A deep learning approach to segmenting hippocampal field potential signal
2024cites this paper
Aerial Wildlife Image Repository for animal monitoring with drones in the age of artificial intelligence
2024cites this paper
A Multi-view Molecular Pre-training with Generative Contrastive Learning
2024cites this paper
Designing deep neural networks for driver intention recognition
2024cites this paper
Exploring the use of Synthetic Training Data for the Classification of Electronic Components in Artificial Intelligence Systems
2024cites this paper
SATDAUG - A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt
2024cites this paper
Deep learning-based software bug classification
2024cites this paper
CBNN: 3-Party Secure Framework for Customized Binary Neural Networks Inference
2024cites this paper
AI-based Catalytic Performance Prediction for CO2 Electrochemical Reduction using Ionic Liquids
2024cites this paper
Hyperparameter tuning for deep learning semantic image segmentation of micro computed tomography scanned fiber-reinforced composites
2024cites this paper
Tourism and Hospitality Forecasting With Big Data: A Systematic Review of the Literature
2024cites this paper
How Many Data Does Machine Learning in Human–Computer Interaction Need?: Re-Estimating the Dataset Size for Convolutional Neural Network-Based Models of Visual Perception
2023cites this paper
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing
2023cites this paper
Study on the real-time object detection approach for end-of-life battery-powered electronics in the waste of electrical and electronic equipment recycling process.
2023cites this paper
RoIA: Region of Interest Attention Network for Surface Defect Detection
2023cites this paper
Oriented object detection in optical remote sensing images using deep learning: a survey
2023cites this paper
Development of a Methane Emission Prediction Tool (POMEP178) for Palm Oil Mill Effluent Using Gaussian Process Regression
2023cites this paper
Automated Detection of Corneal Edema With Deep Learning-Assisted Second Harmonic Generation Microscopy
2023cites this paper
Advancements and Challenges in Machine Learning: A Comprehensive Review of Models, Libraries, Applications, and Algorithms
2023cites this paper
Artificial intelligence-aided method to detect uterine fibroids in ultrasound images: a retrospective study
2023cites this paper
Multitarget Intelligent Recognition of Petrographic Thin Section Images Based on Faster RCNN
2023cites this paper
On the training sample size and classification performance: An experimental evaluation in seismic facies classification
2023cites this paper
A power regulation strategy for heat pipe cooled reactors based on deep learning and hybrid data-driven optimization algorithm
2023cites this paper
Digital twin-driven vibration amplitude simulation for condition monitoring of axial blowers in blast furnace ironmaking
2023cites this paper
IIITH MM2 Speech-Text: A preliminary data for automatic spoken data validation with matched and mismatched speech-text content
2023cites this paper
Defectors: A Large, Diverse Python Dataset for Defect Prediction
2023cites this paper
The Magic Number: Impact of Sample Size for Dementia Screening Using Transfer Learning and Data Augmentation of Clock Drawing Test Images
2023cites this paper
A Reverse Auction-Based Incentive Mechanism for Cost-Effective Data Collection in Mobile Crowdsensing
2023cites this paper
A Wrapped Approach Using Unlabeled Data for Diabetic Retinopathy Diagnosis
2023cites this paper
Automated combustion model classification for char particle distributions using 3-D morphology analysis and pore-resolving CFD simulations
2023cites this paper
Classification of Alzheimer’s Disease using Random Oversampling and Albumentations on Convolutional Neural Network
2023cites this paper
Out-of-Distribution Data Generation for Fault Detection and Diagnosis in Industrial Systems
2023cites this paper
Advancing deep learning-based detection of floating litter using a novel open dataset
2023cites this paper
DNA-Storage in Future Communication Networks
2023influential citation
Quantitative Evaluation of a Multi-Modal Camera Setup for Fusing Event Data with RGB Images
2023cites this paper
Quantifying Privacy Risks of Prompts in Visual Prompt Learning
2023cites this paper
Generative Adversarial Network Models for Augmenting Digit and Character Datasets Embedded in Standard Markings on Ship Bodies
2023cites this paper
VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph
2023cites this paper
A Framework to Estimate the Key Point Within an Object Based on a Deep Learning Object Detection
2023cites this paper
Traffic congestion-aware graph-based vehicle rerouting framework from aerial imagery
2023cites this paper
Emotional Analysis Based On LSTM-CNN Hybrid Neural Network Model
2023cites this paper
Oriented Object Detection in Optical Remote Sensing Images: A Survey
2023cites this paper
Application of A Dual-Stage Deep Learning Framework to Detect Left Atrial Enlargement for Pet Heart Failure
2023cites this paper
Estimation of cyanobacteria pigments in the main rivers of South Korea using spatial attention convolutional neural network with hyperspectral imagery
2022cites this paper
Synthetic Training Image Dataset for Vision-Based 3D Pose Estimation of Construction Workers
2022cites this paper
ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction
2022cites this paper
Self-adaptive temperature and humidity compensation based on improved deep BP neural network for NO2 detection in complex environment
2022cites this paper
Automated metadata annotation: What is and is not possible with machine learning
2022cites this paper
Prioritizing inspection of sewer pipes based on self-cleansing criteria
2022cites this paper
Fully Complex Deep Learning Classifiers for Signal Modulation Recognition in Non-Cooperative Environment
2022cites this paper
A real-time fingerprint-based indoor positioning using deep learning and preceding states
2022cites this paper
Paying attention for adjacent areas: Learning discriminative features for large-scale 3D scene segmentation
2022cites this paper
Low-Resource Adaptation for Personalized Co-Speech Gesture Generation
2022cites this paper
Brain age estimation using multi-feature-based networks
2022cites this paper
Language-Based Syllogistic Reasoning Using Deep Neural Networks
2022cites this paper
Performance of predictive supervised classification models of trace elements in magnetite for mineral exploration
2022cites this paper
Classification of myocardial fibrosis in DE-MRI based on semi-supervised semantic segmentation and dual attention mechanism
2022cites this paper
Deep Learning Model for Prediction of Entanglement Molecular Weight of Polymers
2022cites this paper
Managing Sustainability Tensions in Artificial Intelligence: Insights from Paradox Theory
2022cites this paper
Long-tailed Recognition by Learning from Latent Categories
2022cites this paper
Deep learning for behaviour classification in a preclinical brain injury model
2022cites this paper
Automated Defect Detection in Non-planar Objects Using Deep Learning Algorithms
2022cites this paper
DE-MRI myocardial fibrosis segmentation and classification model based on multi-scale self-supervision and transformer
2022cites this paper
Collate: Collaborative Neural Network Learning for Latency-Critical Edge Systems
2022cites this paper
MP-BADNet+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^+$$\end{document}: Secure and effective backdoor attack det
2022cites this paper
Synchronization-Enhanced Deep Learning Early Flood Risk Predictions: The Core of Data-Driven City Digital Twins for Climate Resilience Planning
2022cites this paper
Audio-Based Wildﬁre Detection on Embedded Systems
2022cites this paper
General intelligence requires rethinking exploration
2022cites this paper
DeepGuard: Backdoor Attack Detection and Identification Schemes in Privacy-Preserving Deep Neural Networks
2022cites this paper
A Comparative Study of Engraved-Digit Data Augmentation by Generative Adversarial Networks
2022influential citation
Learning Physically Meaningful Representations of Energy Systems with Variational Autoencoders
2022cites this paper
Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via Simulation-based Synthetic Data Augmentation and Multitask Learning
2022cites this paper