ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky,Jia Deng,Hao Su,J. Krause,S. Satheesh,Sean Ma,Zhiheng Huang,A. Karpathy,A. Khosla,Michael S. Bernstein,A. Berg,Li Fei-Fei

Published 2014 in International Journal of Computer Vision

ABSTRACT

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

PUBLICATION RECORD

Publication year
2014
Venue
International Journal of Computer Vision
Publication date
2014-09-01
Fields of study
Computer Science
Identifiers
DOI 10.1007/s11263-015-0816-y arXiv 1409.0575
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Return of the devil
2016cited by this paper
CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service
2015cited by this paper
Contextualizing Object Detection and Classification
2015cited by this paper
Learning Deep Features for Scene Recognition using Places Database
2014cited by this paper
Caffe: Convolutional Architecture for Fast Feature Embedding
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Deep Epitomic Convolutional Neural Networks
2014cited by this paper
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Microsoft COCO: Common Objects in Context
2014cited by this paper
The Pascal Visual Object Classes Challenge: A Retrospective
2014cited by this paper
Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning
2014influential reference
Fisher and VLAD with FLAIR
2014influential reference
DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection
2014cited by this paper
Scalable multi-label annotation
2014cited by this paper
Hard negative classes for multiple object detection
2014cited by this paper
Return of the Devil in the Details: Delving Deep into Convolutional Nets
2014cited by this paper
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
2014cited by this paper
Modeling Image Patches with a Generic Dictionary of Mini-epitomes
2014cited by this paper
Multiscale Combinatorial Grouping
2014cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
2013cited by this paper
DeViSE: A Deep Visual-Semantic Embedding Model
2013cited by this paper
Some Improvements on Deep Convolutional Neural Network Based Image Classification
2013influential reference
Sparse arrays of signatures for online character recognition
2013cited by this paper
Deep Fisher Networks for Large-Scale Image Classification
2013influential reference
From Large Scale Image Categorization to Entry-Level Categories
2013cited by this paper
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
2013cited by this paper
Deep Learning using Support Vector Machines
2013cited by this paper
Regularization of Neural Networks using DropConnect
2013cited by this paper
Visualizing and Understanding Convolutional Networks
2013influential reference
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
2013influential reference
Prime Object Proposals with Randomized Prim's Algorithm
2013cited by this paper
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
2013cited by this paper
Regionlets for Generic Object Detection
2013influential reference
Vision meets robotics: The KITTI dataset
2013cited by this paper
OpenSurfaces
2013cited by this paper
Joint Deep Learning for Pedestrian Detection
2013cited by this paper
Network In Network
2013influential reference
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013influential reference
Selective Search for Object Recognition
2013cited by this paper
Graphical Gaussian Vector for Image Categorization
2012influential reference
ImageNet classification with deep convolutional neural networks
2012influential reference
Efficiently Scaling up Crowdsourced Video Annotation
2012cited by this paper
Multi-column deep neural networks for image classification
2012cited by this paper
Diagnosing Error in Object Detectors
2012cited by this paper
Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs
2012cited by this paper
Towards good practice in large-scale learning for image classification
2012influential reference
Crowdsourcing Annotations for Visual Object Detection
2012influential reference
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost
2012influential reference
Segmentation Propagation in ImageNet
2012cited by this paper
Improving neural networks by preventing co-adaptation of feature detectors
2012cited by this paper
Modeling the spatial layout of images beyond spatial pyramids
2012influential reference
Multi-attribute spaces: Calibration for attribute fusion and similarity search
2012cited by this paper
Three things everyone should know to improve object retrieval
2012influential reference
Exact Acceleration of Linear Object Detectors
2012cited by this paper
Adaptive deconvolutional networks for mid and high level feature learning
2011cited by this paper
Measuring the Objectness of Image Windows
2011cited by this paper
Segmentation as selective search for object recognition
2011cited by this paper
Large-scale image classification: Fast feature extraction and SVM training
2011influential reference
High-dimensional signature compression for large-scale image classification
2011influential reference
Quality Assessment for Crowdsourced Object Annotations
2011cited by this paper
Unbiased look at dataset bias
2011influential reference
Nonparametric Scene Parsing via Label Transfer
2011cited by this paper
Empowering Visual Categorization With the GPU
2011influential reference
Combining randomization and discrimination for fine-grained image categorization
2011cited by this paper
Contour Detection and Hierarchical Image Segmentation
2011cited by this paper
Object Detection with Discriminatively Trained Part Based Models
2010influential reference
The Pascal Visual Object Classes (VOC) Challenge
2010cited by this paper
Locality-constrained Linear Coding for image classification
2010cited by this paper
Image Classification Using Super-Vector Coding of Local Image Descriptors
2010cited by this paper
SUN database: Large-scale scene recognition from abbey to zoo
2010cited by this paper
Improving the Fisher Kernel for Large-Scale Image Classification
2010influential reference
Evaluating Color Descriptors for Object and Scene Recognition
2010cited by this paper
The Multidimensional Wisdom of Crowds
2010cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Decomposing a scene into geometric and semantically consistent regions
2009influential reference
Object detection using a max-margin Hough transform
2009influential reference
Linear spatial pyramid matching using sparse coding for image classification
2009cited by this paper
Utility data annotation with Amazon Mechanical Turk
2008cited by this paper
LabelMe: A Database and Web-Based Tool for Image Annotation
2008cited by this paper
Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments
2008influential reference
Get another label? improving data quality and data mining using multiple, noisy labelers
2008cited by this paper
Caltech-256 Object Category Dataset
2007cited by this paper
Fisher Kernels on Visual Vocabularies for Image Categorization
2007cited by this paper
Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks
2007cited by this paper
The PASCAL Visual Object Classes Challenge
2006cited by this paper
Graph-Based Visual Saliency
2006cited by this paper
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
2006cited by this paper
Face Description with Local Binary Patterns: Application to Face Recognition
2006cited by this paper
A Bayesian hierarchical model for learning natural scene categories
2005cited by this paper
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories
2004cited by this paper
Distinctive Image Features from Scale-Invariant Keypoints
2004influential reference
Labeling images with a computer game
2004cited by this paper
Epitomic analysis of appearance and shape
2003cited by this paper
Online Passive-Aggressive Algorithms
2003cited by this paper
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
2001cited by this paper
Speed of processing in the human visual system
1996cited by this paper
WordNet: A Lexical Database for English
1995cited by this paper
[Clothing].
1974cited by this paper

CITED BY

Combating Noisy Labels through Fostering Self- and Neighbor-Consistency
2026influential citation
An interpretable Transformer–LSTM denoising autoencoder for semi-supervised fault diagnosis in chemical processes
2026cites this paper
CNeXt-DANet: ConvNeXt with dual attention for violence detection in surveillance videos
2026cites this paper
Automatic Tag Generation (ATG) Using Machine Learning Tech-niques for Women Violence Detection
2026cites this paper
Few-shot learning perfected: The efficacy and simplicity of Mate-baseline++
2026cites this paper
Robust Cell-Level Classification for Liquid-Based Cervical Cytology Using Deep Transfer Learning: A Multi-Source Study Addressing Scanner-Induced Domain Shifts
2026cites this paper
Efficient UAV High-Resolution Image Stitching via Dense Deep Kernelized Feature
2026cites this paper
CLIP-Driven Lifelong Multi-view Clustering
2026cites this paper
Locality-Attending Vision Transformer
2026cites this paper
A Simple Baseline for Unifying Understanding, Generation, and Editing via Vanilla Next-token Prediction
2026cites this paper
Fusion-CAM: Integrating Gradient and Region-Based Class Activation Maps for Robust Visual Explanations
2026cites this paper
Selective Sensing of Hydrogen and Ammonia Using a Single CMOS-Compatible Sensor and Transfer Learning Methods.
2026cites this paper
Improvement of the method of the multiclass Pap smear image segmentation based on cross-domain transfer learning with limited data
2026cites this paper
Real-Time Histopathological Cancer Diagnosis Using ResNet50: Transfer Learning for Automated Tumor Detection and Classification
2026cites this paper
SSR: A Generic Framework for Text-Aided Map Compression for Localization
2026cites this paper
Underrepresented in Foundation Model Pretraining Data? A One-Shot Probe
2026cites this paper
Solving adversarial examples requires solving exponential misalignment
2026influential citation
Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
2026cites this paper
When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models
2026cites this paper
Structure-Aware Distributed Backdoor Attacks in Federated Learning
2026cites this paper
Cognitive Dark Matter: Measuring What AI Misses
2026cites this paper
Foreign object detection in power transmission lines using SESYOLO.
2026cites this paper
Does data augmentation help or hinder the generalization of deepfake video detection?
2026cites this paper
Interpretable hybrid ensemble with attention-based fusion and EAOO-GA optimization for lung cancer detection
2026cites this paper
TC-Pad\'e: Trajectory-Consistent Pad\'e Approximation for Diffusion Acceleration
2026cites this paper
DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning
2026cites this paper
Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning
2026cites this paper
Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments
2026cites this paper
Confounding factors and biases abound when predicting molecular biomarkers from histological images.
2026cites this paper
CSWin-MDKDNet: cross-shaped window network with multi-dimensional fusion and knowledge distillation for medical image segmentation.
2026cites this paper
Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning
2026cites this paper
Boosting Entropy with Bell Box Quantization
2026cites this paper
What Helps---and What Hurts: Bidirectional Explanations for Vision Transformers
2026cites this paper
Scaling Quantum Machine Learning without Tricks: High-Resolution and Diverse Image Generation
2026cites this paper
Search Multilayer Perceptron-Based Fusion for Efficient and Accurate Siamese Tracking
2026cites this paper
Dream2Learn: Structured Generative Dreaming for Continual Learning
2026cites this paper
Semantic See-through Goggles: Wearing Linguistic Virtual Reality in (Artificial) Intelligence
2026cites this paper
A novel stochastic conjugate gradient algorithm based on a stochastic differential equation perspective
2026cites this paper
DM-SR: Diffusion-based Multimodal Semantic Restoration within Semantic Communication Systems
2026cites this paper
Automated Identification of Stylolites in Geological Whole-Core Images using Hybrid Deep Learning Networks
2026cites this paper
HCDCMQ: Hessian-Aware Channel Determinism-Decomposition with Counterfactual Multi-Agent Optimization for Channel-Wise Mixed-Precision Post-Training Quantization
2026cites this paper
Hierarchical Concept-based Interpretable Models
2026cites this paper
Incremental dimension reduction for efficient and accurate visual anomaly detection
2026cites this paper
A survey of recent advances in adversarial attack and defense on vision-language models.
2026cites this paper
TINCLIP: Improving compositional reasoning of CLIP via textual inversion with no
2026cites this paper
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data.
2026cites this paper
Functional bipartite invariance in mouse primary visual cortex receptive fields.
2026cites this paper
Review of Hybrid and Data-Efficient Methods in Medical Image Segmentation
2026cites this paper
PRNet: prototype reorganization few-shot semantic segmentation network
2026cites this paper
LH-MemUDA: low-high resolution memory black-box unsupervised domain adaptation
2026cites this paper
TriLite: Efficient Weakly Supervised Object Localization with Universal Visual Features and Tri-Region Disentanglement
2026cites this paper
Certified Circuits: Stability Guarantees for Mechanistic Circuits
2026cites this paper
Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache
2026cites this paper
Evaluation of the Impact of Image Mutations on the Origin Classification of Digital Images
2026cites this paper
Data-centric single teacher guided knowledge distillation for alleviating sub-optimal supervision in image classification
2026cites this paper
A survey on design choices for self-supervised learning in computer vision
2026cites this paper
ADCT: Improving Robustness and Calibration of Pattern Recognition Models Against Visual Illusions
2026cites this paper
Semantic segmentation performance of aerial image segmentation using weighted ensemble trained networks CNNs
2026cites this paper
Analyzing Heart Tones to Diagnose Pulmonary Hypertension
2026cites this paper
Detection of Cataract Using Deep Learning Models
2026cites this paper
Edge-Optimized Voice Control with 0.26 M Parameters: Distilling 86M Adaptive Window Audio Transformer for Real-World Variable-Length Inputs
2025cites this paper
Leveraging Brain Inspired Principles for Data Efficient Multimodal Learning
2025cites this paper
HNNet: Histogrammic Neural Network for Rapidly Detecting Diabetic Retinopathy with Retinal Fundus Images
2025cites this paper
CLFE-GAN: A Generation Framework for Contactless Fingerprint Enhancement
2025cites this paper
VISOR: An AI-Powered Guiding Shield for Vision
2025cites this paper
Comparative Analysis of Deep Learning Models to Identify Skin Cancer
2025cites this paper
Reconstruct and De-identify (RaD): A Joint Task Framework for Face Reconstruction and De-identification Leveraging the 3D Morphable Model Explainability
2025cites this paper
Face Detection and Identification Using Convolutional Neural Network and MobileNetV3 Model
2025cites this paper
Smart Home Automation System: Secure Face Recognition and Gesture-Based Control
2025cites this paper
A Comprehensive Survey of Modern Image Classification Architectures: Comparative Analysis of CNN, Transformer, and Hybrid Approaches on ImageNet
2025cites this paper
Adaptive Prefiltering for High-Dimensional Similarity Search: A Frequency-Aware Approach
2025cites this paper
Weakly Supervised MaxN Estimation in Baited Remote Underwater Videos
2025cites this paper
Accuracy Improvement of Prompt-Based Continual Learning with Past Information
2025cites this paper
B-Cos Networks as an Architectural Inductive Bias for Mitigating Catastrophic Forgetting
2025cites this paper
Push Quantization-Aware Training Toward Full Precision Performances via Consistency Regularization
2024cites this paper
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning
2023cites this paper
The Certification Paradox: Certifications Admit Better Attacks
2023cites this paper
From promise to practice: towards the realisation of AI-informed mental health care.
2022cites this paper
An Automated and Robust Image Watermarking Scheme Based on Deep Neural Networks
2020cites this paper
The Power of Sparsity in Convolutional Neural Networks
2017cites this paper
Joint Optimization of Camera Model and Deep Neural Network for Image Recognition
year unknowninfluential citation
HEART-PFL: Stable Personalized Federated Learning under Heterogeneity with Hierarchical Directional Alignment and Adversarial Knowledge Transfer
year unknowncites this paper
SeaClips: A Video Dataset for Maritime Object Detection.
year unknowncites this paper
BAFLE-DCT: Bypassing Adversarial Filters via Frequency-Selective Embedding in the DCT Domain
year unknowncites this paper
Knowledge-Based Systems
year unknowncites this paper
UvA-DARE (Digital Academic Repository) {MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
year unknowncites this paper
algorithm for land segmentation and building recognition to protect the Egyptian cultıvated land based on u-NET and CNN
year unknowncites this paper
Appendix for Enhancing Transferability of Targeted Adversarial Examples via Inverse Target Gradient Competition and Spatial Distance Stretching
year unknowncites this paper
KOEnsAttack: Towards Efﬁcient Data-Free Black-Box Adversarial Attacks via Knowledge-Orthogonalized Substitute Ensembles
year unknowncites this paper
Bringing RNNs Back to Efficient Open-Ended Video Understanding Supplementary Material
year unknowncites this paper
MIT Open Access
year unknowncites this paper
Supplementary Material for JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
year unknowncites this paper
An Efficient Subset Selection Strategy Using Text-Guided Data Attribution to Mitigate Simplicity Bias
year unknowninfluential citation
World Journal of Radiology
year unknowncites this paper
S TRUCTURING H IDDEN F EATURES VIA C LUSTERING OF U NIT -L EVEL A CTIVATION P ATTERNS
year unknowncites this paper
Multiscale Attention-Based Prototypical Network For Few-Shot Semantic Segmentation
year unknowncites this paper
Medical Image Analysis
year unknowncites this paper
Scalable Automated Video Labeling for Early Wildfire Smoke Detection with Fast-Then-Precise Two-Stage Inference
year unknowncites this paper