Return of the Devil in the Details: Delving Deep into Convolutional Nets

Ken Chatfield,K. Simonyan,A. Vedaldi,Andrew Zisserman

Published 2014 in British Machine Vision Conference

ABSTRACT

The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details. We identify several useful properties of CNN-based representations, including the fact that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. We also identify aspects of deep and shallow methods that can be successfully shared. In particular, we show that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost. Source code and models to reproduce the experiments in the paper is made publicly available.

PUBLICATION RECORD

Publication year
2014
Venue
British Machine Vision Conference
Publication date
2014-05-14
Fields of study
Computer Science
Identifiers
DOI 10.5244/C.28.6 arXiv 1405.3531
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Weakly supervised object recognition with convolutional neural networks
2014influential reference
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
2014cited by this paper
HCP: A Flexible CNN Framework for Multi-Label Image Classification
2014cited by this paper
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
2014cited by this paper
Transformation Pursuit for Image Classification
2014cited by this paper
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
2014cited by this paper
Visualizing and Understanding Convolutional Networks
2013influential reference
All About VLAD
2013cited by this paper
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
2013cited by this paper
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
2013cited by this paper
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Efficient Additive Kernels via Explicit Feature Maps
2012cited by this paper
Aggregating Local Image Descriptors into Compact Codes
2012cited by this paper
Modeling the spatial layout of images beyond spatial pyramids
2012cited by this paper
Towards good practice in large-scale learning for image classification
2012cited by this paper
Three things everyone should know to improve object retrieval
2012cited by this paper
The devil is in the details: an evaluation of recent feature encoding methods
2011cited by this paper
Efficient additive kernels via explicit feature maps
2010cited by this paper
The Pascal Visual Object Classes (VOC) Challenge
2010influential reference
Improving the Fisher Kernel for Large-Scale Image Classification
2010influential reference
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Caltech-256 Object Category Dataset
2007cited by this paper
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
2006cited by this paper
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories
2004cited by this paper
Video Google: a text retrieval approach to object matching in videos
2003cited by this paper
Visual categorization with bags of keypoints
2002cited by this paper
Backpropagation Applied to Handwritten Zip Code Recognition
1989cited by this paper
and as an in
year unknowncited by this paper
AND T
year unknowncited by this paper
Author Manuscript, Published in "ieee Conference on Computer Vision and Pattern Recognition on the Burstiness of Visual Elements
year unknowncited by this paper

CITED BY

Early detection of pressure injuries via infrared thermography and ConvNeXt
2026cites this paper
Face Gender Recognition Optimization Using VGG-16 With Integration of Spatial Attention Block and Channel Attention Block
2026cites this paper
FPAD: Fuzzy-Prototype-Guided Adversarial Attack and Defense for Deep Cross-Modal Hashing
2026cites this paper
Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning
2026cites this paper
Deep learning-based object identification for grasping force control of a robotic soft end effector
2026cites this paper
CFLip: Generalizing Lipreading to Unseen Speakers by Learning Common Features
2026cites this paper
Advancing Active Speaker Detection for Egocentric Videos
2025cites this paper
Multimodal Dual-Graph Collaborative Network With Serial Attentive Aggregation Mechanism for Micro-Video Multi-Label Classification
2025cites this paper
Computational Efficient General Convolutional Layer Selection for Transfer Learning
2025cites this paper
Federated Multi-Modal Knowledge Graph Representation Learning with Optimal Transport Alignment
2025cites this paper
Brain-Computer Interfaces and AI Segmentation in Neurosurgery: A Systematic Review of Integrated Precision Approaches
2025cites this paper
Advanced Diagnostics of Aircraft Structures Using Automated Non-Invasive Imaging Techniques: A Comprehensive Review
2025cites this paper
Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
2025cites this paper
A two-stage correction method for UAV movement-induced errors in non-target computer vision-based displacement measurement
2025cites this paper
A Classification and Detection of Cotton Leaf Disease Using Lightweight CNN Architecture
2025cites this paper
Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing
2025influential citation
Unsupervised Federated Learning for Face Recognition in Decentralized Environments
2025cites this paper
DeepPartitioning: Deep Learning of Graph Partitioning for Neuron Segmentation From Electron Microscopy Volume via Graph Neural Network
2025cites this paper
SCSFish2025: a large dataset from South China sea for coral reef fish identification
2025cites this paper
Fusion of Global and Local Features with Multi-Inverted Indices for Image Retrieval
2025cites this paper
A Bayesian Neural Network Guided Fractional Order Level-Set Model for Segmenting Intensity Inhomogeneous Images
2025cites this paper
Enhanced Brain Tumor Segmentation Using Transfer Learning- Based Residual U-Net Architecture
2025cites this paper
Coarse-to-Fine Cross-Modality Generation for Enhancing Vehicle Re-Identification with High-Fidelity Synthetic Data
2025cites this paper
Advantage of Different Data Dimensions and Dual-Beam for Radar Low Resolution Doppler Classification
2025cites this paper
Improving trajectory continuity in drone-based crowd monitoring using a set of minimal-cost techniques and deep discriminative correlation filters
2025cites this paper
AITtrack: Attention-Based Image-Text Alignment for Visual Tracking
2025cites this paper
Attack as Defense: Proactive Adversarial Multi-Modal Learning to Evade Retrieval
2025cites this paper
Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing
2025cites this paper
Deep Learning in Palmprint Recognition-A Comprehensive Survey
2025cites this paper
Acute Lymphoblastic Leukemia Diagnosis Employing YOLOv11, YOLOv8, ResNet50, and Inception-ResNet-v2 Deep Learning Models
2025cites this paper
Evolutionary Neural Architecture Search for Remote Sensing Image Classification
2025cites this paper
Efficient quantification of Parkinson’s disease severity using augmented time-series data
2025cites this paper
When deep learning deciphers silent video: a survey on automatic deep lip reading
2025cites this paper
An Encoder-Agnostic Weakly Supervised Method For Describing Textures
2025cites this paper
Synaptic plasticity-based regularizer for artificial neural networks
2025cites this paper
Automatic 3D inspection method for AR-assisted assembly based on virtual-to-real registration
2025cites this paper
An Insight on the Timely Diagnosis of Diabetic Retinopathy Using Traditional and AI-Driven Approaches
2025cites this paper
STAR: A Unified Spatiotemporal Fusion Framework for Satellite Video Object Tracking
2025cites this paper
Comparative Analysis of Conventional and Focused Data Augmentation Methods in Rib Fracture Detection in CT Images
2025cites this paper
MC-Mamba: Cross-modal target speaker extraction model based on multiple consistency
2025cites this paper
Monocular Cone Sleeve Measurement Based on Kernelized Correlation Filter Tracking and Pinhole Imaging Model
2025cites this paper
Finding the Sweet Spot: A Study of Data Augmentation Intensity for Small-Scale Image Classification
2025cites this paper
Improving Periocular Recognition Accuracy: Opposite Side Learning Suppression and Vertical Image Inversion
2025cites this paper
Class-Wise Combination of Mixture-Based Data Augmentation for Class Imbalance Learning of Focal Liver Lesions in Abdominal CT Images
2025cites this paper
Reassessing deep learning (and meta-learning) computer vision as an efficient method to determine taphonomic agency in bone surface modifications
2025cites this paper
Prospect certainty for data-driven models
2025cites this paper
Visual Question Answering: A Survey of Methods, Datasets, Evaluation, and Challenges
2025cites this paper
Handling Out-of-Distribution Data: A Survey
2025cites this paper
An Overview of AI-Guided Thyroid Ultrasound Image Segmentation and Classification for Nodule Assessment
2025cites this paper
Object-Specific Multiview Classification Through View-Compatible Feature Fusion
2025cites this paper
Self-supervised Bidirectional Synchronization Estimation for Multimodal Deepfake Detection with Short-term Dependency
2025cites this paper
Label-consistent kernel transform learning-based sparse hashing for cross-modal retrieval
2025cites this paper
Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages
2025cites this paper
Robust discriminative correlation-based full-field motion estimation of large-scale structures using a single video camera
2025cites this paper
A Simple Finetuning Strategy Based on Bias-Variance Ratios of Layer-Wise Gradients
2024cites this paper
One size does not fit all in evaluating model selection scores for image classification
2024cites this paper
Graph-Based COVID-19 Detection Using Conditional Generative Adversarial Network
2024cites this paper
MoroccoLens: An ML-Based Mobile Application for Monument Recognition
2024cites this paper
PieBridge: Fast and Parameter-Efficient On-Device Training via Proxy Networks
2024cites this paper
Scalp Disorder Imaging: How Deep Learning and Explainable Artificial Intelligence are Revolutionizing Diagnosis and Treatment
2024cites this paper
Personalization of industrial human–robot communication through domain adaptation based on user feedback
2024cites this paper
Robust tracking for visual complex environments
2024cites this paper
Secure and Efficient Face Recognition via Supervised Federated Learning
2024cites this paper
Joint Semantic Preserving Sparse Hashing for Cross-Modal Retrieval
2024cites this paper
Enabling deformation slack in tracking with temporally even correlation filters
2024cites this paper
Dual variational network for unsupervised cross-modal hashing
2024influential citation
CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning
2024cites this paper
Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
2024cites this paper
Examining noncommunicable diseases using satellite imagery: a systematic literature review
2024cites this paper
A retinal detachment based strabismus detection through FEDCNN
2024cites this paper
Detection and Classification of Cotton Plant Disease Using Deep Learning Network
2024cites this paper
CNN Training Latency Prediction Using Hardware Metrics on Cloud GPUs
2024cites this paper
A Deep Learning-Based System for Driver Fatigue Detection
2024cites this paper
Multi-scale frequency domain learning for texture classification
2024cites this paper
Tropical cyclone tracking from geostationary infrared satellite images using deep learning techniques
2024influential citation
Artificial Intelligence in Head and Neck Cancer: Innovations, Applications, and Future Directions
2024cites this paper
Low Doppler Resolution Radar-Based Target Classification for Clutter Suppression
2024cites this paper
Tiny Machine Learning: Progress and Futures [Feature]
2024cites this paper
Image CAPTCHAs: When Deep Learning Breaks the Mold
2024cites this paper
A two-stage algorithm for heterogeneous face recognition using Deep Stacked PCA Descriptor (DSPD) and Coupled Discriminant Neighbourhood Embedding (CDNE)
2024cites this paper
Visual search and real-image similarity: An empirical assessment through the lens of deep learning
2024cites this paper
DySarl: Dynamic Structure-Aware Representation Learning for Multimodal Knowledge Graph Reasoning
2024cites this paper
Learning a Context-Aware Environmental Residual Correlation Filter via Deep Convolution Features for Visual Object Tracking
2024cites this paper
Automated detection, labelling and radiological grading of clinical spinal MRIs
2024cites this paper
Multi-Modal Siamese Network for Few-Shot Knowledge Graph Completion
2024cites this paper
Improved deep learning image compression model: performance optimization based on convolutional modules and local attention mechanism
2024cites this paper
Deep learning-based predictive models of land subsidence and collapsed pipes in Razavi Khorasan Province, Iran
2024cites this paper
Deep supervised fused similarity hashing for cross-modal retrieval
2024cites this paper
SalDA: DeepConvNet Greets Attention for Visual Saliency Prediction
2024cites this paper
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
2024cites this paper
Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification
2024cites this paper
HNet: A deep learning based hybrid network for speaker dependent visual speech recognition
2024influential citation
CGGNet: Compiler-Guided Generation Network for Smart Contract Data Augmentation
2024cites this paper
Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval
2024cites this paper
Semantic-alignment transformer and adversary hashing for cross-modal retrieval
2024cites this paper
Arabic Lip Reading With Limited Data Using Deep Learning
2024cites this paper
Multi-Modal Learning-Based Blind Video Quality Assessment Metric for Synthesized Views
2024cites this paper
From Radiologist Report to Image Label: Assessing Latent Dirichlet Allocation in Training Neural Networks for Orthopedic Radiograph Classification
2024cites this paper
Evaluating CNN Models for Gait Recognition: A Study on the CASIA-B Dataset
2024cites this paper
OFACD: An end-to-end change detection network for small UAVs remote sensing with viewpoint differences
2024cites this paper