Image Classification with the Fisher Vector: Theory and Practice

Jorge Sánchez,Florent Perronnin,Thomas Mensink,Jakob Verbeek

Published 2013 in International Journal of Computer Vision

ABSTRACT

A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K—with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.

PUBLICATION RECORD

Publication year
2013
Venue
International Journal of Computer Vision
Publication date
2013-06-12
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1007/s11263-013-0636-x
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

CSIFT based locality-constrained linear coding for image classification
2013influential reference
Multipath Sparse Coding Using Hierarchical Matching Pursuit
2013cited by this paper
Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost
2012cited by this paper
Sparse kernel approximations for efficient classification and detection
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012influential reference
Towards good practice in large-scale learning for image classification
2012influential reference
Image categorization using Fisher kernels of non-iid image models
2012influential reference
Efficient Additive Kernels via Explicit Feature Maps
2012cited by this paper
Meta-class features for large-scale object categorization on a budget
2012cited by this paper
Modeling the spatial layout of images beyond spatial pyramids
2012cited by this paper
Aggregating Local Image Descriptors into Compact Codes
2012influential reference
The devil is in the details: an evaluation of recent feature encoding methods
2011cited by this paper
High-dimensional signature compression for large-scale image classification
2011cited by this paper
Large-scale image classification: Fast feature extraction and SVM training
2011influential reference
Modeling spatial layout with fisher vectors for image categorization
2011cited by this paper
Building high-level features using large scale unsupervised learning
2011cited by this paper
Geometric ℓp-norm feature pooling for image classification
2011cited by this paper
Ask the locals: Multi-way local pooling for image recognition
2011cited by this paper
Product Quantization for Nearest Neighbor Search
2011cited by this paper
Discriminative affine sparse codes for image classification
2011cited by this paper
Unbiased look at dataset bias
2011cited by this paper
Efficient additive kernels via explicit feature maps
2010cited by this paper
Learning mid-level features for recognition
2010cited by this paper
The Pascal Visual Object Classes (VOC) Challenge
2010cited by this paper
Improving the Fisher Kernel for Large-Scale Image Classification
2010influential reference
Evaluating Color Descriptors for Object and Scene Recognition
2010cited by this paper
Locality-constrained Linear Coding for image classification
2010cited by this paper
Visual Word Ambiguity
2010influential reference
What Does Classifying More Than 10, 000 Image Categories Tell Us?
2010influential reference
SUN database: Large-scale scene recognition from abbey to zoo
2010cited by this paper
Generalized RBF feature maps for Efficient Detection
2010cited by this paper
Aggregating local descriptors into a compact image representation
2010cited by this paper
Large-scale image retrieval with compressed Fisher vectors
2010influential reference
Multimodal semi-supervised learning for image classification
2010influential reference
Large-scale image categorization with explicit data embedding
2010influential reference
Image Classification Using Super-Vector Coding of Local Image Descriptors
2010influential reference
Max-margin additive classifiers for detection
2009cited by this paper
Efficient Match Kernel between Sets of Features for Visual Recognition
2009cited by this paper
Linear spatial pyramid matching using sparse coding for image classification
2009influential reference
What is the spatial extent of an object?
2009influential reference
Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines
2009cited by this paper
Group-sensitive multiple kernel learning for object categorization
2009cited by this paper
On feature combination for multiclass object classification
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009influential reference
A similarity measure between unordered vector sets with application to image categorization
2008cited by this paper
Classification using intersection kernel support vector machines is efficient
2008cited by this paper
In defense of Nearest-Neighbor based image classification
2008cited by this paper
Regression from patch-kernel
2008cited by this paper
XRCE ’ s participation to ImagEval
2007influential reference
Asymptotic Distribution of Coordinates on High Dimensional Spheres
2007cited by this paper
Caltech-256 Object Category Dataset
2007cited by this paper
The Tradeoffs of Large Scale Learning
2007cited by this paper
Fisher Kernels on Visual Vocabularies for Image Categorization
2007cited by this paper
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study
2006cited by this paper
The Pascal Visual Object Classes Challenge 2006 ( VOC 2006 ) Results
2006cited by this paper
The PASCAL Visual Object Classes Challenge
2006cited by this paper
Adapted Vocabularies for Generic Visual Categorization
2006influential reference
Local Features and Kernels for Classication of Texture and Object Categories: A Comprehensive Study
2006cited by this paper
The HTK book version 3.4
2006cited by this paper
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
2006influential reference
Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels
2005cited by this paper
Mercer kernels for object recognition with local features
2005cited by this paper
Object categorization by learned universal visual dictionary
2005influential reference
Distinctive Image Features from Scale-Invariant Keypoints
2004influential reference
Shape and View Independent Reflectance Map from Multiple Views
2004cited by this paper
Distinctive Image Features from Scale-Invariant Keypoints
2004influential reference
Recognition with local features: the kernel recipe
2003cited by this paper
Video Google: a text retrieval approach to object matching in videos
2003cited by this paper
Visual categorization with bags of keypoints
2002influential reference
Speech Recognition using SVMs
2001cited by this paper
Methods of information geometry
2000cited by this paper
Convolution kernels on discrete structures
1999influential reference
Exploiting Generative Models in Discriminative Classifiers
1998cited by this paper
Quantum Information Theory
1998cited by this paper
IEEE Transactions on Information Theory
1998cited by this paper
_{}-norm uniform distribution
1997cited by this paper
Lp-NORM UNIFORM DISTRIBUTION
1996cited by this paper
A norm selection criterion for the generalized delta rule
1991cited by this paper
Statistical analysis of finite mixture distributions
1986cited by this paper
Product code vector quantizers for waveform and voice coding
1984cited by this paper
Finite Mixture Distributions
1982cited by this paper
Finite Mixture Distributions
1981cited by this paper
Author Manuscript, Published in "ieee Conference on Computer Vision and Pattern Recognition on the Burstiness of Visual Elements
year unknowncited by this paper
Mathematical Programming manuscript No. (will be inserted by the editor) Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
year unknowncited by this paper
Author manuscript, published in "International Conference on Computer Vision (2009) 8 p." Combining efficient object localization and image classification
year unknowninfluential reference

CITED BY

Boosting VLAD with weighted fusion of local descriptors for image retrieval
2018cites this paper
Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification
2018cites this paper
Video2vec Embeddings Recognize Events When Examples Are Scarce
2017cites this paper
Integrated Global-Local Metric Learning for Person Re-identification
2017cites this paper