Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to “α-pooling”, allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions made by these approaches. We identify parts of training images having the highest influence on the prediction of a given test image. This allows for justifying decisions to users and also for analyzing the influence of semantic parts. For example, we can show that the higher capacity VGG16 model focuses much more on the bird's head than, e.g., the lower-capacity VGG-M model when recognizing fine-grained bird categories. Both contributions allow us to analyze the difference when moving between average and bilinear pooling. In addition, experiments show that our generalized approach can outperform both across a variety of standard datasets.
Generalized Orderless Pooling Performs Implicit Salient Matching
Marcel Simon,Yang Gao,Trevor Darrell,Joachim Denzler,E. Rodner
Published 2017 in IEEE International Conference on Computer Vision
ABSTRACT
PUBLICATION RECORD
- Publication year
2017
- Venue
IEEE International Conference on Computer Vision
- Publication date
2017-05-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-36 of 36 references · Page 1 of 1
CITED BY
Showing 1-42 of 42 citing papers · Page 1 of 1