Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition

Zhe Wang,Limin Wang,Yali Wang,Bowen Zhang,Y. Qiao

Published 2016 in IEEE Transactions on Image Processing

ABSTRACT

Traditional feature encoding scheme (e.g., Fisher vector) with local descriptors (e.g., SIFT) and recent convolutional neural networks (CNNs) are two classes of successful methods for image recognition. In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition. To this end, we make three main contributions from the following aspects. First, we propose a patch-level and end-to-end architecture to model the appearance of local patches, called PatchNet. PatchNet is essentially a customized network trained in a weakly supervised manner, which uses the image-level supervision to guide the patch-level feature extraction. Second, we present a hybrid visual representation, called VSAD, by utilizing the robust feature representations of PatchNet to describe local patches and exploiting the semantic probabilities of PatchNet to aggregate these local patches into a global representation. Third, based on the proposed VSAD representation, we propose a new state-of-the-art scene recognition approach, which achieves an excellent performance on two standard benchmarks: MIT Indoor67 (86.2%) and SUN397 (73.0%).

PUBLICATION RECORD

Publication year
2016
Venue
IEEE Transactions on Image Processing
Publication date
2016-09-01
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1109/TIP.2017.2666739 arXiv 1609.00153 PMID 28207394
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Back Propagation
2019influential reference
Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation
2018cited by this paper
Algorithm-Dependent Generalization Bounds for Multi-Task Learning
2017cited by this paper
Friend or Foe: Fine-Grained Categorization With Weak Supervision
2017cited by this paper
Large-Cone Nonnegative Matrix Factorization
2017cited by this paper
Locally Supervised Deep Hybrid Model for Scene Recognition.
2017cited by this paper
Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking
2017cited by this paper
Friend or Foe: Fine-Grained Categorization With Weak Supervision.
2017cited by this paper
Algorithm-Dependent Generalization Bounds for Multi-Task Learning.
2017cited by this paper
Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images
2016cited by this paper
Locally Supervised Deep Hybrid Model for Scene Recognition
2016cited by this paper
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
2016cited by this paper
Codebook enhancement of vlad representation for visual recognition
2016cited by this paper
MoFAP: A Multi-level Representation for Action Recognition
2016cited by this paper
Hybrid CNN and Dictionary-Based Models for Scene Recognition and Domain Adaptation
2016cited by this paper
Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs
2016cited by this paper
Actionness Estimation Using Hybrid Fully Convolutional Networks
2016cited by this paper
Deep Filter Banks for Texture Recognition, Description, and Segmentation
2015cited by this paper
Learning Contextual Dependence With Convolutional Hierarchical Recurrent Neural Networks
2015influential reference
Local Color Contrastive Descriptor for Image Classification
2015cited by this paper
Recognize complex events from static images by fusing deep channels
2015cited by this paper
Multi-scale Recognition with DAG-CNNs
2015influential reference
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
2015cited by this paper
Object-Scene Convolutional Neural Networks for event recognition in images
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Learning Deep Convolutional Neural Networks for Places2 Scene Recognition
2015cited by this paper
Scene classification with semantic Fisher vectors
2015cited by this paper
Action recognition with trajectory-pooled deep-convolutional descriptors
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Deep Spatial Pyramid: The Devil is Once Again in the Details
2015influential reference
Places205-VGGNet Models for Scene Recognition
2015cited by this paper
Multi-scale pyramid pooling for deep convolutional representation
2015cited by this paper
Multi-scale Orderless Pooling of Deep Convolutional Activation Features
2014cited by this paper
Latent Hierarchical Model of Temporal Structure for Complex Activity Classification
2014cited by this paper
Learning Deep Features for Scene Recognition using Places Database
2014cited by this paper
Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Learning Object-to-Class Kernels for Scene Classification
2014cited by this paper
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
2014cited by this paper
Convolutional Network Features for Scene Recognition
2014cited by this paper
Orientational Pyramid Matching for Recognizing Indoor Scenes
2014influential reference
SUN Database: Exploring a Large Collection of Scene Categories
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014cited by this paper
Learning Discriminative and Shareable Features for Scene Classification
2014cited by this paper
Click Prediction for Web Image Reranking Using Multimodal Sparse Coding
2014cited by this paper
Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice
2014cited by this paper
Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics
2014cited by this paper
Going deeper with convolutions
2014influential reference
Author manuscript, published in "International Journal of Computer Vision (2013)" International Journal of Computer Vision manuscript No. (will be inserted by the editor) Image Classification with the Fisher Vector: Theory and Practice
2013cited by this paper
Blocks That Shout: Distinctive Parts for Scene Classification
2013cited by this paper
BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
2013cited by this paper
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
2013influential reference
Pairwise constraints based multiview features fusion for scene classification
2013cited by this paper
Mid-level Visual Element Discovery as Discriminative Mode Seeking
2013influential reference
Reconfigurable models for scene recognition
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition
2012cited by this paper
Unsupervised Discovery of Mid-Level Discriminative Patches
2012cited by this paper
Aggregating Local Image Descriptors into Compact Codes
2012cited by this paper
CENTRIST: A Visual Descriptor for Scene Categorization
2011cited by this paper
Scene recognition and weakly supervised object localization with deformable part-based models
2011cited by this paper
Speeded Up Robust Features
2011cited by this paper
Vlfeat: an open and portable library of computer vision algorithms
2010cited by this paper
Learning mid-level features for recognition
2010cited by this paper
Biologically Inspired Feature Manifold for Scene Classification
2010cited by this paper
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
2010cited by this paper
SUN database: Large-scale scene recognition from abbey to zoo
2010cited by this paper
Improving the Fisher Kernel for Large-Scale Image Classification
2010influential reference
Locality-constrained Linear Coding for image classification
2010cited by this paper
Image Classification Using Super-Vector Coding of Local Image Descriptors
2010cited by this paper
Recognizing indoor scenes
2009cited by this paper
Linear spatial pyramid matching using sparse coding for image classification
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Speeded-Up Robust Features (SURF)
2008cited by this paper
UvA-DARE ( Digital Academic Repository ) Kernel codebooks for scene categorization
2008cited by this paper
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
2006cited by this paper
$rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
2006cited by this paper
Histograms of oriented gradients for human detection
2005influential reference
K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation
2005cited by this paper
Distinctive Image Features from Scale-Invariant Keypoints
2004cited by this paper
Video Google: a text retrieval approach to object matching in videos
2003cited by this paper
Visual categorization with bags of keypoints
2002cited by this paper
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
2001cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper

CITED BY

Context-Aware Dynamic Integration for Scene Recognition
2025cites this paper
From Global to Hybrid: A Review of Supervised Deep Learning for 2-D Image Feature Representation
2025cites this paper
Scene Classification on Fine Arts with Style Transfer
2024cites this paper
Feature selection through adaptive sparse learning for scene recognition
2024cites this paper
Attention-Based Deep Neural Network Combined Local and Global Features for Indoor Scene Recognition
2024cites this paper
SC-ViT: Semantic Contrast Vision Transformer for Scene Recognition
2024cites this paper
NEM: Nested Ensemble Model for scene recognition
2024cites this paper
A single-stream adaptive scene layout modeling method for scene recognition
2024cites this paper
Insights into Image Understanding: Segmentation Methods for Object Recognition and Scene Classification
2024cites this paper
Multi-Source Ensemble Model for Scene Recognition
2024cites this paper
Designing Deep Networks for Scene Recognition
2023cites this paper
Attention-Based Knowledge Distillation in Scene Recognition: The Impact of a DCT-Driven Loss
2023cites this paper
EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition
2023cites this paper
Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss
2022cites this paper
Joint global metric learning and local manifold preservation for scene recognition
2022cites this paper
Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation for Scene Recognition
2022cites this paper
Indoor localization system using deep learning based scene recognition
2022influential citation
Scene recognition using multiple representation network
2022cites this paper
Recent advances in scene image representation and classification
2022cites this paper
Federated learning: a deep learning model based on resnet18 dual path for lung nodule detection
2022cites this paper
Semantic embedding: scene image classification using scene-specific objects
2022cites this paper
RETRACTED: Learning robust features for indoor scene recognition
2022cites this paper
Spatial-Channel Transformer for Scene Recognition
2022cites this paper
Perception Framework through Real-Time Semantic Segmentation and Scene Recognition on a Wearable System for the Visually Impaired
2021cites this paper
Deep Learning for Scene Classification: A Survey
2021influential citation
Place perception from the fusion of different image representation
2021cites this paper
Cross-Modal Pyramid Translation for RGB-D Scene Recognition
2021cites this paper
An embarrassingly simple comparison of machine learning algorithms for indoor scene classification
2021cites this paper
Scale attentive network for scene recognition
2021influential citation
Attention Pyramid Module for Scene Recognition
2021cites this paper
Learning Scene Attribute for Scene Recognition
2020influential citation
Is Whole Object Information Helpful for Scene Recognition?
2020cites this paper
A novel technique for automated concealed face detection in surveillance videos
2020cites this paper
Hierarchical Coding of Convolutional Features for Scene Recognition
2020cites this paper
Urban Scene Recognition via Deep Network Integration
2020cites this paper
Urban Intelligence and Applications: Second International Conference, ICUIA 2020, Taiyuan, China, August 14–16, 2020, Revised Selected Papers
2020cites this paper
An Efficient RGB-D Scene Recognition Method Based on Multi-Information Fusion
2020cites this paper
Scene recognition: A comprehensive survey
2020cites this paper
DASGIL: Domain Adaptation for Semantic and Geometric-Aware Image-Based Localization
2020cites this paper
What am I allowed to do here?: Online Learning of Context-Specific Norms by Pepper
2020cites this paper
Scene Recognition with Comprehensive Regions Graph Modeling
2019cites this paper
Fusing Object Semantics and Deep Appearance Features for Scene Recognition
2019cites this paper
Deep Learning for Automated Medical Image Analysis
2019influential citation
Foreground Fisher Vector: Encoding Class-Relevant Foreground to Improve Image Classification
2019cites this paper
A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter
2019cites this paper
MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
2019cites this paper
Indoor Image Representation by High-Level Semantic Features
2019cites this paper
Automatic Scene Recognition Based on Constructed Knowledge Space Learning
2019cites this paper
A Dynamic Scene Recognition Method for Event-Based Social Network
2019cites this paper
A Survey of Deep Learning Solutions for Multimedia Visual Content Analysis
2019cites this paper
Weakly Semantic Guided Action Recognition
2019cites this paper
FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition
2019cites this paper
Object Feature xxoooooooooooo Object Score GAP Scene Features XXllllssoo Scene Scores OO Scene Score �OO GAP PlacesNet Scene Feature xxssoooossoo Scene Score �OO Scene Features XXllllssoo GAP Convolution Layers
2019cites this paper
HA-CCN: Hierarchical Attention-Based Crowd Counting Network
2019cites this paper
Multi-stream Convolutional Networks for Indoor Scene Recognition
2019cites this paper
Research on Indoor Positioning Method Based on Improved HS-AlexNet Model
2019cites this paper
Semantic-Aware Scene Recognition
2019influential citation
WS-AM: Weakly Supervised Attention Map for Scene Recognition
2019cites this paper
Centroid-Based Scene Classification (CBSC): Using Deep Features and Clustering for RGB-D Indoor Scene Classification
2019cites this paper
Adaptive Attention Annotation Model: Optimizing the Prediction Path through Dependency Fusion
2019cites this paper
On Modeling Context from Objects with a Long Short-Term Memory for Indoor Scene Recognition
2019cites this paper
Centroid Based Concept Learning for RGB-D Indoor Scene Classification
2019cites this paper
Fusing Scene Context to Improve Object Recognition
2018cites this paper
Temporal Hallucinating for Action Recognition with Few Still Images
2018cites this paper
Learning Effective RGB-D Representations for Scene Recognition
2018cites this paper
Scene Image Classification Using Reduced Virtual Feature Representation in Sparse Framework
2018cites this paper
From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition
2018influential citation
Sequential Video VLAD: Training the Aggregation Locally and Temporally
2018cites this paper
Scene Semantic Recognition Based on Probability Topic Model
2018cites this paper
DeepLung: Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification
2018influential citation
Scene recognition with objectness
2018cites this paper
Structured Triplet Learning with POS-Tag Guided Attention for Visual Question Answering
2018cites this paper
Scene Recognition via Bi-enhanced Knowledge Space Learning
2018cites this paper
Extraction of Visual Features from Video Sequences for Better Visual Analysis
2018cites this paper
Using Scene Context to Improve Action Recognition
2018cites this paper
Hierarchy of Alternating Specialists for Scene Recognition
2018cites this paper
Locally Supervised Deep Hybrid Model for Scene Recognition.
2017cites this paper
Weakly Supervised PatchNets : Learning Aggregated Patch Descriptors for Scene Recognition
2017cites this paper
Crowded scene understanding algorithm based on Two-Stream Residual Network
2017influential citation
Good Practice on Deep Scene Classification: from Local Supervision to Knowledge Guided Disambiguation
2017cites this paper
A Robust Indoor Scene Recognition Method Based on Sparse Representation
2017cites this paper
High-Order Local Pooling and Encoding Gaussians Over a Dictionary of Gaussians
2017cites this paper
Locally Supervised Deep Hybrid Model for Scene Recognition
2016cites this paper
Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs
2016influential citation
Expert Systems With Applications
year unknowncites this paper