An Informed Framework for Training Classifiers from Social Media

Published 2016 in Entropy

ABSTRACT

Extracting information from social media has become a major focus of companies and researchers in recent years. Aside from the study of the social aspects, it has also been found feasible to exploit the collaborative strength of crowds to help solve classical machine learning problems like object recognition. In this work, we focus on the generally underappreciated problem of building effective datasets for training classifiers by automatically assembling data from social media. We detail some of the challenges of this approach and outline a framework that uses expanded search queries to retrieve more qualified data. In particular, we concentrate on collaboratively tagged media on the social platform Flickr, and on the problem of image classification to evaluate our approach. Finally, we describe a novel entropy-based method to incorporate an information-theoretic principle to guide our framework. Experimental validation against well-known public datasets shows the viability of this approach and marks an improvement over the state of the art in terms of simplicity and performance.

PUBLICATION RECORD

Publication year
2016
Venue
Entropy
Publication date
2016-04-09
Fields of study
Computer Science
Identifiers
DOI 10.3390/e18040130
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Improving Tag-Clouds as Visual Information Retrieval Interfaces
2024cited by this paper
Semantically-driven automatic creation of training sets for object recognition
2015cited by this paper
MatConvNet: Convolutional Neural Networks for MATLAB
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Learning Everything about Anything: Webly-Supervised Visual Concept Learning
2014cited by this paper
On Entropy-Based Data Mining
2014cited by this paper
Interactive Knowledge Discovery and Data Mining in Biomedical Informatics
2014cited by this paper
Computational approaches for mining user's opinions on the Web 2.0
2014cited by this paper
NEIL: Extracting Visual Knowledge from Web Data
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars
2012cited by this paper
The Pascal Visual Object Classes (VOC) Challenge
2010cited by this paper
Vlfeat: an open and portable library of computer vision algorithms
2010cited by this paper
Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines
2009cited by this paper
Image tag clarity: in search of visual-representative tags for social images
2009cited by this paper
ImageNet: A large-scale hierarchical image database
2009cited by this paper
Resolving tag ambiguity
2008cited by this paper
Some Objects Are More Equal Than Others: Measuring and Predicting Importance
2008cited by this paper
LIBLINEAR: A Library for Large Linear Classification
2008cited by this paper
Personalized recommendation in social tagging systems using hierarchical clustering
2008cited by this paper
Why we tag: motivations for annotation in mobile and online media
2007cited by this paper
Caltech-256 Object Category Dataset
2007cited by this paper
OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning
2007cited by this paper
Generic object recognition with boosting
2006cited by this paper
To search or to label?: predicting the performance of search-based automatic image classifiers
2006cited by this paper
Learning object categories from Google's image search
2005cited by this paper
A Bayesian network framework for relational shape matching
2003cited by this paper
Combining multiple evidence from different types of thesaurus for query expansion
1999cited by this paper
WordNet: A Lexical Database for English
1995cited by this paper
Basic English : a general introduction with rules and grammar
year unknowncited by this paper

CITED BY

Tea leaves identification based on gray-level Co-occurrence matrix and K-nearest neighbors algorithm
2019cites this paper