One promising approach to dealing with datapoints that are outside of the initial training distribution (OOD) is to create new classes that capture similarities in the datapoints previously rejected as uncategorizable. Systems that generate labels can be deployed against an arbitrary amount of data, discovering classification schemes that through training create a higher quality representation of data. We introduce the Dataset Reconstruction Accuracy, a new and important measure of the effectiveness of a model's ability to create labels. We introduce benchmarks against this Dataset Reconstruction metric. We apply a new heuristic, class learnability, for deciding whether a class is worthy of addition to the training dataset. We show that our class discovery system can be successfully applied to vision and language, and we demonstrate the value of semi-supervised learning in automatically discovering novel classes.
ABSTRACT
PUBLICATION RECORD
- Publication year
2020
- Venue
arXiv.org
- Publication date
2020-02-10
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
CONCEPTS
- class discovery system
An automated system that generates labels and identifies novel classification schemes from unlabeled or rejected data.
뀨 (7c402c1b98) extractionimjlk (vdp8mqzes2) reviewAnonymous (12632b8b5f) review박진우 (dztg5apj7m) review - class learnability
A heuristic introduced in this paper to decide whether a newly discovered class is suitable for addition to the training dataset.
뀨 (7c402c1b98) extractionimjlk (vdp8mqzes2) reviewAnonymous (12632b8b5f) review박진우 (dztg5apj7m) review - dataset reconstruction accuracy
A metric introduced in this paper to measure how effectively a model creates labels that reconstruct the dataset.
Aliases: DRA
뀨 (7c402c1b98) extractionimjlk (vdp8mqzes2) reviewAnonymous (12632b8b5f) review박진우 (dztg5apj7m) review - out-of-distribution data
Datapoints that fall outside the initial training distribution and cannot be categorized by an existing model.
Aliases: OOD data, OOD datapoints
뀨 (7c402c1b98) extractionimjlk (vdp8mqzes2) reviewAnonymous (12632b8b5f) review박진우 (dztg5apj7m) review - semi-supervised learning
A learning paradigm that uses both labeled and unlabeled data, applied here to discover novel classes automatically.
뀨 (7c402c1b98) extractionimjlk (vdp8mqzes2) reviewAnonymous (12632b8b5f) review박진우 (dztg5apj7m) review
REFERENCES
Showing 1-45 of 45 references · Page 1 of 1
CITED BY
Showing 1-2 of 2 citing papers · Page 1 of 1