Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Published 2016 in International Conference on Machine Learning

ABSTRACT

We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchical predictor. Our approach optimizes an objective function which favors balanced and easily-separable multi-way node partitions. We theoretically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the algorithm on text classification and language modeling, respectively, and show that they compare favorably to common baselines in terms of accuracy and running time.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Machine Learning
Publication date
2016-10-14
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1610.04658
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Points of Significance: Classification and regression trees
2017cited by this paper
Logarithmic Time One-Against-Some
2016cited by this paper
Efficient softmax approximation for GPUs
2016cited by this paper
On the boosting ability of top-down decision tree learning algorithm for multiclass classification
2016influential reference
Bag of Tricks for Efficient Text Classification
2016influential reference
YFCC100M
2015cited by this paper
Density Estimation Trees
2015cited by this paper
When and why are log-linear models self-normalizing?
2015cited by this paper
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
2015cited by this paper
Deep Neural Decision Forests
2015cited by this paper
An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family
2015cited by this paper
Sparse Local Embeddings for Extreme Multi-label Classification
2015cited by this paper
Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content
2015cited by this paper
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
2015cited by this paper
Dependency Recurrent Neural Language Models for Sentence Completion
2015cited by this paper
Logarithmic Time Online Multiclass prediction
2014cited by this paper
FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning
2014cited by this paper
Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets
2014cited by this paper
#TagSpace: Semantic Embeddings from Hashtags
2014cited by this paper
Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting
2014cited by this paper
Dependency Language Models for Sentence Completion
2013cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013influential reference
Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages
2013cited by this paper
Label Partitioning For Sublinear Ranking
2013cited by this paper
Least Squares Revisited: Scalable Approaches for Multi-class Prediction
2013cited by this paper
Sparse Output Coding for Large-Scale Visual Recognition
2013cited by this paper
Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
2012cited by this paper
Computational Approaches to Sentence Completion
2012cited by this paper
A fast and simple algorithm for training neural probabilistic language models
2012influential reference
Online Learning and Online Convex Optimization
2012influential reference
Classification and regression trees
2011cited by this paper
Empirical Evaluation and Combination of Advanced Language Modeling Techniques
2011cited by this paper
WSABIE: Scaling Up to Large Vocabulary Image Annotation
2011cited by this paper
Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
2011cited by this paper
Torch7: A Matlab-like Environment for Machine Learning
2011cited by this paper
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
2011cited by this paper
ON STRONGLY MIDCONVEX FUNCTIONS
2011influential reference
Recurrent neural network based language model
2010cited by this paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
2010cited by this paper
Label Embedding Trees for Large Multi-Class Tasks
2010cited by this paper
A Multi-class SVM Classifier Utilizing Binary Decision Tree
2009cited by this paper
Error-Correcting Tournaments
2009cited by this paper
Conditional Probability Tree Estimation Analysis and Algorithms
2009influential reference
Multi-Label Prediction via Compressed Sensing
2009cited by this paper
A Scalable Hierarchical Distributed Language Model
2008influential reference
Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model
2008cited by this paper
A unified architecture for natural language processing: deep neural networks with multitask learning
2008cited by this paper
Three new graphical models for statistical language modelling
2007influential reference
Pattern Recognition and Machine Learning
2006cited by this paper
Hierarchical Probabilistic Neural Network Language Model
2005influential reference
Training Neural Network Language Models on Very Large Corpora
2005cited by this paper
Training Connectionist Models for the Structured Language Model
2003cited by this paper
Quick Training of Probabilistic Neural Nets by Importance Sampling
2003cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
Connectionist language modeling for large vocabulary continuous speech recognition
2002cited by this paper
Random Forests
2001cited by this paper
Comparison of part-of-speech and automatically derived category-based language models for speech recognition
1998cited by this paper
An Empirical Study of Smoothing Techniques for Language Modeling
1996cited by this paper
Improved clustering techniques for class-based statistical language modelling
1993cited by this paper
Adaptive Algorithms and Stochastic Approximations
1990cited by this paper
Estimation of probabilities from sparse data for the language model component of a speech recognizer
1987cited by this paper
Interpolated estimation of Markov source parameters from sparse data
1980cited by this paper

CITED BY

AGNOMIN - Architecture Agnostic Multi-Label Function Name Prediction
2025cites this paper
SDHC: Joint Semantic-Data Guided Hierarchical Classification for Fine-Grained HRRP Target Recognition
2024cites this paper
Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-label Classification
2024cites this paper
Dual-Encoders for Extreme Multi-label Classification
2023cites this paper
Extreme Multi-Label Classification for Ad Targeting using Factorization Machines
2023cites this paper
Efficacy of Dual-Encoders for Extreme Multi-Label Classification
2023cites this paper
Enhancing Group Fairness in Online Settings Using Oblique Decision Forests
2023cites this paper
Label Embedding via Low-Coherence Matrices
2023cites this paper
Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining
2023cites this paper
A Survey on Extreme Multi-label Learning
2022cites this paper
Label Disentanglement in Partition-based Extreme Multilabel Classification
2021cites this paper
Softmax Tree: An Accurate, Fast Classifier When the Number of Classes Is Large
2021cites this paper
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
2021cites this paper
DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
2021cites this paper
Generalized Zero-Shot Extreme Multi-label Learning
2021cites this paper
Explainable k -Means Clustering: Theory and Practice ∗
2020cites this paper
The Tree Ensemble Layer: Differentiability meets Conditional Computation
2020cites this paper
ExKMC: Expanding Explainable k-Means Clustering
2020cites this paper
Probabilistic Label Trees for Extreme Multi-label Classification
2020influential citation
Multilabel reductions: what is my loss optimising?
2019cites this paper
Survey on Multi-Output Learning
2019cites this paper
A Survey on Multi-output Learning
2019cites this paper
A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification
2019cites this paper
Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space
2019cites this paper
On the computational complexity of the probabilistic label tree algorithms
2019cites this paper
A UNIFIED FRAMEWORK FOR QUANTILE ELICITATION WITH APPLICATIONS
2019cites this paper
LdSM: Logarithm-depth Streaming Multi-label Decision Trees
2019cites this paper
Extreme Multiclass Classification Criteria
2019cites this paper
Efficient Loss-Based Decoding On Graphs For Extreme Classification
2018cites this paper
Learning Representations of Text through Language and Discourse Modeling: From Characters to Sentences
2018cites this paper
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling
2018cites this paper
AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018cites this paper
A no-regret generalization of hierarchical softmax to extreme multi-label classification
2018influential citation
Structured Multi-Label Biomedical Text Tagging via Attentive Neural Tree Decoding
2018cites this paper
Unbiased scalable softmax optimization
2018cites this paper
Adversarial Extreme Multi-label Classification
2018cites this paper
Candidates v.s. Noises Estimation for Large Multi-Class Classification Problem
2017cites this paper
J OINTLY T RAINING T ASK -S PECIFIC E NCODERS AND D OWNSTREAM M ODELS ON H ETEROGENEOUS M UL - TIPLEX G RAPHS
year unknowncites this paper