Tunable Sensitivity to Large Errors in Neural Network Training

Published 2016 in AAAI Conference on Artificial Intelligence

ABSTRACT

When humans learn a new concept, they might ignore examples that they cannot make sense of at first, and only later focus on such examples, when they are more useful for learning. We propose incorporating this idea of tunable sensitivity for hard examples in neural network learning, using a new generalization of the cross-entropy gradient step, which can be used in place of the gradient in any gradient-based training method. The generalized gradient is parameterized by a value that controls the sensitivity of the training process to harder training examples. We tested our method on several benchmark datasets. We propose, and corroborate in our experiments, that the optimal level of sensitivity to hard example is positively correlated with the depth of the network. Moreover, the test prediction error obtained by our method is generally lower than that of the vanilla cross-entropy gradient learner. We therefore conclude that tunable sensitivity can be helpful for neural network learning.

PUBLICATION RECORD

Publication year
2016
Venue
AAAI Conference on Artificial Intelligence
Publication date
2016-11-23
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1609/aaai.v31i1.10807 arXiv 1611.07743
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Building Machines that Learn and Think Like People
2018cited by this paper
Convolutional RNN: An enhanced model for extracting features from sequential data
2016cited by this paper
Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language
2016cited by this paper
Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
2015cited by this paper
Task Loss Estimation for Sequence Prediction
2015cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Learning to Execute
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Journal of Advanced Computational Intelligence and Intelligent Informatics
2014cited by this paper
On the importance of initialization and momentum in deep learning
2013cited by this paper
Cross-entropy vs. squared error training: a theoretical and experimental comparison
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Reading Digits in Natural Images with Unsupervised Feature Learning
2011cited by this paper
Improving classification accuracy by identifying and removing instances that should be misclassified
2011cited by this paper
Self-Paced Learning for Latent Variable Models
2010cited by this paper
Data Cleaning for Classification Using Misclassification Analysis
2010cited by this paper
Understanding the difficulty of training deep feedforward neural networks
2010cited by this paper
Curriculum learning
2009cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
New developments of the Z-EDM algorithm
2006cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Introduction to multi-layer feed-forward neural networks
1997cited by this paper
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
1989cited by this paper
Connectionist Learning Procedures
1989cited by this paper
Accelerated Learning in Layered Neural Networks
1988cited by this paper
Learning representations by back-propagating errors
1986cited by this paper
Some methods of speeding up the convergence of iteration methods
1964cited by this paper

CITED BY

A Walkthrough for the Principle of Logit Separation
2019cites this paper
Analysis of loss functions for fast single-class classification
2019cites this paper
Uncertainty in emotion recognition
2019cites this paper
Scaling Speech Enhancement in Unseen Environments with Noise Embeddings
2018cites this paper
Deep learning for multisensorial and multimodal interaction
2018cites this paper
Calibrated Prediction Intervals for Neural Network Regressors
2018cites this paper
Weakly Supervised One-Shot Detection with Attention Siamese Networks
2018cites this paper
Weakly Supervised One-Shot Detection with Attention Similarity Networks
2018cites this paper
Fast Single-Class Classification and the Principle of Logit Separation
2017cites this paper
Fast Single-Class Classiﬁcation and the Principle of Logit Separation
year unknowncites this paper