Learning Sparse Deep Feedforward Networks via Tree Skeleton Expansion

Published 2018 in arXiv.org

ABSTRACT

Despite the popularity of deep learning, structure learning for deep models remains a relatively under-explored area. In contrast, structure learning has been studied extensively for probabilistic graphical models (PGMs). In particular, an efficient algorithm has been developed for learning a class of tree-structured PGMs called hierarchical latent tree models (HLTMs), where there is a layer of observed variables at the bottom and multiple layers of latent variables on top. In this paper, we propose a simple method for learning the structures of feedforward neural networks (FNNs) based on HLTMs. The idea is to expand the connections in the tree skeletons from HLTMs and to use the resulting structures for FNNs. An important characteristic of FNN structures learned this way is that they are sparse. We present extensive empirical results to show that, compared with standard FNNs tuned-manually, sparse FNNs learned by our method achieve better or comparable classification performance with much fewer parameters. They are also more interpretable.

PUBLICATION RECORD

Publication year
2018
Venue
arXiv.org
Publication date
2018-03-16
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1803.06120
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Artificial Intelligence in Medicine: 23rd International Conference, AIME 2025, Pavia, Italy, June 23–26, 2025, Proceedings, Part I
2025influential reference
Structure Learning for Deep Neural Networks Based on Multiobjective Optimization
2018cited by this paper
Large-Scale Evolution of Image Classifiers
2017cited by this paper
Self-Normalizing Neural Networks
2017cited by this paper
Learning Structured Sparsity in Deep Neural Networks
2016cited by this paper
Pruning Filters for Efficient ConvNets
2016cited by this paper
DeepTox: Toxicity Prediction using Deep Learning
2016cited by this paper
Neural Architecture Search with Reinforcement Learning
2016cited by this paper
Latent tree models for hierarchical topic detection
2016cited by this paper
Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis
2016influential reference
Hierarchical Attention Networks for Document Classification
2016influential reference
Designing Neural Network Architectures using Reinforcement Learning
2016cited by this paper
Deep Learning
2016cited by this paper
Data-free Parameter Pruning for Deep Neural Networks
2015cited by this paper
Learning both Weights and Connections for Efficient Neural Network
2015influential reference
Progressive EM for Latent Tree Models and Hierarchical Topic Detection
2015cited by this paper
Character-level Convolutional Networks for Text Classification
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Hierarchical Latent Tree Analysis for Topic Detection
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014influential reference
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Improving neural networks by preventing co-adaptation of feature detectors
2012influential reference
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Deep Sparse Rectifier Neural Networks
2011cited by this paper
Strategies for training large scale neural network language models
2011cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
IEEE Workshop on automatic speech recognition and understanding
2009cited by this paper
Learning the Structure of Deep Sparse Graphical Models
2009cited by this paper
IEEE Transactions on Neural Networks
2008cited by this paper
Annals of Statistics
2006cited by this paper
Hierarchical latent class models for cluster analysis
2002cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
IEEE Transactions on Information Theory
1998cited by this paper
Constructive algorithms for structure learning in feedforward neural networks for regression problems
1997cited by this paper
Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks
1992cited by this paper
Probabilistic reasoning in intelligent systems: Networks of plausible inference
1991cited by this paper
Dynamic node creation in backpropagation networks
1989cited by this paper
Estimating the Dimension of a Model
1978cited by this paper
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
1977cited by this paper
Approximating discrete probability distributions with dependence trees
1968cited by this paper

CITED BY

GaterNet: Dynamic Filter Selection in Convolutional Neural Network via a Dedicated Global Gating Network
2018cites this paper
You Look Twice: GaterNet for Dynamic Filter Selection in CNNs
2018cites this paper