Training Restricted Boltzmann Machines on Word Observations

George E. Dahl,Ryan P. Adams,H. Larochelle

Published 2012 in International Conference on Machine Learning

ABSTRACT

The restricted Boltzmann machine (RBM) is a flexible model for complex data. However, using RBMs for high-dimensional multinomial observations poses significant computational difficulties. In natural language processing applications, words are naturally modeled by K-ary discrete distributions, where K is determined by the vocabulary size and can easily be in the hundred thousands. The conventional approach to training RBMs on word observations is limited because it requires sampling the states of K-way softmax visible units during block Gibbs updates, an operation that takes time linear in K. In this work, we address this issue with a more general class of Markov chain Monte Carlo operators on the visible units, yielding updates with computational complexity independent of K. We demonstrate the success of our approach by training RBMs on hundreds of millions of word n-grams using larger vocabularies than previously feasible with RBMs and by using the learned features to improve performance on chunking and sentiment classification tasks, achieving state-of-the-art results on the latter.

PUBLICATION RECORD

Publication year
2012
Venue
International Conference on Machine Learning
Publication date
2012-02-25
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1202.5695
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning Word Vectors for Sentiment Analysis
2011influential reference
Natural Language Processing (Almost) from Scratch
2011influential reference
Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images
2010cited by this paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
2010cited by this paper
Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit
2010cited by this paper
Deep Boltzmann Machines
2009influential reference
Replicated Softmax: an Undirected Topic Model
2009influential reference
Visualizing Data using t-SNE
2008cited by this paper
A unified architecture for natural language processing: deep neural networks with multitask learning
2008influential reference
Training restricted Boltzmann machines using approximations to the likelihood gradient
2008cited by this paper
A Scalable Hierarchical Distributed Language Model
2008influential reference
Three new graphical models for statistical language modelling
2007influential reference
Restricted Boltzmann machines for collaborative filtering
2007cited by this paper
Greedy Layer-Wise Training of Deep Networks
2006cited by this paper
Hierarchical Probabilistic Neural Network Language Model
2005cited by this paper
Exponential Family Harmoniums with an Application to Information Retrieval
2004cited by this paper
A Neural Probabilistic Language Model
2003influential reference
Quick Training of Probabilistic Neural Nets by Importance Sampling
2003cited by this paper
Training Products of Experts by Minimizing Contrastive Divergence
2002cited by this paper
Unsupervised learning of distributions on binary vectors using two layer networks
1991cited by this paper
Parametric Inference for imperfectly observed Gibbsian fields
1989cited by this paper
Information processing in dynamical systems: foundations of harmony theory
1986cited by this paper
On the Alias Method for Generating Random Variables From a Discrete Distribution
1979cited by this paper

CITED BY

Creating a Semantic Knowledge Base and Corpus for Emotion Recognition in Kazakh-Language Texts: Methodologies, Tools, and Technologies
2024cites this paper
A Structural and Semantic Evaluation Method for Social Media Datasets Based on PageRank Algorithm with Cluster Analysis
2023cites this paper
State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions
2022cites this paper
A label-oriented loss function for learning sentence representations
2021cites this paper
A Deep Method Renaming Prediction and Refinement Approach for Java Projects
2021cites this paper
Principal Word Vectors
2020cites this paper
Accelerate Training of Restricted Boltzmann Machines via Iterative Conditional Maximum Likelihood Estimation.
2019cites this paper
Text classification with deep neural networks
2019cites this paper
Project Reports
2019cites this paper
Text Segmentation based on One Hot and Word Vector Representation
2019cites this paper
Learning to Spot and Refactor Inconsistent Method Names
2019cites this paper
Predicting outcomes in crowdfunding campaigns with textual, visual, and linguistic signals
2019cites this paper
State-Regularized Recurrent Neural Networks
2019cites this paper
Large-scale Collaborative Filtering with Product Embeddings
2019cites this paper
Hierarchical Tree Long Short-Term Memory for Sentence Representations
2018cites this paper
Improved Classification Based on Deep Belief Networks
2018cites this paper
Integrating Visual and Textual Affective Descriptors for Sentiment Analysis of Social Media Posts
2018cites this paper
Convolutional Recurrent Deep Learning Model for Sentence Classification
2018cites this paper
An overview on Restricted Boltzmann Machines
2018cites this paper
Principal Word Vectors
2018cites this paper
Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks
2018cites this paper
Stock price prediction through sentiment analysis of corporate disclosures using distributed representation
2018cites this paper
Linguistic Information in Word Embeddings
2018cites this paper
Efficient Learning of Data Distribution using Simultaneous Recurrent Belief Network
2018cites this paper
Concept-Based Embeddings for Natural Language Processing
2018cites this paper
RRA: Recurrent Residual Attention for Sequence Learning
2017influential citation
Pre-training the deep generative models with adaptive hyperparameter optimization
2017cites this paper
Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis
2017cites this paper
Contextual Explanation Networks
2017cites this paper
Neural Bag-of-Ngrams
2017cites this paper
LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine
2017cites this paper
Vector representation of non-standard spellings using dynamic time warping and a denoising autoencoder
2017cites this paper
A P-LSTM Neural Network for Sentiment Classification
2017cites this paper
ASR Hypothesis Reranking Using Prior-Informed Restricted Boltzmann Machine
2017cites this paper
A Likelihood Gradient Free Algorithm for Fast Training of Restricted Boltzmann Machines
2017cites this paper
THE SPATIAL INDUCTIVE BIAS OF DEEP LEARNING
2017cites this paper
Graph-induced restricted Boltzmann machines for document modeling
2016cites this paper
LCD: A Fast Contrastive Divergence Based Training Algorithm for Restricted Boltzmann Machine
2016cites this paper
Learning Sentimental Weights of Mixed-gram Terms for Classification and Visualization
2016cites this paper
Fast classifier learning under bounded computational resources using Partitioned Restricted Boltzmann Machines
2016cites this paper
A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text
2016cites this paper
Assessing diffusion of spatial features in Deep Belief Networks
2016cites this paper
Efficient machine learning using partitioned restricted Boltzmann machines
2016cites this paper
Multi-scale Sentiment Classification Using Canonical Correlation Analysis on Riemannian Manifolds
2016cites this paper
Neural probabilistic models for melody prediction, sequence labelling and classification
2016cites this paper
Advances in scaling deep learning algorithms
2016cites this paper
Training neural word embeddings for transfer learning and translation
2016cites this paper
Learning Invariant Features Using Subspace Restricted Boltzmann Machine
2016cites this paper
An Empirical Study of Skip-Gram Features and Regularization for Learning on Sentiment Analysis
2016cites this paper
Hyperparameters Adaptation for Restricted Boltzmann Machines Based on Free Energy
2016cites this paper
Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
2016cites this paper
How Deep is Your Bag of Words?
2015cites this paper
Modeling inter-node acoustic dependencies with Restricted Boltzmann Machine for distributed microphone array based BSS
2015cites this paper
MultiSpot: Spotting Sentiments with Semantic Aware Multilevel Cascaded Analysis
2015cites this paper
Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews
2015influential citation
Learning Higher-Level Features with Convolutional Restricted Boltzmann Machines for Sentiment Analysis
2015cites this paper
Sequential Convolutional Architectures for Multi-Sentence Text Classification CS 224 N-Final Project Report
2015cites this paper
Deep Learning Approaches to Problems in Speech Recognition, Computational Chemistry, and Natural Language Text Processing
2015influential citation
A Generative Model for Multi-Dialect Representation
2015influential citation
Deep learning using partitioned data vectors
2015cites this paper
An Infinite Restricted Boltzmann Machine
2015cites this paper
Gaussian discrete restricted Boltzmann machine : theory and its applications : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Electronics and Computer Engineering at Massey University, Albany, New Zealand
2015influential citation
Reinforcing the Topic of Embeddings with Theta Pure Dependence for Text Classification
2015cites this paper
Efficient Learning for Undirected Topic Models
2015cites this paper
Bayesian Optimization of Text Representations
2015cites this paper
Deep Unordered Composition Rivals Syntactic Methods for Text Classification
2015cites this paper
Trust inference in online social networks
2015cites this paper
Learning dynamic Boltzmann machines with spike-timing dependent plasticity
2015cites this paper
Distributed Representations of Sentences and Documents
2014influential citation
Training Restricted Boltzmann Machines with Overlapping Partitions
2014cites this paper
ATOMIC ENERGY MODELS FOR MACHINE LEARNING: ATOMIC
2014cites this paper
Combining techniques from different NN-based language models for machine translation
2014influential citation
Image Super-Resolution and Low Light Image Enhancement via Sparse Representations
2014cites this paper
Deep learning with application to hashing
2014cites this paper
ATOMIC ENERGY MODELS FOR MACHINE LEARNING: ATOMIC RESTRICTED BOLTZMANN MACHINES by
2014cites this paper
Machine Learning and Knowledge Discovery in Databases
2014influential citation
Deep Model for Classification of Hyperspectral image using Restricted Boltzmann Machine
2014cites this paper
Matching Images to Texts
2014cites this paper
Sentiment Classification on Polarity Reviews: An Empirical Study Using Rating-based Features
2014influential citation
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
2014cites this paper
Deep Multi-Instance Transfer Learning
2014cites this paper
Learning Harmonium Models With Infinite Latent Features
2014cites this paper
Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach
2014cites this paper
Geocoding location expressions in Twitter messages: A preference learning method
2014cites this paper
fMathematik in den Naturwissenschaften Leipzig
2014cites this paper
Utilizing deep learning for content-based community detection
2014cites this paper
Tagging The Web: Building A Robust Web Tagger with Neural Network
2014influential citation
Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network
2014cites this paper
Discrete restricted Boltzmann machines
2013cites this paper
A multimodal framework for unsupervised feature fusion
2013cites this paper
Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
2013cites this paper
Language Acquisition as Statistical Inference
2013cites this paper
Learning Global-to-Local Discrete Components with Nonparametric Bayesian Feature Construction
2013cites this paper
A Neural Autoregressive Topic Model
2012cites this paper
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
2012cites this paper