Weightless: Lossy Weight Encoding For Deep Neural Network Compression

Brandon Reagen,Udit Gupta,Bob Adolf,M. Mitzenmacher,Alexander M. Rush,Gu-Yeon Wei,D. Brooks

Published 2017 in International Conference on Machine Learning

ABSTRACT

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding co-designed with weight simplification techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, named Weightless, can compress weights by up to 496x without loss of model accuracy. This results in up to a 1.51x improvement over the state-of-the-art.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Machine Learning
Publication date
2017-11-13
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1711.04686
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Recurrent Neural Networks
2017influential reference
Soft Weight-Sharing for Neural Network Compression
2017influential reference
Training Compressed Fully-Connected Networks with a Density-Diversity Penalty
2016cited by this paper
Towards the Limit of Network Quantization
2016influential reference
Dynamic Network Surgery for Efficient DNNs
2016influential reference
Structured Transforms for Small-Footprint Deep Learning
2015cited by this paper
Compressing Neural Networks with the Hashing Trick
2015cited by this paper
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
2015influential reference
Deep Learning with Limited Numerical Precision
2015cited by this paper
Distilling the Knowledge in a Neural Network
2015cited by this paper
Compressing Deep Convolutional Networks using Vector Quantization
2014cited by this paper
ImageNet Large Scale Visual Recognition Challenge
2014cited by this paper
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014influential reference
Predicting Parameters in Deep Learning
2013cited by this paper
Bloomier Filters: A second look
2008cited by this paper
Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives
2006cited by this paper
Model compression
2006cited by this paper
The mnist database of handwritten digits
2005cited by this paper
The Bloomier filter: an efficient data structure for static support lookup tables
2004influential reference
Information Theory, Inference, and Learning Algorithms
2004cited by this paper
Network Applications of Bloom Filters: A Survey
2004cited by this paper
Compressed bloom filters
2001cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Second Order Derivatives for Network Pruning: Optimal Brain Surgeon
1992cited by this paper
Optimal Brain Damage
1989cited by this paper
Space/time trade-offs in hash coding with allowable errors
1970cited by this paper

CITED BY

DiffAxE: Diffusion-driven Hardware Accelerator Generation and Design Space Exploration
2025cites this paper
OrganoSense: Neural Biosignal Processing in the Sensor Using Organic Devices
2025cites this paper
A lightweight intelligent compression method for fast Sea Level Anomaly data transmission
2025cites this paper
Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link
2024cites this paper
Research and Development of Weight Compression Based on NVDLA
2024cites this paper
QSFL: Two-Level Communication-Efficient Federated Learning on Mobile Edge Devices
2024cites this paper
MCNC: Manifold-Constrained Reparameterization for Neural Compression
2024cites this paper
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving
2024cites this paper
Adaptively Hashing 3DLUTs for Lightweight Real-time Image Enhancement
2023cites this paper
Scaling Factor and Shift Factor Based Neural Network Pruning
2023cites this paper
Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference
2022cites this paper
PRANC: Pseudo RAndom Networks for Compacting deep models
2022cites this paper
Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization
2022cites this paper
QSFL: A Two-Level Uplink Communication Optimization Framework for Federated Learning
2022cites this paper
Scenario Based Run-Time Switching for Adaptive CNN-Based Applications at the Edge
2022cites this paper
Arithmetic Coding-Based 5-bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices
2021cites this paper
Neural Network Compression for Noisy Storage Devices
2021cites this paper
Compacting Deep Neural Networks for Internet of Things: Methods and Applications
2021cites this paper
Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design
2021cites this paper
Implementation of Probabilistic Data Structures in the Processes of Neuroevolutionary Synthesis
2021cites this paper
Parallelizing DNN Training on GPUs: Challenges and Opportunities
2021cites this paper
Circa: Stochastic ReLUs for Private Deep Learning
2021cites this paper
Synchronous Weight Quantization-Compression for Low-Bit Quantized Neural Network
2021cites this paper
Sequencing for Encoding in Neuroevolutionary Synthesis of Neural Network Models for Medical Diagnosis
2020cites this paper
P-DNN: An Effective Intrusion Detection Method based on Pruning Deep Neural Network
2020cites this paper
Towards Modality Transferable Visual Information Representation with Optimal Model Compression
2020cites this paper
T-Basis: a Compact Representation for Neural Networks
2020cites this paper
Weight Compression-Friendly Binarized Neural Network
2020cites this paper
Designing resource-constrained neural networks using neural architecture search targeting embedded devices
2020cites this paper
Instant Quantization of Neural Networks using Monte Carlo Methods
2019cites this paper
MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks
2019cites this paper
Toward Knowledge as a Service Over Networks: A Deep Learning Model Communication Paradigm
2019cites this paper
Structured Multi-Hashing for Model Compression
2019cites this paper
MASR: A Modular Accelerator for Sparse RNNs
2019cites this paper
DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression
2019influential citation
Bringing Giant Neural Networks Down to Earth with Unlabeled Data
2019cites this paper
Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
2018influential citation
Compressing Deep Neural Networks with Probabilistic Data Structures
2018cites this paper
From Data to Knowledge: Deep Learning Model Compression, Transmission and Communication
2018cites this paper
Accelerating Convolutional Neural Networks via Activation Map Compression
2018cites this paper
System-level design for efficient execution of CNNs at the edge
year unknowncites this paper