Hardware-Oriented Compression of Long Short-Term Memory for Efficient Inference

Zhisheng Wang,Jun Lin,Zhongfeng Wang

Published 2018 in IEEE Signal Processing Letters

ABSTRACT

Long short-term memory (LSTM) and its variants have been widely adopted in processing sequential data. However, the intrinsic large memory requirement and high computational complexity make it hard to be employed in embedded systems. This incurs the need of model compression and dedicated hardware accelerator for LSTM. In this letter, efficient clipped gating and top-<inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> pruning schemes are introduced to convert the dense matrix computations in LSTM into structured sparse-matrix-sparse-vector multiplications. Then, mixed quantization schemes are developed to eliminate most of the multiplications in LSTM. The proposed compression scheme is well suited for efficient hardware implementations. Experimental results show that the model size and the number of matrix operations can be reduced by <inline-formula><tex-math notation="LaTeX">$32\times$ </tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$18.5\times$</tex-math></inline-formula>, respectively, at a cost of less than <inline-formula><tex-math notation="LaTeX">$1\%$</tex-math></inline-formula> accuracy loss on a word-level language modeling task.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-28 of 28 references · Page 1 of 1