Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations

Authors

  • Kun Huang Shanghai Jiao Tong University
  • Bingbing Ni Shanghai Jiao Tong University
  • Xiaokang Yang Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v33i01.33013854

Abstract

Quantization has shown stunning efficiency on deep neural network, especially for portable devices with limited resources. Most existing works uncritically extend weight quantization methods to activations. However, we take the view that best performance can be obtained by applying different quantization methods to weights and activations respectively. In this paper, we design a new activation function dubbed CReLU from the quantization perspective and further complement this design with appropriate initialization method and training procedure. Moreover, we develop a specific quantization strategy in which we formulate the forward and backward approximation of weights with binary values and quantize the activations to low bitwdth using linear or logarithmic quantizer. We show, for the first time, our final quantized model with binary weights and ultra low bitwidth activations outperforms the previous best models by large margins on ImageNet as well as achieving nearly a 10.85× theoretical speedup with ResNet-18. Furthermore, ablation experiments and theoretical analysis demonstrate the effectiveness and robustness of CReLU in comparison with other activation functions.

Downloads

Published

2019-07-17

How to Cite

Huang, K., Ni, B., & Yang, X. (2019). Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 3854-3861. https://doi.org/10.1609/aaai.v33i01.33013854

Issue

Section

AAAI Technical Track: Machine Learning