The Japan Society of Applied Physics

[PS-2-01 (Late News)] Analysis of Inference Accuracy of Convolutional Neural Networks due to Quantization of Weights, Gradients, Input and Output Signals Stemming from Computation-in-Memory

Adil Padiyal1, Ayumu Yamada1, Naoko Misawa1, Chihiro Matsui1, Ken Takeuchi1 (1.The University of Tokyo (Japan))

https://doi.org/10.7567/SSDM.2023.PS-2-01

This paper proposes a computation-in-memory (CiM) based partially quantized learning and inference scheme for Neural Networks. The inference accuracy is reported according to the nature and location of the quantization of input/output data due to CiM in the neural network. The results suggest the constraints for quantization bit accuracy for weights, input/output data, and gradients following inference accuracy degradation of around 2.8% for conventional training algorithms with quantized backpropagation gradients and inference performed on CNNs while improving memory footprint by 62% and 93% during training and inference phase respectively.