Compact Edge Vision Transformer with 86% Non-volatile Memory Bit Reduction by Percentile Clipping, Per Layer Quantization, and Quantization Aware Training

Ryuhei Yamaguchi

doi:https://doi.org/10.7567/SSDM.2023.K-8-03

4:00 PM - 4:15 PM

[K-8-03] Compact Edge Vision Transformer with 86% Non-volatile Memory Bit Reduction by Percentile Clipping, Per Layer Quantization, and Quantization Aware Training

^○Ryuhei Yamaguchi¹, Ayumu Yamada¹, Naoko Misawa¹, Chihiro Matsui¹, Ken Takeuchi¹ (1. The University of Tokyo (Japan))

https://doi.org/10.7567/SSDM.2023.K-8-03

This paper proposes a compact edge Vision Transformer (ViT) in which the matrices are mapped to non-volatile memory (NVM)-based Computation-in-Memories (CiMs). ViT has attracted much attention for its high inference accuracy. However, the memory size of ViT is too large for edge applications. ViT requires fine-tuning of pre-trained models on large datasets to achieve high inference accuracy. Thus, to map ViT to CiM compactly with high inference accuracy, 3 methods, bit precision aware percentile clipping, per layer quantization, and quantization aware training are proposed. As a result, the memory size of proposed ViT is successfully reduced by 85.8% with 95.4% inference accuracy in CIFAR 10, enabling implementation at edge applications.

2023 International Conference on Solid State Devices and Materials

[K-8-03] Compact Edge Vision Transformer with 86% Non-volatile Memory Bit Reduction by Percentile Clipping, Per Layer Quantization, and Quantization Aware Training