The Japan Society of Applied Physics

4:00 PM - 4:15 PM

[K-8-03] Compact Edge Vision Transformer with 86% Non-volatile Memory Bit Reduction by Percentile Clipping, Per Layer Quantization, and Quantization Aware Training

Ryuhei Yamaguchi1, Ayumu Yamada1, Naoko Misawa1, Chihiro Matsui1, Ken Takeuchi1 (1. The University of Tokyo (Japan))

https://doi.org/10.7567/SSDM.2023.K-8-03

This paper proposes a compact edge Vision Transformer (ViT) in which the matrices are mapped to non-volatile memory (NVM)-based Computation-in-Memories (CiMs). ViT has attracted much attention for its high inference accuracy. However, the memory size of ViT is too large for edge applications. ViT requires fine-tuning of pre-trained models on large datasets to achieve high inference accuracy. Thus, to map ViT to CiM compactly with high inference accuracy, 3 methods, bit precision aware percentile clipping, per layer quantization, and quantization aware training are proposed. As a result, the memory size of proposed ViT is successfully reduced by 85.8% with 95.4% inference accuracy in CIFAR 10, enabling implementation at edge applications.