4:00 PM - 4:15 PM
[K-8-03] Compact Edge Vision Transformer with 86% Non-volatile Memory Bit Reduction by Percentile Clipping, Per Layer Quantization, and Quantization Aware Training
This paper proposes a compact edge Vision Transformer (ViT) in which the matrices are mapped to non-volatile memory (NVM)-based Computation-in-Memories (CiMs). ViT has attracted much attention for its high inference accuracy. However, the memory size of ViT is too large for edge applications. ViT requires fine-tuning of pre-trained models on large datasets to achieve high inference accuracy. Thus, to map ViT to CiM compactly with high inference accuracy, 3 methods, bit precision aware percentile clipping, per layer quantization, and quantization aware training are proposed. As a result, the memory size of proposed ViT is successfully reduced by 85.8% with 95.4% inference accuracy in CIFAR 10, enabling implementation at edge applications.
