Keywords:Object Detection, Edge AI, Speeding up Inference, Quantization, TensorFlowLite
In recent years, the demand for edge AI has been expanding from the viewpoint of real-time performance and data confidentiality. We use the QAT (Quantization Aware Training) method of TensorFlow and TensorFlow Lite to realize speed up and memory saving in edge AI. In the current situation where new AI models are being devised one after another, it is unlikely that the QAT will support all operations. Therefore, depending on the AI model used, there is a problem that speed and accuracy will decrease due to the inclusion of unsupported operations. In this paper, we will take YOLOv3-tiny, an object detection model in which such a problem occurs, as an example to propose methods for improving speed and accuracy. We were able to half the inference time on the Raspberry Pi 3 Model B+ and improve the inference accuracy to the same level as before quantization.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.