JSAI2022

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[1O4-GS-7] Vision, speech media processing: detection / data set creation

Tue. Jun 14, 2022 2:20 PM - 4:00 PM Room O (Room 510)

座長:石原 賢太(NEC)[遠隔]

2:20 PM - 2:40 PM

[1O4-GS-7-01] Object detection-based card detection for OCR applications

〇zhen zhao1, yoshiki hasioka1 (1. AI inside Inc.)

Keywords:OCR, Image processing, Object detection, Position correction

Extracting information from images of cards such as driver’s licenses or credit cards is a computer vision task with widespread needs. In many cases, images taken with a smartphone are taken from an arbitrary position and angle specified by people. To recognize the card’s text with OCR, it is necessary first to localize the card within the image, transform it to a rectangle, and then rotate it to the correct orientation.
Deep learning-based methods are able to perform these localization and rotation tasks with high accuracy. However, handling the two tasks with two separate models results in increased processing times. In this work, we propose a solution to this problem which uses a single object detection model to perform both the localization and rotation tasks, thereby allowing cards to be processed quickly without sacrificing accuracy.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password