Person Image Extraction from Web Pages with Text Classification and Face-similarity-based Distant Supervision

Yuya Nakano; Kazuki Ashihara; Jun Ikeda; Kentaro Nishi; Takahiro Nagai

[4Xin1-46] Person Image Extraction from Web Pages with Text Classification and Face-similarity-based Distant Supervision

〇Yuya Nakano¹, Kazuki Ashihara¹, Jun Ikeda¹, Kentaro Nishi¹, Takahiro Nagai¹ (1.Yahoo Japan Corporation)

Keywords:Information Extraction, Distant Supervision, Text Classification, Face Image Recognition

Direct answers that show information such as images and summary texts of persons are widely recognized by users in search services. The images extracted from web contents are linked to persons by matching the image caption with the person's name. However, when a person's name partially matches the caption of the image, there may be semantic ambiguity in the caption, which makes it difficult to determine if the person in the image is the intended target. In this study, we created a classification model based on a language model to determine if a partial-match image caption represents the intended person. We also proposed a method for creating a dataset by comparing facial images of persons. This method vectorizes the facial region of persons and automates annotation based on the similarity between the correct and candidate images. The proposed method outperforms an existing rule-based filtering method in terms of matching performance.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4Xin1] Poster session 2

[4Xin1-46] Person Image Extraction from Web Pages with Text Classification and Face-similarity-based Distant Supervision

Password