JSAI2023

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[1O4-GS-7] Vision, speech media processing

Tue. Jun 6, 2023 3:00 PM - 4:40 PM Room O (E1+E2)

座長:渡辺 友樹(東芝) [現地]

3:40 PM - 4:00 PM

[1O4-GS-7-03] Predicate Classification Using Optimal Transport Loss in Scene Graph Generation

〇Sorachi Kurita1, Satoshi Oyama1, Itsuki Noda1 (1. Hokkaido University)

Keywords:Scene Graph, Optimal Transport, Computer Vision, Image Recognition, Deep Learning

We propose a method to generate scene graphs using optimal transport loss as a measure to compare two probability distributions. In scene graph generation, learning with cross-entropy loss leads to biased predictions because the distribution of predicate labels in the dataset has severe imbalance. We apply learning with the optimal transport loss, which easily reflects similarity between labels as transportation cost, to the predicate classification in scene graph generation. In the proposed method, the transportation cost of the optimal transport is defined using the similarity of words obtained from the pre-trained model. The experimental evaluation of the effectiveness shows that the method achieves better performance than existing models in terms of mean Recall@50 and mean Recall@100. Furthermore, it can improve recall of predicate labels that are scarce in the dataset.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password