JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I1-GS-7] Language media processing:

Fri. May 31, 2024 9:00 AM - 10:40 AM Room I (Room 41)

座長:石川 開(日本電気株式会社)[[オンライン]]

9:00 AM - 9:20 AM

[4I1-GS-7-01] Comparison of monocular relative depth judgements of humans and DNN models based on accuracy and error consistency

〇Yuki Kubota1, Taiki Fukiage1 (1. NTT Communication Science Laboratory)

Keywords:monocular depth estimation, error consistency, depth perception

Monocular depth estimation techniques have seen significant improvements in accuracy, paralleling the evolution of deep learning. While the performance of these deep models is often evaluated based on their alignment with human perception, depth estimation models have seldom been subjected to such comparative evaluations. In this paper, we compare human and model judgements to monocular depth estimation regarding accuracy and error consistency. As a result, 27 of 34 models have higher accuracy (closer to the ground truth) than humans (0.708, 95%CI: [0.702, 0.713]). However, error consistencies were low for all models relative to their counterparts across humans (0.447, 95%CI: [0.427, 0.465]). The results suggest that strategies to improve error consistency with human judgements include using multiple datasets and avoiding direct training on the dataset that is i.i.d. with the test images.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password