JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2C6-GS-7] Language media processing:

Wed. May 29, 2024 5:30 PM - 7:10 PM Room C (Temporary room 1)

座長:寺下直行(株式会社日立製作所)

6:30 PM - 6:50 PM

[2C6-GS-7-04] Person-ReID: What is the Deep Learning Model looking at?

〇Dung Anh Dau1, Yasuhiro Nakamura1, Hiroshi Satou1 (1. National Defense Academy of Japan)

Keywords:Person Re-Identification, Grad-CAM, Vision Transformer, CNN

Person Re-identification (Re-ID) is a crucial element within automatic visual surveillance systems, with the aim of automatically identifying and locating individuals in a multicamera network. Because the appearance of pedestrians varies significantly between different cameras in this task, a number of models have been proposed that achieve high accuracy on existing benchmark datasets; however, they are still far from being applicable to real-world environments. Addressing this issue requires gaining insight into the behavior of the black box in the deep-learning model. In this study, we trained CNN and Vision Transformer models on the DukeMTMC-ReID and performed cross-domain evaluation on the Market1501 and CUHK03 the trained models. The results revealed that the Vision Transformer outperformed CNN in terms of accuracy. To demonstrate the stability of the Vision Transformer model, we employed Grad-CAM for visualization. This visualization confirmed the superior stability of the Vision Transformer model, as it focused on the specific features of the person and their interrelationships, avoiding distractions from the background.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password