Referring Expression Segmentation With Large-Scale Visual Language Model and Diffusion Probabilistic Model in Household Tasks

Yui Iioka

4:30 PM - 4:50 PM

[3G5-OS-24b-04] Referring Expression Segmentation With Large-Scale Visual Language Model and Diffusion Probabilistic Model in Household Tasks

〇Yui Iioka¹, Yu Yoshida¹, Yuiga Wada¹, Syumpei Hatanaka¹, Komei Sugiura¹ (1. Keio University)

Keywords:Referring Expression Segmentation, Diffusion Probabilistic Model, Natural Language Processing, Image Processing, Object Manipulation

In this paper, we propose the Multimodal Diffusion Segmentation Model (MDSM), which generates a mask in the first stage and refines it in the second stage. We introduce a crossmodal parallel feature extraction mechanism and extend diffusion probabilistic models to handle crossmodal features. Our proposed MDSM surpasses that of the baseline method by a large margin of +10.13 mean IoU.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3G5-OS-24b] 日常生活知識とAI

[3G5-OS-24b-04] Referring Expression Segmentation With Large-Scale Visual Language Model and Diffusion Probabilistic Model in Household Tasks

Password