JSAI2023

Presentation information

Organized Session

Organized Session » OS-24

[3G5-OS-24b] 日常生活知識とAI

Thu. Jun 8, 2023 3:30 PM - 5:10 PM Room G (A4)

オーガナイザ:福田 賢一郎、江上 周作、宮田 なつき、Qiu Yue、鵜飼 孝典、古崎 晃司、川村 隆浩、市瀬 龍太郎、岡田 慧

4:30 PM - 4:50 PM

[3G5-OS-24b-04] Referring Expression Segmentation With Large-Scale Visual Language Model and Diffusion Probabilistic Model in Household Tasks

〇Yui Iioka1, Yu Yoshida1, Yuiga Wada1, Syumpei Hatanaka1, Komei Sugiura1 (1. Keio University)

Keywords:Referring Expression Segmentation, Diffusion Probabilistic Model, Natural Language Processing, Image Processing, Object Manipulation

In this paper, we propose the Multimodal Diffusion Segmentation Model (MDSM), which generates a mask in the first stage and refines it in the second stage. We introduce a crossmodal parallel feature extraction mechanism and extend diffusion probabilistic models to handle crossmodal features. Our proposed MDSM surpasses that of the baseline method by a large margin of +10.13 mean IoU.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password