4:30 PM - 4:50 PM
[3G5-OS-24b-04] Referring Expression Segmentation With Large-Scale Visual Language Model and Diffusion Probabilistic Model in Household Tasks
Keywords:Referring Expression Segmentation, Diffusion Probabilistic Model, Natural Language Processing, Image Processing, Object Manipulation
In this paper, we propose the Multimodal Diffusion Segmentation Model (MDSM), which generates a mask in the first stage and refines it in the second stage. We introduce a crossmodal parallel feature extraction mechanism and extend diffusion probabilistic models to handle crossmodal features. Our proposed MDSM surpasses that of the baseline method by a large margin of +10.13 mean IoU.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.