A method for improving the accuracy of multi-domain adaptive vision language model using prompt learning

ZHENYU GAO

2:20 PM - 2:40 PM

[1B3-GS-2-05] A method for improving the accuracy of multi-domain adaptive vision language model using prompt learning

〇ZHENYU GAO¹, AYAKO YAMAGIWA¹, MASAYUKI GOTO¹ (1. WASEDA University)

Keywords:Prompt learning, Domain adaption, Video language model

Methods for analyzing image data associated with linguistic information have garnered recent attention but encounter challenges due to varying data quantities across different image domains. In response, LADS was proposed, a model trainable without relying on image data from domains with limited samples, utilizing the embedding space between images and text in image language models. While LADS often employs simple domain description text, adequate text can improve model performance. To tackle this issue, we introduce CoOp, a method that optimizes the domain text in CLIP to enhance accuracy. CoOp achieves this by learning prompts, improving vision language models, and elevating CLIP accuracy. We expect the resulting prompts to represent diverse domains within LADS effectively. Finally, we validate the efficacy of our proposed method by applying it to actual data, demonstrating its ability to address imbalanced data quantities across various image domains.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1B3-GS-2] Machine learning: Generative model

[1B3-GS-2-05] A method for improving the accuracy of multi-domain adaptive vision language model using prompt learning

Password