JSAI2024

Presentation information

Poster Session

Poster session » Poster session

[4Xin2] Poster session 2

Fri. May 31, 2024 12:00 PM - 1:40 PM Room X (Event hall 1)

[4Xin2-27] Image Captioning into Left and Right Positional Relationships

〇hibiki Moriya1, Junji Yamato1 (1.KogakuinUniversity)

Keywords:Image Captioning

Image caption generation is a technology that automatically generates sentences describing the content of an image. It is expected that generating captions will lead to a detailed understanding of the image. However, captions generated typically do not include the spatial relationship of objects within the image. In this study, we generate captions that include the left-right spatial relationship between two objects (such as people, animals, vehicles, etc.) appearing in the image. Training datasets used in the image caption generation task generally do not contain spatial relationships. Therefore, we created captions that add spatial relationships to the existing training datasets and used them for training. We employed the Vision and Language model, GIT, for the training. We conducted a caption generation test using images featuring two objects. The results confirmed that the generated captions include the left-right spatial relationship of the objects. By using the dataset created for this study, it is possible to increase the amount of information in the captions, which we believe leads to a more detailed understanding of the image.

Please log in with your participant account.
» Participant Log In