Action Recognition of Public Spaces Using Multi-Modal Model

Masahiro Okano

6:10 PM - 6:30 PM

[2C6-GS-7-03] Action Recognition of Public Spaces Using Multi-Modal Model

〇Masahiro Okano¹, Ryuto Yoshida¹, Junichiro Fujii¹, Shuji Takamori¹, Masazumi Amakata¹ (1. Yachiyo Engineering Co., Ltd.)

Keywords:Multi-Modal Model, VQA, Action Recognition

In promoting smart cities, there is a demand for the evaluation of the quantity and quality of activities in public spaces. Research on labor-saving measures through AI for assessing the quantity of activities is progressing, but research on labor-saving measures for quality assessment is just beginning. Traditional research on AI models for labor-saving qualitative evaluation of public spaces faced issues such as 1) high model creation costs, and 2) low model versatility, which did not lead to sufficient labor-saving. In response to this problem, this study proposes a method for recognizing actions in public spaces using a multimodal model. A multimodal model is one that integrates multiple data sources and has strengths such as 1) zero model creation cost, and 2) high model versatility. By quantitatively evaluating the performance of the multimodal model for qualitative evaluation using small-scale video data, this study demonstrates the potential for public labor-saving through multimodal models.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2C6-GS-7] Language media processing:

[2C6-GS-7-03] Action Recognition of Public Spaces Using Multi-Modal Model

Password