Keywords:Comic Engineering, Multi-modal analysis, Convolutional AutoEncoder
The research of understanding human creations such as comics, novels, and music by artificial intelligence (AI) has become an attractive research topic in AI fields. However, creating an interesting story or comic is still one of the difficult tasks because it requires lots of human creativity. In this study, we focus on that AI can understand comics or not by using four-scene comics because four-scene comics have a clear structure and format. Lots of studies using the image or natural language models have been proposed in such tasks. However, there are few studies using a combination of images and natural language features as multi-modal data. In this study, we proposed the method of combining images and languages to understand four-scene comics utilizing deep learning. The effectiveness of the proposed method is confirmed by computer simulations taking koma prediction problems of four-scene comics as examples.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.