JSAI2025

Presentation information

Poster Session

Poster session » Poster Session

[3Win5] Poster session 3

Thu. May 29, 2025 3:30 PM - 5:30 PM Room W (Event hall D-E)

[3Win5-38] Word-Level Sign Language Recognition with Video Vision Transformer using Transfer Learning

〇Kei Ito1,2, Yimeng Sun1, Takao Nakaguchi1, Masaharu Imai1 (1.The Kyoto College of Graduate Studies for Informatics, 2.Panasonic Information Systems Co., Ltd.)

Keywords:Sign Language, Machine Translation, Transfer Learning, Video Vision Transformer, ViViT

Real-time communication between individuals with hearing impairments and hearing individuals who have not mastered sign language remains challenging. Machine translation of sign language is essential for promoting social inclusion for people with hearing impairments. Since the introduction of Convolutional Neural Networks (CNNs), the accuracy of sign language translation has improved significantly. However, alternative approaches leveraging Transformer models are also being explored. The Video Vision Transformer, an extension of the Transformer model designed for video recognition, allows for the direct input of video data. However, to improve accuracy, preprocessing of input data is required.In this study, we fine-tuned a Video Vision Transformer pretrained on the Kinetics-400 video dataset and evaluated its performance in word-level sign language recognition using two widely recognized sign language datasets (LSA64 and WLASL100). As a result, we achieved accuracy comparable to previous studies without the need for data preprocessing.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password