JSAI2022

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4C1-GS-7] Vision, speech media processing

Fri. Jun 17, 2022 10:00 AM - 11:40 AM Room C (Room C-2)

座長:籾山 悟至(NEC)[現地]

10:40 AM - 11:00 AM

[4C1-GS-7-03] Sign Language Recognition with 3D CNN Transformer

〇Ryota Takahashi1, Hiroaki Saito1 (1. Faculty of Science and Technology, Keio University)

Keywords:Sign Language Recognition, Image Sequence Processing, Deep Learning, Transformer, Gesture Recognition

In this paper, we propose a network that combines 3D CNN and Transformer for isolated sign language recognition. In the sign language field, LSTM-based modules have been used for sequence modeling. One goal is to improve recognition accuracy by replacing LSTM-based modules with self-attention modules such as Transformer. The proposed network is evaluated comprehensively on LSA64 and the own dataset with graded speaker dependencies and conditions. the own dataset consists of three patterns (wearing color gloves, barehanded, wearing a mask). Experimental results on LSA64 demonstrate the effectiveness of our proposed method. Experimental results on the own dataset show differences in shooting conditions, such as background, have a significant impact on recognition accuracy. For a more robust model, it is considered that a large dataset with a variety of shooting conditions and input formats other than RGB is needed.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password