A Study on Deep Learning Models Suitable for Noisy Environment in Speech Source Discrimination Method

野中公貴

[A-5-3] A Study on Deep Learning Models Suitable for Noisy Environment in Speech Source Discrimination Method

^○野中公貴¹, 山田宏樹², 吉田孝博³ (1.東京理科大, 2.東京農工大, 3.東京理科大)

この講演は本会「学術奨励賞受賞候補者」の資格対象です。

Keywords:音声対話型UI、音声発生源判別、dilated convolution

ユーザの身の回りに音声対話型UIを搭載した機器が複数存在する環境で各機器が適切に動作するためには，人が直接発した音声と機器から再生された音声とを判別する必要がある．そのため先行研究では音声発生源判別法が提案されたが，雑音環境下での判別に適した学習モデルについて未検討であった．本研究では，雑音環境下での判別に適した学習モデルを調査するため，3種類の深層学習モデルで判別精度を比較した．その結果，dilated convolution構造を持つ畳み込み層が5層のCNNにより，従来の3層のCNNに対して最大6.5ポイントの改善が得られ，SNR=-10dBにおいても90.0％の判別精度を実現した．

Abstract password authentication.
Password is required to view the abstract. Please enter a password to authenticate.

Presentation information

[A-5] 応用音響

[A-5-3] A Study on Deep Learning Models Suitable for Noisy Environment in Speech Source Discrimination Method

Password