Presentation information

Interactive Session

General Session » Interactive Session

[2Xin5] インタラクティブ1

Wed. Jun 9, 2021 5:20 PM - 7:00 PM Room X (Poster room 1)

[2Xin5-22] Recovering Spectrograms using Contextual Attention

〇Shunsuke HABARA1, Yoshiaki KUROSAWA1, Kazuya MERA1, Toshiyuki TAKEZAWA1 (1.Graduate School of Information Sciences, Hiroshima City University)

Keywords:Voice, Deep Learning, Inpainting

There is a growing trend towards implementing technologies that use deep neural networks to improve sound quality by signal denoising, and a system that converts voice quality in real-time for the online conference. In the field of computer vision, inpainting techniques based on deep neural networks have also been developed in recent years. In this paper, we focus on an inpainting technique with contextual attention to recover spectrograms. We apply a mask to the time direction of the spectrogram and examine whether the spectrogram can be recovered from the non-masked area. We propose a method to improve the accuracy of speech restoration by providing a gradient in the frequency direction to the spectrogram. As a result, our proposed method improved one of sound metrics: Mel-Cepstral Distortion. We also confirmed that the attention map improved attention in the frequency.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.