JSAI2021

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I2-GS-7c] 画像音声メディア処理:音声認識と指示理解

Fri. Jun 11, 2021 11:00 AM - 12:40 PM Room I (GS room 4)

座長:宮西 大樹(国際電気通信基礎技術研究所)

12:20 PM - 12:40 PM

[4I2-GS-7c-05] Towards Sub-word Unit Discovery in Zero Resource Scenario

an Approach Based on Graph Neural Networks

〇Shun Takahashi1, Sakti Sakriani1,2, Satoshi Nakamura1,2 (1. Nara Institute of Advanced Science and Technology, 2. RIKEN AIP Center)

Keywords:automatic speech recognition, low-resource languages, unsupervised learning, zero-resource, graph neural networks

Zero resource speech technology aims for discovering discrete units in a limited amount of unannotated, raw speech data. The previous studies have mainly focused on learning the discrete units from acoustic features, segmented by fixed small time-frame. While achieving high unit quality, they suffer from high bitrate due to the time-frame encoding. In this work, in order to lower the bitrate, we propose a novel approach based on discrete autoencoder and graph convolutional networks. We exploit the speech features discretized by vector-quantization encoding. Since the maximum number of the discretized features is predetermined, we consider a directed graph where each node represents a discretized acoustic feature and each edge transition from one feature to another. Using graph convolution, we extract and encode the topological feature of the graph into each node, and then we symmetrize the graph to apply spectral clustering on the node features. In terms of ABX error rate and bit rate estimation, we demonstrate that our model successfully decreases the bitrate, while retaining the unit quality.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password