Comparison of optimal acoustic features and machine learning in infant speech emotion recognition

Rika Tarumi

4:20 PM - 4:40 PM

[1L4-OS-4a-05] Comparison of optimal acoustic features and machine learning in infant speech emotion recognition

〇Rika Tarumi¹, Takayuki Itoh¹ (1. Ochanomizu University)

Keywords:Speech Emotion Recognition, Infant, Machine Learning, Acoustic Features

Raising a child as young as two years old is fraught with danger, and therefore, parents taking their eyes off their children even a moment is fraught with danger. It is extremely difficult to balance housework and childcare under these circumstances. We therefore investigate the application of automatic discrimination of behaviour and emotion by infant voice, with the aim of reducing the burden of childcare by informing the infant of his/her emotions while the housework of his/her parents. On the other hand, there are few studies on the recognition of babbling of two-year-olds. Also, it may be difficult to apply deep learning techniques for this problem because sufficient sizes of open datasets of babbling voices are rare. In this report, we discuss the choice of optimal features and machine learning, and its generalizability by visualisation using t-SNE, and conclude that the best system is to calculate acoustic features by CNN-2D and then discriminate by SVM.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1L4-OS-4a] OS-4

[1L4-OS-4a-05] Comparison of optimal acoustic features and machine learning in infant speech emotion recognition

Password