JSAI2023

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[1O3-GS-7] Vision, speech media processing

Tue. Jun 6, 2023 1:00 PM - 2:40 PM Room O (E1+E2)

座長:田崎 豪(名城大学) [オンライン]

1:00 PM - 1:20 PM

[1O3-GS-7-01] Automatic Production of Audio Descriptions Using Image Recognition for Live Baseball Broadcasts

〇Yuki Shimano1, Yuya Kuwano1, Masaki Takahashi1, Masaru Miyazaki1, Masanori Sano1, Atsushi Imai2, Toru Takagi2 (1. NHK Science & Technology Research Laboratories, 2. NHK Engineering System)

Keywords:Image Recognition, Automatic Production of Audio Descriptions, Media Accessibility

Audio descriptions enable a visually impaired audience to enjoy broadcast programs by providing supplementary information such as a person’s actions and facial expressions that are difficult for such audiences to understand from the main audio content. Although such descriptions would be ideal for the live sporting broadcasts, the production of audio descriptions for such events requires high production costs and expert commentary skills. We thus developed a system that creates audio descriptions of live baseball broadcasts and distributes them to users' smartphones in real time.These audio descriptions are created from the superimposed captions of baseball broadcasts automatically by using image recognition.The experimental results indicate that the proposed method recognizes information of superimposed captions and robustly produces audio descriptions in real time.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password