JSAI2022

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2O1-GS-7] Vision, speech media processing: generation

Wed. Jun 15, 2022 9:00 AM - 10:40 AM Room O (Room 510)

座長:栗田 修平(理化学研究所)[現地]

9:00 AM - 9:20 AM

[2O1-GS-7-01] Generating Subgoals with Adversarial Network on Vision-and-Language Navigation

〇Shintaro Ishikawa1, Komei Sugiura1 (1. Keio University)

Keywords:Vision-and-Language Navigation, Adversarial Training, Robot, Natural Language Processing, Image Processing

In this paper, we focus on a vision-and-language task in which a robot is instructed to execute household tasks. We propose Moment-based Adversarial Training (MAT), which uses two types of moments for perturbation updates in adversarial training. We introduce MAT to the embedding spaces of the instruction, subgoals, and state representations to handle their varieties. We validated our method on the ALFRED benchmark, and the results demonstrated that our method outperformed the baseline method for all the metrics on the benchmark.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password