Keywords:Natural Language Processing, Image Processing, Object Manipulation, Referring Expression, Robot
Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by a variety of ambiguities and missing information. Existing methods are insufficient to model reference expressions that specify relationships between objects. In this paper, we propose Target-dependent UNITER, which learns directly the relationship between the target object and other objects by focusing on the relevant regions within an image, instead of the whole image. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.