Learning Beyond 2D Images

Winston Hsu

11:00 AM - 12:10 PM

[3A2-PS-3-01] Learning Beyond 2D Images

Winston Hsu¹ (1. National Taiwan University)

We observed super-human capabilities from current (2D) convolutional networks for the images — either for discriminative or generative models. For this talk, we will show our recent attempts in visual cognitive computing beyond 2D images. We will first demonstrate the huge opportunities as augmenting the leaning with temporal cues, 3D (point cloud) data, raw data, audio, etc. over emerging domains such as entertainment, security, healthcare, manufacturing, etc. In an explainable manner, we will justify how to design neural networks leveraging the novel (and diverse) modalities. We will demystify the pros and cons for these novel signals. We will showcase a few tangible applications ranging from video QA, robotic object referring, situation understanding, autonomous driving, etc. We will also review the lessons we learned as designing the advanced neural networks which accommodate the multimodal signals in an end-to-end manner.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3A2-PS-3] Learning Beyond 2D Images

[3A2-PS-3-01] Learning Beyond 2D Images

Password