JSAI2025

Presentation information

Poster Session

Poster session » Poster Session

[1Win4] Poster session 1

Tue. May 27, 2025 3:30 PM - 5:30 PM Room W (Event hall D-E)

[1Win4-67] Event-Driven GPUDirect Inference for Reducing Overhead in Inference Serving

〇Kenji kenji Tanaka1, Kento Kitamura1, Kazunori Seno1 (1.NTT IOWN Innovation Center)

Keywords:Inference Serving System, GPUDirect RDMA, DOCA

In this study, we developed a novel event-driven streaming GPU computing system that integrates DOCA GPUNetIO and CUDA Graph to support an AI-driven cyber-physical system operating on NTT’s next-generation data center infrastructure (IOWN). The goal is to enable concurrent execution of multiple models while minimizing latency overhead and GPU power consumption. Compared to existing methods, the proposed approach reduces inference overhead by 20% and increases throughput by 173.2%. Furthermore, by employing event-driven inference, our system can process inference requests for up to five models simultaneously without resource contention.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password