JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models

Hiroshi Sasaki

[2Win5-87] JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models

〇Hiroshi Sasaki¹ (1.The Japan Research Institute, Limited)

Keywords:Dataset, LLM, Generative AI, Multimodal

Vision and language models (VLMs) are anticipated to be able to analyse human-written documents with a question-and-answering (QA) style. Such VLMs are demanded to recognise flowchart images in documents, which provide valuable insights that text-based explanations do not. Building precise flowchart understanding VLMs requires a bunch of flowchart images and corresponding text data for their training and evaluation, but the preparation of such datasets is quite time-consuming. To address this, we create a synthesised flowchart visual QA dataset using large language models. Our dataset consists of descriptions of business job tasks, flowcharts of the job tasks written as domain-specific language (DSL) codes and QA data related to the flowcharts along with the flowchart images rendered from the DSL codes. We introduce the dataset with the synthesis procedure and show the improvement of VLMs on a flowchart QA task when finetuning using the dataset.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2Win5] Poster session 2

[2Win5-87] JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models

Password