JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[2G6-GS-6] Language media processing:

Wed. May 29, 2024 5:30 PM - 7:10 PM Room G (Room 22+23)

座長:丹羽彩奈(リクルート/Megagon Labs)

6:30 PM - 6:50 PM

[2G6-GS-6-04] Exploring the Analyst's Perspective: Extracting Insights from Code and Markdown Comments

〇Hisato Kuroiwa1, Teruaki Hayashi1 (1. The University of Tokyo)

Keywords:Natural Language Processing, Code mining

Despite the rapid proliferation of big data in recent years, complex tools for data analytics are thought to be a major contributor to the lack of data scientists. This research seeks to understand the acquisition of expert knowledge in data science by exploring the relationship between analysts and Python notebook data submitted to the Kaggle platform. Specifically, we categorized code titles into clusters and counted markdown cells, user-defined functions, and code lines across user tiers and analyzed word patterns related to Covid-19. Our findings also reveal that the thematic focus of analyses on the platform is tier-dependent. However, the analyses also show a blend of disparate methodologies within the same areas, and the markdown cells featured comments in multiple languages, suggesting the necessity for innovative analytical approaches in the future studies.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password