6:30 PM - 6:50 PM
[2G6-GS-6-04] Exploring the Analyst's Perspective: Extracting Insights from Code and Markdown Comments
Keywords:Natural Language Processing, Code mining
Despite the rapid proliferation of big data in recent years, complex tools for data analytics are thought to be a major contributor to the lack of data scientists. This research seeks to understand the acquisition of expert knowledge in data science by exploring the relationship between analysts and Python notebook data submitted to the Kaggle platform. Specifically, we categorized code titles into clusters and counted markdown cells, user-defined functions, and code lines across user tiers and analyzed word patterns related to Covid-19. Our findings also reveal that the thematic focus of analyses on the platform is tier-dependent. However, the analyses also show a blend of disparate methodologies within the same areas, and the markdown cells featured comments in multiple languages, suggesting the necessity for innovative analytical approaches in the future studies.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.