[4-B-1-04] NDBデータ解析の感想
We analyzed the claims National Database (NDB) at the on-site visit center in Tokyo from February to August 2018. This analysis assessed timely access to new drugs on a prefectural level, focusing on newly-approved chemotherapy and diabetes drugs from 2011 to 2015. From my analysis, I would like to offer three considerations.
First, database’s complex structure demands time to understand the data comprehensively. For example, one might encounter a DPC record that does not exist in Medical receipt data; conducting analysis without acknowledging inconsistencies in the data structure may lead to misinterpretation of results. Targeting the appropriate records and meeting specific extraction conditions are imperative parts of NDB analysis.
Second, it is vital to carefully assess the limitations of and assumptions inherent to an analysis before conducting it. Analyses are conducted based on data that is assumed. For instance, current records do not list specific dates for hospital admission and discharge, which requires additional determinations to be made. Ultimately, the efficacy of data analytics on this scale is contingent upon the identification of assumptions made by the analysis, and on the successful confirmation that those assumptions are accurate.
Finally, validation of the analysis is a major concern. NDB datasets require a suite of data-handling skills to handle their massive size. Analysts with adequate skills in the sort of software that has the capacity to deal with such quantities of data are few, and researchers may be tempted to cut corners in the time-consuming process of validation. In the interest of preventing burnout, conducting easily-validated analyses might be one solution. Rather than tackling topics that simultaneously test multiple hypotheses, NBD analysis is currently more suited to simpler analyses.
First, database’s complex structure demands time to understand the data comprehensively. For example, one might encounter a DPC record that does not exist in Medical receipt data; conducting analysis without acknowledging inconsistencies in the data structure may lead to misinterpretation of results. Targeting the appropriate records and meeting specific extraction conditions are imperative parts of NDB analysis.
Second, it is vital to carefully assess the limitations of and assumptions inherent to an analysis before conducting it. Analyses are conducted based on data that is assumed. For instance, current records do not list specific dates for hospital admission and discharge, which requires additional determinations to be made. Ultimately, the efficacy of data analytics on this scale is contingent upon the identification of assumptions made by the analysis, and on the successful confirmation that those assumptions are accurate.
Finally, validation of the analysis is a major concern. NDB datasets require a suite of data-handling skills to handle their massive size. Analysts with adequate skills in the sort of software that has the capacity to deal with such quantities of data are few, and researchers may be tempted to cut corners in the time-consuming process of validation. In the interest of preventing burnout, conducting easily-validated analyses might be one solution. Rather than tackling topics that simultaneously test multiple hypotheses, NBD analysis is currently more suited to simpler analyses.