[4-E-1-03] Preparation and the pitfalls of processing work for the use of NDB special extraction data - From experience of use -
NDB, Database research, Data wrangling
The advantages of utilising national databases include 'the world's largest accumulation of cases', 'complete coverage' and 'the only (almost) complete tracking of patient journeys in Japan'.
However, in order to make use of these characteristics in the utilisation of specially extracted data, various processing operations are required for the vast amount of data, and work that is different from the statistical processing and other analyses conducted in conventional research, such as preparing the processing environment and studying the processing methods, is required.
In particular, in processing, there are many pitfalls unique to the use of NDB data, as the receipt data, which is the origin of NDB data, is not input for the purpose of analysis in the first place.
For example, in the current NDB data, the data is divided by month and by the medical institution that issued the receipt, and it is necessary to collate the data by patient, but as a unified ID to identify the patient is not assigned, complete collation is not possible. Therefore, it is necessary to consider the most appropriate nomenclature method according to the purpose of the analysis.
In this presentation, based on the experience of using the NDB, the common preparations required for the utilisation of special extraction data, as well as tips and points to note in data processing work will be introduced.
However, in order to make use of these characteristics in the utilisation of specially extracted data, various processing operations are required for the vast amount of data, and work that is different from the statistical processing and other analyses conducted in conventional research, such as preparing the processing environment and studying the processing methods, is required.
In particular, in processing, there are many pitfalls unique to the use of NDB data, as the receipt data, which is the origin of NDB data, is not input for the purpose of analysis in the first place.
For example, in the current NDB data, the data is divided by month and by the medical institution that issued the receipt, and it is necessary to collate the data by patient, but as a unified ID to identify the patient is not assigned, complete collation is not possible. Therefore, it is necessary to consider the most appropriate nomenclature method according to the purpose of the analysis.
In this presentation, based on the experience of using the NDB, the common preparations required for the utilisation of special extraction data, as well as tips and points to note in data processing work will be introduced.