*Shinsuke Satoh1, Fusako Isoda1, Hiroshi HANADO1, Katsuhiro Nakagawa1, Susumu Uchino2, Kohei Yamashita2, Kazuya Muranaga2
(1.National Institute of Information and Communications Technology, 2.Systems Engineering Consultants (SEC))
Keywords:Phased Array Weather Radar, Data Quality Control, Data Disclosure, Observation Big Data
The Phased Array Weather Radar (PAWR), which was developed for early detection and prediction of torrential rain, performs detailed 3D observation in 30 seconds, and generate observation big data at 100 times the data rate of conventional parabolic antenna type weather radar. NICT is engaged in observation operation and data storage / distribution of four PAWRs installed in Suita (Osaka University), Kobe, Okinawa, and Saitama University. In general, the weather radar observation data is used in real-time to grasp and predict rainfall distribution. The PAWR real-time observation data every 30 seconds is also used in the smartphone apps "3D Amagumo Weather" and "RIKEN Weather Forecast." On the other hand, past observation data is also important for studying various rainfall events. Long-term data archiving is needed to investigate various heavy rainfall mechanisms, improved precipitation forecasting, and recent record heavy rainfall increases. On the PAWR web page (https://pawr.nict.go.jp/), a quick look (QL) image showing the rainfall distribution at an altitude of 2 km is created and published within 1 minute after observation. All QL images and rainfall summaries (graphs) are posted on the "Past Data" page. The rainfall summary includes the average rainfall, maximum rainfall, and rainfall area at an altitude of 2 km as daily and weekly graphs, and text data every 30 seconds. This QL image and rainfall summary are created for searching past data, but they are also used as useful data in themselves. Millions of QL images can be used for machine learning, and the rainfall summary can be used not only for grasping long-term rainfall conditions but also for investigating fluctuations in data quality.
Data quality control is important for the use of real-time and past data. The PAWR observation data includes unnecessary data such as surface clutter, noise, and pseudo-echo, so a data quality control (QC) flag is created within 10 seconds. However, because there are the issues of the QC accuracy due to various rainfalls, we are developing a clutter identification method using semantic segmentation. We are proceeding with the policy of disclosing past observation data as open data, but there are many issues. First, there is the problem of data capacity. The past data is saved in multiple storage servers as hourly tgz files in consideration of capacity reduction and handling, but it has already exceeded 3 PB. Since it is not realistic to publish all the data, only the requested data is copied to the data public server. Another problem is that the data is saved in polar coordinates in the original binary format, which makes it difficult for general users to use. In order to be widely used by many users as open data, it is necessary to perform data quality control and process it into an easy-to-use data format.