JSAI2018

Presentation information

Oral presentation

General Session » [General Session] 3. Data Mining

[1P1] [General Session] 3. Data Mining

Tue. Jun 5, 2018 1:20 PM - 3:00 PM Room P (4F Emerald Lobby)

座長:成松 宏美(NTT)

1:20 PM - 1:40 PM

[1P1-01] A Sampling Method based on Generalized Relative Square Error to Emphasize Low Probability Events

Hiroshi Hasegawa1, 〇Tomomi Nakamura1, Takashi Washio2 (1. Dept. Math and Sciences, Ibaraki University, 2. ISIR, University of Osaka)

Keywords:data sampling , generalized relative square error, events of low probability , Large deviation theory, Wang--Landau algorithm

A method of data sampling from a huge data set is discussed. We introduce a generalized relative square error to emphasize low probability events and figure out the best sampling weight to reduce the error. Our arguments are based on the large deviation theory. Large reduction in the generalized relative square error was numerically confirmed for the best sampling weight. We also propose to use Wang-Landau algorithm in data sampling. This algorithm is not only efficient to estimate a distribution of the original data, but also useful in data sampling to suppress the statistical errors.