A new statistical method to identify geochemical data structure

岩森 光; 吉田 健太; 中村 仁美; 桑谷 立; 浜田 盛久; 原口 悟; 上木 賢太

15:30 〜 15:45

[SGC54-07] A new statistical method to identify geochemical data structure

*岩森光¹、吉田健太¹、中村仁美^1,2,3、桑谷立¹、浜田盛久¹、原口悟¹、上木賢太⁴ (1.海洋研究開発機構・地球内部物質循環研究分野、2.東京工業大学・地球惑星科学専攻、3.千葉工業大学・次世代海洋資源研究センター、4.東京大学・地震研究所)

キーワード：多変量統計解析、クラスタ分析、主成分分析、独立成分分析、地球化学データ

Identifying the data structure including trends and groups/clusters in geochemical problems is essential to discuss the origin of sources and processes from the observed variability of data. A rapidly increasing number and high dimensionality of recent geochemical data require efficient and accurate methods for capturing the data structure. For example, the two databases of GEOROC and PetDB contain ~382,000 sets of data in total. Jenner and O’Neil [2012] provided analysis of 60 elements in 616 ocean floor basaltic glasses. The structure including trends and groups of these data cannot be identified by graphical methods (e.g., Harker diagrams and identifying trends/groups based on them). As will be demonstrated, even 2-dimensional data may be misinterpreted by graphical methods.
Here we propose a new multivariate statistical method that combines three conventional but powerful methods to capture the true structure of multivariate data [Iwamori et al., 2017, doi:10.1002/2016gc006663]; they are k-means cluster analysis (KCA), principal component analysis (PCA), and independent component analysis (ICA). The reasons for selecting the three methods are (i) KCA and PCA are probably the most fundamental yet powerful tools for multivariate analyses; (ii) ICA is not as common as PCA but is a unique tool for identifying hidden independent structures; and (iii) the three methods are newly found to be closely related and can be integrated to analyze the data effectively. In this study, we first describe the relationship of these three methods to elucidate the entire data structure based mainly on synthetic data. We apply this to a natural data set of isotopic compositions of basalts for which ICA has been performed. On the basis of the results, an effective combination of the methods is clarified, for which we provide an Excel program “KCA” at both doi:10.1002/2016gc006663 and http://dsap.jamstec.go.jp/ to allow readers to test and apply the program to individual problems.

講演情報

[S-GC54] [JJ] 地球化学の最前線

[SGC54-07] A new statistical method to identify geochemical data structure