[MGI29-P03] Sparse feature selection for clustering and sample-wise distance, with application to geochemical data
Keywords:MORB, Machine learning, Geochemical data
In order to analyze such kind of data, we considered the "distribution" defined by the observation values of the series of samples or multiple analyses. We measured the distance between the distributions. The distance between distributions is derived by a nonparametric method which does not assume any specific probability distribution. The distance corresponding to each feature quantity is defined. The total distance is defined by the weighted sum of "element distances”. By using clustering with this weight and further selecting features by imposing sparse constraints on the weights, we can calculate the distances between sets and the quantities characterizing distances (in the case of this study, the elemental species and the specific isotopic ratios).
The advantages of this method are,
1. It enables us to determine variables characterizing the distance,
2. It is unnecessary for all samples to have analytical values of all elements with this method.
Using rock chemical composition database “petdb" (http://www.petdb.org), compositional data of 3988 MORB samples, up to 49 elements (including 5 isotopes and 10 major elements) was compiled and used for analysis. Based on its spatial distribution, MORB was grouped into several clusters and chemical compositions and distances between the clusters are compared. Elements or isotope ratios that characterize the spatial variation and the distances between the clusters were obtained using this method. As a result, MORB is clustered into the east-west hemisphere. Sr isotope ratio was found to be most important as an amount characterizing the spatial variation of MORB. Clustering of this east-west hemisphere is consistent with the structure shown by Iwamori and Nakamura (2015). Since the Sr isotopic ratio is sensitive to the amount of recycled material in the source mantle (e.g., Hoffman, 1997; Albarede, 2009), it is suggested that the distribution of recycled material of the subducted slab is systematically different between the east and western hemispheres.