17:15 〜 19:15
[MIS06-P10] Leveraging Big Data and Deep Learning for Quantifying XRF Core Scanning Data into Various Geological Proxies
キーワード:X-ray Fluorescence (XRF), Geochemistry, Machine Learning, Self-supervised Learning, Foundation Model, Spectral Analysis
X-ray fluorescence (XRF) core scanning is widely used in geological research due to its rapid, non-destructive, and high-resolution capabilities. Significant efforts have been made to quantify XRF measurements into various geological proxies; however, conventional quantification models remain largely project-specific. The variability in materials and target proxies across individual studies makes cross-project applications challenging, requiring future projects to gather substantial datasets to train accurate models.
To address this challenge, we employ self-supervised learning using a masked deep autoencoder architecture on a global collection of XRF data and geological proxies. Our objective is to develop a foundation model that overcomes project-specific limitations and continuously improves by integrating diverse datasets, including legacy cores.
Our initial results demonstrate the effectiveness of this approach. The foundation model is pre-trained on 54,643 spectra from marine sediments collected in high-latitude regions of the Pacific and Southern Oceans. This pre-training phase enables the model to develop a general understanding of XRF spectra, allowing it to recognize key spectral features. After fine-tuning with only one-third of the training data, the model outperforms conventional quantification methods in accuracy for calcium carbonate (CaCO3) and total organic carbon (TOC) measurements. Furthermore, it exhibits a 60% improvement in accuracy when tested on entirely unseen sediment cores located tens of kilometers away, demonstrating its strong generalizability.
To further scale up this approach, we have included legacy cores from additional oceanic and terrestrial regions, such as the Indian Ocean, Japan Sea, Arctic, and European and Patagonian lakes. Our collaboration with Kochi University provides access to a broader range of core samples. By expanding the database to incorporate diverse materials and machine settings, we aim to enhance the model’s adaptability. Ultimately, this approach seeks to extend beyond core scanning and facilitate advancements in all XRF-based measurement techniques.
To address this challenge, we employ self-supervised learning using a masked deep autoencoder architecture on a global collection of XRF data and geological proxies. Our objective is to develop a foundation model that overcomes project-specific limitations and continuously improves by integrating diverse datasets, including legacy cores.
Our initial results demonstrate the effectiveness of this approach. The foundation model is pre-trained on 54,643 spectra from marine sediments collected in high-latitude regions of the Pacific and Southern Oceans. This pre-training phase enables the model to develop a general understanding of XRF spectra, allowing it to recognize key spectral features. After fine-tuning with only one-third of the training data, the model outperforms conventional quantification methods in accuracy for calcium carbonate (CaCO3) and total organic carbon (TOC) measurements. Furthermore, it exhibits a 60% improvement in accuracy when tested on entirely unseen sediment cores located tens of kilometers away, demonstrating its strong generalizability.
To further scale up this approach, we have included legacy cores from additional oceanic and terrestrial regions, such as the Indian Ocean, Japan Sea, Arctic, and European and Patagonian lakes. Our collaboration with Kochi University provides access to a broader range of core samples. By expanding the database to incorporate diverse materials and machine settings, we aim to enhance the model’s adaptability. Ultimately, this approach seeks to extend beyond core scanning and facilitate advancements in all XRF-based measurement techniques.