[STT52-P07] On the Optimum Solutions of Bayesian Inversions
Keywords:Fully Bayesian Inferences, Inversion Validations
Geophysical inversions are frequently built on the Bayesian inferences. Bayesian inferences provide the posterior probability distribution function (pdf) from 1) the observation equation to relate the data and model parameters and 2) the prior distribution of the model parameters, via Bayes' theorem. The more extended approach, called fully Bayesian inference, serves a joint posterior pdf in terms of model parameters and hyperparameters, by further considering the hyperprior distribution: the prior for the hyperparameters.
The posterior pdf requires criteria for informational reductions to obtain the most reliable (optimum) model parameters. The best way of reduction is nontrivial, and the criteria can be summarized into three groups. The first is the probability maximum of the posterior pdf (maximum a posteriori, MAP). This is the most intuitive way. Another is using the probability maximum in the marginalized posterior in terms of hyperparameters, by integrating the posterior pdf over the model parameters. The optimum model parameters are then evaluated as the ordinary weighted least squares given by the selected hyperparameters. This is exactly the Akaike Bayesian information criterion (ABIC), and has been adopted in the wide classes of geophysical inversions (Yabuki & Matsu'ura, 1996). The other is using the marginalized posterior pdf in terms of model parameters, by integrating the posterior over the hyperparameters. It contains, for example, the average, the median, and the most frequent values (the mode). The last group of indices has been common for the previous fully Bayesian inferences, such as Fukuda & Johnson (2008). Therefore, we call these as fully Bayesian inferences, in contrast to MAP and ABIC.
MAP, ABIC, and previous fully Bayesian inferences give similar results in most case (e.g., Fukuda & Johnson, 2008). However, we report a clear discrepancy between them for the case of large numbers of model parameters.
First, we derived the analytic expressions of the solutions for MAP and previous fully Bayesian inferences, as well as their uncertainty represented by covariances, in the problem where the observation equation is linear in terms of model parameters. The obtained solutions clarified that any kind of the previous fully Bayesian inferences (averages, modes, medians, and so on) almost surely gives the same estimate value as the MAP estimate in the asymptote of large numbers of model parameters, for wide classes of hyperprior like the uniform distribution. The solutions of MAP and the previous fully Bayesian inferences are further shown to be different from the solution of ABIC.
Next, we employed the synthetic test to clarify the difference between ABIC and other estimates, with using the small roughness as a prior (Yabuki & Matsu'ura, 1992). The numerical investigation clarified that distribution of MAP estimates (that is the posterior pdf itself) converged to the overfitted solution for given data and the prior is neglected unfairly, as we improve the resolution of models by increasing the number of model parameters. Meanwhile, the estimate of ABIC did not induce the aforementioned problem and provided the significantly closer estimates to the true solutions than MAP estimates with fine discretization.
These are counterintuitive from the standard view where ABIC is regarded as a tractable approximation to assess the joint posterior pdf in terms of model parameters and hyperparameters (Gelman et al., 2013). The accuracy deterioration due to the overfitting in the case of the higher resolution with fewer discretization errors can be a crucial problem in the inversions for real data, where the true parameters are intrinsically unknown. Our results suggest the way of improving the ordinary fully Bayesian inferences along the way of ABIC, and warn the potential risks hidden behind the MAP and previous fully Bayesian inferences.
The posterior pdf requires criteria for informational reductions to obtain the most reliable (optimum) model parameters. The best way of reduction is nontrivial, and the criteria can be summarized into three groups. The first is the probability maximum of the posterior pdf (maximum a posteriori, MAP). This is the most intuitive way. Another is using the probability maximum in the marginalized posterior in terms of hyperparameters, by integrating the posterior pdf over the model parameters. The optimum model parameters are then evaluated as the ordinary weighted least squares given by the selected hyperparameters. This is exactly the Akaike Bayesian information criterion (ABIC), and has been adopted in the wide classes of geophysical inversions (Yabuki & Matsu'ura, 1996). The other is using the marginalized posterior pdf in terms of model parameters, by integrating the posterior over the hyperparameters. It contains, for example, the average, the median, and the most frequent values (the mode). The last group of indices has been common for the previous fully Bayesian inferences, such as Fukuda & Johnson (2008). Therefore, we call these as fully Bayesian inferences, in contrast to MAP and ABIC.
MAP, ABIC, and previous fully Bayesian inferences give similar results in most case (e.g., Fukuda & Johnson, 2008). However, we report a clear discrepancy between them for the case of large numbers of model parameters.
First, we derived the analytic expressions of the solutions for MAP and previous fully Bayesian inferences, as well as their uncertainty represented by covariances, in the problem where the observation equation is linear in terms of model parameters. The obtained solutions clarified that any kind of the previous fully Bayesian inferences (averages, modes, medians, and so on) almost surely gives the same estimate value as the MAP estimate in the asymptote of large numbers of model parameters, for wide classes of hyperprior like the uniform distribution. The solutions of MAP and the previous fully Bayesian inferences are further shown to be different from the solution of ABIC.
Next, we employed the synthetic test to clarify the difference between ABIC and other estimates, with using the small roughness as a prior (Yabuki & Matsu'ura, 1992). The numerical investigation clarified that distribution of MAP estimates (that is the posterior pdf itself) converged to the overfitted solution for given data and the prior is neglected unfairly, as we improve the resolution of models by increasing the number of model parameters. Meanwhile, the estimate of ABIC did not induce the aforementioned problem and provided the significantly closer estimates to the true solutions than MAP estimates with fine discretization.
These are counterintuitive from the standard view where ABIC is regarded as a tractable approximation to assess the joint posterior pdf in terms of model parameters and hyperparameters (Gelman et al., 2013). The accuracy deterioration due to the overfitting in the case of the higher resolution with fewer discretization errors can be a crucial problem in the inversions for real data, where the true parameters are intrinsically unknown. Our results suggest the way of improving the ordinary fully Bayesian inferences along the way of ABIC, and warn the potential risks hidden behind the MAP and previous fully Bayesian inferences.