12:00 〜 12:15
[MGI27-12] Schrödinger ブリッジと拡散モデルによる都市微気象のリアルタイム3D超解像
キーワード:超解像、都市微気象、拡散モデル、シュレディンガーブリッジ、画像修正
Diffusion-based deep generative models (diffusion models) have demonstrated remarkable effectiveness in various image processing tasks, including super-resolution [1]. In conventional super-resolution, these models generate high-resolution (HR) images from pure noise through a reverse diffusion process. However, since super-resolution aims to enhance low-resolution (LR) images, initializing the reverse process directly from LR images rather than noise can significantly improve computational efficiency (see upper panels in Figure). Recently, the Schrödinger bridge formulation has emerged as a framework for constructing transformations between arbitrary data distributions, such as LR and HR images [2,3]. This framework remains largely unexplored in meteorological problems. We demonstrate that our extended Schrödinger bridge approach for urban flow fields outperforms conventional diffusion models in both inference accuracy and computational efficiency for real-time 3D super-resolution applications.
The diffusion model [4] implements stochastic processes governed by Eqs. (1) and (2) in Figure. The forward process described by Eq. (1) evolves an HR image y0 into pure noise yT, while Eq. (2) governs its reverse process. Here, at denotes a prescribed function and Wt is a standard Wiener process. Super-resolution is achieved by training a neural network to learn the score function st for the reverse process, with the LR image x as an input to st.
The Schrödinger bridge [2,3] employs stochastic processes governed by Eqs. (3) and (4) in Figure. The forward process in Eq. (3) transforms an HR image z0 to an LR image zT, while Eq. (4) describes its reverse process. The term bt denotes a prescribed function, and B is a constant matrix incorporating building and flow field structures. The LR to HR transformation (i.e., super-resolution) is achieved by training neural networks to learn functions ft and st for the reverse process.
For evaluation, we performed 2D temperature super-resolution using a U-Net architecture [1]. The LR and HR were set to 20 m and 5 m, respectively. The dataset was generated from numerical experiments using the multi-scale atmosphere-ocean model MSSG [5,6] over a 2 km × 2 km domain centered on Tokyo Station. We used the same numerical experiments to generate 3D temperature and velocity fields at 20 m (LR) and 5 m (HR) for 3D super-resolution.
The 2D super-resolution results show that the Schrödinger-bridge model outperforms the conventional diffusion model, achieving a mean absolute error (MAE) of 0.18 K compared to 0.76 K for the diffusion model. Furthermore, The Schrödinger-bridge model requires only 10 time steps for the reverse process, whereas the diffusion model needs 250 steps.
The bottom panels in Figure show 3D super-resolution results from the Schrödinger-bridge model. Temperature fields are displayed with buildings shown in gray. Similar results were obtained for velocity fields. Figure indicates that the Schrödinger-bridge model successfully reconstructs 3D temperature that is similar to the HR data, while performing inpainting and bias correction simultaneously [7]. Super-resolution of all 3D data for 60-min predictions takes approximately 140 s, demonstrating the feasibility of real-time super-resolution simulation [7,8]. In our presentation, we will elaborate on the theoretical foundations of Eqs (1)-(4), focusing particularly on the determination of matrix B for 3D urban micrometeorology applications.
[1] Saharia et al. (2021), arXiv:2111.05826.
[2] Liu et al. (2023), arXiv:2302.05872.
[3] Albergo et al. (2023), arXiv:2303.08797.
[4] Ho et al. (2020), arXiv:2006.11239.
[5] Takahashi et al. (2013), J. Phys. Conf. Ser.
[6] Matsuda et al. (2018), J. Wind Eng. Indust. Aerodyn.
[7] Yasuda and Onishi (2025), Urban Clim.
[8] Onishi et al. (2019). SOLA.
The diffusion model [4] implements stochastic processes governed by Eqs. (1) and (2) in Figure. The forward process described by Eq. (1) evolves an HR image y0 into pure noise yT, while Eq. (2) governs its reverse process. Here, at denotes a prescribed function and Wt is a standard Wiener process. Super-resolution is achieved by training a neural network to learn the score function st for the reverse process, with the LR image x as an input to st.
The Schrödinger bridge [2,3] employs stochastic processes governed by Eqs. (3) and (4) in Figure. The forward process in Eq. (3) transforms an HR image z0 to an LR image zT, while Eq. (4) describes its reverse process. The term bt denotes a prescribed function, and B is a constant matrix incorporating building and flow field structures. The LR to HR transformation (i.e., super-resolution) is achieved by training neural networks to learn functions ft and st for the reverse process.
For evaluation, we performed 2D temperature super-resolution using a U-Net architecture [1]. The LR and HR were set to 20 m and 5 m, respectively. The dataset was generated from numerical experiments using the multi-scale atmosphere-ocean model MSSG [5,6] over a 2 km × 2 km domain centered on Tokyo Station. We used the same numerical experiments to generate 3D temperature and velocity fields at 20 m (LR) and 5 m (HR) for 3D super-resolution.
The 2D super-resolution results show that the Schrödinger-bridge model outperforms the conventional diffusion model, achieving a mean absolute error (MAE) of 0.18 K compared to 0.76 K for the diffusion model. Furthermore, The Schrödinger-bridge model requires only 10 time steps for the reverse process, whereas the diffusion model needs 250 steps.
The bottom panels in Figure show 3D super-resolution results from the Schrödinger-bridge model. Temperature fields are displayed with buildings shown in gray. Similar results were obtained for velocity fields. Figure indicates that the Schrödinger-bridge model successfully reconstructs 3D temperature that is similar to the HR data, while performing inpainting and bias correction simultaneously [7]. Super-resolution of all 3D data for 60-min predictions takes approximately 140 s, demonstrating the feasibility of real-time super-resolution simulation [7,8]. In our presentation, we will elaborate on the theoretical foundations of Eqs (1)-(4), focusing particularly on the determination of matrix B for 3D urban micrometeorology applications.
[1] Saharia et al. (2021), arXiv:2111.05826.
[2] Liu et al. (2023), arXiv:2302.05872.
[3] Albergo et al. (2023), arXiv:2303.08797.
[4] Ho et al. (2020), arXiv:2006.11239.
[5] Takahashi et al. (2013), J. Phys. Conf. Ser.
[6] Matsuda et al. (2018), J. Wind Eng. Indust. Aerodyn.
[7] Yasuda and Onishi (2025), Urban Clim.
[8] Onishi et al. (2019). SOLA.