Surrogate-based active learning for parameter inference in geosciences via Bayesian sparse 2 multi-adaptivity enhanced by information theory

PN 5 A-2

Project description

Simulations with well-calibrated models offer a unique way to predict the multifaceted behavior of subsurface flow. Thus, reliable and feasible frameworks for model calibration are needed that account for the uncertainty that remains after model calibration for highly non-linear and computationally expensive models. Additionally, we need to state that due to the nature of carbon dioxide (CO2) physics, the CO2 displacement front have very strong non-linearity which is very hard to capture. The consequence of them are strongly variating front speeds w.r.t. the evaluated parameter space. Well spread in the scientific community Machine Learning (ML) techniques seems to be very suitable candidate for such non-linear problems. Classical ML approaches require huge amount of data coming from model parameters and as well as model response. Unfortunately, many problems addressed in geosciences could only provide very sparse data.  The data sparsity is caused by low amount of the available measurement data and as well by extensive computational costs of numerical simulation of realistic models. Multiphase flow of CO2 in deep geological formation is unquestionable representative of this class of problems. In the current project, we propose to develop ML approach that will be able to treat the local non-linearity of the physical problem adaptively taking into account the sparse nature of the available data. The project intend to explore the link between the Bayesian inference and information theory in a goal-oriented fashion to localize non-linearity of the physical problem adaptively according to the available observation data and computational resources. We will follow the recent trend in stochastic model reduction and will train a mathematically optimal response surface using limited (sparse) information from the original CO2 model in the light of observed data. The key novelty of the current project consists in extension of the arbitrary multi-resolution polynomial chaos (very recently developed by applicants of the proposal) framework towards an adaptive and sparse reconstruction based on Bayesian theory accompanied by the information-theoretic arguments. Following the idea of Bayesian experimental design we will maximize the expected utility and identify the sparse set of parameters where the original CO2 model has to be run.  Employment of the information-theoretic arguments will help to localize the parameter region of highest interest. Moreover, we suggest to identify the sparse structure of the aMR-PC representation according to the maximization of the quality of Bayesian parameter inference for a given computer time budget, where the Bayesian model evidence (BME) provide the necessary mathematical toolbox for the optimally choice. Combining Bayesian inference with information theory will help iteratively and adaptively improvement of the response surface, while iterative including relevant information into the adaptive response surface. With the novel approach denoted as Bayesian sparse2 arbitrary multi-resolution polynomial chaos expansion (Bs2-aMP-PC), it will be possible to calibrate highly non-linear models at strongly reduced computational costs and with quantified post-calibration uncertainty will focus its approximation quality on the parameter region of highest interest.

Project information

Project title Surrogate-based active learning for parameter inference in geosciences via Bayesian sparse 2 multi-adaptivity enhanced by information theory
Project leader Sergey Oladyshkin
Project partner Wolfgang Nowak
Project duration September 2019 – August 2022
Project number PN 5 A-2
Alternative project number DFG OL 456/3-1
Group webpage

Publications PN 5 A-2

    To the top of the page