The German Research Foundation (DFG) has granted junior research group leader Anneli Guthke the project "Unified diagnostic evaluation of physically-based, data-driven and hybrid hydrological models based on information theory (UNITE)", which she submitted together with Uwe Ehret from the Institute of Water and River Basin Management, Department of Hydrology, at the Karlsruhe Institute of Technology (KIT).
“It is a great opportunity to now work on a full joint project with Uwe Ehret. We have met at a workshop on Information Theory in the Earth Sciences in 2018. Since then, we regularly exchanged ideas and visions of model evaluation by Bayesian and information-based approaches and found these informal meetings very insightful for both worlds. The funding of this project will now allow us to explore these ideas in depth”, explains Anneli Guthke.
In engineering and environmental sciences, we have increasingly relied on simulation models to deepen system understanding, predict the current state of the system in the near future (e.g., in operational use), and/or to predict a future state of the system that was previously unobservable (e.g., due to climate change, changes in land use, new infrastructure). The utility of model simulations depends largely on how well they can reproduce reality, and how honestly they can quantify prediction uncertainties. Rigorous statistical methods based on Bayesian probability theory are available for model evaluation and uncertainty quantification, but they rely on a core assumption that is generally violated: they assume that the considered model is the “true model”. Violating this assumption, by simplified and often overly rigid structural choices, leads to drastically shrinking prediction intervals that become over-confident and rarely cover future data. This effect compromises the utility of any simulation model.
In contrast, data-driven and ML-based approaches quantify the information in the data but ignore physics knowledge and hence tend to overestimate uncertainty and “underexplore” what we know. Hybrid models (any type of model in between physics-based and data-driven) aim to combine the strengths of both worlds. However, to date, a rigorous evaluation framework to judge and compare the utility of such very different models is lacking.
The aim of this research project is to develop and apply information theoretic concepts and methods for diagnostic model evaluation, which provide a more general and holistic understanding and evaluation of different models in hydrological sciences. Hydrological modeling will benefit from this proposed diagnostic framework by improved understanding of the hydrological cycle and hydrological processes at different scales, by insights into intrinsic model errors of simulation models, and into the blackboxness of ML-based approaches. Such approaches have gained substantial interest in the hydrological science community, partially due to the ease in application, but also due to the superior results, even in ungauged basin settings, compared to traditional modelling frameworks.
In summary, merging data-driven with physics-based modeling approaches, as targeted by SimTech, requires also merging on the level of statistical evaluation. This project therefore aims to fuse Bayesian uncertainty assessment with information-theoretic measures to bring (lack of) information in physics-based models and information in data to the same scale. Expected benefits are:
- testable predictions that help advance science through improved system understanding,
- guidance towards model improvement across the continuum of physics-based to data-driven models,
- increased public acceptance of (hybrid) predictions through mapping of model entities and dynamics to real-world physical compartments, states and processes.