Bridging Data-Poor and Data-Rich Regimes

Focus Challenge 3

Data-rich regimes denote regions in space or time where massive data, whether experimental, simulated, or sensory, are readily available. Data-poor regimes are characterized by scarce data, either due to experimental limitations or lack of sufficiently accurate theoretical methods for generating precise simulation data.

In our Engineered Geosystems vision, for example, data-rich regimes can be either small (e.g., tomographic images of rock samples with micrometer resolution) or large (e.g., seismic data of geophysical exploitations on the kilometer scale). However, predicting the safety of energy storage in subsurface systems, for instance, requires knowledge of the intermediate scale where the rock formation is heterogeneous and where data are scarce.

The situation is analogous with our Digital Human Model, where we have a large amount of data on the cellular and organ system level, but frequently little data on the intermediate scale of the neuromuscular system.

With respect to our Next-Generation Virtual Materials Design vision, it is difficult to predict desired macroscopic material properties, even though we have an enormous amount of experimental data available on the atomic
and molecular constituents.

For all these applications, the challenges are the same, namely:

  1. constructing and testing appropriate models that make use of the abundant data in data-rich regimes;
  2. developing or identifying appropriate scale-bridging and homogenization techniques that supply reliable forecasts on the data-poor scales;
  3. bridging data-rich and data-poor regimes with suitable numerical models that extrapolate and self-adapt from available data-rich regimes to yield reliable predictions for the relevant data-poor ones;
  4. developing model reduction techniques to assimilate data on multiple scales, to guide the generation of better data in data-poor regimes, adapt models on the fly, and obtain real-time predictions; and
  5. gauging our success via measures that are suited for quantifying uncertainty.

These challenges thus require that we generate appropriate simulation methods and machine learning tools which must be capable of intelligently utilizing massive data on specified scales to produce reliable predictions for data-poor regions in space and time.

To the top of the page