Setting New Standards in Research Data Management for Simulation Science

June 14, 2024

SimTech has developed new solutions and tools to significantly enhance research data management (RDM) in simulation science. These innovations aim to improve the management, accessibility, and reproducibility of research data, greatly facilitating scientific collaboration and discovery.

Effective RDM is crucial in today’s scientific landscape. As research becomes increasingly data-intensive, it is essential to manage data so that it is findable, accessible, interoperable, and reproducible (FAIR). Adhering to these principles ensures that data can be reliably used and shared across different research disciplines, accelerating scientific advancements and fostering collaborations. Proper RDM practices maintain the quality and integrity of scientific research.

The publication titled "Research Data Management in Simulation Science: Infrastructure, Tools, and Applications" by Bernd Flemisch, Sibylle Hermann, Melanie Herschel, Dirk Pflüger, Jürgen Pleiss, Jan Range, Sarbani Roy, Makoto Takamoto, and Benjamin Uekermann provides a comprehensive overview of the RDM infrastructure developed by SimTech. Key highlights include the establishment of the central data repository DaRUS, built on the open-source software Dataverse, which supports various metadata schemes and facilitates the publication of datasets with proper annotations and quality control.

A significant part of SimTech's RDM strategy is the development of specialized tools to streamline data management processes. EasyDataverse is a Python library that simplifies the creation and management of metadata reports, making it easier for researchers to upload and update datasets in Dataverse. The Harvester-Curator tool automates the extraction and curation of metadata from research data, significantly reducing the effort required for metadata entry and enhancing data accessibility. EasyReview is a web-based tool designed to facilitate the review and quality control of datasets, ensuring that high-quality data and metadata are published.

These tools are integrated into a workflow that begins with locally produced datasets, which are then annotated with metadata and uploaded to a data repository. Before publication, datasets undergo a review process to ensure they meet quality standards. This systematic approach not only simplifies the management of research data but also ensures that the data is reliable and reusable.

SimTech’s efforts are exemplified in various domain applications. The EnzymeML Platform, for instance, integrates biocatalytic data with tools for data acquisition, analysis, and modeling, making data management scalable and reproducible. Another example is the PDEBench repository, which provides benchmarks for scientific machine learning based on partial differential equations, using tools like EasyDataverse and DVUploader to manage large datasets efficiently.

For SimTech, efficient RDM is particularly vital due to the interdisciplinary nature of simulation science. The complexity of integrating data from various domains—ranging from biology and chemistry to engineering and physics—necessitates robust data management solutions. SimTech's dedicated RDM infrastructure and tools not only support their researchers in adhering to FAIR principles but also enhance the reproducibility and reliability of their computational results. SimTech's efforts in developing and implementing these RDM solutions not only improve the efficiency and reproducibility of their own research but also set a benchmark for other research institutions aiming to enhance their RDM capabilities.

See publication here

To the top of the page