As a result of the EXC 2075 cluster’s focus on data-integrated simulation science, research data management (RDM) is becoming increasingly important. With this in mind, a committee has been set up, the Research Data Management Committee, bringing together the expertise of apl. Prof. Dr. Jürgen Pleiss, Prof. Dr. Melanie Herschel and apl. Prof. Dr. Bernd Flemisch so as to cover both sides of the topic, i.e.: Where does the data come from and how can it be used in the various research communities?
Why is research data management actually necessary?
For research data management to ensure the possibility of re-use, it requires databases that contain a description of the experiment or simulation conditions under which the data was generated (metadata) as well as the actual research findings. In addition, standardized data exchange formats are necessary for sharing data and metadata between research groups. But how which (meta)data is collected is a question that concerns all scientific fields, not just the natural and engineering sciences. “The challenge facing us is that scientific data has to be reproducible when published. However, key data to ensuring reproducibility is often missing or not adequately documented. It’s not a purely technical problem – it’s a scientific one too. So the question we’re looking at on the committee is how data can be documented and published in a way that ensures complete reproducibility,” says Jürgen Pleiss, explaining the committee’s goals and tasks.
RDM measures being taken on various levels
The comprehensive nature of RDM calls for measures on various levels. At the university level, the Competence Center for Research Data Management (FoKUS) provides advice on research data management as well as running the University of Stuttgart Data Repository (DaRUS). At federal-state level, the “bwDataFederation” is establishing a data infrastructure, and at the national level SimTech is an active participant in the DFG’s NFDI (National Research Data Infrastructure) program. In addition, as we reported recently, SimTech has created two positions to provide support in this area. Sibylle Hermann, the new Data and Software Steward, responsible for designing SimTech’s RDM strategy, and Research Software Engineer Ralf Diestelkämper have joined the cluster to assist with tool implementation. Finally, a committee has also been set up to deal with RDM issues.
Special Interest Group on Data Infrastructure (SIGDIUS) created
The committee deals specifically with RDM issues that arise in the field of simulation science. In essence, it examines how data from simulations can be captured, stored and made accessible. The committee’s work has led to the formation of a special interest group on data infrastructure (known by the acronym SIGDIUS). The group is intended as a forum for anyone looking to set up or enhance an RDM infrastructure at working-group or institutional level. The framework for the forum was created by the monthly SIGDIUS seminars, which have been running since April 2019. They take place on the first Wednesday of the month and are an opportunity to learn more from in-house and external experts but also to discuss experiences with others. The focus is thus very much on dialogue and networking, looking at questions such as, “What are the needs of the different disciplines?”, “What’s happening elsewhere?”, “What problems are there?” and “How can we identify needs and implement tools already used in other disciplines?”
Speakers at past seminars have included Wolfgang Wachter from the DFG, who gave a presentation on “Research data infrastructure in physics and chemistry”, and Frank Tristram from the Karlsruhe Institute of Technology (KIT) on “Bringing excellent data practices into a Cluster of Excellence: DFG rules meet software development”.
Variety of perspectives to be taken into account
The titles of these presentations are another clear indication of the need for RDM to incorporate a wide variety of perspectives – from infrastructure through to the definition of general standards. In the measures described above, consideration is now being given to how these standards need to be defined – particularly in terms of discipline-specific requirements. As a result, development of methods, standards and tools for RDM is currently very dynamic and there is a high degree of fragmentation between the various disciplines’ processes, standards and tools for data capture. This can be seen, for example, in tools that are key to systematic capture of experimental data: Electronic Laboratory Notebooks (ELNs) and Laboratory Information Management Systems (LIMS).
“There are currently more than 300 different LIMS/ELN products on the market. They attempt to meet the requirements of a wide range of users as well as offering very tailored solutions for specific disciplines. That makes it difficult to choose the right RDM strategy for the case in hand,” says Jürgen Pleiss, highlighting the current problems in the RDM field.
So there’s still a lot to do. SIGDIUS and the committee are doing their part and there will be more seminars on various topics in 2020 as well. A presentation by Ulrike Wittig from HITS in Heidelberg, on “Research data management by SEEK/FAIRDOMHub”, will start the ball rolling in February and we will keep you posted about further events on our website.
February 05, 2020, Ulrike Wittig (HITS, Heidelberg): “Research data management by SEEK/FAIRDOMHub”