In a face-to-face workshop at the end of July, SimTech's Research Data Software Engineer Jan Range and Fabian Zills from the Institute for Computational Physics introduced participants to Data Version Control and ZnTrack.
Data Version Control - DVC for short - is a research data management tool that can be used to set up workflows and reproduce these from any past stage. What sets it apart from other tools is not only that it captures system-relevant parameters, source codes and results, but also that it can be seamlessly integrated into a Git environment. This allows users to easily reproduce results at any point in time. Thanks to Data Version Control and its graphical user interface Iterative Studio, simulations and machine learning can be evaluated in a single place. The data collected in this way can also be published seamlessly on DaRUS thanks to PyDaRUS.
In addition to a general introduction to the topic, the focus was on the practical implementation of a Machine Learning project with the help of Data Version Control and the support in setting up own DVC workflows. In the workshop, Fabian Zills also introduced the participants to the "ZnTrack" library he developed - a Python interface that facilitates the use of Data Version Control. ZnTrack therefore is an easy-to-use package for tracking parameters and creating computational graphs for Python projects.
"The workshop went great. The participants were able to easily find their way into the concept, especially through practical applications. In the final discussion, we talked about how the concept can be applied to any problems in SimTech," says Jan Range.