Networking data islands
Exchanging, comparing, reusing – this is often difficult with research data. They are often scattered, hidden or not comparable. The National Research Data Infrastructure (NFDI) is to change this in future.
In science, nothing works without data. They are the raw material for theories and insights. However, not all scientists have equal access to the mountain of data growing by the day. This is because the data is distributed in many places – as if on isolated islands, which are out of reach for many. Access is often only available to the institutes or working groups that have collected the valuable raw materials and stored them on their islands.
This is not due to a lack of willingness to share and exchange, but mainly due to the lack of connections between the islands – the lack of pipelines for raw materials, so to speak. “For us researchers, this is a big disadvantage: it prevents us from comparing and merging data. This delays or prevents new findings,” says bioinformatician Prof. Björn Usadel, Director at the Institute of Bio- and Geosciences (IBG 4). Sometimes scientists even repeat the same or similar experiments because they cannot access the data from another island or do not know about it at all.
To counteract this, a network of data pipelines is not enough, however, because the collected data must not only be findable and accessible, but also comparable. That is exactly what is often the difficult part: “It starts with the details,” says Torsten Bronger from the Jülich Central Library. The research data manager supports scientists in their work with data: “A doctoral researcher wants to document her experiments. Should she create one file per attempt or several? What data does her measuring device provide? What information does she store? How does she name the files? Which file format does she choose? Does she store the data on a drive or in a database? Such things can be handled very differently by different institutes.” As a consequence, scientists from island A too often cannot do anything with the raw material from island B, let alone discover it in the first place.
The National Research Data Infrastructure, NFDI for short, aims to change all that. Researchers from a respective discipline intend to form a consortium to build up this infrastructure. In this way, the isolated islands within a discipline are to be networked into island groups – of plant research or engineering sciences, for instance. Since there are no blanket solutions for the acquisition, storage and provision of raw materials, the members of the respective island groups are to develop common standards. This includes, for example, uniform rules for the preparation of data, its quality, its protection, and rules for the metadata. “Among other things, metadata describe the conditions under which an experiment was carried out, such as the measurement duration, the prevailing temperature or the air pressure. Without this information, the actual measurement data can hardly be compared,” says Björn Usadel. He is co-speaker of DataPLANT, the NFDI consortium that aims to strengthen data sharing and collaboration in plant sciences.
In some disciplines, there are already approaches for such standardizations. These are to be integrated into the NFDI. In the end, data are expected to be available quickly and easily, with pipeline networks accessible worldwide and even linked across disciplines. That way, information from a wide range of topics can be taken into account regarding global challenges such as climate change. It is the wish of Torsten Bronger that “In the optimal case, the data will finally be so comprehensibly and completely organized, findable and accessible that even uses will become possible that one would not have dreamed of.”
FAIR data management
The “FAIR principles” are to be applied in the NFDI: data are to be handled in such a way that they are
- Interoperable and
Consortia with FZJ participation:
- NFDI4Ing – engineering and materials science
- DataPLANT – plant research
- DAPHNE4NFDI – neutron and photon users from different disciplines
- PUNCH4NFDI – particle, nuclear and astrophysics
- NFDI-MatWerk – materials science and engineering
- NFDI4Earth – Earth system sciences
- TEXT+ – text-based and language-based research data
- FAIRmat – condensed-matter physics and the chemical physics of solids
- NFDI4Microbiota – microbiology
On the recommendation of the German Council for Scientific Information Infrastructures (RfII), the Joint Science Conference of the German federal government and the state governments decided in 2018 to establish a National Research Data Infrastructure (NFDI). It will consist of up to 30 specialist consortia, the selection and review of which will be coordinated by the German Research Foundation (DFG). The federal and state governments will provide up to €90 million annually until 2028 for the development of the NFDI. After two of a total of three rounds of calls, 19 consortia have been selected, nine of them with Jülich participation. The federal government and the states founded the NFDI Association in October 2020 in order to coordinate the activities within the NFDI.
Photos: Forschungszentrum Jülich/Ralf-Uwe Limbach, Raimund Knauf, Illustration: SeitenPlan