WP 5 wants to improve the entire data management process

Adil Hasan (UNINETT Sigma2), EOSC-Nordic WP5 Leader

Have you ever played the game 20 questions? Someone thinks of an object such as an animal, mineral or vegetable, and the others have to find out what the person is thinking of by asking up to 20 questions to which the person can only respond “yes” or “no.” Usually, you start by asking general questions such as “can you fly?” The questions become more and more specific as you narrow down the type until, hopefully, you correctly identify the object.

Practically all research proceeds in this manner. The early projects studying some phenomena ask some pretty general questions which produce fairly coarse data. New tools are developed or adopted that enable researchers in later projects to ask more specific questions that generate more detailed data. Early projects which produce little data can get by with ad-hoc or manual data management procedures, but as the complexity of the research increases and the amount of data grows, the task of managing the data becomes more significant. If left unaddressed, a majority of the researcher’s time would be spent managing their data, with little time left for actual research.

For research areas about to embark on more detailed investigations that will generate a significant increase in the amount of data to be studied, the task of data management can be bewildering. However, all is not lost. There are more mature areas of research that have tackled some of the data management problems faced by the younger research areas, and their needs have generated tools that could be used by less mature research areas. The problem is that data management solutions that make use of these tools are often tailored to the research area, making it difficult to see how it, or part of it, can be applied to a different area.

One of EOSC-Nordic’s aims is to see how the existing tools, some of which already exist as EOSC services, can be used to improve the data management process in the candidate research areas of biodiversity, climate and natural languages. In WP5, we hope to capture the process in the data management plan such that it is possible to drive the process from the plan. The data management plan is viewed as a description of how the data will be managed throughout the project. We are working on adopting the Research Data Alliance machine actionable data management plan (maDMP) schema (learn more here) to automatically enable provisioning of data management services, such as storage services. Along with other groups, we in WP5 are actively involved in the adoption of the schema, and we have contributed to updates of the schema. A highlight of the recent maDMP hackathon was the export of a plan from our pilot DMP tool, easyDMP, into the other tool DMPOnline. This demonstrates the independence (to some degree) of the plan from the tool. The final result of the working group will enable tools to evolve without worrying that the plans will no longer be readable.

In WP5 we are also investigating open standards such as the common workflow language (learn more here) to capture aspects of the data management process and incorporate them into the data management plan. By capturing enough information, it should be possible to replay any part of the data management process, which will have to facilitate the verifiability of the data and contribute to the reusability of the data, as the ‘R’ in the FAIR principles.

Our dream is to be able to drive the entire data management process through the data management plan. Our dream may not be entirely achievable within the lifetime of the EOSC-Nordic, as there is a lot of work to do, and we also rely on some aspects that are coming into focus in other projects. But we hope that our efforts within EOSC-Nordic make it easier for researchers to improve their data management processes, so that researchers can spend more time on research and less on data management.