Each repository requires a unique approach
There is no straightforward way of harvesting metadata into a data portal such as the European EUDAT B2FIND. Simply because every repository is different.
That was one of the key findings from the WP5.1 as they examined the re-use of community specific data in EOSC.
“The goal was actually to make a sort of cookbook or a how-to-do guide on how to harvest metadata into B2FIND, but that was not really possible. There will always be a need for adjustments because every repository is different,” says Hannah Mihai, Data Management consultant at DeiC in Denmark and part of WP5.
One of the use cases in EOSC-Nordic which the WP5 are responsible for is a collaboration between archaeologists in Denmark and Norway, who are going to integrate two different national databases to make it easier for scientists to do research across borders and thereby expand the research possibilities.
Harvesting from two databases
First Aarhus University in Denmark made 205.935 datasets of archaeological excavation data discoverable through the B2FIND portal. The archeologists at AU applied the OAI-PMH protocol to the Danish database “Fund og Fortidsminder.” Afterward, technical staff at Deutsche Klimarechenzentrum (DKRZ) harvested all the data and made it available through the B2FIND service. And just this month the Norwegian database “Askeladden” also successfully had 181.869 datasets harvested into B2FIND.
“Culture does not respect modern state boundaries in a globalized world. But also modern history, pre-modern history, and even in prehistory, people travelled, married, traded, followed migrating herds, or had conflicts with their neighbours. Thus new ideas, ways of doing things, materials and moveable artefacts spread from one place to the other – by land, by sea, by air, and recently: by digital technologies,” it states in the report from WP5.1.
When data from the two databases are integrated, researchers will only have to look in one place when doing research even across borders.
WP5.1 originally wished to make a “cookbook” or a how-to-guide for harvesting metadata into a portal on the basis of the experiences with the harvesting of the two national databases into B2FIND. The group did also end up producing one. But it should be seen more as a general guideline than a detailed step-by-step instruction, as all repositories are different and an individual approach is needed most times.
“It is necessary to develop a unique approach to each repository,” says Hannah Mihai.
Control your metadata
Another key finding in the work done by WP5.1 was that it was very important to have full control of one’s metadata. That means to follow the standards in the community so that the correct vocabulary, titles and terms are used.
“It is very important to have a good overview and control over your metadata, because the better you know how your metadata looks like and the better organized it is, the easier it is to get it harvested into one of these portals,” says Hannah Mihai.
Every portal is different and so is every repository or library where the metadata comes from. WP5.1 also concluded that even though it is always a good idea to follow the standards that are in your community, the problem is though, that oftentimes the communities have not decided upon any. So, to improve, the communities need to agree on a common vocabulary they want to use.
Another thing examined by the WP5.1 was how FAIR the metadata was when stored in the national repositories versus how FAIR they were after they were harvested. And the result was that the FAIR score of the national repository has a lot of room for improvement.
“There is a small but noticeable increase in the FAIR score of the metadata due to the harvesting from ‘Fund og Fortidsminder’ into B2FIND, so by harvesting the metadata into B2FIND, we are taking a step in the right direction,” explains Hannah Mihai.