FAIRification STEP 5 webinar: Value and limitations of FAIR assessment tools
On 8 February 2022, the EOSC-Nordic project team organized an exciting and successful FAIRification Step 5 webinar on the subject of FAIR Evaluators. This webinar was part of the planned activities under work package 4. It aimed to guide and assist the Nordic and Baltic repositories in increasing the “FAIRness” of their metadata and datasets. The event attracted about 120 data repository representatives across the region and beyond.
The webinar was the fifth in a series of multiple steps and focused on THE VALUE and LIMITATIONS OF FAIR EVALUATORS. While in earlier stages, the FAIRification team focused mainly on steps towards improving the published metadata of the repositories, this webinar Step 5 concentrated on the use of FAIR evaluators that have the ability to “score” the FAIRness of the metadata from a given repository by running several tests against the FAIR Principles. Based on these tests, the FAIR evaluators show a score and provide clear guidance to the repositories to improve FAIRness.
For proper evaluations, the repository’s metadata must be machine-actionable so that a machine agent can find, interpret and process the metadata found, for instance, on the landing page of the repository. The FAIR principles set out the guideline for FAIRness of data by indicating the relevance and importance of enriching datasets with clear machine-actionable metadata. For more info on all the 15 FAIR Principles, please visit the GO Fair Initiative web page on FAIR principles.
The FAIRification webinar Step 5 gave an interesting view on the value of these evaluators, showing several examples of how repositories could improve their FAIRness level. The webinar also clarified that multiple evaluators exist and do not necessarily give the same outcomes. It also became clear that testing against concepts like “community standards” is not easy. The challenge now is to work towards sharper criteria-setting and convergence in defining, articulating, and measuring the different FAIR components/metrics so that multiple evaluators will give more or less similar scores. The webinar also demonstrated that the a-priori use of community standards, templates, vocabularies, and ontologies, collectively defined as the “FAIR at the source” process, is seen as a much easier route than trying to work on the curation of existing (meta)data sets.
The webinar also demonstrated that organizing a METADATA for MACHINE WORKSHOP (M4M) and/or defining a FAIR IMPLEMENTATION PROFILE (FIP) for a community or domain could be an excellent exercise to define and publish the implementation choices and enabling resources of a particular community.
Regarding the FAIR assessment of repositories in the Nordic and Baltic countries, the project team has followed a step-by-step process to increase the FAIR uptake in the Nordics over the last two years. The team has hosted a series of events to further this goal.
April 2020 – First assessment hackathon – Initial exercise
November 2020 –Webinar Step 1 – Focus on PID
February 2021 – Webinar Step 2 – Focus on the split between Data and Metadata
April 2021 – Webinar Step 3 – Focus on Generic Metadata
October 2021 – Webinar Step 4 – Focus on Domain-Specific Metadata
February 2022 – Webinar Step 5 – Value and Limitations of FAIR evaluators
Summary of the webinar Step 5
The event featured several experts who gave the audience an overview of the subject from different angles, thereby articulating valuable conclusions, takeaways, and recommendations.
The following summary gives the main takeaways from each presentation. The presentations are in our Knowledge Hub; the links are in the material section of this article.
Introduction by Bert Meerman from GFF
- FAIR DATA is not the same as “OPEN and FREE DATA.”
- DATA needs to stay fully under the control of the DATA-OWNERS
- Communities play an important role in defining, publishing, and sharing metadata schemas and FAIR implementation choices.
- Machine readability / Machine actionability is crucial. The machine has to interpret and understand the meaning.
- Community Implementation choices – published in a FAIR Implementation Profile (FIP).
- Encourage Metadata for Machine workshops (M4 M’s) for communities to define the metadata schema and templates.
FAIR Principles, Interpretations, Implementation Considerations, Evaluation, Certification, and Convergence by Erik Schultes from GO FAIR
- FAIR Principles classification: Infrastructure / Technology vs. domain-driven community choices.
- Use “nanopublications” for machine-readability.
- The abundance of evaluators and inconsistency in scoring mechanisms leads to divergence. Convergence can be achieved by community consensus.
- RDA FAIR Maturity model and FAIR Implementation Profiles (FIP) as a start.
- Using the FIP Wizard to build and qualify the FIP for a community.
- Funders demanding “FAIR at the source,” driving convergence and offering domain-relevant maDMPs (machine-actionable Data Management Plans).
How testing guides improvement in FAIRness by Mark Wilkinson from the Technical University of Madrid
- When using automated evaluators, differences in FAIR interpretations and scoring mechanisms lead to different outcomes.
- “R” tests are community-centric, therefore, complex to perform, but recently we see more communities defining clear criteria that can be tested.
- Testing brings you to the “root cause” of the problem, giving the repository clues/guidance and creating room for improvements.
- FAIR evaluation is extremely useful in guiding the path to FAIRness.
- Plans exist for web services, link to FIP’s and building a connection to FAIRsharing API.
F-UJI FAIR Evaluator Capabilities and Limitations of FAIR assessments by Robert Huber from the University of Bremen/Pangaea
- F-UJI is an automated FAIR Data assessment tool.
- The F-UJI Tool performs a practical test against a specific metric derived from the FAIR principles.
- Differences in results derived by different versions of the F-UJI Tool are due to differences in metrics, tests, and software.
- There is a lack of standard samples and calibration procedures.
- Changing samples and/or test procedures may lead to inconsistent results.
- Connectivity and stability of third-party services (e.g., DataCite) are essential for reliable testing.
Real-world experience with evaluating repositories by Hannah Mihai from DeiC and Tuomas J. Alaterä from FSD
- Sample of about 100 repositories, with a sample size of 10 datasets each.
- Datacite metadata gives added FAIR value.
- Over time, a slight increase was recorded, but software version changes could distort results.
- Being in EOSC-Nordic has provided a good framework for internal FAIRification for FSD
- Key issues resulted in a low evaluation score: No Linked Data declared, No machine-readable metadata declared, No machine-readable license information declared.
- Focus on valid and rich metadata is highly recommended.
- Basic examples: Embedded JSON, Enriched Dublin Core, Typed Links, Signposting, Vocabularies, Ontologies.
- Do not focus on a 100% score.
CEDAR: Promoting FAIRness at the Sources by Mark Musen from Stanford University
- Systems to evaluate data FAIRness have had difficulty finding an audience.
FAIR principles depend on community standards that are not objectively computable.
- Metadata in public repositories is a mess!
- If we want to have FAIR data, we need good metadata. Good metadata needs Ontologies, Reporting guidelines, Technology, and Procedures.
- Don’t even try to measure FAIRness. Make data FAIR from the beginning!
- Online data will never be FAIR :
- Until we standardize metadata structure using common templates
- Until we can fill in those templates with controlled terms whenever possible
- Until we create technology that will make it easy for investigators to annotate their datasets in standardized, searchable ways
- Until we recognize the importance of creating FAIR data from the very beginning
Q& A Session
Josefine Nordling from CSC moderated the Q&A session. The following gives the highlights of the discussion. To see the entire Q&A session, please see the recording on our YouTube channel.
- Focus on FAIR assistance, rather than on FAIR assessment
- There should be different evaluators for different purposes/communities. Domains should select their domain-specific evaluators.
- A comparison of several FAIRness evaluators showed the lack of and comparability between them. After weighing all the individual tests and metrics equally, somewhat comparable results for individual datasets were found.
- It is “easier” to attain a high-level FAIRness for your dataset if you belong to a discipline/community with advanced repositories.
- The more an advanced FAIR enabling repository can provide, the easier it will be to achieve relatively high scores.
- High-quality metadata is necessary for FAIR (especially for I and R). This does not come “free” for anyone, but it is essential for the FAIR ecosystem. We (especially researchers) need tools that gather the metadata from the beginning of the data lifecycle without extra effort.
- There is a need for a “Citation Tracking System.”
- Try to integrate CEDAR with f.i. Dataverse.
- Reuse as much as possible EXISTING, AGREED STANDARDS, even though there is a tendency to create our own DDI dialects.
- Introduction, Bert Meerman
- FAIR Principles, Interpretations, Implementation Considerations, Evaluation, Certification, and Convergence, Erik Schultes
- How testing guides improvement in FAIRness, Mark Wilkinson
- F-UJI FAIR Evaluator Capabilities and Limitations of FAIR assessments, Robert Huber
- Real-world experience with evaluating repositories, Hannah Mihai and Tuomas J. Alaterä
- CEDAR: Promoting FAIRness at the Sources, Mark Musen
Further support measures and webinars on FAIR
We are more than happy to receive feedback and questions from the research community, so please reach out to us.
Author: Bert Meerman, Director GFF, EOSC-Nordic WP4 member and Task-leader WP 4.1.3. firstname.lastname@example.org