OSFAIR 2017 - FAIR metrics – Starring your data sets: A lightweight FAIR data assessment tool

03 Oct 2017

OSFAIR 2017 - FAIR metrics – Starring your data sets: A lightweight FAIR data assessment tool

The FAIR data principles – data should be findable, accessible, interoperable, and reusable – have rapidly gained a wide support, but to operationalise, measure and implement them appears to be no unambiguous task. Interestingly, the FAIR principles bear a great resemblance to the principles underlying the CoreTrustSeal (CTS), formery known as the Data Seal of Approval (DSA) for trustworthy data repositories: data can be found on the internet, data are accessible, data are in a usable format, data are reliable, and data can be referred to. Although the latter two CTS principles have no direct equivalent in FAIR, the two sets of principles complement each other very well: the CTS can be used to assess the quality of repositories, FAIR the fitness for use of datasets.

 

DANS is The Netherlands institute for permanent access to digital research resources. In its long-term research data repository it currently preserves more than 36.000 data sets. Complementary to several initiatives to make new data FAIR, DANS is interested in assessing the level of FAIRness of extant data: in its own holdings as well as in other data repositories.Therefore DANS has started working on a practical operationalisation of the FAIR principles.The approach taken is such that the F, A, I and R can be measured independently, meaning that a rating of a dataset under one principle is independent from ratings under one of the others. Actually, it appeared that the principles defined under R better fit or are already covered by F, A, and I. In the DANS approach therefore the level of Reusability follows from the scores on the other principles. Please refer to this blogpost for more details.

 

DANS has built an online tool prototype which guides the user through a set of questions – no more than five per F, A, and I – to assess a specific dataset. In the Open Science FAIR workshop more than thirty colleagues from universities, publishers and international projects have explored this so-called FAIRdat tool prototype themselves. Overall, the participants responded very positively to the tool: they found it easy to use and everything was explained (for each question the tool provides guidance). One participant even answered the question what was best with “that it exists”.

 

Understandably, there were also recommendations on how to improve the prototype. An important one concerns the target group or groups of such an assessment tool, because it requires some intimacy with the dataset under assessment. During the workshop the testing was carried out with datasets formerly unknown to the participants, and about a third of them reported that they didn’t feel really capable and qualified to assess the datasets. Envisioned target groups for FAIRdat are archivists or data managers of the repository where the dataset is preserved (9 votes from the 17 attendants who used the online feedback questionnaire), someone who used the data (8/17) and the person(s) who created the dataset (6/17). A valuable suggestion was also to provide different views and perhaps different routes through the tool for different roles.

 

A few attendants fed back on the length of the underlying documentation; maybe role-specific variants could be the way to go. Interestingly, we learned that the question about a “clear and accessible data usage licence” needs more guidance and probably discussion within the community: does a copyright marker count as a licence in this sense? It was also noted that some questions asked by the tool are hard to answer when one hasn’t downloaded the data: file formats, for instance, are a crucial aspect of interoperability of the data, but they can be “hidden” inside .zip files. Also it was recommended to consider whether for some question a scale could be better than the current Yes/No questions.

 

As in other settings where the FAIRdat tool is being tested, opinions differed on interpreting Reusability as the resultant of the F, A, and I. Several participants fed back that it makes sense, but others are unsure or against it. DANS invites everyone to submit data characteristics that really need to be covered under the R.

 

Finally, DANS envisions that FAIRdat produces FAIR badges that the repository can show together with the dataset, indicating the level of FAIRness of that dataset. The majority of attendees who fed back (11/14) considers such badges “Great, that helps me decide if I’m intersted in the data”. The main recommendation therefore is for DANS to proceed with this approach.

 

Authors: Peter Doorn, Elly Dijk and Marjan Grootveld, Data Archiving and Networked Services/DANS