Data Quality

Jun 07 2018
+1
+5
-1

Data quality is likely to be the most difficult element to standardize in any given set of rules of participation, considering that the usual standard of “fit for purpose” varies so much from use case to use case. There are two mechanisms to ensure appropriate data quality. The first derives from FAIR principles, as interoperable and reusable data implies that the data set has a given minimum amount of metadata. Defining an appropriate standard for metadata that can be efficiently defined by data depositors and implemented by repositories and index/search services, will be key to implementation of the EOSC. The second mechanism is that of peer-review and collective filtering, e.g. Yelp/TripAdvisor-type reviews provided by users. As data become more accessible, it may be useful to provide mechanisms in search systems/indices for users to provide reviews that could be used to supplement citation counts.




1 comment on "Data Quality"

  • jkh1
    +1
    +1
    -1

    FAIR principles need to be enforced more strictly. and repositories need to be actually usable. Currently, one can simply either not release data in part or at all and simply dump them anywhere on the web. A particular point of contention on data quality is the availability of information on proprietary reagents that are critical to re-use and re-interpretation of data sets. There should be a mechanism for making this information accessible to people re-analyzing public data sets or exclude from public repositories data sets that can't be re-used because of non-availability of critical information.