Life Science datasets

SD ID: Leveraging EOSC to offload updating and standardizing life sciences datasets and to improve studies reproducibility, reusability and interoperability

Organisations & Contacts:

Jordi Rambla, Centre for Genomic Regulation (CRG)

Cedric Notredame, Centre for Genomic Regulation (CRG)

Erik van den Bergh, EBI

Matthew Viljoen, EGI

OVERVIEW: This demonstrator will leverage EOSC resources to refresh datasets from uploaded to the EGA using newly available or updated reference data. Doing this, the new dataset will also be made available in a FAIR manner, adding metadata according to the attributes that have been chosen to contribute the strongest to the FAIR principles. Pipelines and security mechanisms will be developed as part of this demonstrator to automate this process.

SCIENTIFIC OBJECTIVES OF THE DEMONSTRATOR:

  • A set of results data has been reproduced using a portable version of the pipeline.
  • The same result set has been updated by re-analyzing it with a current version.
  • FAIRfied metadata on both result sets is available at a testing EGA server and/or at an appropriate repository

IMPACT: This pilot will have a pragmatic impact by demonstrating how to make analyses portable (tools and workflows), how to increase findability, by using persistent identifiers, how to leverage security technologies for sensible data, how to deploy the workflow into a cloud and how to make data FAIR. It will also have a long term impact by increasing the usability of EGA hosted data by assuring to potential users that up-to-date versions of an assured quality are available to download.

The success of the project will be monitored using well defined user cases and insuring their reproducibility across sites and platforms. This monitoring will occur through space (i.e. across sites) and time (i.e. reproduction and updating of existing results).
The potential scientific, and socio-economical impact is extremely significant at a time when insilico analysis are being routinely deployed in a medical context with this approach expected to dominate the so called precision medicine in the next decade.