SD ID: CryoEM workflows
Organisations & Contacts:
- Carlos Oscar Sorzano-Sanchez
- Jose-Maria Carazo
- Gergely Sipos, EGI
- Erik van den Bergh, EBI
OVERVIEW: CryoEM will develop ways to share detailed information on cryo-EM image processing workflows, concentrating on those processes usually run at the Facility level. This work should increase reproducibility in Science. The idea is to write from Scipion a workflow file that fully describes the image processing steps so that they can be re.executed resulting in exactly the same results (making the data more FAIR). This file should go with the raw data as acquired by large facilities in Europe (like Diamond and ESRF synchrotrons) as well as smaller EM facilities like (Necen, SciLife Lab, or CNB-CSIC). We foresee that in some facilities, this image processing workflow is performed on the cloud so that the technology employed must allow for this possibility.
SCIENTIFIC OBJECTIVES OF THE DEMONSTRATOR: Enable users of a representative subset of major CryoEM Facilities in Europe to bring back raw and preprocessed data, and a file linking to the acquired data and the analysis workflows.»The file will contain detailed information enabling the reproducibility of processing steps, be ready and accepted to be deposited in CryoEM major databases, and be easy to browse and analyze over the Web. After one year of work, users of a representative subset of major CryoEM Facilities in Europe will leave the Facility not only with raw and preprocessed data, but also with a file linking the acquired data and the analysis workflows. This file:
- will contain detailed information enabling the reproduction of processing steps (assuming access to the original software),
- will be ready and accepted to be deposited in cryo-EM databases like EMDB and EMPIAR, and
- will be easy to browse and analyze over the Web.
By providing means to properly record data analysis workflows in cryoEM, we will certainly contribute to data reproducibility, data reuse and data interoperability. This Demonstrator will result in improved reproducibility of cryoEM structures, allowing a better interoperation with distributed data and analysis sources, addressingprovenance and increasing data reuse in a multidisciplinary context.
More generally, the Demonstrator will be an exemplar for workflows that involve the acquisition of complex experimental data and the application of sophisticated processing and modelling. Such workflows may be more complicated that typical use cases for e-infrastructure, but would provide real benefit to European scientists.
RECOMMENDATIONS FOR THE IMPLEMENTATION
- Create a public repository of acquisition metadata and image processing workflows for new acquisitions, as a temporary repository until the data is finally analyzed and deposited in the standard public databases (EMDB and EMPIAR).
- Create an authentication policy such that biologists coming out from an EM facility could continue the image processing in some of the EOSC cloud machines.