SD ID: CryoEM workflows

Organisations & Contacts:

Carlos Oscar Sorzano-Sanchez

Jose-Maria Carazo

Gergely Sipos, EGI

Erik van den Bergh, EBI

OVERVIEW:  CryoEM will develop ways to share detailed information on cryo-EM image processing workflows, concentrating on those processes usually run at the Facility level. This work should increase reproducibility in Science. The idea is to write from Scipion a workflow file that fully describes the image processing steps so that they can be re.executed resulting in exactly the same results (making the data more FAIR). This file should go with the raw data as acquired by large facilities in Europe (like Diamond and ESRF synchrotrons) as well as smaller EM facilities like (Necen, SciLife Lab, or CNB-CSIC). We foresee that in some facilities, this image processing workflow is performed on the cloud so that the technology employed must allow for this possibility.

SCIENTIFIC OBJECTIVES OF THE DEMONSTRATOR: After one year of work, users of a representative subset of major CryoEM Facilities in Europe will leave the Facility not only with raw and preprocessed data, but also with a file linking the acquired data and the analysis workflows. This file:

  1. will contain detailed information enabling the reproduction of processing steps (assuming access to the original software),
  2. will be ready and accepted to be deposited in cryo-EM databases like EMDB and EMPIAR, and
  3. will be easy to browse and analyze over the Web.


By providing means to properly record data analysis workflows in cryoEM, we will certainly contribute to data reproducibility, data reuse and data interoperability. This Demonstrator will result in improved reproducibility of cryoEM structures, allowing a better interoperation with distributed data and analysis sources, addressingprovenance and increasing data reuse in a multidisciplinary context.

More generally, the Demonstrator will be an exemplar for workflows that involve the acquisition of complex experimental data and the application of sophisticated processing and modelling. Such workflows may be more complicated that typical use cases for e-infrastructure, but would provide real benefit to European scientists.