SD ID: Photon-Neutron Science Demonstrator ORGANISATIONS: DESY, EMBL, ESRF, EU-XFeL, ESS, ILL, STFC CONTACT: Volker Guelzow, DESY Email: volker.guelzow(at)desy.de |
OVERVIEW:
The Photon Neutron Data Science Demonstrator will leverage on the photon-neutron community to improve computing facilities by creating a virtual platform for all users.
Photons and Neutrons are widely used for research in many scientific fields and they require large Research Infrastructures (RI). Research at these RIs makes use of large area detectors, multi-channel detection, and high repetition of measurements. This leads to large quantities of data and raises the need to perform data analysis in an efficient manner. Thousands of users of the RIs propose, conduct and analyze data from scientific experiments in a wide range of application domains. Access is granted after a thorough peer-review of the scientific proposals. Often, these users’ groups are small teams of scientists coming from universities and research organizations using RIs in various locations in Europe according to the specific characteristics of the beamlines; in general, more than one analytical facility is needed for the same experiment. Critical issues are data storage, sustained access to the data and an efficient data analysis ecosystem.
Download Photon Neutron Presentation by Sune Rastad BAHN (European Spallation Source)
OBJECTIVES:
Exploiting a community of more than 35,000 unique users (in 2011), the science demonstrator aims to enable cloud based storage and compute solutions, foster standardized data formats and allow transparent and secure remote access to scientific data. We will focus for this demonstrator on a particular data analysis framework outlined in the diagram. The crystfel framework is increasingly used at various synchrotrons and FELs to analyze date from serial (femto-second) x-ray crystallography. The nature of these experiments make a cloud-based distributed pipeline particularly appealing, since the framework can fully exploit large computational resources with tunable demands. The framework is well documented and vast amount of data are readily and openly available.
TECHNICAL FOCUS:
- Exploit and improve the crystfel framework for distributed computing.
- Provide compatible data analysis software
- Allow transparent and secure remote access to data
- Standardize data formats NeXus/HDF5 and annotation of data
- Test and establish (if feasible) web-services for easy consumption and visualization of the data
- Exploit existing authentication and authorization solutions
- Allow long term preservation of data
- Promote data policies in laboratories
MAIN ACHIEVEMENTS
- Deploy and test of the software used by a large community in structural biology at Free Electron Lasers and Synchrotrons on a local OpenStack cloud platform and on local HPC clusters at DESY, Hamburg.
- Examination of the workflow to identify and establish community specific cloud services and gain insight into technical, organizational, legal issues and interoperability requirements.
- Several applications from the Photon and Neutron field have been containerized, deployed and partially profiled.
- Identification of some features, which would be helpful in a further deployment, like tools and service facilitating server-less partitioning of data analysis pipelines.
- Scale up the on-site OpenStack infrastructure due to successful proof of concept, which has greatly raised interest and visibility in the user community. This development on the hardware side is complemented by integrating new OpenStack modules into our cloud instance.
RECOMMENDATIONS FOR THE IMPLEMENTATION
A substantial part of the applications in the Photon/Neutron science domain is “free to use for academic purposes” but subject to restrictive licensing conditions. Providing services based on such applications requires to know that the user agreed to the license’s terms and is indeed an academic user. This can of course be controlled on a per-service basis, however, it would be more convenient, manageable and scalable to provide such attributes in a federated way integrated into the EOSC ecosystem.