SD ID: WLCG / DPHEP
CONTACT: Jamie Shiers, CERN
Funding agencies today require (FAIR) Data Management Plans, explaining how data acquired or produced will be preserved for re-use, sharing and verification of results.
The preservation of data from CERN’s Large Hadron Collider poses significant challenges: not least in terms of scale. The purpose of this demonstrator is to show how existing, fully generic services can be combined to meet these needs in a manner that is discipline agnostic, i.e. can be used by others without modification.
Download DPHEP Data Preservation in High Energy Physics by John KENNEDY (MPCDF)
The high energy physics science demonstrator wants to deploy services that tackle the following functions:
- Trusted / certified digital repositories where data is referenced by a Persistent Identifier (PID);
- Scalable “digital library” services where documentation is referenced by a Digital Object Identifer (DOI);
- A versioning file system to capture and preserve the associated software and needed environment;
- A virtualised environment that allows the above to run in Cloud, Grid and many other environments.
The goal is to use non-discipline specific services combined in a simple and transparent manner (e.g. through PIDs) to build a system capable of storing and preserving Open Data at a scale of 100TB or more.
Some limited success was achieved with the individual services identified (Zenodo, CVMFS, a Trustworthy Digital Repository), but it was not possible to integrate them into an usable service.
RECOMMENDATIONS FOR THE IMPLEMENTATION
The EOSC Pilot integrates services from three well-established e-infrastructures, mentioned above. Equivalent services are used in production by the CERN Open Data Portal, which is available via anonymous access over the Internet worldwide.While it was possible to upload a documentation file into the EUDAT B2SHARE test instance and while software from the LHC experiments is stored in the RAL CVMFS instance, there have been significant delays in finding a site that could act as a TDR for this pilot.There were numerous misunderstandings regarding the scope, duration and scale of the demonstrator; no bulk upload of existing “Open Data” was achieved, anonymous access was not addressed, nor were the 3 services successfully integrated.