High energy physics

SD ID: WLCG / DPHEP

ORGANISATIONS: CERN

CONTACT: Jamie Shiers, CERN

Email: Jamie.Shiers(at)cern.ch

OVERVIEW:

Funding agencies today require (FAIR) Data Management Plans, explaining how data acquired or produced will be preserved for re-use, sharing and verification of results.

The preservation of data from CERN’s Large Hadron Collider poses significant challenges: not least in terms of scale. The purpose of this demonstrator is to show how existing, fully generic services can be combined to meet these needs in a manner that is discipline agnostic, i.e. can be used by others without modification.

Download DPHEP Data Preservation in High Energy Physics by John KENNEDY (MPCDF)

OBJECTIVE:

The high energy physics science demonstrator wants to deploy services that tackle the following functions:

  1. Trusted / certified digital repositories where data is referenced by a Persistent Identifier (PID);
  2. Scalable “digital library” services where documentation is referenced by a Digital Object Identifer (DOI);
  3. A versioning file system to capture and preserve the associated software and needed environment;
  4. A virtualised environment that allows the above to run in Cloud, Grid and many other environments.

TECHNICAL FOCUS:

The goal is to use non-discipline specific services combined in a simple and transparent manner (e.g. through PIDs) to build a system capable of storing and preserving Open Data at a scale of 100TB or more.

EOSCpilot SD - High Energy Physics

MAIN ACHIEVEMENTS

Some limited success was achieved with the individual services identified (Zenodo, CVMFS, a Trustworthy Digital Repository), but it was not possible to integrate them into an usable service.

RECOMMENDATIONS FOR THE IMPLEMENTATION

The EOSC Pilot integrates services from three well-established e-infrastructures, mentioned above. Equivalent services are used in production by the CERN Open Data Portal, which is available via anonymous access over the Internet worldwide.While it was possible to upload a documentation file into the EUDAT B2SHARE test instance and while software from the LHC experiments is stored in the RAL CVMFS instance, there have been significant delays in finding a site that could act as a TDR for this pilot.There were numerous misunderstandings regarding the scope, duration and scale of the demonstrator; no bulk upload of existing “Open Data” was achieved, anonymous access was not addressed, nor were the 3 services successfully integrated.


Public Attachment: