Pan-Cancer Analyses & Cloud Computing within the EOSC

There are a multitude of challenges being faced in the life sciences, health, food, fishery and agriculture sectors. In cancer research, Europe has taken a technical lead within international consortia around cloud-based pan-cancer genomic analysis. This global competitive advantage can be maintained by leveraging open science analysis models around controlled access data sets developed in collaboration with researchers elsewhere in the world. These analysis frameworks could also be re-used to analyse cardiovascular and neuro-degenerative diseases as well as stimulating biotech/pharmaceutical industries to use public cancer genomic data in R&D.

CONTEXT:

The Pan-Cancer Analysis of Whole Genomes Project (PCAWG)

The pan-cancer analysis of whole genomes project (PCAWG) is analysing large cohorts of cancer genomes, and pursuing so-called pan-cancer studies to identify factors that may be involved in tumour formation and disease progression across multiple cancer types. PCAWG is currently analyzing >2800 cancer whole genomes, largely on academic and public clouds, and is also developing approaches for data integration with transcriptome & clinical data to address specific hypotheses.

The EOSCpilot science demonstrator aims to establish a portable cloud-based federated solution for collaborative cancer genomics and associated health data management, and an environment accessible to European scientists for analysis.

OBJECTIVES:

PanCancer aim is to develop interoperable IT frameworks to enable standardized sharing and large-scale processing of cancer genomes with other molecular and clinical data, to enable biological and translational breakthroughs. To do so, interoperable frameworks have been employed to process ~10,000 cancer Whole Genome Sequenced (WGS) tumour-normal pairs from 20most common cancer types. The main research objective is to uncover genome-wide patterns of different types of genetic variation, which requires availability of WGS data, and integrating these with molecular, demographic and clinical data. PanCancer will allow to proactively create suitable standards and interoperability.

MAIN ACHIEVEMENTS:

The Butler scientific workflow framework has been set up and tested at three globally distributed cloud computing environments that are based on the OpenStack platform, these include: the EMBL-EBI Embassy Cloud in the UK, the Cyfronet cloud in Poland, and the ComputeCanada cloud in Canada.

>400 high coverage whole genome samples (~60 TB of data) from the ICGC pediatric brain cancer cohort were downloaded to the ComputeCanada cloud and
~50 TB of public data from the 1000 Genomes project was loaded onto the Cyfronet environment from the EMBL/EBI data servers utilising Cyfronet’s Oneprovider software.
Butler was used to run a genomic alignment workflow (based on BWA and developed at The Sanger Institute) on >400 samples at ComputeCanada and >400 samples at EMBL/EBI Embassy cloud, with over 100 TB of data processed to date. Proper operation of the infrastructure was monitored by Butler’s detailed monitoring and self-healing capabilities.

IMPACT:

Engagement in the EOSC will enable Europe to maintain its current global competitive advantage in pan-cancer analyses & foster cloud-based solutions for genomic data analysis. Broader impact and the ability to reuse solutions in other areas (e.g. for cardiovascular & neuro-degenerative diseases) is also foreseen.

RECOMMENDATIONS FOR THE IMPLEMENTATION

Improve the availability of computational resources
Improve systems’ stability and storage capabilities
Introduce clear Service Level Agreements between service providers and service consumers to make clear the requirements asked by the consumers can be met

Public Attachment:

fliera4_pancancer_web.pdf

NEWS & PUBLICATIONS

05 July 2019

EOSCpilot maps key deliverables for use by EOSC Executive Board Working Groups

During the review meeting of the EOSCpilot project earlier this month, EOSCpilot Coordinator Juan
04 June 2019

EOSCpilot Rounds Up Key Contributions to the EOSC

As EOSCpilot officially ended last week, 31 May 2019, we take a look back at some of the key cont
23 May 2019

EOSC Data Interoperability Ensure Availability of Scientific Data

The overall objective of task 6.2 was to complement the FAIR principles by providing a strategy and a set of recommendations for the EOSC to improve the availability of research data to users and services through an open cloud infrastructure.
22 May 2019

EOSCpilot Delivers Final EOSC Architecture Recommendation

Earlier this month, EOSCpilot delivered its pioneering work on what the European Open Science Clo
21 May 2019

An EOSC Roadmap for Service Portfolio

Shaping the definition of the Servie Porfolio of EOSC has been a significant activity in the EOSCpilot project. Along with the identification of services, the roadmap also focused on the implementation of the EOSC Service Portfolio Management.

Search form

Pan-Cancer Analyses & Cloud Computing within the EOSC

You are here