From the studies performed so far is that the identified loci explain only asmall proportion of the heritability of cancer, and that most of the individual alleles identified confer very small risks for cancer, translating in a total lack of clinical application.

Our hypothesis was that patients with common cancer types carry a convergence of rare variants in a reduced number of genes of cancer related pathways, like DNA damage repair, cell cycle control, apoptosis etc., and that susceptibility genes and pathways highly overlap between different cancer types. Thus, we proposed to identify the genes, pathways and protein interactions involved in cancer predisposition by evaluating the enrichment of rare germline variants across multiple cancer types. Amongst some of these variants we postulated that there are some less common variants in non-cancer patients that might be common in subjects with cancer. Moreover, we hypothesized that there are regions of the human genome that might contain some common and less common variants that have not been explored. Since the analysis that was to be performed in this project is unbiased with respect to the regions that will be explore, we should be able to obtain a complete set of variants that could give us a comprehensive picture of variation in cancer predisposition. Finally, owing to the large sample sizes available through PCAWG, we could investigate the link between germline cancer predisposition and sub-clonal structure and molecular readouts that can play a relevant role in the initial steps of cancer development.

Our proposal responded to the need of advancing the field of cancer genomics and promoted immediate uptake in clinical practice in several ways. First, we have defined new computational methods that permit extracting clinically relevant information from large-scale studies across cancer types (PCAWG project). In PanCanrisk we have further extended the goals of the germline analysis in the PCAWG the project to a very comprehensive analysis of genetic variants in the most complete datasets of WES generated in the context of the ICGC and TCGA investigations.

We have developed new bioinformatic approaches to improve the analysis of germline variation in cancer sequencing, a largely uncharted area with tremendous potential for early diagnosis and risk management.

Second, we have been pioneer linking different types of genetic variation data within cancer types with extensive molecular readouts and regulatory information of rare and private genomic variants.

Third, we have designed a multi-gene targeted sequencing (MGTS) panel coupled with a complete bioinformatics analysis platform tailored towards the use by clinicians.

Fourth we replicated in large cancer cohorts the associations detected in the PCAWG project, and perform clinical validation in patients that are tested for cancer risk.

Fifth, we applied functional genomics in human cell lines to validate the mutations identified, thereby providing valuable data for training and testing of computational methods.

Finally, we developed new computational approaches to exploits omics data for rapid biomarker assay development, enabling early diagnosis and monitoring of disease progression.