Clinical validation in cancer patients


We have evaluated a list of 118 candidate genes based on the results from the PCAWG project and some previous CLL work, in a set of breast and colorectal cancer patients obtained from the Multi-Case Control Study (MCC-Spain) and the EPICOLON2 (FPGMX) cohorts.

Briefly, targeted resequencing of these genes was performed on 591 breast cancer, 843 colon cancer cases, and 1479 matched control samples. Fastq files for the targeted regions underwent alignment, duplicate-marking, recalibration, variant calling and annotation using EDiVa (see WP1).Rare variant association analysis was performed with our newly developed rare variant association pipeline, BATI, which includes includes a novel association test based on Integrated Nested Laplace approximation (INLA; Rue et al., 2009) to implement Bayesian inference on the generalised linear mixed model framework. Our tool runs, in addition to the new INLA test, standard tests such as Burden, SKAT-O and KBAC.The analysis was performed using the three standard approaches in addition to INLA. The most relevant findings are the replication of two candidate genes for joint breast and colon cancer risk.

We then selected a list of 34 genomic variants in 4 genes that were positively replicated to be evaluated in the clinical cancer risk cohorts of breast and colon cancer risk samples. These variants were included in a 3-plex SequenomMASSArray design for the genotyping and the assessment of their correlation with disease risk, where we could positively replicate the presence of loss-of-function and truncating variants in two of these genes to be associated with joint breast and colorectal cancer susceptibility.


In parallel, we have identified 676 CNVs within the complete cohort overlapping with one or more of the targeted genes. The most common CNVs affected CDC27, PMS2, SMAD4 andGREB1. Copy number variants in CDC27 had an unusual and recurrent coverage pattern, which is also atypical for CNVs. Further investigation in the literature led us to the conclusion that these events are retroduplications. Furthermore, enrichment analysis revealed no significant difference between cases and controls for breast or colon cancer, indicating that the retroduplications in CDC27 are not associated to cancer risk.

CNVs in PMS2 were previously associated with Lynch syndrome, which is related with breast [5] and colon [6] cancers. However, as described in [6], “Molecular testing of PMS2 is especially challenging because of the presence of the PMS2CL pseudogene, which has more than 98% sequence identity with the 3′ region of PMS2, including exons 9 and 11 to 15”. We did not find a significant association of CNVs in PMS2 with breast or colon cancer in our study and concluded that the large number of CNVs is likely explained by changes in the pseudogene, leading to deviations in read mapping efficiency.Similarly, we did not find a significant enrichment of the CNVs in SMAD4, NFKB2 or MYCBP2 in breast or colon as compared to healthy controls. In summary, none of the targeted genes showed a significant enrichment of germline CNVs in colon or breast cancer. This negative result mirrors the negative result for germline CNV association in the PCAGW cohort, were no significant association could be identified either.