Abstract
Large-scale whole cancer-genome sequencing projects have led to the identification of a handful of cis-regulatory driver mutations in cancer genomes. However, recent studies have demonstrated that very large cancer cohorts will be required in order to identify low frequency non-coding drivers. To further this endeavour, in this study, we performed highdepth sequencing across 95 colorectal cancers and matched normal samples using a unique target capture sequencing (TCS) assay focusing on over 35 megabases of gene regulatory elements. We first assessed coverage and variant detection capability from our TCS data, and compared this with a sample that was additionally whole-genome sequenced (WGS). TCS enabled substantially deeper sequencing and thus we detected 51% more somatic single nucleotide variants (n = 2,457) and 144% more somatic insertions and deletions (n = 39) by TCS than WGS. Variants obtained from TCS data were suitable for somatic mutational signature detection, enabling us to define the signatures associated with germline deleterious variants in MSH6 and MUTYH in samples within our cohort. Finally, we surveyed regulatory mutations to find putative drivers by assessing variant recurrence and function, identifying some regulatory variants that may influence oncogenesis. Our study demonstrates TCS to be a sequencing-efficient alternative to traditional WGS, enabling improved coverage and variant detection when seeking to identify variants at specific loci among larger cohorts. Interestingly, we found no candidate variants that have a clear driver function, suggesting that regulatory drivers may be rare in a colorectal cancer cohort of this size.
Author Summary In recent years, some cancer research focus has turned towards the role of somatic mutations in the 98% of the genome that is non-coding. To investigate such mutations, we performed deep sequencing of regulatory regions and a selection of coding genes across 95 colorectal cancer and matched-normal samples. To determine the ability of our targeted deep sequencing methodology to accurately detect variants, we compared our results with those from a sample that was additionally whole-genome sequenced. We found target capture sequencing to enable greater sequencing depth, allowing the detection of 51% and 144% more somatic single nucleotide and insertion/deletion mutations, respectively. Our study here demonstrates target capture sequencing to be a useful approach for researchers seeking to identify variants at specific loci among larger cohorts. Our results also enabled the generation of mutational signatures, implicating deleterious germline single nucleotide variants in coding exons of MSH6 and MUTYH in samples within our cohort. Finally, we surveyed regulatory elements in search of somatic cancer driver mutations. We identified some regulatory variants that may influence oncogenesis, but found no candidate variants with clear driver function. These findings suggest that regulatory driver mutations may be rare in a colorectal cancer cohort of this size.
List of Abbreviations
- bp
- Base pairs
- BWA
- Burrows Wheeler Aligner
- ChIP-seq
- Chromatin immunoprecipitation sequencing
- COSMIC
- Catalogue of Somatic Mutations in Cancer
- DHS
- DNase I hypersensitivity
- DNase-seq
- DNase I hypersensitivity sequencing
- ENCODE
- Encyclopedia of DNA Elements
- GEO
- Gene Expression Omnibus
- IGV
- Integrative Genomics Viewer
- Indel
- Insertion and deletion
- lncRNA
- Long non-coding RNA
- mb
- megabase
- miRNA
- MicroRNA
- MSI
- Microsatellite instability
- MSS
- Microsatellite stable
- mtDNA
- mitochondrial DNA
- mTERF
- Mitochondrial transcription termination factor
- PCR
- Polymerase chain reaction
- POLE
- Polymerase epsilon
- RNA-seq
- Ribonucleic acid sequencing
- S.D.
- Standard deviation
- TCGA
- The Cancer Genome Atlas
- TCS
- Target capture sequencing
- VAF
- Variant allele frequency
- WGS
- Whole-genome sequencing
- WXS
- Whole exome sequencing