Abstract
Many cancer genomes contain large numbers of somatic mutations, but few of these mutations drive tumor development. Current approaches to identify cancer driver genes are largely based on mutational recurrence, i.e. they search for genes with an increased number of nonsynonymous mutations relative to the local background mutation rate. Multiple studies have noted that the sensitivity of recurrence-based methods is limited in tumors with high background mutation rates, because passenger mutations dilute their statistical power. Here, we observe that passenger mutations tend to occur in characteristic nucleotide sequence contexts, while driver mutations follow a different distribution pattern determined by the location of functionally relevant genomic positions along the protein-coding sequence. To discover new cancer genes, we searched for genes with an excess of mutations in unusual nucleotide contexts that deviate from the characteristic context around passenger mutations. By applying this statistical framework to whole-exome sequencing data from 12,004 tumors, we discovered a long tail of novel candidate cancer genes with mutation frequencies as low as 1% and functional supporting evidence. Our results show that considering both the number and the nucleotide context around mutations helps identify novel cancer driver genes, particularly in tumors with high background mutation rates.