PT - JOURNAL ARTICLE AU - Johanna Bertl AU - Qianyun Guo AU - Malene Juul Rasmussen AU - Søren Besenbacher AU - Morten Muhlig Nielsen AU - Henrik Hornshøj AU - Jakob Skou Pedersen AU - Asger Hobolth TI - A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data AID - 10.1101/122879 DP - 2017 Jan 01 TA - bioRxiv PG - 122879 4099 - http://biorxiv.org/content/early/2017/03/31/122879.short 4100 - http://biorxiv.org/content/early/2017/03/31/122879.full AB - Understanding and modelling the neutral mutational process in cancer cells is crucial for identifying the mutations that drive cancer development. The neutral mutational process is very complex: Whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients, and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. Instead, we model the probabilities of the different types of mutations for each position in the genome by multinomial logistic regression. We apply our site-specific model to a data set of 505 cancer genomes from 14 different cancer types. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms certain well-known mutational signatures. Our site-specific multinomial regression model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.