PT - JOURNAL ARTICLE AU - Louis J. Dijkstra AU - Johannes Köster AU - Tobias Marschall AU - Alexander Schönhuth TI - Enhancing sensitivity and controlling false discovery rate in somatic indel discovery using a latent variable model AID - 10.1101/121954 DP - 2017 Jan 01 TA - bioRxiv PG - 121954 4099 - http://biorxiv.org/content/early/2017/03/29/121954.short 4100 - http://biorxiv.org/content/early/2017/03/29/121954.full AB - Cancer is a genetic disorder in the first place. Therefore, next-generation sequencing (NGS) based discovery of somatically acquired genetic variants has gained widespread attention. Computational prediction of somatic variants, however, is affected by a variety of confounding factors. In addition to the uncertainties that one commonly encounters also in germline variation prediction, such as misplaced and/or inaccurate read alignments, cancer heterogeneity and impure samples significantly add to the issues. Overall, this hampers state-of-the-art indel discovery tools to discover somatic indels at operable performance rates, although they perform excellently when calling germline indels. While affecting all size ranges, both common and cancer-specific problems interfere in particularly unfavorable ways in the prediction of somatic midsize (30-150 bp) insertions and deletions.Here, we present a latent variable model that can take the major confounding factors and uncertainties into a unifying account. Using this modeling framework, we first demonstrate how to efficiently compute the probability for a (putative) indel to be somatic, thereby resolving a principled computational runtime bottleneck in Bayesian uncertainty quantification. Second, we show how to reliably estimate the allele frequencies for a given list of indels. Third, we also present an intuitive and effective way to control the false discovery rate, an issue in genetic variant discovery that has been found notoriously hard to deal with. As a tool that implements all methodology developed, we present PROSIC (PROcessing Somatic Indel Calls). PROSIC achieves significant improvements in particular in terms of recall when applied to deletion call sheets, as provided by prevalent state-of-the-art tools, in comparison to their integrated somatic indel calling routines.The software is publicly available at https://prosic.github.io and can be easily installed via https://bioconda.github.io.