RT Journal Article SR Electronic T1 DNA damage is a major cause of sequencing errors, directly confounding variant identification JF bioRxiv FD Cold Spring Harbor Laboratory SP 070334 DO 10.1101/070334 A1 Lixin Chen A1 Pingfang Liu A1 Thomas C. Evans, Jr. A1 Laurence Ettwiller YR 2016 UL http://biorxiv.org/content/early/2016/08/19/070334.abstract AB Pervasive mutations in somatic cells generate a heterogeneous genomic population within an organism and may result in serious medical conditions. While cancer is the most studied disease associated with somatic variations, recent advances in single cell and ultra deep sequencing indicates that a number of phenotypes and pathologies are impacted by cell specific variants. Currently, the accurate identification of low allelic frequency somatic variants relies on a combination of deep sequencing coverage and multiple evidences. However, in this study we show that false positive variants can account for more than 70% of identified somatic variations, rendering conventional detection methods inadequate for accurate determination of low allelic variants. Interestingly, these false positive variants primarily originate from mutagenic DNA damage which directly confounds determination of genuine somatic mutations. Furthermore, we developed and validated a simple metric to measure mutagenic DNA damage and demonstrated that mutagenic DNA damage is the leading cause of sequencing errors in widely-used resources including the 1000 Genomes Project and The Cancer Genome Atlas.