PT - JOURNAL ARTICLE AU - Hussein Al-Asadi AU - Kushal K Dey AU - John Novembre AU - Matthew Stephens TI - Inference and visualization of DNA damage patterns using a grade of membership model AID - 10.1101/327684 DP - 2018 Jan 01 TA - bioRxiv PG - 327684 4099 - http://biorxiv.org/content/early/2018/05/21/327684.short 4100 - http://biorxiv.org/content/early/2018/05/21/327684.full AB - Quality control plays a major role in the analysis of ancient DNA (aDNA). One key step in this quality control is assessment of DNA damage: aDNA contains unique signatures of DNA damage that distinguish it from modern DNA, and so analyses of damage patterns can help confirm that DNA sequences obtained are from endogenous aDNA rather than from modern contamination. Predominant signatures of DNA damage include a high frequency of cytosine to thymine substitutions (C-to-T) at the ends of fragments, and elevated rates of purines (A & G) before the 5’ strand-breaks. Existing QC procedures help assess damage by simply plotting for each sample, the C-to-T mismatch rate along the read and the composition of bases before the 5’ strand-breaks. Here we present a more flexible and comprehensive model-based approach to infer and visualize damage patterns in aDNA, implemented in an R package aRchaic. This approach is based on a “grade of membership” model (also known as “admixture” or “topic” model) in which each sample has an estimated grade of membership in each of K damage profiles that are estimated from the data. We illustrate aRchaic on data from several aDNA studies and modern individuals from 1000 Genomes Project Consortium (2012). Here, aRchaic clearly distinguishes modern from ancient samples irrespective of DNA extraction, lab and sequencing protocols. Additionally, through an in-silico contamination experiment, we show that the aRchaic grades of membership reflect relative levels of exogenous modern contamination. Together, the outputs of aRchaic provide a concise visual summary of DNA damage patterns, as well as other processes generating mismatches in the data. Availability: aRchaic is available for download from https://www.github.com/kkdey/aRchaic.Contact: halasadi{at}uchicago.edu, kkdey{at}uchicago.edu