TY - JOUR T1 - Chromosome-wide characterization of Y-STR mutation rates using ultra-deep genealogies JF - bioRxiv DO - 10.1101/036590 SP - 036590 AU - Thomas Willems AU - Melissa Gymrek AU - G. David Poznik AU - Chris Tyler-Smith AU - The 1000 Genomes Project Y-Chromosome Working Group AU - Yaniv Erlich Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/01/15/036590.abstract N2 - Although the utility of short tandem repeats on the Y-chromosome (Y-STRs) has long been recognized and leveraged in forensics, genealogy and paternity testing, the bulk of these applications have relied on only a few dozen loci identified as having remarkably high mutation rates. Recent efforts have expanded the set of Y-STRs with known mutation rates to two hundred markers, but the limited throughput of the capillary method for estimating mutation rates has left the mutability of most Y-STRs uncharacterized, particularly those with dinucleotide repeat units. To address this limitation, we developed a novel method capable of concurrently estimating the mutation rates of all Y-STRs by leveraging population-scale whole-genome sequencing data. Extensive simulations confirmed that our method robustly accounts for PCR stutter artifacts and obtains unbiased mutation rate estimates. Application of the method to orthogonal datasets from the 1000 Genomes Project and Simons Genome Diversity Project utilized evolutionary data from over 250,000 meioses to estimate the mutation rates of more than 700 Y-STRs with 2–6 base pair repeat units, yielding the largest such set to date. Comparison of these estimates with those from father-son studies indicated a high degree of concordance for loci that have been previously characterized. In addition, we identified nearly 100 previously uncharacterized Y-STRs with pergeneration mutation rates greater than 1 in 3000. Altogether, our study provides a broadly applicable method for estimating Y-STR mutation rates from whole-genome sequencing cohorts, outlines a framework for imputing Y-STRs, vastly expands the number of identified loci with high discriminative power and provides the first chromosome-wide characterization of the mutation rates of dinucleotide short tandem repeats. ER -