RT Journal Article SR Electronic T1 Profiling the genome-wide landscape of tandem repeat expansions JF bioRxiv FD Cold Spring Harbor Laboratory SP 361162 DO 10.1101/361162 A1 Nima Mousavi A1 Sharona Shleizer-Burko A1 Melissa Gymrek YR 2018 UL http://biorxiv.org/content/early/2018/07/03/361162.abstract AB Tandem Repeat (TR) expansions have been implicated in dozens of genetic diseases, including Huntington’s Disease, Fragile X Syndrome, and hereditary ataxias. Furthermore, TRs have recently been implicated in a range of complex traits, including gene expression and cancer risk. While the human genome harbors hundreds of thousands of TRs, analysis of TR expansions has been mainly limited to known pathogenic loci. A major challenge is that expanded repeats are beyond the read length of most next-generation sequencing (NGS) datasets. We present GangSTR, a novel algorithm for genome-wide profiling of both normal and expanded TRs. GangSTR extracts information from paired-end reads into a unified model to estimate maximum likelihood TR lengths. We validated GangSTR on real and simulated TR expansions and show that GangSTR outperforms alternative methods. We applied GangSTR to more than 150 individuals to profile the landscape of TR expansions in a healthy population and validated novel expansions using orthogonal technologies. Our analysis revealed that each individual harbors dozens of TR alleles longer than standard read lengths and identified hundreds of potentially mis-annotated TRs in the reference genome. GangSTR is packaged as a standalone tool that will likely enable discovery of novel pathogenic variants not currently accessible from NGS.