Abstract
Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian human diseases. Traditionally, pathogenic STR expansions could only be detected by single locus techniques, such as PCR and electrophoresis. The ability to genotype STRs directly from next-generation sequencing data has the potential to reduce both the time and cost to reaching diagnosis and to discover new causal STR loci. Most existing tools detect STR variation within the read length, and so are unable to detect the majority of pathogenic expansions.
Here we present STRetch, a new genome-wide method to detect pathogenic STR expansions and estimate their approximate size directly from short read sequencing. We show that STRetch can detect pathogenic STR expansions in short-read whole genome sequencing data. We apply STRetch to the analysis of 97 whole genomes to reveal variation at STR loci. Finally, we demonstrate the application of STRetch to solve cases of patients with undiagnosed disease, where STR expansions are a likely cause. A key advantage of STRetch over other tools is that it assesses expansions at all STR loci in the genome and so can be used to detect novel disease-causing STR loci.
STRetch is open source software, available from github.com/Oshlack/STRetch.