RT Journal Article SR Electronic T1 STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.11.18.469113 DO 10.1101/2021.11.18.469113 A1 Harriet Dashnow A1 Brent S. Pedersen A1 Laurel Hiatt A1 Joe Brown A1 Sarah J. Beecroft A1 Gianina Ravenscroft A1 Amy J. LaCroix A1 Phillipa Lamont A1 Richard H. Roxburgh A1 Miriam J. Rodrigues A1 Mark Davis A1 Heather C. Mefford A1 Nigel G. Laing A1 Aaron R. Quinlan YR 2021 UL http://biorxiv.org/content/early/2021/11/20/2021.11.18.469113.abstract AB Expansions of short tandem repeats (STRs) cause dozens of rare Mendelian diseases. However, STR expansions, especially those arising from repeats not present in the reference genome, are challenging to detect from short-read sequencing data. Such “novel” STRs include new repeat units occurring at known STR loci, or entirely new STR loci where the sequence is absent from the reference genome. A primary cause of difficulty detecting STR expansions is that reads arising from STR expansions are frequently mismapped or unmapped. To address this challenge, we have developed STRling, a new STR detection algorithm that counts k-mers (short DNA sequences of length k) in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions. As a result, STRling can call expansions at both known and novel STR loci. STRling has a sensitivity of 83% for 14 known STR disease loci, including the novel STRs that cause CANVAS and DBQD2. It is the first method to resolve the position of novel STR expansions to base pair accuracy. Such accuracy is essential to interpreting the consequence of each expansion. STRling has an estimated 0.078 false discovery rate for known pathogenic loci in unaffected individuals and a 0.20 false discovery rate for genome-wide loci in unaffected individuals when using variants called from long-read data as truth. STRling is fast, scalable on cloud computing, open-source, and freely available at https://github.com/quinlan-lab/STRling.Competing Interest StatementThe authors have declared no competing interest.