Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci

View ORCID ProfileHarriet Dashnow, View ORCID ProfileBrent S. Pedersen, View ORCID ProfileLaurel Hiatt, View ORCID ProfileJoe Brown, View ORCID ProfileSarah J. Beecroft, Gianina Ravenscroft, Amy J. LaCroix, Phillipa Lamont, Richard H. Roxburgh, View ORCID ProfileMiriam J. Rodrigues, Mark Davis, View ORCID ProfileHeather C. Mefford, View ORCID ProfileNigel G. Laing, View ORCID ProfileAaron R. Quinlan
doi: https://doi.org/10.1101/2021.11.18.469113
Harriet Dashnow
1Department of Human Genetics, University of Utah, Salt Lake City, UT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Harriet Dashnow
Brent S. Pedersen
1Department of Human Genetics, University of Utah, Salt Lake City, UT
2Utrecht University Medical Center, Utrecht, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brent S. Pedersen
Laurel Hiatt
1Department of Human Genetics, University of Utah, Salt Lake City, UT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Laurel Hiatt
Joe Brown
1Department of Human Genetics, University of Utah, Salt Lake City, UT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joe Brown
Sarah J. Beecroft
3Pawsey Supercomputing Research Centre, Kensington, Western Australia, Australia
4Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, Western Australia, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sarah J. Beecroft
Gianina Ravenscroft
4Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, Western Australia, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amy J. LaCroix
5Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle WA 98195
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Phillipa Lamont
6Neurogenetic Unit, Royal Perth Hospital, Perth, WA, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard H. Roxburgh
7Neurology, Auckland City Hospital, Auckland, New Zealand and Centre for Brain Research, University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Miriam J. Rodrigues
7Neurology, Auckland City Hospital, Auckland, New Zealand and Centre for Brain Research, University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Miriam J. Rodrigues
Mark Davis
8Neurogenetic Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heather C. Mefford
5Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle WA 98195
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heather C. Mefford
Nigel G. Laing
4Harry Perkins Institute of Medical Research and Centre for Medical Research, University of Western Australia, Perth, Western Australia, Australia
8Neurogenetic Unit, Department of Diagnostic Genomics, PathWest Laboratory Medicine, Western Australian Department of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nigel G. Laing
Aaron R. Quinlan
1Department of Human Genetics, University of Utah, Salt Lake City, UT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Aaron R. Quinlan
  • For correspondence: aaronquinlan@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Expansions of short tandem repeats (STRs) cause dozens of rare Mendelian diseases. However, STR expansions, especially those arising from repeats not present in the reference genome, are challenging to detect from short-read sequencing data. Such “novel” STRs include new repeat units occurring at known STR loci, or entirely new STR loci where the sequence is absent from the reference genome. A primary cause of difficulty detecting STR expansions is that reads arising from STR expansions are frequently mismapped or unmapped. To address this challenge, we have developed STRling, a new STR detection algorithm that counts k-mers (short DNA sequences of length k) in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions. As a result, STRling can call expansions at both known and novel STR loci. STRling has a sensitivity of 83% for 14 known STR disease loci, including the novel STRs that cause CANVAS and DBQD2. It is the first method to resolve the position of novel STR expansions to base pair accuracy. Such accuracy is essential to interpreting the consequence of each expansion. STRling has an estimated 0.078 false discovery rate for known pathogenic loci in unaffected individuals and a 0.20 false discovery rate for genome-wide loci in unaffected individuals when using variants called from long-read data as truth. STRling is fast, scalable on cloud computing, open-source, and freely available at https://github.com/quinlan-lab/STRling.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/quinlan-lab/STRling

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted November 20, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci
Harriet Dashnow, Brent S. Pedersen, Laurel Hiatt, Joe Brown, Sarah J. Beecroft, Gianina Ravenscroft, Amy J. LaCroix, Phillipa Lamont, Richard H. Roxburgh, Miriam J. Rodrigues, Mark Davis, Heather C. Mefford, Nigel G. Laing, Aaron R. Quinlan
bioRxiv 2021.11.18.469113; doi: https://doi.org/10.1101/2021.11.18.469113
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci
Harriet Dashnow, Brent S. Pedersen, Laurel Hiatt, Joe Brown, Sarah J. Beecroft, Gianina Ravenscroft, Amy J. LaCroix, Phillipa Lamont, Richard H. Roxburgh, Miriam J. Rodrigues, Mark Davis, Heather C. Mefford, Nigel G. Laing, Aaron R. Quinlan
bioRxiv 2021.11.18.469113; doi: https://doi.org/10.1101/2021.11.18.469113

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4102)
  • Biochemistry (8806)
  • Bioengineering (6505)
  • Bioinformatics (23432)
  • Biophysics (11779)
  • Cancer Biology (9189)
  • Cell Biology (13304)
  • Clinical Trials (138)
  • Developmental Biology (7427)
  • Ecology (11397)
  • Epidemiology (2066)
  • Evolutionary Biology (15138)
  • Genetics (10427)
  • Genomics (14032)
  • Immunology (9163)
  • Microbiology (22139)
  • Molecular Biology (8802)
  • Neuroscience (47513)
  • Paleontology (350)
  • Pathology (1427)
  • Pharmacology and Toxicology (2488)
  • Physiology (3727)
  • Plant Biology (8076)
  • Scientific Communication and Education (1436)
  • Synthetic Biology (2220)
  • Systems Biology (6031)
  • Zoology (1252)