Abstract
Tandem repeat (TR) variation is associated with gene expression changes and over 50 rare monogenic diseases. Recent advances in sequencing have enabled accurate, long reads that can characterize the full-length sequence and methylation profile of TRs. However, despite these advances in sequencing technology, computational methods to fully profile tandem repeats across the genome do not exist. To address this gap, we introduce tools for tandem repeat genotyping (TRGT), visualization and an accompanying TR database. TRGT accurately resolves the length and sequence composition of TR regions in the human genome. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 99.56%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all repeat expansions while also identifying methylation signals, mosaicism, and providing finer resolution of repeat length. Additionally, we release a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.
Competing Interest Statement
Egor Dolzhenko, Guilherme De Sena Brandine, Tom Mokveld, William J. Rowell, Caitlin Karniski, Zev Kronenberg, Aaron Wenger, Michael A Eberle are employees and shareholders of Pacific Biosciences. Fritz J. Sedlazeck received research support from Illumina, Pacific Biosciences, Nanopore, and Genentech.