RT Journal Article SR Electronic T1 Haplotype Diversity and Sequence Heterogeneity of Human Telomeres JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.01.31.929307 DO 10.1101/2020.01.31.929307 A1 Kirill Grigorev A1 Jonathan Foox A1 Daniela Bezdan A1 Daniel Butler A1 Jared J. Luxton A1 Jake Reed A1 Cem Meydan A1 Susan M. Bailey A1 Christopher E. Mason YR 2020 UL http://biorxiv.org/content/early/2020/01/31/2020.01.31.929307.abstract AB Telomeres are regions of repetitive nucleotide sequences capping the ends of eukaryotic chromosomes that protect against deterioration, whose lengths can be correlated with age and disease risk factors. Given their length and repetitive nature, telomeric regions are not easily reconstructed from short read sequencing, making telomere sequence resolution a very costly and generally intractable problem. Recently, long-read sequencing, with read lengths measuring in hundreds of Kbp, has made it possible to routinely read into telomeric regions and inspect their structure. Here, we describe a framework for extracting telomeric reads from single-molecule sequencing experiments, describing their sequence variation and motifs, and for haplotype inference. We find that long telomeric stretches can be accurately captured with long-read sequencing, observe extensive sequence heterogeneity of human telomeres, discover and localize non-canonical motifs (both previously reported as well as novel), and report the first motif composition maps of human telomeric diplotypes on a multi-Kbp scale.