TY - JOUR T1 - Unexpected Properties of Short Genomic Tandem Repeats JF - bioRxiv DO - 10.1101/165308 SP - 165308 AU - Irina Glotova AU - Michael Molla AU - Arthur L. Delcher AU - Simon Kasif Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/18/165308.abstract N2 - Length polymorphisms in genomic short tandem repeats have been implicated in a variety of diseases, most notably human neurodegenerative disorders. Expansions of tandem repeats are also associated with genomic instability in cancer. Our previous study of length-3 tandem repeats uncovered a surprising pattern in the length distribution of certain such repeats in the non-coding regions of the human reference genome: a bias towards repeats of length 3n - 1, (n > 3). That is, the observed frequency of repeats of this length in the human genome is higher than expected by chance based on the frequency of shorter repeats.We have hypothesized that this pattern may be a general property of genomic DNA. If true, this could have implications with regard to the dynamics of repeat expansion generally. To test this hypothesis, we have analyzed the genomic sequences of a broad range of eukaryotic organisms as well as several complete human genomes and obtained a number of thought provoking results. We establish that this unexpected elevation in frequency of 3n - 1 long repeats is statistically significant. We also expanded this analysis to different classes of genomic regions and tandem repeats of length four and five. The specific pattern was found in 13 of the 20 organisms analyzed, including all chordate and insect genomes tested. The bias pattern, however, was not confined to a single branch of the evolutionary tree. For some genomes, such as Drosophila melanogaster, the repeat bias surprisingly was also identified in exons. The pattern is present in both small and large genomes. A similar pattern was also found in tetranucleotide and pentanucleotide repeats in the human genome. Another surprising property was identified for the flanking GC content for triplet repeats of length 3n. These findings indicate a puzzling new genomic phenomenon with possible evolutionary and disease-related implications. ER -