The landscape of human STR variation

  1. Yaniv Erlich1
  1. 1Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA;
  2. 2Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA;
  3. 3Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA;
  4. 4Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
  5. 5Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
  6. 6Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA;
  7. 7Gene by Gene, Ltd., Houston, Texas 77008, USA
  1. Corresponding author: yaniv{at}wi.mit.edu

Abstract

Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome’s representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.

Footnotes

  • [Supplemental material is available for this article.]

  • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.177774.114.

    Freely available online through the Genome Research Open Access option.

  • Received April 30, 2014.
  • Accepted August 15, 2014.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server