Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

  1. Ines Hellmann1,6,
  2. Yuan Mang2,
  3. Zhiping Gu3,
  4. Peter Li3,
  5. Francisco M. de la Vega4,
  6. Andrew G. Clark5, and
  7. Rasmus Nielsen1
  1. 1 Departments of Integrative Biology and Statistics, University of California, Berkeley, California 94720, USA;
  2. 2 Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and Molecular Medicine, University of Copenhagen, 2200 Copenhagen, Denmark;
  3. 3 Bioinformatics R&D, Applied Biosystems, Rockville, Maryland 20850, USA;
  4. 4 Computational Genetics, Applied Biosystems, Foster City, California 94404, USA;
  5. 5 Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA

Abstract

We introduce a simple, broadly applicable method for obtaining estimates of nucleotide diversity θ from genomic shotgun sequencing data. The method takes into account the special nature of these data: random sampling of genomic segments from one or more individuals and a relatively high error rate for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show that the elevated diversity in telomeric regions is mainly due to elevated mutation rates and not due to decreased levels of background selection. However, we find indications that telomeres as well as centromeres experience greater impact from natural selection than intrachromosomal regions. Finally, we identify a number of genomic regions with increased or reduced diversity compared with the local level of human–chimpanzee divergence and the local recombination rate.

Footnotes

  • 6 Corresponding author.

    6 E-mail inesh{at}berkeley.edu; fax (510) 642-2740.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.074187.107.

    • Received November 8, 2007.
    • Accepted April 7, 2008.
| Table of Contents

Preprint Server