Inference of natural selection from interspersed genomic elements based on polymorphism and divergence

Mol Biol Evol. 2013 May;30(5):1159-71. doi: 10.1093/molbev/mst019. Epub 2013 Feb 5.

Abstract

Complete genome sequences contain valuable information about natural selection, but this information is difficult to access for short, widely scattered noncoding elements such as transcription factor binding sites or small noncoding RNAs. Here, we introduce a new computational method, called Inference of Natural Selection from Interspersed Genomically coHerent elemenTs (INSIGHT), for measuring the influence of natural selection on such elements. INSIGHT uses a generative probabilistic model to contrast patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites, pooling weak information from many short elements in a manner that accounts for variation among loci in mutation rates and coalescent times. The method is able to disentangle the contributions of weak negative, strong negative, and positive selection based on their distinct effects on patterns of polymorphism and divergence. It obtains information about divergence from multiple outgroup genomes using a general statistical phylogenetic approach. The INSIGHT model is efficiently fitted to genome-wide data using an approximate expectation maximization algorithm. Using simulations, we show that the method can accurately estimate the parameters of interest even in complex demographic scenarios, and that it significantly improves on methods based on summary statistics describing polymorphism and divergence. To demonstrate the usefulness of INSIGHT, we apply it to several classes of human noncoding RNAs and to GATA2-binding sites in the human genome.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • DNA / genetics
  • Evolution, Molecular*
  • Genetic Variation / genetics
  • Genetics, Population
  • Humans
  • Phylogeny
  • Polymorphism, Genetic / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics
  • Selection, Genetic / genetics*

Substances

  • DNA