Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

  1. Adam Siepel1,6,
  2. Gill Bejerano1,
  3. Jakob S. Pedersen1,
  4. Angie S. Hinrichs1,
  5. Minmei Hou3,
  6. Kate Rosenbloom1,
  7. Hiram Clawson1,
  8. John Spieth4,
  9. LaDeana W. Hillier4,
  10. Stephen Richards5,
  11. George M. Weinstock5,
  12. Richard K. Wilson4,
  13. Richard A. Gibbs5,
  14. W. James Kent1,
  15. Webb Miller3, and
  16. David Haussler1,2
  1. 1 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA
  2. 2 Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, California 95064, USA
  3. 3 Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania 16802, USA
  4. 4 Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri 63108, USA
  5. 5 Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA

Abstract

We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%–8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%–53%), Caenorhabditis elegans (18%–37%), and Saccharaomyces cerevisiae (47%–68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3′ UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

Footnotes

  • [Supplemental material is available online at www.genome.org. The multiple alignments, predicted conserved elements, and base-by-base conservation scores presented here can be downloaded from http://www.cse.ucsc.edu/~acs/conservation. Up-to-date versions of these data sets are displayed in the “Conservation” and “Most Conserved” tracks in the UCSC Genome Browser (http://genome.ucsc.edu). The phastCons program is part of a software package called PHAST (PHylogenetic Analysis with Space/Time models), which is available by request from acs{at}soe.ucsc.edu.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3715005. Article published online before print in July 2005.

  • 6 Corresponding author. E-mail acs{at}soe.ucsc.edu; fax (831) 459-1809.

    • Accepted June 2, 2005.
    • Received January 19, 2005.
| Table of Contents

Preprint Server