Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish

Nucleic Acids Res. 2013 Aug;41(15):e151. doi: 10.1093/nar/gkt557. Epub 2013 Jun 27.

Abstract

Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology / methods*
  • Conserved Sequence*
  • Evolution, Molecular
  • Genomics / methods
  • Internet
  • Molecular Sequence Annotation
  • Phylogeny*
  • Regulatory Sequences, Nucleic Acid
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Synteny
  • Zebrafish / classification
  • Zebrafish / genetics*