Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes

BMC Genomics. 2004 Dec 21;5(1):99. doi: 10.1186/1471-2164-5-99.

Abstract

Background: Evolutionarily conserved sequences within or adjoining orthologous genes often serve as critical cis-regulatory regions. Recent studies have identified long, non-coding genomic regions that are perfectly conserved between human and mouse, termed ultra-conserved regions (UCRs). Here, we focus on UCRs that cluster around genes involved in early vertebrate development; genes conserved over 450 million years of vertebrate evolution.

Results: Based on a high resolution detection procedure, our UCR set enables novel insights into vertebrate genome organization and regulation of developmentally important genes. We find that the genomic positions of deeply conserved UCRs are strongly associated with the locations of genes encoding key regulators of development, with particularly strong positional correlation to transcription factor-encoding genes. Of particular importance is the observation that most UCRs are clustered into arrays that span hundreds of kilobases around their presumptive target genes. Such a hallmark signature is present around several uncharacterized human genes predicted to encode developmentally important DNA-binding proteins.

Conclusion: The genomic organization of UCRs, combined with previous findings, suggests that UCRs act as essential long-range modulators of gene expression. The exceptional sequence conservation and clustered structure suggests that UCR-mediated molecular events involve greater complexity than traditional DNA binding by transcription factors. The high-resolution UCR collection presented here provides a wealth of target sequences for future experimental studies to determine the nature of the biochemical mechanisms involved in the preservation of arrays of nearly identical non-coding sequences over the course of vertebrate evolution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cluster Analysis
  • Conserved Sequence
  • DNA / metabolism
  • DNA-Binding Proteins / genetics
  • Evolution, Molecular
  • Gene Expression Regulation
  • Gene Expression Regulation, Developmental*
  • Genes, Developmental*
  • Genome*
  • Humans
  • Molecular Sequence Data
  • Multigene Family
  • Oligonucleotide Array Sequence Analysis / methods*
  • Protein Binding
  • Vertebrates / genetics

Substances

  • DNA-Binding Proteins
  • DNA