Eukaryotic Promoter Recognition

  1. James W. Fickett1,3 and
  2. Artemis G. Hatzigeorgiou2
  1. 1Bioinformatics, SmithKline Beecham Pharmaceuticals, King of Prussia, Pennsylvania 19406; 2Synaptic Ltd., 13671 Acharnai, Greece

This extract was created in the absence of an abstract.

Computational analysis of polymerase II (Pol II) promoters may contribute to improved gene identification and to prediction of the expression context of genes. Before assessing the state of computational promoter recognition per se in the main body of this review, we will provide a context by giving a brief overview of these two problems.

Partitioning a Genome into Genes

Only recently has it become common to determine eukaryotic genomic sequences large enough to contain several genes. With these data comes a new problem for gene finding programs: to partition a set of exons correctly among several genes.

One line of development in eukaryotic gene identification begins with coding region identification by statistical means and adds pattern recognition for sites of transcriptional, splicing, and translational control to produce algorithms capable of suggesting overall gene structure (for review, see Gelfand 1995; Fickett 1996a). To date, most development effort has focused on integration of the various kinds of pattern information in the relatively simple case where a single complete gene is present in the input sequence. In this case, current algorithms usually suggest a putative protein translation similar to that in the literature, though there is still significant room for improvement (Burset and Guigo 1996). The extension of these algorithms to deal with a sequence containing multiple or partial genes is just beginning (Burge and Karlin 1997;http://gnomic.stanford.edu/~chris/GENSCANW.html). Because the signals that control the start and stop of transcription and translation, and the location of splicing, are still not very well understood, it is not uncommon for a gene-finding algorithm to confuse internal with initial and terminal exons, thus wrongly partitioning the exons. The problem is compounded by our incomplete understanding of alternative splicing control elements.

Another line of development in gene identification is based on homology (e.g., Gish and States 1993; Gelfand et al. …

| Table of Contents

Preprint Server