Discovery of novel transcription factor binding sites by statistical overrepresentation

Saurabh Sinha; Martin Tompa

doi:10.1093/nar/gkf669

Discovery of novel transcription factor binding sites by statistical overrepresentation

Nucleic Acids Res. 2002 Dec 15;30(24):5549-60. doi: 10.1093/nar/gkf669.

Authors

Saurabh Sinha¹, Martin Tompa

Affiliation

¹ Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350, USA.

Abstract

Understanding the complex and varied mechanisms that regulate gene expression is an important and challenging problem. A fundamental sub-problem is to identify DNA binding sites for unknown regulatory factors, given a collection of genes believed to be co-regulated. We discuss a computational method that identifies good candidates for such binding sites. Unlike local search techniques such as expectation maximization and Gibbs samplers that may not reach a global optimum, the method discussed enumerates all motifs in the search space, and is guaranteed to produce the motifs with greatest z-scores. We discuss the results of validation experiments in which this algorithm was used to identify candidate binding sites in several well studied regulons of Saccharomyces cerevisiae, where the most prominent transcription factor binding sites are largely known. We then discuss the results on gene families in the functional and mutant phenotype catalogs of S.cerevisiae, where the algorithm suggests many promising novel transcription factor binding sites. The program is available at http://bio.cs.washington.edu/software.html.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Binding Sites / genetics
Computational Biology / methods*
Computational Biology / statistics & numerical data
DNA, Fungal / genetics
DNA, Fungal / metabolism
Genes, Fungal / genetics
Phenotype
Promoter Regions, Genetic / genetics
Protein Binding
Regulon / genetics
Reproducibility of Results
Saccharomyces cerevisiae / genetics
Transcription Factors / metabolism*

Substances

DNA, Fungal
Transcription Factors