Discovering statistically significant biclusters in gene expression data

Amos Tanay; Roded Sharan; Ron Shamir

doi:10.1093/bioinformatics/18.suppl_1.s136

Discovering statistically significant biclusters in gene expression data

Bioinformatics. 2002:18 Suppl 1:S136-44. doi: 10.1093/bioinformatics/18.suppl_1.s136.

Authors

Amos Tanay¹, Roded Sharan, Ron Shamir

Affiliation

¹ School Of Computer Science, Tel-Aviv University, Ramat-Aviv, Tel-Aviv, 69978, Israel. amos@tau.ac.il

PMID: 12169541
DOI: 10.1093/bioinformatics/18.suppl_1.s136

Abstract

In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under plausible assumptions, our algorithm is polynomial and is guaranteed to find the most significant biclusters. We tested our method on a collection of yeast expression profiles and on a human cancer dataset. Cross validation results show high specificity in assigning function to genes based on their biclusters, and we are able to annotate in this way 196 uncharacterized yeast genes. We also demonstrate how the biclusters lead to detecting new concrete biological associations. In cancer data we are able to detect and relate finer tissue types than was previously possible. We also show that the method outperforms the biclustering algorithm of Cheng and Church (2000).

Publication types

Comparative Study
Evaluation Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Validation Study

MeSH terms

Algorithms*
Cluster Analysis*
Gene Expression Profiling / methods*
Humans
Lymphoma / genetics
Models, Genetic*
Models, Statistical*
Oligonucleotide Array Sequence Analysis / methods*
Sequence Alignment / methods
Sequence Analysis, DNA / methods*
Sequence Homology, Nucleic Acid
Yeasts / genetics