PT - JOURNAL ARTICLE
AU - Sun, Guoli
AU - Krasnitz, Alexander
TI - Significantly distinct branches of hierarchical trees: A framework for statistical analysis and applications to biological data
AID - 10.1101/002188
DP - 2014 Jan 01
TA - bioRxiv
PG - 002188
4099 - http://biorxiv.org/content/early/2014/06/05/002188.short
4100 - http://biorxiv.org/content/early/2014/06/05/002188.full
AB - Background One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.Results We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.Conclusions Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html.