Summary
Quantifying the similarity of clusterings is a fundamental step in data analysis. Clustering similarity is the basis for method evaluation, consensus clustering, and tracking the temporal evolution of clusters, among many other tasks. Here we provide CluSim, a comprehensive Python package for the comparison of partitions, overlapping clusterings, and hierarchical clusterings (dendrograms) with more than 20 similarity measures. The CluSim package provides both analytic and empirical methods for assessing the similarity of clusterings in the context of a random model, and provides the novel element-centric approaches for clustering similarity measure that we introduced recently. We illustrate the use of the package through two examples: an evaluation of the clustering of Gene Expression data in the context of different random models, and detailed analysis of model incongruence using element-centric comparisons between a set of phylogentic trees (dendrograms).
Availability and implementation The CluSim Python package and accompanying jupyter notebook is available at https://github.com/Hoosier-Clusters/clusim with the MIT open source licence.
Contact ajgates42{at}gmail.com or yyahn{at}iu.edu