RT Journal Article SR Electronic T1 KMD clustering: Robust generic clustering of biological data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.10.04.325233 DO 10.1101/2020.10.04.325233 A1 Zelig, Aviv A1 Kaplan, Noam YR 2020 UL http://biorxiv.org/content/early/2020/10/04/2020.10.04.325233.abstract AB The challenges of clustering noisy high-dimensional biological data have spawned advanced clustering algorithms that are tailored for specific subtypes of biological datatypes. However, the performance of such methods varies greatly between datasets, they require post hoc tuning of cryptic hyperparameters, and they are often not transferable to other types of data. Here we present a novel generic clustering approach called k minimal distances (KMD) clustering, based on a simple generalization of single and average linkage hierarchical clustering. We show how a generalized silhouette-like function is predictive of clustering accuracy and exploit this property to eliminate the main hyperparameter k. We evaluated KMD clustering on standard simulated datasets, simulated datasets with high noise added, mass cytometry datasets and scRNA-seq datasets. When compared to standard generic and state-of-the-art specialized algorithms, KMD clustering’s performance was consistently better or comparable to that of the best algorithm on each of the tested datasets.Competing Interest StatementThe authors have declared no competing interest.