RT Journal Article SR Electronic T1 An efficient not-only-linear correlation coefficient based on machine learning JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.06.15.496326 DO 10.1101/2022.06.15.496326 A1 Pividori, Milton A1 Ritchie, Marylyn D. A1 Milone, Diego H. A1 Greene, Casey S. YR 2022 UL http://biorxiv.org/content/early/2022/06/17/2022.06.15.496326.abstract AB Correlation coefficients are widely used to identify patterns in data that may be of particular interest. In transcriptomics, genes with correlated expression often share functions or are part of disease-relevant biological processes. Here we introduce the Clustermatch Correlation Coefficient (CCC), an efficient, easy-to-use and not-only-linear coefficient based on machine learning models. CCC reveals biologically meaningful linear and nonlinear patterns missed by standard, linear-only correlation coefficients. CCC captures general patterns in data by comparing clustering solutions while being much faster than state-of-the-art coefficients such as the Maximal Information Coefficient. When applied to human gene expression data, CCC identifies robust linear relationships while detecting nonlinear patterns associated, for example, with sex differences that are not captured by linear-only coefficients. Gene pairs highly ranked by CCC were enriched for interactions in integrated networks built from protein-protein interaction, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC could detect functional relationships that linear-only methods missed. CCC is a highly-efficient, next-generation not-only-linear correlation coefficient that can readily be applied to genome-scale data and other domains across different data types.Competing Interest StatementThe authors have declared no competing interest.