PT - JOURNAL ARTICLE AU - Javad Rahimikollu AU - Hanxi Xiao AU - Anna E. Rosengart AU - Tracy Tabib AU - Paul Zdinak AU - Kun He AU - Xin Bing AU - Florentina Bunea AU - Marten Wegkamp AU - Amanda C. Poholek AU - Alok V Joglekar AU - Robert A Lafyatis AU - Jishnu Das TI - SLIDE: Significant Latent Factor Interaction Discovery and Exploration across biological domains AID - 10.1101/2022.11.25.518001 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.11.25.518001 4099 - http://biorxiv.org/content/early/2022/11/27/2022.11.25.518001.short 4100 - http://biorxiv.org/content/early/2022/11/27/2022.11.25.518001.full AB - Modern multi-omic technologies can generate deep multi-scale profiles. However, differences in data modalities, multicollinearity of the data, and large numbers of irrelevant features make the analyses and integration of high-dimensional omic datasets challenging. Here, we present Significant Latent factor Interaction Discovery and Exploration (SLIDE), a first-in-class interpretable machine learning technique for identifying significant interacting latent factors underlying outcomes of interest from high-dimensional omic datasets. SLIDE makes no assumptions regarding data-generating mechanisms, comes with theoretical guarantees regarding identifiability of the latent factors/corresponding inference, outperforms/performs at least as well as state-of-the-art approaches in terms of prediction, and provides inference beyond prediction. Using SLIDE on scRNA-seq data from systemic sclerosis (SSc) patients, we first uncovered significant interacting latent factors underlying SSc pathogenesis. In addition to accurately predicting SSc severity and outperforming existing benchmarks, SLIDE uncovered significant factors that included well-elucidated altered transcriptomic states in myeloid cells and fibroblasts, an intriguing keratinocyte-centric signature validated by protein staining, and a novel mechanism involving altered HLA signaling in myeloid cells, that has support in genetic data. SLIDE also worked well on spatial transcriptomic data and was able to accurately identify significant interacting latent factors underlying immune cell partitioning by 3D location within lymph nodes. Finally, SLIDE leveraged paired scRNA-seq and TCR-seq data to elucidate latent factors underlying extents of clonal expansion of CD4 T cells in a nonobese diabetic model of T1D. The latent factors uncovered by SLIDE included well-known activation markers, inhibitory receptors and intracellular regulators of receptor signaling, but also honed in on several novel naïve and memory states that standard analyses missed. Overall, SLIDE is a versatile engine for biological discovery from modern multi-omic datasets.Competing Interest StatementThe authors have declared no competing interest.