Deconfounded Dimension Reduction via Partial Embeddings

Andrew A. Chen; Kelly Clark; Blake Dewey; Anna DuVal; Nicole Pellegrini; Govind Nair; Youmna Jalkh; Samar Khalil; Jon Zurawski; Peter Calabresi; Daniel Reich; Rohit Bakshi; Haochang Shou; Russell T. Shinohara; the Alzheimer’s Disease Neuroimaging Initiative; the North American Imaging in Multiple Sclerosis Cooperative

doi:10.1101/2023.01.10.523448

Abstract

Dimension reduction tools preserving similarity and graph structure such as t-SNE and UMAP can capture complex biological patterns in high-dimensional data. However, these tools typically are not designed to separate effects of interest from unwanted effects due to confounders. We introduce the partial embedding (PARE) framework, which enables removal of confounders from any distance-based dimension reduction method. We then develop partial t-SNE and partial UMAP and apply these methods to genomic and neuroimaging data. Our results show that the PARE framework can remove batch effects in single-cell sequencing data as well as separate clinical and technical variability in neuroimaging measures. We demonstrate that the PARE framework extends dimension reduction methods to highlight biological patterns of interest while effectively removing confounding effects.

Competing Interest Statement

RB has received consulting fees from Bristol-Myers Squibb and EMD Serono and research support from Bristol-Myers Squibb, EMD Serono, and Novartis. DSR has received research funding from Abata Therapeutics, Sanofi-Genzyme, and Vertex Pharmaceuticals, all unrelated to the current study. RTS receives consulting income from Octave Bioscience and compensation for scientific reviewing from the American Medical Association.

Footnotes

c Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.