New Results
Multiset correlation and factor analysis enables exploration of multi-omic data
View ORCID ProfileBrielin C. Brown, Collin Wang, View ORCID ProfileSilva Kasela, François Aguet, Daniel C. Nachun, View ORCID ProfileKent D. Taylor, View ORCID ProfileRussell P. Tracy, View ORCID ProfilePeter Durda, Yongmei Liu, View ORCID ProfileW. Craig Johnson, View ORCID ProfileDavid Van Den Berg, Namrata Gupta, Stacy Gabriel, Joshua D. Smith, Robert Gerzsten, View ORCID ProfileClary Clish, Quenna Wong, George Papanicolau, View ORCID ProfileThomas W. Blackwell, View ORCID ProfileJerome I. Rotter, View ORCID ProfileStephen S. Rich, View ORCID ProfileKristin G. Ardlie, David A. Knowles, View ORCID ProfileTuuli Lappalainen
doi: https://doi.org/10.1101/2022.07.18.500246
Brielin C. Brown
1New York Genome Center, New York, NY, USA
2Data Science Institute, Columbia University, New York, NY, USA
Collin Wang
1New York Genome Center, New York, NY, USA
3Department of Computer Science, Columbia University, New York, NY, USA
Silva Kasela
1New York Genome Center, New York, NY, USA
4Department of Systems Biology, Columbia University, New York, NY, USA
François Aguet
5Illumina Incorporated, San Francisco, CA, USA
6The Broad Institute of MIT and Harvard, Boston, MA, USA
Daniel C. Nachun
7Department of Pathology, Stanford University, Stanford, CA, USA
Kent D. Taylor
8Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Russell P. Tracy
9Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
Peter Durda
9Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
Yongmei Liu
10Department of Medicine, Duke University Medical Center, Durham, NC, USA
W. Craig Johnson
11Department of Biostatistics, University of Washington, Seattle, WA, USA
David Van Den Berg
12Department of Clinical Preventative Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Namrata Gupta
6The Broad Institute of MIT and Harvard, Boston, MA, USA
Stacy Gabriel
6The Broad Institute of MIT and Harvard, Boston, MA, USA
Joshua D. Smith
13Northwest Genomics Center, University of Washington, Seattle, WA, USA
Robert Gerzsten
14Beth Israel Deaconess Medical Center, Division of Cardiovascular Medicine, Boston, Massachusetts, USA
Clary Clish
6The Broad Institute of MIT and Harvard, Boston, MA, USA
Quenna Wong
15Department of Biostatistics, University of Washington, Seattle, WA, USA
George Papanicolau
16Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, Bethesda, MD, USA
Thomas W. Blackwell
17Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
Jerome I. Rotter
8Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
Stephen S. Rich
18Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
Kristin G. Ardlie
6The Broad Institute of MIT and Harvard, Boston, MA, USA
David A. Knowles
1New York Genome Center, New York, NY, USA
2Data Science Institute, Columbia University, New York, NY, USA
3Department of Computer Science, Columbia University, New York, NY, USA
4Department of Systems Biology, Columbia University, New York, NY, USA
Tuuli Lappalainen
1New York Genome Center, New York, NY, USA
4Department of Systems Biology, Columbia University, New York, NY, USA
19Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden

Abstract
Multi-omics datasets are becoming more common, necessitating better integration methods to realize their revolutionary potential. Here, we introduce Multi-set Correlation and Factor Analysis, an unsupervised integration method that enables fast inference of shared and private factors in multi-modal data. Applied to 614 ancestry-diverse participant samples across five ‘omics types, MCFA infers a shared space that captures clinically relevant molecular processes.
Competing Interest Statement
Tuuli Lappalainen is a paid adviser or consultant of Variant Bio, GSK, Pfizer and Goldfinch Bio. Francois Aguet is an employee and shareholder of Illumina Inc.
Footnotes
↵† Co-first author
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Posted July 20, 2022.
Multiset correlation and factor analysis enables exploration of multi-omic data
Brielin C. Brown, Collin Wang, Silva Kasela, François Aguet, Daniel C. Nachun, Kent D. Taylor, Russell P. Tracy, Peter Durda, Yongmei Liu, W. Craig Johnson, David Van Den Berg, Namrata Gupta, Stacy Gabriel, Joshua D. Smith, Robert Gerzsten, Clary Clish, Quenna Wong, George Papanicolau, Thomas W. Blackwell, Jerome I. Rotter, Stephen S. Rich, Kristin G. Ardlie, David A. Knowles, Tuuli Lappalainen
bioRxiv 2022.07.18.500246; doi: https://doi.org/10.1101/2022.07.18.500246
Multiset correlation and factor analysis enables exploration of multi-omic data
Brielin C. Brown, Collin Wang, Silva Kasela, François Aguet, Daniel C. Nachun, Kent D. Taylor, Russell P. Tracy, Peter Durda, Yongmei Liu, W. Craig Johnson, David Van Den Berg, Namrata Gupta, Stacy Gabriel, Joshua D. Smith, Robert Gerzsten, Clary Clish, Quenna Wong, George Papanicolau, Thomas W. Blackwell, Jerome I. Rotter, Stephen S. Rich, Kristin G. Ardlie, David A. Knowles, Tuuli Lappalainen
bioRxiv 2022.07.18.500246; doi: https://doi.org/10.1101/2022.07.18.500246
Subject Area
Subject Areas
- Biochemistry (10781)
- Bioengineering (8035)
- Bioinformatics (27263)
- Biophysics (13967)
- Cancer Biology (11115)
- Cell Biology (16035)
- Clinical Trials (138)
- Developmental Biology (8773)
- Ecology (13270)
- Epidemiology (2067)
- Evolutionary Biology (17346)
- Genetics (11681)
- Genomics (15905)
- Immunology (11015)
- Microbiology (26054)
- Molecular Biology (10628)
- Neuroscience (56486)
- Paleontology (417)
- Pathology (1729)
- Pharmacology and Toxicology (3000)
- Physiology (4539)
- Plant Biology (9618)
- Synthetic Biology (2685)
- Systems Biology (6970)
- Zoology (1508)