TY - JOUR T1 - Protein prediction for trait mapping in diverse populations JF - bioRxiv DO - 10.1101/2021.08.11.455912 SP - 2021.08.11.455912 AU - Ryan Schubert AU - Elyse Geoffroy AU - Isabelle Gregga AU - Ashley J. Mulford AU - Francois Aguet AU - Kristin Ardlie AU - Robert Gerszten AU - Clary Clish AU - David Van Den Berg AU - Kent D. Taylor AU - Peter Durda AU - W. Craig Johnson AU - Elaine Cornell AU - Xiuqing Guo AU - Yongmei Liu AU - Russell Tracy AU - Matthew Conomos AU - Cathy Laurie AU - Tom Blackwell AU - George Papanicolaou AU - Tuuli Lappalainen AU - Anna V. Mikhaylova AU - Timothy A. Thornton AU - Michael H. Cho AU - Christopher R. Gignoux AU - Leslie Lange AU - Ethan Lange AU - Stephen S. Rich AU - Jerome I. Rotter AU - NHLBI TOPMed Consortium AU - Ani Manichaikul AU - Hae Kyung Im AU - Heather E. Wheeler Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/08/11/2021.08.11.455912.abstract N2 - Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n=183), Chinese (n=71), European (n=416), and Hispanic/Latino (n=301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ~50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837328.Author summary Gene regulation is a critical mechanism underlying complex traits. Transcriptome-wide association studies (TWAS) have helped elucidate potential mechanisms because each association connects a gene rather than a variant to the complex trait. Like genome-wide association studies (GWAS), most TWAS are still conducted exclusively in populations of European ancestry, which misses the opportunity to test the full spectrum of human genetic variation for associations with complex traits. Here, move beyond the transcriptome and because protein measurement assays are growing to allow interrogation of the proteome, we use data from TOPMed MESA to develop genetic predictors of protein abundance in diverse ancestry populations. We compare model-building strategies with the goal of providing the best resource for protein association discovery with available data. We demonstrate how these prediction models can be used to perform proteome-wide association studies (PWAS) in diverse populations. We show the most protein-trait associations were discovered, colocalized, and replicated in independent cohorts using proteome prediction model training populations with similar ancestries to individuals in the GWAS. We shared our protein prediction models and performance statistics publicly to facilitate future proteome mapping studies in diverse populations.Competing Interest StatementThe authors have declared no competing interest. ER -