Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Phenotype integration improves power and preserves specificity in biobank-based genetic studies of MDD

View ORCID ProfileAndrew Dahl, Michael Thompson, View ORCID ProfileUlzee An, View ORCID ProfileMorten Krebs, View ORCID ProfileVivek Appadurai, View ORCID ProfileRichard Border, View ORCID ProfileSilviu-Alin Bacanu, View ORCID ProfileThomas Werge, View ORCID ProfileJonathan Flint, View ORCID ProfileAndrew J. Schork, View ORCID ProfileSriram Sankararaman, View ORCID ProfileKenneth Kendler, View ORCID ProfileNa Cai
doi: https://doi.org/10.1101/2022.08.15.503980
Andrew Dahl
1Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew Dahl
  • For correspondence: andywdahl@uchicago.edu na.cai@helmholtz-muenchen.de
Michael Thompson
2Department of Computer Science, University of California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ulzee An
2Department of Computer Science, University of California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ulzee An
Morten Krebs
3Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital – Mental Health Services CPH, Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Morten Krebs
Vivek Appadurai
3Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital – Mental Health Services CPH, Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vivek Appadurai
Richard Border
2Department of Computer Science, University of California, Los Angeles, CA, USA
4Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
5Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Richard Border
Silviu-Alin Bacanu
6Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Silviu-Alin Bacanu
Thomas Werge
3Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital – Mental Health Services CPH, Copenhagen, Denmark
7Lundbeck Foundation GeoGenetics Centre, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
8Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Thomas Werge
Jonathan Flint
4Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonathan Flint
Andrew J. Schork
3Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital – Mental Health Services CPH, Copenhagen, Denmark
9Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
10Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Sciences, Copenhagen University, Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew J. Schork
Sriram Sankararaman
2Department of Computer Science, University of California, Los Angeles, CA, USA
4Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
11Department of Computational Medicine, University of California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sriram Sankararaman
Kenneth Kendler
6Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kenneth Kendler
Na Cai
12Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
13Computational Health Centre, Helmholtz Zentrum München, Neuherberg, Germany
14School of Medicine, Technical University of Munich, Munich, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Na Cai
  • For correspondence: andywdahl@uchicago.edu na.cai@helmholtz-muenchen.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Biobanks often contain several phenotypes relevant to a given disorder, and researchers face complex tradeoffs between shallow phenotypes (high sample size, low specificity and sensitivity) and deep phenotypes (low sample size, high specificity and sensitivity). Here, we study an extreme case: Major Depressive Disorder (MDD) in UK Biobank. Previous studies found that shallow and deep MDD phenotypes have qualitatively distinct genetic architectures, but it remains unclear which are optimal for scientific study or clinical prediction. We propose a new framework to get the best of both worlds by integrating together information across hundreds of MDD-relevant phenotypes. First, we use phenotype imputation to increase sample size for the deepest available MDD phenotype, which dramatically improves GWAS power (increases #loci ~10 fold) and PRS accuracy (increases R2 ~2 fold). Further, we show the genetic architecture of the imputed phenotype remains specific to MDD using genetic correlation, PRS prediction in external clinical cohorts, and a novel PRS-based pleiotropy metric. We also develop a complementary approach to improve specificity of GWAS on shallow MDD phenotypes by adjusting for phenome-wide PCs. Finally, we study phenotype integration at the level of GWAS summary statistics, which can increase GWAS and PRS power but introduces non-MDD-specific signals. Our work provides a simple and scalable recipe to improve genetic studies in large biobanks by combining the sample size of shallow phenotypes with the sensitivity and specificity of deep phenotypes.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • 1. We evaluated three alternatives to our phenotype imputation matrix: (1) sex-stratified; (2) adding BMI; and (3) restricting to MTAG phenotypes (Supplementary Figure 2). (1) slightly hurt performance, (2) had little impact, and (3) performed far worse. 2. We significantly improved our presentation of the MTAG results, especially by expanding on our description of its utility in the Discussion and by modifying section titles. 3. We now publicly release our GWAS summary statistics. 4. We clarified/corrected our use of a few key terms, e.g. effective sample size and phenotype integration. 5. We thoroughly evaluated the variances and correlations of the imputed phenotypes (Supplementary Figure 1). 6. We added a major caveat to our Discussion on the potential for phenotype imputation to bias downstream analyses that others might perform, as well as potential solutions for these future directions. 7. We formally show that the increase in GWAS power from phenotype integration is statistically significant (Supplementary Figure 3). 8. We expanded our Discussion and added relevant references to place our work in context of prior efforts to improve power for MDD GWAS, including proxy GWAS (GWAX) and combining endorsements of multiple depression measures. 9. We extensively characterized PRS Pleiotropy as a function of sample size (Extended Data Figure 5). Most importantly, our results confirm that the conclusions in our paper are robust to sample size differences between our PRS. More broadly, our results characterize the complex interplay between sample size, p-value threshold, and the number of SNPs in a PRS (Supplementary Figures 10-11, Extended Data Figure 6). 10. We added two key references to prior work on MTAG; a theoretical result that complements our empirical results on power-specificity tradeoffs, and a smaller-scale observation consistent with our result that MTAG inflates genetic correlation. 11. We corrected a small but important overstatement in our previous submission on the relationship between portability and biological causality.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 13, 2023.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Phenotype integration improves power and preserves specificity in biobank-based genetic studies of MDD
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Phenotype integration improves power and preserves specificity in biobank-based genetic studies of MDD
Andrew Dahl, Michael Thompson, Ulzee An, Morten Krebs, Vivek Appadurai, Richard Border, Silviu-Alin Bacanu, Thomas Werge, Jonathan Flint, Andrew J. Schork, Sriram Sankararaman, Kenneth Kendler, Na Cai
bioRxiv 2022.08.15.503980; doi: https://doi.org/10.1101/2022.08.15.503980
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Phenotype integration improves power and preserves specificity in biobank-based genetic studies of MDD
Andrew Dahl, Michael Thompson, Ulzee An, Morten Krebs, Vivek Appadurai, Richard Border, Silviu-Alin Bacanu, Thomas Werge, Jonathan Flint, Andrew J. Schork, Sriram Sankararaman, Kenneth Kendler, Na Cai
bioRxiv 2022.08.15.503980; doi: https://doi.org/10.1101/2022.08.15.503980

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4838)
  • Biochemistry (10749)
  • Bioengineering (8020)
  • Bioinformatics (27202)
  • Biophysics (13944)
  • Cancer Biology (11088)
  • Cell Biology (16001)
  • Clinical Trials (138)
  • Developmental Biology (8760)
  • Ecology (13249)
  • Epidemiology (2067)
  • Evolutionary Biology (17324)
  • Genetics (11667)
  • Genomics (15888)
  • Immunology (10997)
  • Microbiology (26006)
  • Molecular Biology (10611)
  • Neuroscience (56373)
  • Paleontology (417)
  • Pathology (1729)
  • Pharmacology and Toxicology (2999)
  • Physiology (4530)
  • Plant Biology (9593)
  • Scientific Communication and Education (1610)
  • Synthetic Biology (2674)
  • Systems Biology (6961)
  • Zoology (1508)