Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Phenotype prediction from genome-wide genotyping data: a crowdsourcing experiment

View ORCID ProfileOlivier Naret, View ORCID ProfileDavid AA Baranger, Sharada Prasanna Mohanty, Bastian Greshake Tzovaras, Marcel Salathé, View ORCID ProfileJacques Fellay, with the openSNP and crowdAI community
doi: https://doi.org/10.1101/2020.08.25.265900
Olivier Naret
1School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Olivier Naret
  • For correspondence: onaret@gmail.com
David AA Baranger
2Department of Psychological and Brain Sciences, Washington University, St. Louis, MO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David AA Baranger
Sharada Prasanna Mohanty
1School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bastian Greshake Tzovaras
3Lawrence Berkeley National Laboratory, Berkeley, CA, USA
4Department for Applied Bioinformatics, Goethe University, Frankfurt am Main, Germany
5Center for Research and Interdisciplinarity (CRI), Université de Paris, INSERM U1284, Paris, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marcel Salathé
1School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacques Fellay
1School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jacques Fellay
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Background The increasing statistical power of genome-wide association studies is fostering the development of precision medicine through genomic predictions of complex traits. Nevertheless, it has been shown that the results remain relatively modest. A reason might be the nature of the methods typically used to construct genomic predictions. Recent machine learning techniques have properties that could help to capture the architecture of complex traits better and improve genomic prediction accuracy.

Methods We relied on crowd-sourcing to efficiently compare multiple genomic prediction methods. This represents an innovative approach in the genomic field because of the privacy concerns linked to human genetic data. There are two crowd-sourcing elements building our study. First, we constructed a dataset from openSNP (opensnp.org), an open repository where people voluntarily share their genotyping data and phenotypic information in an effort to participate in open science. To leverage this resource we release the ‘openSNP Cohort Maker’, a tool that builds a homogeneous and up-to-date cohort based on the data available on opensnp.org. Second, we organized an open online challenge on the CrowdAI platform (crowdai.org) aiming at predicting height from genome-wide genotyping data.

Results The ‘openSNP Height Prediction’ challenge lasted for three months. A total of 138 challengers contributed to 1275 submissions. The winner computed a polygenic risk score using the publicly available summary statistics of the GIANT study to achieve the best result (r2 = 0.53 versus r2 = 0.49 for the second-best).

Conclusion We report here the first crowd-sourced challenge on publicly available genome-wide genotyping data. We also deliver the ‘openSNP Cohort Maker’ that will allow people to make use of the data available on opensnp.org.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • ↵* olivier.naret{at}epfl.ch

  • https://zenodo.org/record/1442755#.XlTwyHVKh1M

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 25, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Phenotype prediction from genome-wide genotyping data: a crowdsourcing experiment
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Phenotype prediction from genome-wide genotyping data: a crowdsourcing experiment
Olivier Naret, David AA Baranger, Sharada Prasanna Mohanty, Bastian Greshake Tzovaras, Marcel Salathé, Jacques Fellay, with the openSNP and crowdAI community
bioRxiv 2020.08.25.265900; doi: https://doi.org/10.1101/2020.08.25.265900
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Phenotype prediction from genome-wide genotyping data: a crowdsourcing experiment
Olivier Naret, David AA Baranger, Sharada Prasanna Mohanty, Bastian Greshake Tzovaras, Marcel Salathé, Jacques Fellay, with the openSNP and crowdAI community
bioRxiv 2020.08.25.265900; doi: https://doi.org/10.1101/2020.08.25.265900

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3602)
  • Biochemistry (7567)
  • Bioengineering (5522)
  • Bioinformatics (20782)
  • Biophysics (10325)
  • Cancer Biology (7978)
  • Cell Biology (11635)
  • Clinical Trials (138)
  • Developmental Biology (6602)
  • Ecology (10200)
  • Epidemiology (2065)
  • Evolutionary Biology (13611)
  • Genetics (9539)
  • Genomics (12844)
  • Immunology (7919)
  • Microbiology (19538)
  • Molecular Biology (7657)
  • Neuroscience (42081)
  • Paleontology (308)
  • Pathology (1257)
  • Pharmacology and Toxicology (2201)
  • Physiology (3267)
  • Plant Biology (7038)
  • Scientific Communication and Education (1294)
  • Synthetic Biology (1951)
  • Systems Biology (5426)
  • Zoology (1116)