Abstract
Background The increasing statistical power of genome-wide association studies is fostering the development of precision medicine through genomic predictions of complex traits. Nevertheless, it has been shown that the results remain relatively modest. A reason might be the nature of the methods typically used to construct genomic predictions. Recent machine learning techniques have properties that could help to capture the architecture of complex traits better and improve genomic prediction accuracy.
Methods We relied on crowd-sourcing to efficiently compare multiple genomic prediction methods. This represents an innovative approach in the genomic field because of the privacy concerns linked to human genetic data. There are two crowd-sourcing elements building our study. First, we constructed a dataset from openSNP (opensnp.org), an open repository where people voluntarily share their genotyping data and phenotypic information in an effort to participate in open science. To leverage this resource we release the ‘openSNP Cohort Maker’, a tool that builds a homogeneous and up-to-date cohort based on the data available on opensnp.org. Second, we organized an open online challenge on the CrowdAI platform (crowdai.org) aiming at predicting height from genome-wide genotyping data.
Results The ‘openSNP Height Prediction’ challenge lasted for three months. A total of 138 challengers contributed to 1275 submissions. The winner computed a polygenic risk score using the publicly available summary statistics of the GIANT study to achieve the best result (r2 = 0.53 versus r2 = 0.49 for the second-best).
Conclusion We report here the first crowd-sourced challenge on publicly available genome-wide genotyping data. We also deliver the ‘openSNP Cohort Maker’ that will allow people to make use of the data available on opensnp.org.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* olivier.naret{at}epfl.ch