Abstract
Polygenic risk scores (PRS) leverage the genetic contribution of an individual’s genotype by estimating disease risk. Traditional PRS prediction methods are predominantly for European population. The accuracy of PRS prediction in non-European populations is diminished due to much smaller sample size of genome-wide association studies (GWAS). In this article, we introduced a novel method to construct PRS for non-European populations, abbreviated as TL-Multi, by conducting transfer learning framework to learn useful knowledge from European population to correct the bias for non-European populations. We considered non-European GWAS data as target data and European GWAS data as informative auxiliary data. TL-Multi borrowed useful information from auxiliary data to improve the learning accuracy of the target data while preserving the efficiency and accuracy. To demonstrate the practical applicability of the proposed method, we applied TL-Multi to predict systemic lupus erythematosus (SLE) risk in Hong Kong population by borrowing information from European population. TL-Multi achieved better prediction accuracy than alternative methods including Lassosum, meta-analysis and linkage disequilibrium (LD)-informed pruning and P-values thresholding for multiethnic PRS (PT-Multi), and substantially improved the prediction performance with moderate cross-population genetic correlation in both simulations and SLE application.
Competing Interest Statement
The authors have declared no competing interest.