RT Journal Article SR Electronic T1 Accurate ethnicity prediction from placental DNA methylation data JF bioRxiv FD Cold Spring Harbor Laboratory SP 618470 DO 10.1101/618470 A1 Victor Yuan A1 E Magda Price A1 Giulia F Del Gobbo A1 Sara Mostafavi A1 Brian Cox A1 Alexandra M. Binder A1 Karin B. Michels A1 Carmen Marsit A1 Wendy P. Robinson YR 2019 UL http://biorxiv.org/content/early/2019/05/07/618470.abstract AB Background The influence of genetics on variation in DNA methylation (DNAme) is well documented. Yet confounding from population stratification is often unaccounted for in DNAme association studies. Existing approaches to address confounding by population stratification using DNAme data may not generalize to populations or tissues outside those in which they were developed. To aid future placental DNAme studies in assessing population stratification, we developed an ethnicity classifier, PlaNET (Placental DNAme Elastic Net Ethnicity Tool), using five cohorts with Infinium Human Methylation 450k BeadChip array (HM450k) data from placental samples that is also compatible with the newer EPIC platform.Results Data from 509 placental samples was used to develop PlaNET and show that it accurately predicts (accuracy = 0.938, kappa = 0.823) major classes of self-reported ethnicity/race (African: n = 58, Asian: n = 53, Caucasian: n = 389), and produces ethnicity probabilities that are highly correlated with genetic ancestry inferred from genome-wide SNP arrays (>2.5 million SNP) and ancestry informative markers (n = 50 SNPs). PlaNET’s ethnicity classification relies on 1860 HM450K microarray sites, and over half of these were linked to nearby genetic polymorphisms (n = 955). Our placental-optimized method outperforms existing approaches in assessing population stratification in placental samples from individuals of Asian, African, and Caucasian ethnicities.Conclusion PlaNET provides an improved approach to address population stratification in placental DNAme association studies. The method can be applied to predict ethnicity as a discrete or continuous variable and will be especially useful when self-reported ethnicity information is missing and genotyping markers are unavailable. PlaNET is available as an R package at (https://github.com/wvictor14/planet).PlaNETPlacental DNAme Elastic Net Ethnicity ToolDNAmeDNA methylationCpGCytosine-phosphate-guanineSNPSingle-nucleotide polymorphismAIMsAncestry informative genotyping markersmQTLmethylation quantitative trait lociPCAPrincipal component analysisPCPrincipal componentHM450KInfinium HumanMethylation450 BeadChipEPICInfinium MethylationEPIC BeadChipLODOCVLeave-one-dataset-out cross validationGLMNETGeneralized logistic regression with an elastic net penaltySVMSupport vector machinesKNNK-nearest neighboursNSCNearest shrunken centroidsPlaNETPlacental elastic net ethnicity classifierUSAUnited States of AmericaAFRAfricanASIAsianCAUCaucasianBMIQBeta-mixture interquantile normalizationNOOBNormal exponential out-of-band normalization.