DNA methylation mediates genetic liability to non-syndromic cleft lip/palate

Background Non-syndromic cleft lip/palate (nsCL/P) is a complex trait with genetic and environmental risk factors. Around 40 distinct genetic risk loci have been identified for nsCL/P, but many reside in non-protein-coding regions with an unclear function. We hypothesised that one possibility is that the genetic risk variants influence susceptibility to nsCL/P through gene regulation pathways, such as those involving DNA methylation. Methods Using nsCL/P Genome-wide association study summary data and methylation data from four studies, we used Mendelian randomization and joint likelihood mapping to identify putative loci where genetic liability to nsCL/P may be mediated by variation in DNA methylation in blood. Results There was evidence at three independent loci, VAX1 (10q25.3), LOC146880 (17q23.3) and NTN1 (17p13.1), that liability to nsCL/P and variation in DNA methylation might be driven by the same genetic variant. Follow up analyses using DNA methylation data, derived from lip and palate tissue, and gene expression catalogues provided further insight into possible biological mechanisms. Conclusions Genetic variation may increase liability to nsCL/P by influencing DNA methylation and gene expression at VAX1, LOC146880 and NTN1.


Introduction
Orofacial clefts are a heterogenous group of birth disorders 1 . In epidemiology and genetics research, orofacial clefts can be divided into the subtypes cleft palate only (CPO) and cleft lip with or without cleft palate (CL/P), with strong evidence for distinct aetiologies 1 . There is also accumulating evidence suggesting that the CL/P subtypes cleft lip only (CLO) and cleft lip with cleft palate (CLP) may also differ aetiologically 2 3 . Mendelian syndromes can feature CL/P and CPO but around 70% of CL/P cases are non-syndromic (nsCL/P), with a complex aetiology likely to involve both genetic and environmental risk factors 4 5 .
Genome-wide association studies (GWAS) have identified around 40 distinct genetic risk variants for nsCL/P in European and Asian populations 2 6-13 but many variants reside in non-protein coding regions and so their functional relevance remains unclear. One possibility is that genetic risk variants may be affecting nsCL/P susceptibility through gene regulation pathways. Indeed, a non-coding interval at 8q24, a major nsCL/P risk locus, has previously been shown to regulate gene expression in the developing murine face 14 . There is increasing evidence that epigenetic mechanisms, such as DNA methylation, play a role in development of orofacial clefts [15][16][17][18] , potentially via changes to gene expression.
In this study, we applied a recently devised analysis framework 19 20 to explore whether genetic influences on liability to nsCL/P are mediated by DNA methylation assayed in whole blood ( Figure 1). Genetic variants that are associated with DNA methylation (methylation quantitative trait loci; mQTLs) have been previously identified in the Avon Longitudinal Study of Parents and Children (ALSPAC) 21 . We use these mQTLs to perform Mendelian randomization (MR), an epidemiological tool typically used to explore causal relationships between modifiable risk factors and health or disease outcomes 22 . In this instance, genetic variants robustly associated with DNA methylation were tested for association with nsCL/P. We considered four possible models to explain an association between an mQTL and nsCL/P: 1) DNA methylation mediates genetic influences on liability to nsCL/P; 2) the direction of effect is reversed, i.e. genetic liability to nsCL/P causes variation in DNA methylation; 3) DNA methylation and liability to nsCL/P are influenced by separate genetic variants that are in linkage disequilibrium (LD) with each other; 4) DNA methylation and liability to nsCL/P are influenced by the same genetic variant, but via independent pathways, i.e. the association is due to horizontal pleiotropy (Figure 1) 19 20 .
We systematically applied additional analyses, including bidirectional MR and co-localization, to estimate the most likely model as far as possible although we were unable to use recently derived MR methods 23 24 to distinguish between mediation and vertical or horizontal pleiotropy because most CpGs are instrumented by a single genetic variant. Where there was evidence that genetic influences on liability to nsCL/P may be mediated by DNA methylation, we explored associations with gene expression. We also compared our findings from the general population to results from an epigenome-wide association study (EWAS) of whole blood samples from nsCL/P cases and unaffected controls. Given accumulating evidence that different subtypes of OFCs have distinct aetiologies, we also explored whether identified CpGs are differentially methylated in blood samples from children with different OFC subtypes. Finally, since the majority of our analyses used DNA methylation derived in blood, which might not be representative of the developing orofacial tissues, as described previously 15 , we explored correlations between DNA methylation in blood and lip/palate tissue in the same individuals.

Methods
Da ta s ou rce s n s CL / P g e n e t i c r i s k v a r i a n t s We identified single nucleotide polymorphisms (SNPs) associated with nsCL/P by conducting a meta-analysis of summary statistics from two nsCL/P GWAS. Summary statistics for the first GWAS came from a case-control study of 399 cases and 1318 controls of Central European descent 7 . For the second GWAS, we generated summary statistics by conducting a GWAS using individual level data from 638 parent-offspring trios and 178 parent-offspring duos of European descent from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICC). These data were available to download from dbGaP (Study Accession phs000094.v1.p1) 25 . Full GWAS methods are described in the Supplementary Material, but briefly, we performed a transmission disequilibrium test (TDT) 26 on the pedigree data. We then performed a fixed-effects inverse-variance-weighted metaanalysis of the summary statistics from both GWAS using METAL, on the total sample of 1215 cases and 2772 controls 27 . The results compared well with those previously published using a very similar dataset but slightly different quality control and analysis methods 6 . We used LiftOver (genome.sph.umich.edu/wiki/LiftOver) to convert the genome positions in the nsCL/P summary statistics to the most recent genome build 37. Finally, we used PLINK 28 and ALSPAC as a reference panel clump the results according to LD (r 2 <0.001), within a 250 kb region around each index variant, and generate a set of independent SNPs for the pipeline.

M et hy l a t i o n g e ne t i c r i s k v ar i a nt s ( m Q T Ls ) A LS P A C
To identify mQTLs (SNPs associated with DNA methylation), we used data from the Avon Longitudinal Study of Parents and Children (ALSPAC) 29 30 . In addition to collecting detailed questionnaire and clinic data for the whole cohort, the study has generated genome-wide DNA methylation and genotype data for subsets of the cohort 31 (methods described in the Supplementary Material). These data have previously been used to generate a database of mQTLs (http://www.mqtldb.org/) 21 .
The database contains summary statistics for all mQTLs with a P-value <1x10 -7 for the association between SNP and CpG. For the purposes of this study, we focused on the mQTLs identified in cord blood samples collected at birth (the closest available time point to the orofacial developmental period). For part of our study (the reverse two sample MR), we required specific CpG-SNP associations that were unavailable from mQTLdb.org. Therefore, for required CpGs, we replicated the methods in the original study: we excluded individuals with missing genotype or covariate data, leaving 787 children. We then rank-normalised the methylation data to remove outliers and controlled for covariates, potential batch effects and the influence of cell heterogeneity by regressing data points on sex, the first 10 ancestry principal components, bisulfite-converted DNA batch and blood cell proportions estimated using the Houseman method 32 33 . We then calculated residuals, which were used as the outcome variable in a linear regression model in PLINK 28 to calculate the relevant CpG-SNP associations.
Finally, we excluded any mQTLs acting in trans (i.e. any SNP associated with a CpG site more than 1M base pairs away) and excluded any CpGs that have been flagged as potentially problematic (for example, cross-hybridising probes) according to a previous publication 34 .

G O Y A
We attempted to replicate mQTLs from ALSPAC using genotype and cord blood DNA methylation data from the Genetics of Overweight Young Adults (GOYA) cohort, which is a subset of the Danish National Birth Cohort (DNBC) 35 . Genotype and cord blood DNA methylation data were available for 1000 children. We replicated the methods described above for ALSPAC by first excluding individuals with missing genotype or covariate data, leaving 889 children and also removing SNPs with missingness (>5%) using PLINK. As in ALSPAC, we rank-normalised the methylation data to remove outliers and adjusted for covariates, potential batch effects and the influence of cell heterogeneity by regressing data points on sex, the first 10 ancestry principal components, DNA batch and blood cell proportions estimated using the Houseman method 32 33 . Residuals were then used as the outcome variable in a linear regression model in PLINK to calculate the relevant CpG-SNP associations. E xp r e s s i o n q u a n t i t a t i v e t r a i t l o c i ( e Q T L s ) a s g e n e t i c r i s k v a r i a n t s G T E x To identify eQTLs (SNPs associated with gene expression), we used Genotype-Tissue Expression (GTEx, www.gtexportal.org), which is a database of eQTLs generated using genotype and RNA sequencing gene expression data for 43 distinct tissue types from 175 individuals 36 37 .

N ES D A N TR C o n di t i o na l e Q T L C at a l o g
To explore the consistency of our findings, we also identified eQTLs using a second database: the NESDA NTR Conditional eQTL Catalog (https://eqtl.onderzoek.io/index.php?page=info). For this database, eQTLs were identified using genotype and gene expression microarray data from blood samples from 4896 individuals across two Dutch biobanks. Conditional eQTL analysis was applied to distinguish between dependent and independent eQTLs 38 .

D NA m et h yl at i on i n c h i l d r e n wi t h o r o f ac i a l c l ef t s B r a zi l i a n c oh o r t
To assess whether methylation at nsCL/P-associated CpGs (identified through MR) differs between nsCL/P cases and controls, we performed a look-up of results from a recently-published EWAS 15 . This EWAS compared blood DNA methylation profiles in 67 non-familial, nsCL/P cases and 59 age-and sex-matched controls from a Brazilian population. The average age at sampling was 5.29 years for cases and 6.45 years for controls. DNA methylation was measured using the Illumina Infinium HumanMethylation450 BeadChip platform.

Th e C l e f t Co l l e c t i v e
To explore whether methylation at nsCL/P-associated CpGs differs by cleft subtype, we compared mean methylation values in blood and matched lip/palate tissue samples from 150 children from the UK enrolled in the Cleft Collective birth cohort study. Methylation data were generated for a separate study, as previously described 2 . Briefly, a sample of 150 believed-to-be-non-syndromic children was randomly selected and stratified by cleft subtype: 50 with cleft lip only (CLO), 50 with cleft palate only (CPO), and 50 with cleft lip and palate (CLP). These children have been classified as non-syndromic because they have not been diagnosed as having any other anomaly, however, since the children are still very young, we cannot be completely sure of their non-syndromic status. Blood and either lip or palate tissue samples were available for each of the 150 children in this study. The orofacial tissue type was dependent on the OFC subtype; therefore, lip samples were available for children with CLO and palate samples for children with CPO. Of the 50 children with CLP, 43 contributed a lip sample and seven contributed a palate sample. Genomewide DNA methylation was measured using the Illumina Infinium HumanMethylation450 BeadChip platform and functional normalisation was performed on the blood and tissue samples together. Of the original 300 samples, three blood and two lip samples failed quality control. Surrogate variables were generated using the sva package in R to capture variation in the methylation data associated with technical batch and cellular heterogeneity 39 .
An a l ys is p ip e l in e Te s t i n g f o r m e d i a t i o n : M e n d e l i a n r a n d o m i z a t i o n o f t h e e f f e c t o f m et hy l a t i o n o n l i abi l i t y t o n s C L/ P nsCL/P meta-GWAS summary statistics for 543,150 SNPs were LD-pruned (r 2 <0.001) to 17,090 independent SNPs. These independent SNPs were then merged with 127,215 mQTLs from the ALSPAC mQTL database. After removing potentially problematic CpGs and CpGs acting in trans (which may increase the likelihood of horizontal pleiotropy), there were 7,091 independent CpG-SNP pairings for 6,425 distinct CpGs. We then used the MR-base R package 40 to perform twosample MR on all CpGs, using mQTLs as the exposure variables and nsCL/P as the outcome. In initial analysis, CpGs with one mQTL were tested using the Wald test and CpGs with more than one mQTL were tested using the Inverse Variance Weighted (IVW) method. To account for possible residual LD between mQTLs, CpGs with more than one mQTL, were retested adjusting for LD between the SNPs using a likelihood-based method 41 . Pair-wise SNP LD was computed using the Caucasian European (CEU) and British (GBR) populations in LDlink 42 . As a sensitivity analysis, we attempted to replicate the SNP-CpG associations with a Bonferroni-corrected MR p-value <0.05 in GOYA. To assess the possibility of reverse causation, we used MR-base to conduct the reverse two sample MR. Six genome-wide significant nsCL/P SNPs in Europeans 6 were used as the exposure and mQTLs from ALSPAC were used as the outcome. The IVW method was used as the primary analysis.
Tes t i ng f o r l i n k a g e: j o i nt -l i k el i h o od m ap p i n g t o a s s es s c ol o c a l i s a t i o n We used the Joint Likelihood Mapping (JLIM) package in R (jlim.R) 43 to test if liability to nsCL/P and methylation are driven by the same causal effect in each region of interest, i.e. the mQTL and nsCL/P risk variant are co-localised rather than simply being in LD. To distinguish between separate causal variants, we set the limit of genetic resolution in terms of r 2 to 0.8. 1000 genomes CEU data was used as the reference dataset for LD. Most CpGs were associated with only one independent mQTL, so we were not able to distinguish mediation/vertical pleiotropy (top left-hand panel of Figure 1) from horizontal pleiotropy (bottom right-hand panel of Figure 1).

C om p a r i s o n t o ge n e e x pr e s s i on
The previous steps identified CpGs that potentially mediate the effect of genetic variation on susceptibility to nsCL/P. Further evidence for a functional effect would be provided if mQTLs also affected gene expression. Therefore, we looked up relevant SNPs in two eQTL databases (GTEx 37 and NESDA NTR Conditional eQTL Catalog 38 ) and noted the estimated effect size and P-values for eQTLs in various tissues. C om p a r i s o n t o EW A S r es u l t s At identified CpGs, we looked-up the mean methylation values in nsCL/P cases and controls from the Brazilian EWAS study. We compared the direction of estimated effect and P-values obtained using the observational EWAS and MR approaches.
Ti s s u e a n d c l e f t -s u b t yp e -s p e c i f i c v a r i a t i o n At identified CpGs, we used data from the Cleft Collective to explore 1) whether methylation in blood was correlated with methylation at the site of the cleft (lip/palate), and 2) whether mean methylation varied according to cleft subtype (CLO, CPO or CLP). One-way ANOVA was used to compare the mean methylation of subtypes, adjusting for sex and surrogate variables designed to capture technical batch and cell composition effects.

Results
Te stin g fo r me d ia tio n : M en de lia n ra n d o mi za tio n of th e e ffe ct o f me th yla tio n o n lia b ility to n sCL /P To identify CpGs where methylation may mediate genetic liability to nsCL/P, we used two sample MR with mQTLs from the ALSPAC data as the exposure and liability to nsCL/P as the outcome (from the nsCL/P GWAS meta-analysis summary statistics). We found evidence for an effect of methylation on liability to nsCL/P at 26 CpGs after Bonferroni correction for 6,425 tests (Bonferroni-corrected P-value <0.05, corresponding to an uncorrected P-value <7.8x10 -6 ). Of these 26 CpGs, 20 were instrumented by single mQTLs and six were instrumented by two mQTLs each.
When the six CpGs with two mQTLs each were re-tested taking into account LD between the SNPs, only one (cg02598441 at LOC146880) survived correction for multiple testing. These 21 mQTLs were therefore taken forward to the reversecausation step.
As a sensitivity analysis, we investigated all 21 of the ALSPAC mQTLs in data from the GOYA cohort. 17 of the 21 CpG-SNP pairings passed quality control and were present in the GOYA data, of which 16 replicated in the same direction with P<0.05 (Supplementary Table 1).
Te stin g fo r re ve rse c au s atio n : M e n d e lian ran d o m iza tio n o f th e e ff e ct o f ge n e tic lia b ilit y to n sCL /P o n me th yla tio n Next, we tested if the association between the mQTLs and liability to nsCL/P arose because genetic liability to nsCL/P influences variation in methylation, by using two sample MR with genetic liability to nsCL/P as the exposure and methylation as the outcome. We found no evidence that genetic liability to nsCL/P influences variation in methylation at the 21 CpGs (Table 1). However, it should be noted that this step is very likely to be limited by statistical power Te stin g fo r l in ka g e : jo in t-like lih o o d ma pp in g to a sse ss coloc alis a t io n We used a co-localisation method to assess if there was evidence that methylation and liability to nsCL/P are driven by the same causal effect at each locus. Of the 20 CpGs instrumented by a single mQTL, we found evidence for colocalisation at four CpGs (cg11398452, cg01862363, cg02481697 and cg16107528), with three of these being associated with the same mQTL ( Table 1).
With the addition of the CpG with two mQTLs (cg02598441), we found strongest evidence that methylation at five CpGs are putative mediators of genetic liability to nsCL/P at four SNPs ( Table 1). Of these four SNPs, three SNPs were available in the imputed GOYA data (rs807647, rs1808191 and rs4752028). Two of the SNPs (intergenic rs8076457 and rs1808191 near PLEKHM1P1) consistently replicated as mQTLs in GOYA, with the third SNP rs4752028 replicating as an mQTL for two out of three CpG sites but not the CpG-SNP pairing with evidence of co-localisation (Supplementary Table 1).  * This region was too sparsely genotyped to apply the co-localisation analysis ** This CpG had two mQTLs, so we did not apply the co-localisation analysis Co mp a riso n b e twe e n m e th yla t io n a n d g e n e e xp re ssio n In a look-up of the four identified SNPs in the GTex and NESDA NTR Conditional eQTL databases, we found strong evidence that rs4752028 at SHTN1 (which is an mQTL for cg11398452 at VAX1) is an eQTL for the nearby SHTN1 gene (Table 2). There was also strong evidence that both rs1808191 at PLEKHM1P1 and rs1991401 at CEP95/DDX5 (which are mQTLs for cg02598441 at LOC146880) are eQTLs for six nearby genes, including CEP95 and DDX5, which were identified through both databases ( Table 2). There was no evidence that intergenic SNP rs8076457 (which is an mQTL for cg01862363, cg02481697 and cg16107528 at NTN1) is associated with gene expression ( Table 2).
The direction of estimated effect was also concordant between studies at the three CpGs at NTN1, but the smallest EWAS P-value was 0.12. At cg11398452 (VAX1), the direction of estimated effect was discordant between our MR analysis and the Brazilian EWAS, with a small EWAS P-value (9.4x10 -3 ) ( Table 3).
Tis su e a nd c le ft-su b t yp e -sp e cific va ria tio n At most of the five identified CpGs, methylation in blood, lip and palate tissues was somewhat correlated (correlation coefficients ranging -0.11 to 0.32), particularly between blood and lip tissue. We found weak evidence that mean methylation values in any of the three tissues differed between cleft subtypes. However, the analysis was likely underpowered to detect small to moderate correlations ( Table 3).

Discussion
In this study, we employed a framework that aims to identify putative mediators of genetic influences on nsCL/P via DNA methylation. We found five CpG sites, in three independent regions (VAX1, LOC146880, NTN1), with either evidence of co-localisation between variants influencing methylation and nsCL/P or evidence of two independent variants affecting both nsCL/P and methylation.
We found lower methylation at the CpG at VAX1 (cg11398452) in association with the nsCL/P risk allele C of the SNP rs4752028 at SHTN1. This SNP is strongly associated with lower expression of SHTN1 according to two eQTL databases.
VAX1 is a homeobox containing gene that has been shown to be expressed in the developing brain 44 45 and SNPs in VAX1 have been shown to be associated with nsCL/P in multiple independent GWAS across distinct populations 4 7 8 46 47 . VAX1 knock-out mice have been shown to develop cleft palate, suggesting VAX1 has a potentially important role in nsCL/P aetiology 4 . SHTN1, sometimes known as KIAA1598, codes for the protein shootin1 that is involved in neuronal polarization 48 and has also been reported to be relevant to the aetiology of nsCL/P in several studies 8 49 50 . It is difficult to distinguish the more significant locus between the VAX1 and SHTN1 genes because of their close proximity and similar expression profiles in mice and it is unclear which is the functional gene in the area 8 45 50 51 .
We did not replicate the association between methylation at cg11398452 and rs4752028 in the GOYA data, but we did find that the SNP was strongly associated with methylation at nearby probes. The lack of replication could be because of technical effects, ancestral differences between cohorts or the enrichment of GOYA for overweight and obese mothers, which may introduce selection bias. The direction of association between cg11398452 methylation and nsCL/P was opposite in our MR study compared to a previously published observational EWAS. However, there are several potential explanations for these discordant results, including differences in populations (European vs Brazilian), tissue (cord blood vs whole blood), age of participants (newborns vs children over six years old), or a lack of power giving rise to spurious associations in either analysis.
We found higher methylation at the CpG at LOC146880 (cg02598441) in association with the G allele of rs1991401 in DDX5 and the C allele of rs1808191 in PLEKHM1P. rs1991401 in DDX5 was associated with reduced expression of DDX5 and increased expression of CEP95 and MILR1 while rs1808191 in PLEKHM1P was associated with increased expression of SMURF2 but decreased expression of PLEKHM1P, RP13-104F24.3 and has-mir-6080. However, there was weak evidence that the SNPs affected expression of the same genes. DDX5 is involved in RNA helicase processes that are highly relevant to important cellular processes while PLEKHM1P and LOC146880 are pseudogenes 45 . There is no robust evidence from previous literature to support an association between genetic variation of these genes and nsCL/P. The SNP in PLEKHM1P replicated as an mQTL in the GOYA dataset but there was not sufficient data to test the SNP in DDX5.
We found lower methylation at three CpGs at NTN1 in association with the nsCL/P risk allele T of the SNP rs8076457, an intergenic SNP close to NTN1.
rs8076457, the mQTL for cg08162363, cg02481697 and cg16107528, did not robustly associate with gene expression levels in two datasets. The function of NTN1 is still largely unknown but is thought to be involved in cell migration during development 45 . NTN1 has been previously discussed as a strong candidate gene for nsCL/P 12 ; NTN1 may affect liability to nsCL/P via epistatic interactions, there is some evidence that NTN1 knock-out mice show consistency with the cleft palate phenotype and NTN1 expression is localised to the palate 12 52 . rs8076457 replicated as an mQTL across all relevant CpGs in the GOYA dataset.
Previous work has identified many functional possibilities for genetic risk variants for nsCL/P 14 52 53 but this study is the first to specifically look at the role of DNA methylation in mediating genetic susceptibility to nsCL/P. Additional strengths of this study include the integration of multiple data sources, for example ALSPAC, which provided access to detailed phenotype, genotype and epigenetic data. The nsCL/P GWAS summary statistics allowed a comprehensive genome-wide analysis in a large dataset. The Brazilian cohort EWAS results allowed a comparison of the influence of methylation on nsCL/P according to observational and MR studies. The use of the GOYA replication cohort, allowed triangulation of evidence for mQTLs across different studies. Finally, the Cleft Collective data allowed us to compare genome-wide DNA methylation in different tissues and subtypes of cleft.
There are, several limitations to this study. First, methylation and expression in the studied tissues (postnatal cord blood, whole blood, lip and palate tissue) may not accurately reflect that in the developing orofacial tissue where epigenetic processes could feasibly influence susceptibility to nsCL/P. However, a previous study has identified a high correlation between blood and lip tissue, both taken at the time of first surgery in a UK cohort of patients with non-familial nsCL/P 15 . Previous analysis looking for tissue-specific signals for nsCL/P did not find evidence of enrichment and concluded that this may be due to tissue type differences 2 . Second, cleft lip only (CLO) and cleft lip and palate (CLP) cases were analysed together as one group in the GWAS, MR analyses and the previously published EWAS.
Increasingly, evidence suggests that these subtypes are molecularly and aetiologically distinct and should be analysed separately 2 3 , but we were limited by the data available from previous studies. Although we found no evidence of differential methylation between subtypes at our five identified CpGs, there may be other loci where methylation mediates genetic influences on more specific cleft subtypes. Third, although efforts were made to select only non-syndromic cases for the Cleft Collective analysis, we cannot guarantee that no syndromic cases were included, and children with syndromes may have very different methylation profiles.
Fourth, a major limitation of this study is that some of the steps, particularly the reverse MR, are likely to be statistically underpowered. Fifth, as the majority of mQTLs were instrumented by just a single genetic variant, we were unable to distinguish between mediation and horizontal pleiotropy and therefore proposed mediation is putative. Finally, although mQTLs were largely concordant between ALSPAC and GOYA, the mQTL (in SHTN1) found to co-localise with liability to nsCL/P in ALSPAC did not replicate in GOYA.
In conclusion, we identified three putative loci where DNA methylation may mediate genetic susceptibility to nsCL/P. Future work, determining the function of these genes and the epigenetic modulation of their expression relevant to prenatal orofacial development could provide important aetiological insights.  m  e  a  s  u  r  e  s  o  f  c  e  l  l  m  i  x  t  u  r  e  d  i  s  t  r  i  b  u  t  i  o  n  .   B  M  C  B  i  o  i  n  f  o  r  m  a  t  i  c  s   2  0  1  2  ;  1  3  (  1  )  :  8  6  .   3  3  .  R  e  i  n  i  u  s  L  E  ,  A  c  e  v  e  d  o  N  ,  J  o  e  r  i  n  k  M  ,  e  t  a  l  .  D  i  f  f  e  r  e  n  t  i  a  l  D  N  A  m  e  t  h  y  l  a  t  i  o  n  i  n  p  u  r  i  f  i  e  d  h  u  m  a  n  b  l  o  o  d   c  e  l  l  s  :  i  m  p  l  i  c  a  t  i  o  n  s  f  o  r  c  e  l  l  l  i  n  e  a  g  e  a  n  d  s  t  u  d  i  e  s  o  n  d  i  s  e  a  s  e  s  u  s  c  e  p  t  i  b  i  l  i  t  y  .   P  l  o  S  O  n  e   2  0  1  2  ;  7  (  7  )  :  e  4  1  3  6  1  .   3  4  .  Z  h  o  u  W  ,  L  a  i  r  d  P  W  ,  S  h  e  n  H  .  C  o  m  p  r  e  h  e  n  s  i  v  e  c  h  a  r  a  c  t  e  r  i  z  a  t  i  o  n  ,  a  n  n  o  t  a  t  i  o  n  a  n  d  i  n  n  o  v  a  t  i  v  e  u  s  e  o

S u p pl e m e n t a r y M a t e r i a l ns C L/ P G W AS m e t h o ds
The transmission disequilibrium test (TDT) 1 evaluates the frequency with which parental alleles are transmitted to affected offspring and is a family based association test of genetic linkage in the presence of genetic association. The TDT was run on 638 parent-offspring trios and 178 parent-offspring duos of European, descent, publicly available from dbGAP, to determine genome-wide genetic variation associated with nsCL/P. GWAS genotypes and phenotypes available at dbGaP (https://www.ncbi.nih/gov/gap; accession number phs000094.v1.p1).
The Bonn-II study 2 summary statistics from a case-control GWAS of 399 nsCL/P cases and 1,318 controls were meta-analysed using a fixed effect inversevariance weighted method, in terms of effect size and standard error, with the TDT GWAS summary statistics using METAL 3 based on a previously described protocol for combining TDT and case-control studies 4 . The final sample including 1215 cases and 2772 controls.

A v o n L o n gi t u di n a l S t u d y o f P a r e n t s a n d C hi l dr e n ( A LS P A C )
To identify mQTLs (SNPs associated with DNA methylation), we used data ALSPAC methylation and genotype data have previously been used to generate a database of mQTLs (http://www.mqtldb.org/) 9 . The database contains summary statistics for all mQTLs with a P-value <1*10-5 for the association between SNP and CpG. For part of our study, we required specific CpG-SNP associations that were unavailable from mQTLdb.org. Therefore, for required CpGs, we replicated the methods in the original study: we excluded individuals with missing genotype or covariate data, leaving 787 children. We then rank-normalised the methylation data to remove outliers and then controlled for covariates, potential batch effects and the influence of cell heterogeneity by regressing data points on sex, the first 10 ancestry principal components, bisulfite-converted DNA batch and blood cell proportions 10 11 estimated using the Houseman method. The residuals were then used as the outcome variable in a linear regression model in PLINK 12 to calculate the relevant CpG-SNP associations.

Genetics of Overweight Young Adults (GOYA)
The