Grandmaternal smoking during pregnancy is associated with differential DNA methylation in their grandchildren

The idea that information can be transmitted to subsequent generation(s) by epigenetic means has been studied for decades but remains controversial in humans. Epidemiological studies have established that grandparental exposures are associated with health outcomes in their grandchildren, often with sex-specific effects; however the mechanism of transmission is still unclear. We conducted Epigenome Wide Association Studies (EWAS) to test whether grandmaternal smoking during pregnancy is associated with altered DNA methylation (DNAm) in their adolescent grandchildren. We used data from a birth cohort, with discovery and replication datasets of 1225 and 708 individuals (respectively), aged 15-17 years, and tested replication in the same individuals at birth and 7 years. We show for the first time that DNAm at a small number of loci is associated with grandmaternal smoking in humans, and their locations in the genome suggest hypotheses of transmission. We observe and replicate sex-specific associations at two sites on the X chromosome, one located in an imprinting control region and both within transcription factor binding sites (TFBSs). In fact, we observe enrichment for TFBSs among the CpG sites with the strongest associations, suggesting that TFBSs may be a mechanism by which grandmaternal exposures influence offspring DNA methylation. There is limited evidence that these associations appear at earlier timepoints, so effects are not static throughout development. The implication of this work is that effects of smoking during pregnancy may induce DNAm changes in later generations and that these changes are often sex-specific, in line with observational associations.


41
The idea that information can be transmitted to subsequent generation(s) by epigenetic means 42 remains controversial in humans (1). The terminology used in the literature on this topic is not 43 always consistent; here we use the term transgenerational to include all transmissions from 44 one generation to subsequent generations. Of all epigenetic mechanisms that might be 45 involved in transmission of information between generations, DNA methylation (DNAm) is a 46 strong candidate because it is heritable over cell division. A frequent argument against this is 47 7 complete" cell type reference. Because known covariates can be imperfect and miss sources of 137 unwanted variation, we conducted a sensitivity analysis adjusting for surrogate variables (using 138 surrogate variable analysis (SVA)(27) as implemented in meffil) where we assessed correlation 139 between effect sizes of the SVA and known covariates models. As there was high correlation 140 between effect sizes (>0.97) the known covariates model was used for all analyses -the high 141 correlation suggests that the main model accounted for all substantial sources of DNAm 142 variation, and SVA risks removing biologically interesting sources of variation in the data. 143 Testing for replication 144 We used three complementary approaches to test for replication of the sites most strongly 145 associated with the exposure in the discovery dataset, as no single measure can capture this. 146 Firstly, we took the 25 top associated sites from the EPIC analyses that were also present on 147 the 450k array and assessed them for association in the 450k analyses at the equivalent of 148 p<0.05/25. Secondly, we correlated effect sizes between the discovery and replication 149 datasets, for the top 10, 25, 50, 100 and 200 sites identified in each discovery EWAS. Finally, 150 we conducted a binomial test for each discovery EWAS to ascertain whether the top 10, 25, 50, 151 100 and 200 sites replicated at p<0.05 with the same direction of effect. 152

153
For each of the six EWAS (maternal grandmother smoking: all individuals, males, and females; 154 and paternal grandmother smoking: all individuals, males, and females), we meta-analysed 155 results from the EPIC and 450k analyses at 15-17 years, using all sites common to both arrays. 156 We performed meta-analysis of the effect sizes and standard errors using METAL (28). 157

158
As previous work has identified that associations between ancestral exposures and health we tested the hypothesis that sites on the X chromosome would associate with grandmaternal 161 smoking during pregnancy. We tested the X chromosome separately for each sex-stratified 162 EWAS and meta-analysis, adjusting for X chromosome significance (p<2.7e-06 in EPIC, p<4.5e-163 06 in 450k and p<4.9e-06 in the meta-analysis). 164

165
As some DNAm sites have been shown to escape the wave of de-methylation in germ cells (4), 166 we tested the hypothesis that these sites are associated with grandmaternal smoking during 167 pregnancy. To do this we took the 116,618 regions of the genome that have been identified as 168 escaping de-methylation (4) (which have recently been made available as supplementary 169 material in a bioRxiv paper (29)). We identified all sites on the EPIC array that were within 170 those genomic regions (n=36,051) and tested them for association with grandmaternal 171 smoking at Bonferroni corrected significance (p<0.05/36051=1.4e-6). 172 Imprinting control region analysis 173 As ICRs are not subject to the phases of de-methylation and re-methylation in the early 174 embryo, we sought to test whether DNAm sites in identified regions might associate with 175 grandmaternal smoking. We took the set of 984 DNAm sites present on the EPIC array 176 identified as being within ICRs at FDR<0.05 (30). We tested these sites for association with 177 grandmaternal smoking at Bonferroni corrected significance (p<0.05/984=5.1e-5). There were 178 29 ICR sites that overlapped with the escapees. 179

180
To ascertain whether any sites associated with grandmaternal smoking at 15-17 years are 181 differentially methylated from birth, we repeated each EWAS using DNAm profiles for ALSPAC 182 participants from blood samples collected at birth and 7 years (see supplementary table 1 for  183 participant numbers). We included the same covariates as for the adolescents, aside from at birth where gestational age was substituted for age. As the birth and 7 years DNAm profiles 185 were assayed from different sample types (blood spots and white cells at birth; white cells and 186 whole blood at 7 years), sample type was also included as a covariate. In addition to using this 187 analysis to assess replication of associations in the 15-17-year-olds, we assessed the opposite, 188 replication of associations at the birth and age 7 in the 15-17-year-old discovery dataset. 189 Transcription factor binding site (TFBS) enrichment analysis 190 To test the hypothesis that differential DNAm associated with grandmaternal smoking might 191 be mediated by TFs preserving or maintaining methylation status, we tested whether DNAm 192 sites were located near TFBS more than expected by chance. To do this we took the top 25 193 sites from each discovery EWAS and tested them for TFBS enrichment against all sites on the 194 EPIC array used in our EWAS (n sites=838,019) using LOLA locus overlap (31). We used the 195 Encode TFBS (32, 33) region set created by the LOLA team, comprising ChIP-seq data on 161 196 TFs, which is available through http://lolaweb.databio.org. We tested 100bp on either side of 197 the DNAm site, removing overlapping sites to prevent inflation of results. Results were 198 reduced to TFBS measured in blood which were associated in at least one EWAS at p<0.05. To 199 assess whether individual sites identified in the main analysis were associated with a TFBS, we 200 used the hg19 version of the UCSC genome browser (34); https://genome-euro.ucsc.edu/. 201 Enrichment of prenatal-and own smoking-associated sites 202 We tested the hypothesis that DNAm sites that are established as being associated with 203 prenatal smoking and own smoking would be enriched in our EWAS associations, to ascertain 204 whether transgenerational transmission might be related to these sites. To do this we 205 evaluated statistical inflation of EWAS associations among the 568 DNAm sites (of which 540 206 were available on the EPIC array) previously reported to be associated with maternal prenatal 207 smoking in cord blood (17), and the 2623 sites (2445 available on the EPIC array) reported to 208 be associated with own smoking (35). For each, inflation beyond expected levels was 209 evaluated by generating QQ plots and lambda values. We then used a one-sided Wilcoxon rank 210 sum test to ask if DNAm sites associated with prenatal-and own-smoking had lower p-values 211 in our EWAS than expected from a random selection. 212 Enrichment of lean mass-associated sites 213 We finally sought to identify whether DNAm sites associated with grandmaternal smoking 214 might be related to lean mass (a previously reported epidemiological association (13)). 215 Although no published EWAS of lean mass is available, 47 sites associated with lean mass in 216 the mothers in ALSPAC at p<1e-04 are available in the EWAS catalog (36); 217 http://www.ewascatalog.org/. We checked for inflation of these sites in our data using QQ 218 plots and lambda values, and tested enrichment for these sites using a Wilcoxon rank sum test. 219

221
Of the 1869 individuals with EPIC array DNAm profiles passing QC, we removed 267 because 222 they were either of non-white ethnicity or had missing ethnicity data -this was because non-223 white ethnicity was associated with lower rates of smoking for both maternal and paternal 224 grandmothers (p=0.03 and 0.007, respectively). Of the remaining 1602 participants, 285 were 225 removed because their mother reported that she smoked during her pregnancy, and 73 226 further individuals were removed because they reported smoking themselves. Of the 910 227 individuals with 450k DNAm data passing QC and filtering, 125 individuals were removed 228 because their mother reported smoking during pregnancy, and a further 59 were removed as 229 they reported smoking themselves. All individuals in the 450k dataset were of white ethnicity. 230   When testing the X chromosome, only one association survived adjustment for multiple tests 270 (p < 2.7e-6). The association was with paternal grandmother smoking in the males 271 (cg27456137; p=1.9e-06); Error! Reference source not found.. The probe for this site has been 272 flagged (23) as cross-hybridising to a 49bp sequence 500bp from cg27456137. Three probes on 273 the EPIC array reside within that 49bp sequence; however none were associated with either 274 grandmother smoking near genome-wide significance (all p>0.03).

276
When testing whether DNAm sites located within escapee regions were associated at the 277 Bonferroni corrected p-value p<1.5e-06 in the discovery dataset, we find no sites associated 278 with maternal or paternal grandmother smoking. 279 Imprinting control regions 280 We similarly tested the hypothesis that transmission might involve sites within ICRs. We 281 observe one association that survives correction for multiple tests (p<5.1e-05); the association 282 is with paternal grandmother smoking (cg15068552, p=2.2e-05); Error! Reference source not 283 found.. 284

Testing associations and replication earlier in life 285
In cord blood, we find one site associated with maternal grandmother smoking in all 286 individuals, and two sites associated with paternal grandmother smoking in females (see Error! either grandmother smoking in any of the six analyses. All associations p>1e-04 using the main 289 model are reported in supplementary tables 16-27. None of these associations were observed 290 at adolescence (i.e., in the main discovery dataset) below the p<0.05/3 threshold (all p>0.07). 291 We then tested whether two of the three associations observed at adolescence (i.e., in the 292 main discovery dataset) were observed at birth and at 7 years (cg15068552 in all individuals 293 when the paternal grandmother smoked, and cg19782749 in females when the maternal 294 grandmother smoked; cg27456137 could not be tested because it was not measured by the 295 450k array). We see a suggestion of replication at cg15068552 at birth in all individuals when 296 the paternal grandmother smoked (p=0.02), and at cg19782749 at 7 years in females when the 297

301
Transcription factor binding site analysis  Among sites associated with prenatal smoking, we observe some inflation for associations with 318 paternal grandmother smoking in males (lambda=1.27  0.13) and females (lambda=1.12  319 0.11). This inflation is replicated only for males in the 450k dataset (lambda=1.46  0.12). 320 Among sites associated with own smoking, there is weak inflation for associations with 321 paternal grandmother smoking in females (lambda=1.160.05), but this association is not 322 replicated. Inflation results are summarised in Table 3

329
In summary, we find some evidence for effects of grandmother smoking on DNA methylation 330 in her adolescent grandchildren; on the X chromosome, in an ICR, in TFBS, and among prenatal 331 smoking-associated DNAm sites. We also find three sites associated with grandmaternal 332 smoking in cord blood, but these associations do not appear to persist. In most cases, 333 associations appear to be sex-specific in line with previous research (8-10). Associations are 334 summarised in Figure 2.

339
We find some evidence for mechanisms by which DNAm might be preserved through phases of 340 de-and re-methylation in the germ cell. Two of the six sites we identify are on the X 341 chromosome, giving a possible route by which sex-specific differences in transmission of 342 responses across generations might occur. We find evidence suggesting TFs might have a role 343 in the transmission of epigenetic responses to smoking across generations -both from the 344 enrichment analysis, and the location of all six individual sites within TFBS. We find evidence 345 for a single site residing within an ICR, but find no evidence for sites in regions known to 346 escape de-methylation in germ cells. Finally, we find suggestive evidence of replication of two 347 sites identified in adolescents in earlier DNAm samples (one at birth and one at 7 years), 348 although no site replicates across all three timepoints. 349 We find evidence of inflation and enrichment of sites associated with prenatal smoking only in 350 males when their paternal grandmother smoked, and do not find consistent inflation of sites 351 associated with own smoking. This could suggest that grandmaternal smoking affects DNAm 352 through different mechanisms to maternal smoking. The inflation we see in males is contrary 353 to previous null prenatal findings (18); the reason for this discrepancy may be that we test a 354 larger number of sites. We do not see any inflation or enrichment of lean mass associated DNAm sites in our analyses, suggesting that the differences in lean mass observed previously 356 (13) may not be related to differences in DNAm. 357 Because TFBS are a consistent feature of our findings, our study supports the idea that DNAm 358 changes may be linked to ancestral smoking by TF binding events. These binding events could 359 either shield DNAm from being modified in early development or induce DNAm changes 360 consistent with ancestral smoking, as DNAm status can be restored by TFs during germline and 361 embryonic development following erasure (6, 7). However it is not clear why the associations 362 we do see would change over time, and so we cannot rule out the possibility that we find 363 differences at these DNAm sites due to another factor that is influenced by grandmaternal 364 smoking, such as parental behaviour. We suggest TFBS might present the most promising line 365 of future work in transgenerational epigenetic responses in humans. 366 Strengths of our study are that we assessed grandmaternal smoking effects in a large cohort of 367 humans with ancestral smoking data, alongside rich phenotypic data. We have DNAm data 368 from birth so were able to assess whether DNAm differences at these sites are present 369 between birth and adolescence. Limitations include that the 450k and EPIC array platforms 370 only cover around 2% and 4% of the genome, respectively, and that our replication dataset 371 came from the same birth cohort as the discovery data. 372 Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. 388