Amplification of the epigenetic (gestational) age acceleration signal

Background Epigenetic (gestational) age acceleration (E(G)AA) is associated with environmental exposures and health outcomes in humans. However, E(G)AA is the residual term from a regression of epigenetic age (outcome) on chronological (gestational) age (predictor) and therefore strongly obscured by ‘noise’ from multiple sources. Here, we propose a simple procedure, based on regression, principal component analysis (PCA), and the Lasso, that amplifies E(G)AA signals. More specifically, we first regress given (gestational) age against each CpG used for epigenetic (gestational) age prediction. The CpGs are typically taken from one of several epigenetic clocks available. PCA is subsequently performed on the resulting matrix of residual vectors for each CpG as it projects the E(G)AA signal onto perpendicular principal components (PCs), thereby separating ‘signal’ from noise. Finally, we use the Lasso to select PCs associated with an outcome of interest. We apply our method to previous studies: EAA in patients with Down’s syndrome and Werner’s syndrome and EGAA of newborns exposed to prenatal smoking as well as associations with maternal BMI. Results The extracted EAA components computed using our proposed procedure revealed a significant association with Down’s syndrome (PB<0.05, Bonferroni adjusted for multiple testing) as well as for Werner’s Syndrome (PB<0.05). For EGAA we find a significant association with maternal prenatal smoking (PB<0.05, also Bonferroni adjusted) and maternal BMI (PB<0.05). Additionally, by examining the loadings of the PCs of interest, and contrary to residual EGAA, our method can identify implicated CpGs. Conclusions Our findings suggest that our proposed procedure leads to a remarkable amplification of the E(G)AA signal. Furthermore, our method reveals that E(G)AA is a composite signal that can be driven by multiple independent factors.

model itself [20]. We here suggest a simple procedure to extract and amplify EAA (or EGAA) 66 signals from 'noise'. We demonstrate our procedure on 1) EAA with respect to Down's 67 syndrome and Werner's syndrome [15] 2) EGAA with respect to prenatal maternal smoking 68 and association with maternal BMI [18,21]. We also compare how our method performs with 69 respect to standard residual E(G)AA.

72
The procedure 73 To boost the E(G)AA signal, we first extract the CpGs associated with epigenetic  Revisited literature and study samples 84 We first revisited the study "Accelerated epigenetic aging in Down syndrome" [15]. This 85 study was selected for the two following reasons: 1) The association between EAA and Down 86 syndrome is already considered to be solid, and 2) the DNAm data used in the article is 87 publically available (GSE52588,  89 Horvath skin and blood clock [9], all of which can be applicable to GSE52588 (Figures 1 and   90 2). In addition, we also re-examined a study on Werner's syndrome [16] (GSE131752, Table   91 1) where we also consider the same epigenetic clocks as mentioned above for Down's 92 syndrome (Additional file 2, Figures S1 and S2). 93 Secondly, we tested our EGAA-extraction method on cord blood DNAm, taken from 94 newborns from the MoBa cohort study (MoBa2, n=685, Table 1), to assess EGAA previously 95 reported with regards to maternal BMI and maternal smoking [18,21]. Case 1: Extracted EAAs in adults with Down'sand Werner's syndrome 98 We tested whether residual EAA differs between unaffected controls (n=29) and patients with 99 Down syndrome (n=29  Figure 1).

104
Our PCA based EAA extraction method was applied to the same dataset (Details of the EAA 105 extraction method can be found in the Methods section). The Lasso method extracted several 106 PCs (we refer to these as eEAAs), and we tested the differences in the PCs between patients 107 with Down's syndrome and controls. In Figure 2, we demonstrate this using two eEAAs 108 (eEAA2 and eEAA3, i.e., the second and third PC) from Hannum Figure 2 with Figure 1). In addition, the fact that several independent 115 eEAAs were found to be significantly associated with Down's syndrome suggests that EAA is EAAs were found to be substantially weaker than those with eEAAs resulting from our PC-119 based method (PB<0.05, Figures S1 and S2, Additional file 2).

120
Case 2: Extracted EGAAs in newborns exposed to prenatal smoking and maternal BMI 121 In MoBa2 (n=685), we associated residual EGAA with maternal smoking during pregnancy.

122
The standard residual-based EGAA was defined as the residuals from a regression of EGA on  126 We then applied our E(G)AA-extraction method and associated the resulting eEGAAs with 127 prenatal smoking ( Figure 3). We found that eEGAA9 and eEGAA90 were significantly 128 associated (PB < 0.05, Bonferroni corrected) with prenatal smoking. Since effects from 129 prenatal maternal smoking on newborns' methylome have been studied extensively we  We also found a statistical association between eEGAA5 and maternal BMI (PB<0.05, Figure   134 4) but not for residual EGAA (P=0.85). Two studies have previously reported the association 135 between maternal BMI and EGAA [18,21].

137
Our new procedure amplifies eE(G)AA signals as compared with previous findings on 138 residual E(G)AA. The core of this method lies in its micro-level approach which focuses on 139 associating CA with each of the CpGs included in epigenetic clocks, separately (Additional 140 file 4, Figure S4). The method thereby breaks the composite residual E(G)AA signal into 141 multiple different independent components using PCA. Expectedly, in many of the CpG-CA  numerous CpGs with small effects that would not have been identified separately. 168 One general weakness with the Lasso is that it can select covariates that are associated

210
Description of the procedure 211 We assume CA (or GA) to be a linear combination of DNAm levels at n CpG sites as follows.
Where is the DNAm level at the j-th CpG site, and is an error term assumed to be 214 normally distributed with mean 0 and variance equal to 2 (i.e., ~ (0, 2 )).