Differential methylation as a mediator of COVID-19 susceptibility

The COVID-19 outbreak shows a huge variation in prevalence and mortality on geographical level but also within populations1. The ACE2 gene, identified as the SARS-CoV2 receptor, has been shown to facilitate the viral invasion and people with higher ACE2 expression generally are more severely affected2, 3. As there is a lot of variability in ACE2 expression between individuals we hypothesized that differential DNA methylation profiles could be (one of) the confounding factors explaining this variability. Here we show that epigenetic profiling of host tissue, especially in the ACE2 promoter region and its homologue ACE1, may be important risk factors for COVID-19. Our results propose that variable methylation can explain (part of) the differential susceptibility, symptom severity and death rate for COVID-19. Our findings are a promising starting point to further evaluate the potential of ACE1/2 methylation and other candidates as a predictor for clinical outcome upon SARS-CoV2 infection.


List of Abbreviations
As of August 15th, 2020, SARS-CoV2 is responsible for more than 21 million detected COVID-19 cases and at least 762k deaths worldwide 4 . The outbreak of the COVID-19 pandemic is especially unnerving because it's hard to predict how the virus will affect any individual person. As symptoms vary a lot between infected persons -if they experience them at all -determining if a person is indeed infected by this novel coronavirus is a surprisingly hard and tricky question to answer without testing 5 .
The Centers for Disease Control and Prevention (CDC) reports that the main symptoms are fever/chills, headache, muscle pain, fatigue, coughing, sore throat and shortness of breath, which can appear somewhere between two and 14 days after exposure to the virus 6 . Confirmed COVID-19 cases have shown a wide range of reported symptoms, going from mild to severe illness. According to the CDC, approximately 80% of infected people present few to mild symptoms, while others have a more severe manifestation of the disease, with extreme cases relying on a ventilator to breathe.
There are a few clear risk factors 7 , including age and general health status but even within these subgroups, there is a huge range in severity. It has been shown that next to age and health status, also socio-economic and environmental factors 8 , sex 9 , and even vitamin D status 10 are associated with the immune response and differential susceptibility. Identifying what factors are responsible for the interindividual response to the virus is of utmost importance to aid in the identification of population groups at higher risk and to guide more effective strategies and protective measurements.
It should be noted that this difference in individual susceptibility is actually not that unusual for infectious diseases, and thus not unique for COVID-19. The same is true for tuberculosis, malaria and the 'common' flu 11 . The underlying reason, however, is mostly disease-specific due to the fact that the biological pathways involved in the manifestation of the illness often differ.
Certain genomic variants play a role in the specific immune response to viral infections. The last weeks, efforts were initiated to look into the role of genetics, and if each person's unique genetics may affect their susceptibility and severity to COVID-19. Especially the interplay between the ACE genes and SARS-CoV2 has been of great interest.
The angiotensin-converting enzyme 2 (ACE2) gene, which been identified as the SARS-CoV2 receptor, has been shown to facilitate in the viral invasion 2 . While most COVID-19 studies currently focus on ACE2, a Belgian research group very recently released data from 33 European countries where they looked at the gene coding for angiotensin-converting enzyme 1 (ACE1) 12 . Although ACE2 and ACE1 share only 42% of amino acid identity, they both act as carboxypeptidases to cleave amino acids from the peptides' carboxyl terminal. The ACE1 enzyme is characterized by a genetic deletion/insertion (D/I) polymorphism in intron 16, which is associated with alterations in circulating and tissue concentrations of ACE. Their results demonstrate that prevalence and mortality of COVID-19 infection correlate with the D allele frequency of the ACE1 D/I polymorphism. Another research group recently has launched a similar hypothesis 3 . Interestingly, the D allele has proven to be associated with a reduced expression of ACE2 3 .
It was already known that expression of ACE2 is significantly increased amongst people who smoke or suffer from diabetes, hypertension, heart diseases and other conditions 13 , and people with higher ACE2 expression generally seem also more severely affected by SARS-CoV2 14 . Interestingly, this mechanistically implicated ACE2 gene has been shown to be epigenetically regulated. Taken together with the patient characteristics for severe COVID-19 outcome this substantiates an essential role for epigenetic regulation.

Results
To explore the importance of ACE2 methylation, publicly available DNA methylation data from 5 research studies (~1000 samples) was used. All data originated from blood from "healthy" people and methylation analysis was performed on Illumina Infinium HumanMethylation450 BeadChip (450K) DNA methylation arrays. As in all Illumina methylation assays, methylation values of each probe are expressed as β values, which range from 0 (no methylation) to 1 (fully methylated). Raw β values were preprocessed, cleaned and subsequently normalized. In next step, the 8 probes falling into the ACE2 region were fetched and visualized. Figure 1 displays for each individual 8 probe the corresponding methylation levels per sample. The x-and y-axis show each sample's age and its normalized β value for that position/probe, respectively. Probes are sorted per genomic position in ACE2 (left to right per row).
Two groups are noticeable: (i) the big cloud of samples marked with circles and a lighter color, and (ii) the smaller cloud of darker triangles. This latter group represents data from a cohort of centenarians. Of special interest are the promoter (flanking) regions. Here, TSS200 covers the region from transcription start site (TSS) to -200 nucleotides upstream; TSS1500 represents the region -200 to -1500 nucleotides upstream of TSS. Probes falling into these regulatory regions are colored in orange, red and purple (TSS1500/200 + 1stExon/5'UTR) It is apparent that in general upon aging the average methylation levels gradually decline while the spread increases. However, when only looking at the group of centenarians methylation levels appear higher and show a remarkable higher consistency ( Table 1).
There is a lot of variability in ACE2 expression between individuals and many variants of ACE2 are associated with medical conditions such as diabetes, hypertension and cardiovascular disorders 13 . Thus, it could be that such predisposing genetic variants may also contribute to the susceptibility and explain the enormous variability in infection rate, symptom development/severity and death rate. But, while definitely plausible, to date, limited associations have been found between ACE2 variants and response to SARS-CoV2 infection. As ACE2 plays a role in the SARS-CoV2 infection pathway and ACE2 expression seems very variable between individuals, differential DNA methylation of ACE2 could be one of the confounding factors explaining (part of) this variability. The first column shows the squared Spearman correlation coefficient (r 2 ) between methylation and age. The second and third columns display for each probe the median β value as well as the interquartile range (IQR) for the non-centenarian group (°) and the centenarian group (*), separately. Hoffmann and coworkers showed that in addition to ACE2, SARS-CoV2 infection also depends on the host cell factor transmembrane protease serine 2 (TMPRSS2), a cellular protease. While ACE2 is used for initial binding and host cell entry, the spike protein of SARS-CoV2 is primed by TMPRSS2. A process of which the authors demonstrate can be blocked by a clinically proven protease inhibitor 15 . The TMPRSS2 gene is known to be regulated by androgen 16 . Interestingly, inhibition of the androgen receptor by for example androgen deprivation therapy (ADT) decreases TMPRSS2 levels 17 -the same effect as higher (promoter) methylation -resulting in less viral priming options and thus less viral entry 18 .
DNA methylation is not static. With age, the methylation state of various genes may change 19 . It is also known that general health status and lifestyle has an impact on the methylation signatures. These changes are quantifiable and serve as a means to determine one's "epigenetic age" which often differs from the chronological age 20 . Epigenetic age is a collection or footprint reflecting a combination of a person's genetic make-up but also his/her history of past external experiences/exposures. A compelling example is the case of super-centenarians (people who live > 100 years old and seem to age very healthy). When looking at their epigenetic age, it is significantly less than their chronological age. In these supercentenarians, the underlying reason is probably a combination of both genetic predisposition and lifestyle.
Could differential ACE2 promoter methylation in host tissue be causative to the differential expression that is seen amongst COVID-19 patients? Could variable DNA signatures of ACE2 (and TMPRSS2) explain (at least part of) the differential susceptibility, symptom development/severity and death rate for COVID-19? Here, we hypothesize that epigenetic profiling, especially DNA methylation signatures, may indeed permit discovery of population-, age-and gender-related risk factors for COVID-19. The more consistent methylation levels in the centenarian cohort is at least very compelling.
Our findings are a promising starting point to further evaluate the potential of ACE2 and TMPRSS2 methylation as a predictor for clinical outcome upon SARS-CoV2 infection. Note that blood DNA represents a mixture of DNA from various leucocyte types. However, due to close contact in the alveoli, methylation patterns found in blood can closely reflect methylation in lung tissue. Other studies focusing on the effect of smoking on DNA methylation indeed found that the effects in blood samples were very similar to the changes in lung tissue 21,22 . But to prove any association, follow-up research is needed on respiratory samples from patients. We are now collecting these samples and aim to compare methylation profiles of ACE2, TMPRSS2 and other interesting candidate genes from both COVID-19 negatives and positives (with ranging severity/complications). As the virus continues to spread, more mild cases will arise, and healthcare professionals need to recognize these to accurately portray total numbers of COVID-19 infections. Differentiating mild and moderate from severe disease may also help clinicians in more accurately triaging cases who need medical attention and minimize the risks on the population, health systems, and economy.

Methods
Methylation was measured on Illumina's 450K methylation array for which raw data was downloaded from the Gene Expression Omnibus (GEO) 23 (GSE30870, GSE32149, GSE36064, GSE41169, GSE42861). Details on the individual data sets and characteristics of the study cohort can be found in Table 2 26 . In BMIQ, the β-values of type II probes are adjusted into a statistical distribution characteristic of type I probes. After normalization, probes corresponding with ACE2 were fetched and the β-values of each sample were plotted against the subjects' respective chronological age.