The MERS-CoV receptor gene is among COVID-19 risk factors inherited from Neandertals

In the current SARS-CoV-2 pandemic, two genetic regions derived from Neandertals have been shown to increase and decrease, respectively, the risk of falling severely ill upon infection. Here, we show that 2-8% of people in Eurasia carry a variant promoter region of the DPP4 gene inherited from Neandertals. This gene encodes an enzyme that serves as a receptor for the coronavirus MERS-CoV and is currently not believed to be a receptor for SARS-CoV-2. However, the Neandertal DPP4 variant doubles the risk to become critically ill in COVID-19.

3 2020). We find that under the rare disease assumption, the Neandertal-like alleles are associated with ~80% increased risk per allele of being hospitalized upon infection with SARS-CoV-2 (Supplementary Table S1). The most strongly associated SNP (rs117888248) has an odds ratio of 1.84 (95% CI: 1.41-2.41, p = 7.7e-6). The risk for carriers of this allele of requiring mechanical ventilation is increased by ~109% (OR = 2.09, 95% CI: 1.44-3.03, p = 1.2e-4). The Neandertal-like alleles form a ~26.3 kb-haplotype (r 2 >0.8). Of the 15 SNP defining the haplotype (Supplementary   Table S1), 14 carry alleles seen in hetero-or homozygous forms in a Neandertal genome (Prüfer et al. 2017). This haplotype is derived from Neandertals (p = 0.023) according to a published formula (Huerta-Sanchez et al. 2014) and using parameters as previously described ( 1B). Both these risk haplotypes have stronger effect sizes than the protective Neandertal haplotype on chromosome 12, which decreases the risk of becoming severely ill by ~23% (Zeberg and Pääbo 2020b). The Neandertal DPP4 haplotype is present in ~1% of Europeans, ~2.5% in South Asians ~4% in East Asians, and ~0.7% in admixed Americans (Fig. 1C). It is absent among Africans south of the Sahara.
We calculated the statistical significance of the association between the Neandertal DPP4 haplotype and severe COVID-19 under the null-hypothesis that Neandertal haplotypes have no impact on COVID-19. Because only a fraction of the Neandertal genome is found among presentday humans, and because Neandertal haplotypes are on average longer than other haplotypes, the statistical power to detect associations with Neandertal haplotypes is better than for genome-wide analyses. When we consider Neandertal haplotypes that are present in a frequency of >1% among Europeans in the 1000 Genomes Project and are identified in previously published maps of Neandertal contributions, we find that the effective number of hypotheses is 5,761, yielding an 'introgression wide' significance threshold of 8.7e-6 (Supplementary Material). Thus, under the nullhypothesis that Neandertal gene variants has no impact on COVID-19, the association of the DPP4 haplotype with severe disease is significant.
It was recently shown that the spike protein of SARS-CoV-2 binds to DPP4 (Li et al. The combination of large effect sizes and small number of Neanderal loci (and correspondingly smaller number of the multiple tests requiring correction) may allow associations with infection disease susceptibility to be detected in smaller cohorts than if all variants across the genome are considered. For the DPP4 locus, we estimate that approximately two times more patients than currently available in HGI will be needed to achieve a 80% probability to detect the association between DPP4 and severe COVID-19 with the standard genome-wide significance threshold (p<5e-

8) (Supplementary Materials).
The three Neandertal genomes available to date, which vary in age between ~120,000 years and ~50,000 years and come from Europe and southern Siberia, are all homozygous for the risk variants on chromosome 2. Furthermore, the late Neandertal genome in Europe, which is most closely related to the Neandertals that mixed with modern humans, was homozygous also for the risk variants  To exclude Neandertal-like variants due to incomplete lineage sorting, we further required the resulting haplotypes to have a length of at least 10 kb. Using these criteria, we identify 40,055 SNPs. We use these SNPs to estimate the effective number of hypotheses in European genomes from the 1000 Genomes Project using the Genetic Type I Error Calculator (Li et al. 2011). This yields a significance threshold of 8.7e-6 and a suggestive threshold of 1.7e-4, for a Neandertal "introgression-wide association study" ("IWAS").
Sample size needed to detect the DPP4 haplotype As shown above, the power to detect a variant is improved if only introgressed Neandertal haplotypes are considered, although under different null-hypotheses. We calculated the sample size needed to achieve genome-wide significance (p<5e-8), using standard techniques (Pirinen et al. 2020). If there is a non-zero effect, i.e., β≠0, then the z-score is distributed as z∼N(β/SE,1) and z 2 ∼χ 2 1((β/SE) 2 ). The parameter (β/SE) 2 is known as the non-centrality parameter and scales linearly with sample size. The 7 non-central chi-squared distribution was used to calculated the probability of observing a sufficiently strong association, i.e. statistical power. To reach 80% power to detect the Neandertal DPP4 haplotype we find that we need approximately twice the sample size. For 99% detection probability the sample size needs to tripple.

Data availability
The archaic genomes are availability at the server of the Max Planck Institute for Evolutionary Anthropology (http://cdna.eva.mpg.de/neandertal/Vindija/VCF/ and https://bioinf.eva.mpg.de /jbrowse/) and the modern human genomes at the 1000 Genomes Project server  Table   Supplementary Table S1. SNPs in linkage disequilibrium with rs117888248 and the corresponding Neandertal alleles. LD data from the 1000 Genomes Project, "Vindija" refers to a Croatian Neandertal genome (https://bioinf.eva.mpg.de/jbrowse/).