The protective effect of sickle cell haemoglobin against severe malaria depends on parasite genotype

Host genetic factors can confer resistance against malaria, raising the question of whether this has led to evolutionary adaptation of parasite populations. In this study we investigated the correlation between host and parasite genetic variation in 4,171 Gambian and Kenya children ascertained with severe malaria due to Plasmodium falciparum. We identified a strong association between sickle haemoglobin (HbS) in the host and variation in three regions of the parasite genome, including nonsynonymous variants in the acyl-CoA synthetase family member PfACS8 on chromosome 2, in a second region of chromosome 2, and in a region containing structural variation on chromosome 11. The HbS-associated parasite alleles are in strong linkage disequilibrium and have frequencies which covary with the frequency of HbS across populations, in particular being much more common in Africa than other parts of the world. The estimated protective effect of HbS against severe malaria, as determined by comparison of cases with population controls, varies greatly according to the parasite genotype at these three loci. These findings open up a new avenue of enquiry into the biological and epidemiological significance of the HbS-associated polymorphisms in the parasite genome, and the evolutionary forces that have led to their high frequency and strong linkage disequilibrium in African P. falciparum populations.

(GWAS) of human resistance to severe malaria that has been reported elsewhere 2,5,6 . 54 In brief, we sequenced the P. falciparum genome using the Illumina X Ten platform 55 using two approaches based on sequencing whole DNA and selective whole genome 56 amplification 7 . We used an established pipeline 8 to identify and call genotypes at 57 over 2 million single nucleotide polymorphisms (SNPs) and short insertion/deletion 58 variants across the Pf genome in these samples (Methods). The following analysis is 59 based on 4,171 samples that had high quality data for both parasite and human 60 genotypes and were not closely related, of which a subset of 3,346 had human 61 genome-wide genotyping available. We focussed on a set of 51,225 biallelic variants 62 in the P.falciparum genome that passed all quality control filters and were observed in 63 at least 25 infections in this subset. Our analyses exclude mixed genotype calls that 64 arise in malaria when a host is infected with multiple parasite lineages. Full details of 65 our sequencing and data processing can be found in Supplementary Methods. 66 67 We used a logistic regression approach to test for pairwise association between these 68 Dantu blood group phenotype 10 ; ii. variants that showed suggestive but not 73 conclusive evidence of association with severe malaria in our previous GWAS 5 ; iii. 74 HLA alleles and additional glycophorin structural variants that we previously imputed 75 in these samples; and iv. variants near genes that encode human blood group antigens, 76 which we tested against the subset of P.falciparum variants lying near genes which 77 encode proteins important for the merozoite stage 11,12 , as these might conceivably  and P = 1.6 x 10 -10 . At the chromosome 11 locus, the chr11: 1,058,035 T>A variant, 104 which lies in Pf3D7_1127000, was associated with BFHbS = 1.5x10 17 and P = 7.3x10 -105 respectively, and we shall use + andsigns to refer to the alleles that are positively 107 and negatively correlated with HbS, e.g. Pfsa1+ is the allele that is positively 108 correlated with HbS at the Pfsa1 locus. All three of the lead variants are 109 nonsynonymous mutations of their respective genes, as are additional associated 110 variants in these regions (Figure 1 and Supplementary Table 1). 111

125
We attempted to replicate this finding in a separate set of 825 samples in which the 126 HbS genotypes have previously been assayed 2 (Supplementary Table 2 corresponding large effect size estimates (estimated odds ratio (OR) = 11.8 for 133 Pfsa1+, 7.4 for Pfsa2+ and 21.7 for Pfsa3+). As described above, these estimates 134 assume an additive relationship between HbS and the Pf genotype at each locus, but 135 we also noted that genotype counts are most consistent with an overdominance effect 136 (Supplementary Figure 6). We further examined the effect of adjusting for 137 covariates including human and parasite principal components reflecting population 138 structure, year of sampling, clinical type of severe malaria and technical features 139 related to sequencing (Supplementary Figure 7). Inclusion of these covariates did 140 not substantially affect results with one exception: we found that parasite principal 141 components (PCs) computed across the whole P.falciparum genome in Kenya 142 included components that correlated with the Pfsa loci, and including these PCs 143 reduced the association signal. Altering the PCs by removing the Pfsa regions 144 restored the association, indicating that this is not due to a general population 145 structure effect that is reflected in genotypes across the P.falciparum genome, and we 146 further discuss the reasons for this finding below. Taken together, these data appear to 147 indicate genuine differences in the distribution of parasite genotypes between severe 148 infections of HbS-and non-HbS genotype individuals. 149

166
The level of protection afforded by HbS can be estimated by comparing its frequency 167 between severe malaria cases and population controls. As shown in Figure 2, the vast 168 majority of children with HbS genotype in our data were infected with parasites that 169 carry Pfsa+ alleles. Corresponding to this, our data show little evidence of a 170 protective effect of HbS against severe malaria with parasites of Pfsa1+, Pfsa2+, 171 Pfsa3+ genotype (estimated relative risk (RR) = 0.83, 95% CI = 0.53-1.30). In 172 contrast, HbS is strongly associated with reduced risk of disease caused by parasites 173 of Pfsa1-, Pfsa2-, Pfsa3-genotype (RR = 0.01, 95% CI = 0.007-0.03). These 174 estimates should be interpreted with caution because they are based on just 49 cases 175 of severe malaria that had an HbS genotype, because many of these samples were 176 included in the initial discovery dataset, and because there is some variation evident 177 between populations; however it can be concluded that the protective effect of HbS is 178 dependent on parasite genotype at the Pfsa loci. 179  Table 1). In Kenyan samples, the Pfsa loci have the highest between-228 chromosome LD of any pair of variants in the genome. In Gambia, between-229 chromosome LD at these SNPs is also extreme, but another pair of extensive regions 230 on chromosomes 6 and 7 also show strong LD (Table 1)

321
For the analyses presented in main text, we used the 3,346 samples with imputed human genotypes for our initial discovery 322 analysis, and the 4,071 individuals with directly-typed HbS genotypes for all other analysis. The individuals in these two datasets 323 substantially overlap (Supplementary Figure 1), but a subset of 825 individuals have directly-typed for HbS but were not in the 324 discovery data and we used these for replication.

326
Inference of genetic interaction from severe malaria cases 327 To describe our approach, we first consider a simplified model of infection in which parasites have a single definite (measurable) 328 genotype, acquired at time of biting, that is relevant to disease outcome -i.e. we neglect any effects of within-host mutation, co-

364
Testing for genome-to-genome correlation 365 We developed a C++ program (HPTEST) to efficiently estimate the odds ratio (4) across multiple human and parasite variants.

445
A fuller description of the context of these SNPs can be found in Supplementary Methods. Our interpretation is that the 446 statistical evidence for these associations is not sufficiently strong on its own to make these signals compelling without 447 additional evidence. processes, but we did not observe systeamtic differences in coverage between the different Pfsa genotypes at these loci. To 519 further establish alignment accuracy, we also inspected alignment metrics and noted that across all analysis samples, over 99% of 520 reads at each location carried either the reference or the identified non-reference allele, and over 99% of these reads had mapping 521 quality at least 50 (representing confident read alignment). These results suggest sequencie reads provide generally accurate 522 genotype calls at these sites.

524
Assessing the distribution of between-chromosome LD

525
We developed a C++ program (LDBIRD) to efficiently compute LD between all pairs of Pf variants. LDBIRD computes the 526 frequency of each variant, and computes the correlation between genotypes at each pair of variants with sufficiently high 527 frequency. It then generates a histogram of correlation values and reports pairs of variants with squared correlation above a 528 specified level. We applied LDBIRD separately to Pf data from Gambian and Kenyan severe malaria cases. We restricted 529 attention to comparisons between biallelic variants that had frequency at least 5% in the given population and with at least 75%

535
To summarise between-chromosome LD results we grouped signals into regions as follows. First, we observed that most variant 536 pairs have |r| < 0.15 and hence r 2 > 0.05 is typically a substantially outlying degree of inter-chromosomal LD (Figure 4)

550
Assessing the structure of Pfsa regions in available genome assemblies

551
We extracted 101bp and 1001bp flanking sequence centred at the chr2:631,190, chr2:814,288 and chr11:1,058,035 loci from the 552 Pf3D7 reference sequence. We then used minimap2 42 to align these sequences to a previously generated set of genome assemblies from P.falciparum isolates and laboratory strains 27 (Supplementary Table 4), allowing for multiple possible 554 mapping locations. Each flanking sequence aligned to a single location on the corresponding chromosome in all included 555 genomes, with the exception that sequence flanking the chromosome 11 locus aligned to two locations in the ML01 sample.

556
This sample was excluded from previous analysis 27 as it represents a multiple infection; we comment further on this below.

558
To further inspect sequence identity, we used MAFFT to generate a multiple sequence alignment (MSA) corresponding to the 559 1001bp sequence centred at each locus. Four isolates (GA01 from The Gabon, SN01 from Senegal, Congo CD01 and ML01 560 from Mali) carry the non-reference 'A' allele at the chr11:1,058,035 SNP; two of these (GA01 and CD01) also carry the non-561 reference allele at the chr2:631,190 SNP and one (CD01) carries the non-reference allele at all three SNPs. However, expansion 562 of alignments to include a 10,001bp segment indicated that these four samples also carry a structural rearrangement at the chr11 563 locus. Specifically, GA01, SN01, CD01 and ML01 genomes include a ~1kb insertion present approximately 900bp to the right 564 of chr11:1,058,035, and also a ~400bp deletion approximately 2400bp to the left of chr11:1,058,035. To investigate this, we 565 generated kmer sharing 'dot' plots for k=50 across the region (Supplementary Figure 10), revealing a complex rearrangement 566 carrying both deleted and duplicated segments. The duplicated sequence includes a segment (approx. coordinates 1,054,000-567 1,055,000 in Pf3D7) that contains the gene SNRPF ('small nuclear ribonucleoprotein F, putative') in the Pf3D7 reference.

568
Inspection of breakpoints did not reveal any other predicted gene copy number changes in this region, including for 569 Pf3D7_1127000.

571
As noted above, the chromosome 11 region aligns to a second contig in ML01 (contig chr0_142, Supplementary Table 4). This 572 contig appears to have a different tandem duplication of a ~4kb segment lying to the right of the associated SNP (approximately 573 corresponding to the range 11:1,060,100 -1,064,000 in Pf3D7; Supplementary Figure 8). This segment contains a number of 574 genes including dUTPase, which has been under investigation as a potential drug target 43 . We interpret this second contig as 575 arising due to the multiple infection in this sample 27 , and given challenges inherent in genome assembly of mixed samples it is 576 unclear whether this duplication represents an assembly artefact or a second genuine regional structural variant.

578
Data Availability

579
A full list of data generated by this study and relevant accessions can be found at http://www.malariagen.net/resource/32.