Abstract
Sequestration of Plasmodium falciparum-infected erythrocytes to host endothelium through the parasite-derived PfEMP1 adhesion proteins, is central to the development of malaria pathogenesis. PfEMP1 proteins have diversified and expanded to encompass many sequence variants conferring the same array of human endothelial receptor binding phenotypes. Here, we analyzed RNA-seq profiles of parasites isolated from 32 infected travelers returning to Germany. Patients were categorized into either malaria naïve (n=15) or pre-exposed (n=17), and into severe (n=8) or non-severe (n=24) cases. Expression analysis of PfEMP1-encoding var genes showed that severe malaria was associated with expression of PfEMP1 containing the endothelial protein C receptor (EPCR)-binding CIDRα1 domain, whereas CD36-binding PfEMP1 was linked to non-severe malaria outcomes. In addition, gene expression-guided determination of parasite age, suggested that circulating parasites from non-severe malaria patients were older than parasites from severe malaria patients. First-time infected patients were also more likely to develop severe symptoms and tended to be infected for a longer period, which thus appeared to select for parasites with more sequestration-efficiency and therefore more pathogenic PfEMP1 variants.
Introduction
Despite considerable efforts during recent years to combat malaria, this disease remains a major threat to public health in tropical countries. The most severe clinical courses of malaria are due to infections with the protozoan species Plasmodium falciparum. In 2018, there were 228 million cases of malaria worldwide, resulting in more than 400,000 deaths (WHO, 2019). Currently, about half of the world population is living at risk of infection and more than 90% of the malaria deaths occur in Africa. In particular, children under the age of five and pregnant women suffer from severe disease. The virulence of P. falciparum is linked to the infected erythrocytes binding to endothelial cell surface molecules expressed on blood vessel walls. This phenomenon, known as sequestration, prevents the passage of infected erythrocytes through the spleen, which would otherwise remove the infected erythrocytes from the circulation and kill the parasite (Saul, 1999). The membrane proteins mediating sequestration are exposed to the host’s immune system and through evolution P. falciparum parasites have acquired several multi-copy gene families coding for variant surface antigens (VSAs) allowing immune escape through extensive sequence polymorphisms. Endothelial sequestration is mediated by the P. falciparum erythrocyte membrane protein 1 (PfEMP1) family, which members have different binding capacities for host vascular tissue receptors such as CD36, EPCR, ICAM-1, PECAM1, receptor for complement component C1q (gC1qR) and CSA. PfEMP1 proteins are known to mediate adhesion of infected erythrocytes to the linings of small blood vessels (Magallón-Tejada et al, 2016; Turner et al, 2013; Rowe et al, 2009). The long, variable, extracellular PfEMP1 region responsible for receptor binding contains a single N-terminal segment (NTS main classes A, B and pam) and a variable number of different Duffy-binding like (DBL main classes are DBLα-ζ and pam) and cysteine-rich inter-domain region domains (CIDR main classes are CIDRα-δ and pam) domains (Rask et al, 2010). Based on recent findings, the sub-classification within main domain classes, e.g. the DBLβ subclasses 1 – 13, was questioned due to recombination occurring frequently between members of the different subclasses (Otto et al, 2019). PfEMP1 molecules have been grouped into four categories (A, B, C and E) depending on the protein domain composition as well as the 5’ upstream sequence, the chromosomal localization and the direction of transcription of their encoding var genes (Rask et al, 2010; Kyes et al, 2007; Kraemer & Smith, 2003; Lavstsen et al, 2003). Each parasite possesses about 60 var genes with approximately the same distribution over the different groups (Rask et al, 2010). About 10% of the genes belong to A-type var genes, typically encoding longer PfEMP1 proteins with a head-structure containing a DBLα1 and either a CIDRα1 or a CIDRβ/γ/δ domain. In some A-type proteins, an ICAM-1-binding DBLβ1 or 3 domain follows this head-structure (Lennartz et al, 2017). Two conserved subfamilies also belong to the group A: the var1 gene present in two different allele forms in the parasite population (3D7- and IT-type) (Otto et al, 2019) and the shortest var3 gene family with only two extracellularly exposed domains (DBLα1.3 and DBLε8). Most PfEMP1 proteins belong to the B- and C-families and have a DBLα0-CIDRα2-6 head structure attached to a DBLδ1-CIDRβ/γ domain combination. Some B-type proteins possess a DBLα2-CIDRα1 head-structure typically followed by additional domains including DBLβ12 domains suggested to bind gC1qR (Magallón-Tejada et al, 2016). Other B-type PfEMP1 have a ICAM-1-binding DBLβ5 domain (Lennartz et al, 2019). The VAR2CSA PfEMP1 binds placental CSA and cause pregnancy-associated malaria. The var2csa genes constitute the group E and most P. falciparum isolates possess a single or two gene copies of this inter-strain conserved var gene variant. Based on the head structure domains, PfEMP1 divide into those with DBLα1 (A-type PfEMP1), DBLα2 (B-type PfEMP1) or DBLα0 (B- and C-type PfEMP1) domains as well as those binding CD36 via CIDRα2-6 domains (B- and C-type PfEMP1) or EPCR via CIDRα1 (found in A- and B-type PfEMP1) (Otto et al, 2019). Accordingly, the head structure confers mutually exclusive binding properties, either to EPCR, CD36 or to an unknown receptor via the CIDRβ/γ/δ domain of some A-type proteins. Some domain sequence variants are found to often co-occur (Otto et al, 2019; Berger et al, 2013; Rask et al, 2010). For example, one domain cassette (DC), DC8, includes specific CIDRα1.1/8 subtypes capable of binding EPCR.
Group A var gene expression has been associated with severe forms of malaria, whereas mild malaria may be associated with group C expression (Avril et al, 2012; Claessens et al, 2012; Lavstsen et al, 2012; Falk et al, 2009; Warimwe et al, 2009; Kyriacou et al, 2006; Rottmann et al, 2006; Jensen et al, 2004; Kirchgatter & Del Portillo, 2002). Conflicting results were reported on group B expression during severe disease; however, this is likely explained by the fact that a subset of B-type PfEMP1 share EPCR and ICAM-1 receptor-binding phenotypes with A-type PfEMP1. Indeed, consensus from a range of gene expression studies is that severe malaria is associated with expression of PfEMP1 with EPCR-binding CIDR α1 domains (Jespersen et al, 2016; Kessler et al, 2017; Storm et al, 2019; Shabani et al, 2017; Mkumbaye et al, 2017; Bernabeu et al, 2016; Magallón-Tejada et al, 2016). No other domain has been consistently associated with severity of disease, but elevated expression of some specific DCs has also been associated with severe disease, in particular the EPCR-binding DC8 (DBLα2-CIDRα1.1-DBLβ12-DBLγ4/6), DC13 (DBLα1.7-CIDRα1.4) and DC4 (DBLα1.4-CIDRα1.6-DBLβ3), but also the DC5 found in A-type PfEMP1 (DBLγ12-DBLδ5-CIDRβ3/5) and DC6 (DBLγ14,-DBLζ5-DBLε4) found in both A- and B-type PfEMP1 (Bernabeu et al, 2016; Magallón-Tejada et al, 2016; Berger et al, 2013; Avril et al, 2013, 2012; Claessens et al, 2012; Lavstsen et al, 2012). In this study, we used an RNA-seq based analysis to study gene expression with special emphasis on var genes in parasites from hospitalized travelers returning from malaria endemic countries with certified P. falciparum infections. Individuals were clustered into i) first-time infected and ii) pre-exposed individuals on the basis of serological data or into iii) severe and iv) non-severe cases according to medical reports. Our multi-dimensional analysis reveals a clear association of domain cassettes with EPCR-binding properties with a naïve immune status and severe malaria, whereas CD36-binding PfEMP1 proteins and the conserved var1-3D7 allele were expressed at higher levels in pre-exposed patients and non-severe cases. Interestingly, circulating parasites from severe cases tended to be younger than parasites from non-severe cases, indicating that the EPCR-binding phenotype confers more efficient sequestration of infected erythrocytes.
Results
Cohort characterization
This study is based on a cohort of 32 malaria patients hospitalized in Hamburg, Germany. Parasitemia, recorded clinical symptoms and patient sub-grouping are summarized in Table 1. For ten patients, the present malaria episode was their first recorded P. falciparum infection. Nine individuals had previously experienced malaria episodes according to the medical reports, whereas malaria exposure was unknown for 13 patients. In order to determine if patients already had an immune response to P. falciparum antigens, indicative of previous exposure to malaria, plasma samples were analyzed by a Luminex-based assay displaying the antigens AMA1, MSP1 and CSP (Table S1). Immune responses to AMA1 and MSP1 are known to be long-lasting and seroconversion to AMA1 is assumed to occur after only a single or very few malaria infections (Drakeley et al, 2005). Principal component analysis of the Luminex data resulted in separation of the patients into two discrete groups corresponding to first-time infected adults (‘naïve cluster’) and malaria pre-exposed individuals (‘pre-exposed cluster’) (Figure 1A). The 13 patients with unknown malaria exposure status could be grouped into either the naïve or pre-exposed groups defined by the PCA of the antigen reactivity. The only outlier in the clustering was a 19-year-old patient (#21) from Sudan, who reported several malaria episodes during childhood, but clustered with the malaria-naïve patients.
In order to further characterize the patient cohort plasma samples (n=32) were subjected to Luminex analysis with the P. falciparum antigens AMA1, MSP1, CSP known to induce a strong antibody response in humans. With exception of patient #21 unsupervised clustering of the PCA-reduced data clearly discriminates between first-time infected (naïve) and pre-exposed patients with higher antibody levels against tested P. falciparum antigens and also assigns plasma samples from patients with unknown immune status into naïve and pre-exposed clusters (A). Classification of patient #21 into the naïve subgroup was confirmed using different serological assays assessing antibody levels against P. falciparum on different levels: a merozoite-directed antibody-dependent respiratory burst (mADRB) assay (Kapelski et al, 2014) (B), a PEMS-specific ELISA (C) and a 262-feature protein microarray covering 228 well-known P. falciparum antigens detecting reactivity with individual antigens and the antibody breadth of IgG (upper panel) and IgM (lower panel) (D). The boxes represent medians with IQR; the whiskers depict minimum and maximum values (range) with outliers located outside the whiskers. Serological assays revealed significant differences between patient groups (Mann Whitney U test). Reactivity of patient plasma IgG and IgM with individual antigens in the protein microarray is presented as volcano plot highlighting the significant hits in red. Box plots represent antibody breadths by summarizing the number of recognized antigens out of 262 features tested. Data from all assays were used for an unsupervised random forest approach (E). The variable importance plot of the random forest model shows the decrease in prediction accuracy if values of a variable are permuted randomly. The decrease in accuracy was determined for each serological assay indicating that the mADRB, ELISA and Luminex assays are most relevant in the prediction of patient clusters (F). Patients with known immune status based on medical reports were marked in all plots with filled circles in blue (naïve) and grey (pre-exposed), samples from patients with unknown immune status are shown as open circles. ELISA: Enzyme-linked immunosorbent assay, IQR: interquartile range, PCA: Principal component analysis
Characteristics and classification of malaria patients.
Plasma samples were further subjected to i) a merozoite-directed antibody-dependent respiratory burst (mADRB) assay (Kapelski et al, 2014), ii) a PEMS-specific ELISA and iii) a protein microarray with 228 P. falciparum antigens (Borrmann, 2020). Analysis of these serological assays in relation to the patient clustering confirmed the expected higher and broader antigen recognition by ELISA, protein microarray, and stronger ability to induce burst of neutrophils by serum from the group of malaria pre-exposed patients (Figure 1B–D, Table S1). Data from all the serological assays were next used for an unsupervised random forest machine learning approach to build models predictive of individual’s protective status. This algorithm confirmed the classification of patient #21 as being non-immune (Figure 1E). The calculated variable importance highlighted the relevance of the mADRB assay, the ELISA and the Luminex to allocate patients into cluster (Figure 1F). A multidimensional scaling plot was used to visualize cluster allocation and patient #26 positioned at the borderline to pre-exposed patients (Figure 1E). The patient was grouped into the naïve cluster in accordance with the Luminex data and the medical report showing that this patient from Jamaica was infected during his first trip to Africa.
Using protein microarrays, the antibody response against described antigens was analyzed in detail. As expected, pre-exposed individuals showed significantly elevated IgG antibody responses against a wide range of parasite antigens, especially typical blood stage markers, including MSP1, MSP2, MSP4, MSP10, EBA175, ring exported protein 1 (REX1), and AMA1 (Figure 1D, upper panel). Markers for a recent infection, MSP1, MSP4, GLURP and Early transcribed membrane protein 5 (ETRAMP5) (Van Den Hoogen et al, 2019), were significantly elevated in the pre-exposed individuals in comparison to the defined first-time infected group. In addition, further members of the ETRAMP family, including ETRAMP10, ETRAMP14, ETRAMP10.2 and ETRAMP4, and also antibodies against pre-erythrocytic antigens, such as CSP, STARP and LSA3, were highly elevated. Similar effects were detectable for IgM antibodies; previous exposure to the malaria parasite led to higher antibody levels (Figure 1D, lower panel).
Based on medical reports eight patients from the malaria-naïve group were considered as more severe cases, having a substantially higher parasitemia and an impaired function of diverse organs especially the brain (Table 1). Patient #1 was included into the severe group due to increasing parasitemia during hospitalization and circulating schizonts indicative of a very high sequestering parasite biomass associated with severity (Bernabeu et al, 2016). The remaining 24 cases were summarized into the non-severe malaria group. Comparing the IgG antibody response of severe and non-severe cases within the previously malaria-naïve group, elevated antibody levels were detected in the severe subgroup. The highest fold change was observed in antibodies directed against intracellular proteins, such as DnaJ protein, GTPase-activating protein or heat shock protein 70 (Figure S1). Interestingly, IgM antibodies against ETRAMP5 were detectable in the severely infected individuals, suggesting they were infected for a prolonged period of time compared to the mild malaria population (Helb et al, 2015; Van Den Hoogen et al, 2019).
RNA-seq transcriptomics
Parasites were isolated from the venous blood of all patients for subsequent transcriptional profiling. Transcriptome libraries were sequenced for all 32 patient samples. The number of trimmed reads ranged between 29,142,684 and 82,000,248 (median: 41,383,289) within the individual libraries derived from patients. The proportion of total reads specific for P. falciparum were 87.7% (median; IQR: 76.7–91.3). Lower percentages (12.4% and 15.68%) were obtained only for patient isolates #1 and #2 (Table S2). These samples were not subjected to globin-mRNA depletion due to their low RNA content after multiple rounds of DNase treatment. Consequently, less than one million P. falciparum reads were obtained for each of these samples. Therefore, samples from patient #1 and #2 were omitted from assembly due to low coverage, but included in the differential gene expression analysis.
Variation in parasite ages in the different patient samples was analyzed with a mixture model in accordance to Tonkin-Hill et al. (Tonkin-Hill et al, 2018) using published data from López-Barragán et al. as a reference (López-Barragán et al, 2011). Parasites from first-time infected and pre-exposed patients revealed no obvious difference in the proportion of the different parasite stages (Figure 2A) or in median age (Figure 2B, Table 2). However, a small bias towards younger parasites in the severe cases was observed with a median age of 8.2 hpi (IQR: 8.0– 9.8) in comparison to 9.8 hpi in the non-severe cases (IQR: 8.2–11.4) (Figure 2C,D, Table 2). None of the samples revealed high proportions of late trophozoites (all <3%), schizonts (0%) or gametocytes (all <6%) (Figure 2A, C).
Patient samples consist of a combination of different parasite stages. To estimate the proportion of different life cycle stages in each sample a constrained linear model was fit using data from López-Barragán et al. (López-Barragán et al, 2011). The proportions of rings (8 hpi), early trophozoites (19 hpi), late trophozoites (30 hpi), schizonts (42 hpi) and gametocytes stages shown in the columns of the bar plots must add to 1 for each sample. Shown are the comparisons between first-time infected (naïve; blue) and pre-exposed samples (grey) (A) and severe (red) and non-severe cases (grey) (B). A bias towards the early trophozoite appears in the non-severe malaria sample group, which was confirmed by calculating the age in hours post infection (hpi) for each parasite sample. The boxes represent medians with IQR; the whiskers depict minimum and maximum values (range) with outliers located outside the whiskers (C, D). IQR: interquartile range
Patient groups data.
Differential var gene expression on gene, domain and homology block level
To correlate individual var genes with a naïve immune status or disease severity differential vargene expression was analyzed according to Tonkin-Hill et al. (Tonkin-Hill et al, 2018). First, var gene assembly was performed by assembling each sample separately, which reduces the risk for generating false chimeric genes and results in longer contigs compared to a combined all sample assembly approach. In total, 6,441 contigs with over 500 bp-length were generated with an N50 of 2,302 bp and a maximum length of 10,412 bp (Data S1). For 5,488 contigs PfEMP1 domains could be annotated, the remaining contain only homology blocks defined by Rask et al. (Rask et al, 2010) (Table S3, S4).
Var allele level
A median of 200 contigs (IQR: 137–279) with >500 bp was assembled per sample. Almost half of the transcripts, for which a PfEMP1 domain annotation could be made (47.2%: 2,592), showed >98% nucleotide identity for at least 80% of the length of a var gene in the varDB (Otto, 2019). Moreover, 203 transcripts matched with over 1 kb overlap and 98% identity to var genes from the 15 reference genomes (Otto et al, 2018a) (Table S4). The Salmon RNA-seq quantification pipeline, which identifies equivalence classes allowing reads to contribute to the expression estimates of multiple transcripts, was used to estimate expression levels for each transcript. As a result, it accounts for the redundancy present in our whole set of var gene contigs from all separate sample-specific assemblies. We compared this approach with Corset which previously has been used to investigate differential expression of var genes in severe malaria and found it gave similar results (see methods) (Tonkin-Hill et al, 2018). Due to the high diversity in var genes both of these approaches are only able to identify significant associations between transcripts and phenotypes when there is sufficient similarity within the associated sequences.
By comparing transcripts expressed in first-time infected patients with those from parasites isolated from pre-exposed patients using the Salmon approach, transcript levels were higher for twelve and lower for three genes in malaria-naïve hosts (Table 3, Figure 3A, B, Table S6). Assembled alleles of the conserved subfamilies, var1, var2csa and var3, were also found at higher frequencies in first-time infected patients. Notably, the var1-IT allele was expressed at higher levels in parasites from first-time infected patients, whereas the var1-3D7 allele was expressed at higher levels in parasites from pre-exposed and non-severe patients (Figure 3, Table 3, 4). This was confirmed by mapping normalized reads from all patients to the var1-3D7 and var1-IT allele forms (Figure S2). Several var fragments from B- or C-type var genes were associated with a naïve immune status and three transcripts from A, DC8 and B-type var genes as well as var2csa were linked to severe malaria patients (Figure 3, Table 3, 4, Table S6, S7).
RNA-seq reads of each patient sample were matched to de novo assembled var contigs with varying length, domain and homology block composition. Shown are significant differently expressed var gene contigs with an adjusted p-value of <0.05 in firsttime infected (blue) and pre-exposed patient samples (grey) (A, B) as well as severe (red) and non-severe cases (grey) (C, D). Data are displayed as heatmaps showing expression levels in log transformed normalized Salmon read counts for each individual sample (A, C) or as box plot with median log transformed normalized Salmon read counts and interquartile range (IQR) for each group of samples (B, D). Normalized Salmon read counts for all assembled transcripts are available in Table S4.
var transcripts up- and down-regulated in first-time infected patients.
var transcripts up- and down-regulated in severe cases.
Domain level
Similar to Tonkin-Hill et al. (Tonkin-Hill et al, 2018), we quantified domain level expression by aligning reads to their single best ‘hit’ in the combined assembly. Domains were identified within each transcript and the sum of the read counts corresponding to domains with the same classification was calculated to provide domain level read counts. This showed that different EPCR-binding CIDRα1 domain variants and other domains found in DCs with CIDRα1 domains were expressed at significantly higher levels in first-time infected patients (Table 5, Figure 4A, B, Table S6). Thus, besides domains from DC8 (DBLα2, CIDRα1.1, DBLβ12) and DC13 (DBLα1.7, CIDRα1.4), a single domain from DC15 (DBLα1.2) were increased upon infection of malaria-naïve individuals. The DBLα1.2 domain was in all of the 32 gene assemblies with adjacent CIDR domains annotated linked to an EPCR binding CIDRα1 domain, 56% of these were a CIDRα1.5 domain (Table S3, S4). The CIDRα1.6 domain from DC4 failed to reach statistical significance (padj=0.07), but was 2.25x times higher expressed in parasites infecting naïve patients (Table S6). In addition to domains associated with EPCR-binding PfEMP1, parasites from first-time infected patients showed significantly more transcripts encoding the CIDRδ1 domain of DC16 (DBLα1.5/6-CIDRδ1/2) (Table 5, Figure 4). In general, all domains associated with the same domain cassettes showed the same trend even if some domains failed to reach statistical significance (Table S6). Moreover, the DBLβ6 domain was among the top hits of significantly higher expressed domains in the naïve patient cluster. The DBLβ6 is associated with A-type var genes and often found adjacent to DC15 and DC16 (Otto et al, 2019) thus supporting the association of both domain cassettes with a naïve immune status. Domain types found expressed at lower levels in malaria-naïve included several domains from the var1-3D7 allele (DBLα1.4, DBLγ15, DBLε5) as well as NTSB and N-terminal head domains from B- and C-type PfEMP1 (DBLα0.13/22/23) with CD36-binding CIDR domains (CIDRα2.8/9,6) and the C-terminal CIDRγ11 domain (Table 5, Figure 4A, B).
RNA-seq reads of each patient sample were matched to de novo assembled var contigs with varying length, domain and homology block composition. Shown are significantly differently expressed PfEMP1 domain subfamilies from Rask et al. (Rask et al, 2010) with an adjusted p-value of <0.05 in first-time infected (blue) and pre-exposed patient samples (grey) (A, B) as well as severe (red) and non-severe cases (grey) (C, D) using HMMER3 models. The N-terminal head structure (NTS-DBLα −CIDRα/β/γ/δ) confers a mutually exclusive binding phenotype either to EPCR-, CD36-, CSA- or an unknown receptor. Expression values of the N-terminal domains were summarized for each patient and differences in the distribution among patient groups were tested using the Mann-Whitney U test (E, F). Data are displayed as heatmaps showing expression levels in log transcripts per million (TPM) for each individual sample (A, C) or as box plot with median log TPM and interquartile range (IQR) for each group of samples (B, D, E, F).
var domains defined by Rask et al. (Rask et al, 2010) up- and down-regulated in first-time infected patients.
When comparing the severe sample set to the non-severe, domains of DC8 (DBLα2, CIDRα1.1, DBLβ12) and DC15 (DBLα1.2) were found associated with severe disease (Table 6, Figure 4C, D, Table S7). The DC16 consists either of a DBLα1.5 or 1.6 attached to a CIDRδ1 domain. The DBLα1.6 domain was found expressed at higher levels in severe malaria patients whereas the DBLα1.5 domain was found to be highly expressed in non-severe cases. The CIDRδ1 domain showed no association with disease group, and the DBLα1.5 domain type was generally expressed at a very high level in multiple patient isolates (Table 6, Figure 4). As observed for pre-exposed individuals, domain types expressed at significantly higher levels in non-severe cases included the CIDRα1.3 domain from the var1-3D7 allele as well as N-terminal head domains from B- and C-type PfEMP1 with CD36-binding capacity (DBLα0.23, CIDRα2.4/9)(Table 6, Figure 4C, D).
var domains defined by Rask et al.. (Rask et al, 2010) up- and down-regulated in severe cases.
Since the subclassification of domains is debatable (Otto et al, 2019) and different domain subclasses confer the same binding phenotype (Lau et al, 2015), the domains of the N-terminal head structure were grouped according to their binding phenotype and the normalized read counts (TPM) were summarized for each patient (Figure 4E, F). This showed clear differences were observed for DBL and CIDR domains associated with EPCR- or CD36-binding PfEMP1. As expected, the EPCR-binding phenotype as well as the CIDRγ3 domain were associated with the naïve and more severe cases, the CD36-binding phenotype with the pre-exposed and non-severe patients.
Homology block level
Within PfEMP1 sequences 628 homology blocks were defined (Rask et al, 2010) and 613 were available for download and subsequent analysis from the VARDOM server (Tonkin-Hill et al, 2018). Homology block expression levels were obtained by aggregating read counts for each block after first identifying all occurrences of the block within the combined transcript assembly. Transcripts encoding blocks number 255, 584 and 614, all typically located within DBLβ domains of DC8 and CIDRα1-containing type A PfEMP1 (Table 7, Figure 5A, B, Table S6), number 557, located in the inter-domain region between DBLβ and a DBLγ domains (no PfEMP1 type association) and block number 155 found in NTSA, were found associated with a naïve immune status. All these blocks are found in A-, B/A- or B-type genes. Conversely, transcripts encoding block 88 from DBLα0 domains and 269 from ATSB were found at lower levels in malaria-naïve patients indicating that B- and C-type genes are more frequently expressed in pre-exposed individuals (Table 7, Figure 5A, B). Two homology blocks, 591 and 559, associated with B-type PfEMP1 were found to be lower expressed in severe malaria cases (Table 8, Figure 5C, D, Table S7).
RNA-seq reads of each patient sample were matched to de novo assembled var contigs with varying length, domain and homology block composition. Shown are significantly differently expressed homology blocks from Rask et al. (Rask et al, 2010) with an adjusted p-value of <0.05 in first-time infected (blue) and pre-exposed patient samples (grey) (A, B) as well as severe (red) and non-severe cases (grey) (C, D). Data are displayed as heatmaps showing expression levels in log transcripts per million (TPM) for each individual sample (A, C) or as box plot with median log TPM and interquartile range (IQR) for each group of samples (B, D).
var blocks defined by Rask et al.. (Rask et al, 2010) up- and down-regulated in first-time infected patients.
var blocks defined by Rask et al.. (Rask et al, 2010) down-regulated in severe cases.
A summary of the differential var gene expression data on the multi-, single- and subdomain level can be found in Figure 6.
A schematic presentation of typical PfEMP1 domain compositions (A). The N-terminal head structure confers mutually exclusive receptor binding phenotypes: EPCR (yellow: CIDRα1.1/4-8), CD36 (salmon CIDRα2–6), CSA (VAR2CSA) and yet unknown phenotypes (orange: CIDRβ/γ/δ, red: CIDRα1.2/3 from VAR1, VAR3). Group A includes the conserved subfamilies VAR1 and VAR3, EPCR binding variants and those with unknown binding phenotypes sometimes associated with rosetting. Group B PfEMP1 can have EPCR-binding capacities, but most variants share a four-domain structure with group C-type variants capable of CD36-binding. Dual binder can be found within group A and B with an DBLβ domain responsible for ICAM-1-(DBLβ1/3/5) or gC1qr-binding (DBLβ12). Inter-strain conserved tandem arrangements of domains, so called domain cassettes (DC), can be found within all groups as selectively indicated.
Transcripts, domains and homology blocks according to Rask et al. (Rask et al, 2010) found significant differently expressed (p-value <0.05) between patient groups of both comparisons: first-time infected (blue) versus pre-exposed (black) cases and severe (red) versus non-severe (black) cases (B).
ATS: acidic terminal sequence, CIDR: cysteine-rich interdomain region, CSA: chondroitin sulphate A, DBL: Duffy binding-like, DC: domain cassette, EPCR: endothelial protein C receptor, gC1qr: receptor for complement component C1q, ICAM-1: intercellular adhesion molecule 1, NTS: N-terminal segment, TM: transmembrane domain
Var expression profiling by DBLα-tag sequencing
To supplement our RNA-seq analysis with an orthogonal analysis, we performed DBLα-tag RT-PCR combined with deep amplicon sequencing on 30 of our patient samples (Lavstsen et al, 2012). 851 to 3,368 reads with a median of 1,666 over all samples were analyzed. Identical DBLα-tag sequences were clustered to generate relative expression levels of each unique var gene tag. Overall, the relative expression levels were similar for sequences found in both the RNA-seq and the DBLα-tag approach with a mean log2(DBLα-PCR/RNA-seq) of 0.4 (CI of 95%: −2.5–3.3) determined by Bland-Altman plotting (Figure S3). In median, 82.6% of all detected individual DBLα-tag sequence clusters with >10 reads (92.9% of all DBLα-tag reads) were found in the RNA-seq approach; and 81.8% of the upper 75th percentile of RNA-seq contigs (with DBLα tag sequences) were found by the DBLα-tag approach.
Unique DBLα-tag sequences were searched for near identical sequences among all known var genes on varDB Version V1.1 (Otto, 2019) using Varia tool (Mackenzie et al, 2020). Nearly identical database sequences were found for ~85% of the DBLα-tag sequences allowing prediction of these query gene’s domain annotation (Table S10). In line with the RNA-seq data we found DBLα1 and DBLα2 sequences enriched in first-time infected patients and the severe malaria patients. Conversely, a significant higher proportion of DBLα0 sequences was found in pre-exposed individuals and mild cases (Figure 7A, B). No difference was observed in the number of reads or unique DBLα-tags detected between patient groups, although a trend towards more DBLα-tag clusters could be observed in first-time infected patients and severe cases (Figure 7A, B). A prediction of the NTS and CIDR domains adjacent to the DBLα domain showed a significant higher proportion of NTSA in severe cases as well as EPCR-binding CIDRα1 domains in first-time infected and severe cases. Expression of var genes encoding NTSB and CIDRα2-6 domains were significantly associated with in pre-exposed and non-severe cases (Figure 7A, B). Analysis of var expression in relation to other domains, showed var transcripts with DBLβ, γ and ζ or CIDRγ domains were more frequently expressed in first-time infected and severe malaria patients whereas those encoding DBLδ and CIDRβ were less frequent (Figure 7C, D).
Amplified DBLα-tag sequences were blasted against the ~2,400 genomes on varDB (Otto, 2019) to obtain subclassification into DBLα0/1/2 and prediction of adjacent head-structure NTS and CIDR domains and their related binding phenotype. Proportion of each NTS and DBLα subclass as well as CIDR domains grouped according to binding phenotype (CIDRα1.1/4-8: EPCR-binding, CIDRα2-6: CD36-binding, CIDRβ/γ/δ: unknown binding phenotype/rosetting) was calculated and shown separately on the left, number of total reads and individual sequence cluster with n ?10 sequences are shown on the right. Differences in the distribution among first-time infected (blue) and pre-exposed individuals (grey) (A) as well as severe (red) and non-severe cases (grey) (B) were tested using the Mann-Whitney U test. The boxes represent medians with IQR; the whiskers depict minimum and maximum values (range) with outliers located outside the whiskers. Quantile regression was applied to look for differences between patient groups on the level of domain main classes (left) and subdomains (right). Shown are median differences with 95%-confidence intervals of domains with values unequal 0. Domains with positive values tend to be higher expressed in naïve (C) and severe patients (D).
Assessing expression in relation to domain subtype, CIDRα1.1/5, DBLβ12, DBLγ2/12 and DBLα2 were associated with severe malaria whereas CIDRα3.1/4, DBLα0.12/16, and DBLδ1 associated with non-severe cases (Figure 7D). Overall, these data corroborated the main observations from the RNA-seq analysis, confirming the association of EPCR-binding PfEMP1 variants with development of severe malaria symptoms and CD36-binding PfEMP1 variants with establishment of less severe infections in semi-immune individuals.
Correlation of var gene expression with antibody levels against head structure CIDR domains
A detailed analysis of the antibody repertoire of the patients against head structure CIDR domains of PfEMP1 was carried out using a panel of 19 different EPCR-binding CIDRα1 domains, 12 CD36-binding CIDRα2-6 domains, three CIDRδ1 domains as well as a single CIDRγ3 domain (Obeng-Adjei et al, 2020; Bachmann et al, 2019). Additionally, the minimal binding region of VAR2CSA was included (Figure 8). Generally, plasma samples from malaria-naïve as well as severe cases showed lower MFI values for all antigens tested in comparison to samples from pre-exposed or non-severe cases (Figure 8A, B). Mann-Whitney U testing revealed significant differences for CIDRα2-6, CIDRδ1 and CIDRγ3, but not for EPCR-binding CIDRα1 domains.
Patient plasma samples (n=32) were subjected to Luminex analysis with 35 PfEMP1 head structure CIDR domains. The panel includes EPCR-binding CIDRα1 domains (n = 19), CD36-binding CIDRα2-6 domains (n = 12) and CIDR domains with unknown binding phenotype (CIDRγ3: n = 1, CIDRδ1: n = 3) as well as the minimal binding region of VAR2CSA (VAR2). Box plots showing mean fluorescence intensities (MFI) extending from the 25th to the 75th percentiles with a line at the median indicate higher reactivity of the pre-exposed (A) and non-severe cases (B) with all PfEMP1 domains tested. Significant differences were observed for recognition of CIDRα2-6, CIDRδ1 and CIDRγ3; VAR2CSA recognition differed only between severe and non-severe cases (Mann Whitney U test). Furthermore, the breadth of IgG recognition (%) of CIDR domains for the different patient groups was calculated and shown as a heat map (C).
Another way of analyzing the samples is to take the average MFI from an unrelated control cohort plus two standard deviations as a cut off for seropositivity for calculation of the coverage of antigen recognition (Cham et al, 2010). By doing so, almost half of the tested antigens were recognized by pre-exposed (median: 46.3%) and non-severe patients (median: 45.1%), but only 1/4 of the antigens were recognized by first-time infected patients (median: 24.4%) and 1/20 by severely ill patients (median: 4.9%). Apart from controls, antigens recognized by over 60% of the pre-exposed and/or non-severe patient sera were i) four CIDRα1 domains capable of EPCR-binding (CIDRα1.5, CIDRα1.6, CIDRα1.7 and the DC8 domain CIDRα1.8), ii) two CD36-binding CIDRα domains (CIDRα2.10, CIDRα3.1) and iii) two CIDR domains with unknown binding phenotype (CIDRδ1 and CIDRγ3) (Figure 8C, Table S1).
Differential gene expression analysis of core genes
Global gene expression analysis (Tonkin-Hill et al, 2018) identified 420 genes to be higher and 236 to be lower expressed in first-time infected patients, together corresponding to 11.3% of all P. falciparum genes (Table S9). Analysis of gene set enrichment analysis (GSEA) of GO terms and KEGG pathways showed a significantly lower expression level for genes with several GO terms involved in antigenic variation and host cell remodeling in first-time infected patients (Figure 9A, Table S9). These analysis results may be distorted, since variant surface antigens like var, rif, stevor, surf and pfmc-2tm are largely clone-specific and reads from the clinical isolates would map only to homologous regions in 3D7 genes, which may actually not be present in the clinical isolate. Therefore, we manually screened differentially expressed genes known to be involved in var gene regulation or correct display of PfEMP1 at the host cell surface (Table S9). On single gene level, genes encoding the Maurer’s cleft proteins MAHRP1 and REX2 (Spycher et al, 2003; Spielmann et al, 2006), REX4 and MSRP7 located within the host cell cytosol (Spielmann et al, 2006; Heiber et al, 2013), the erythrocyte membrane-located glycophorin binding protein GBP130 (Perkins, 1988) as well as Sir2a and SWIB known to be involved in var gene regulation (Tonkin et al, 2009; Wang & Zhang, 2020) were expressed at significantly lower levels in malaria-naïve patients. MAHRP1 is essential for translocation of PfEMP1 to the surface of infected erythrocytes (Spycher et al, 2006, 2008) and was suggested to be part of the PfEMP1 loading hub (McHugh et al, 2020); on the contrary, deletion of the REX2 and REX4-encoding genes via chromosome breakage was associated with the loss of cytoadherence, but not with aberrant trafficking of PfEMP1 (Nacer et al, 2011; Chaiyaroj et al, 1994; Day et al, 1993). Genetic ablation of GBP130 increased the membrane rigidity of infected erythrocytes without negative impact on cytoadherence under flow conditions (Maier et al, 2008).
Gene set enrichment analysis (GSEA) of GO terms and KEGG pathways indicate gene sets deregulated in first-time infected malaria patients. GO terms related to antigenic variation and host cell remodeling are significantly down-regulated, only the KEGG pathway 03410 ‘base excision repair’ shows a significant upregulation in malaria-naïve patients (A). Log fold changes (logFC) for the 15 P. falciparum genes assigned to the KEGG pathway 03410 ‘base excision repair’ are plotted with the six significant hits marked with * for p<0.05 and ** for p<0.01 (B).
In contrast, several PHIST encoding genes were found expressed at higher levels in first-time infected patients including the lysine-rich membrane-associated PHISTb protein (LyMP), previously reported to interact with the ATS domain (Oberli et al, 2014, 2016). Furthermore, the exported proteins FIKK9.6 (Nunes et al, 2007), MSRP5 and MSRP6 (Heiber et al, 2013) as well as PF3D7_0721100 showed a significant increase compared to pre-exposed patients. The conserved Plasmodium protein of unknown function PF3D7_0721100 detected in detergent-resistant fractions of the red blood cell membrane (Yam et al, 2013) was also found in a putative PfEMP1 unloading hub together with REX1, MAHRP2, PTP5 and PF3D7_1353100 using protein interaction network analysis (McHugh et al, 2020). On the level of var gene expression regulation, the chromatin-associated exoribonuclease PfRNase II (PF3D7_0906000) was increased in first-time infected patients (Zhang et al, 2014).
Most of the associations with a naïve immune status were also observed in the comparison of severe versus non-severe cases (Table S10). Significantly higher expresses genes included several PHIST and HYP proteins, MSRP5, PfRNase II, PF3D7_0721100 and, additionally, SET1, PTP1, SBP1, PTP5 and PF3D7_1353100. Overall, three proteins defined by McHugh et al. as members of the PfEMP1 unloading hub were significantly higher expressed in severe malaria cases (PTP5, PF3D7_0721100 and PF3D7_1353100) and all remaining proteins showed the same trend for upregulation (PIESP2, PfJ23, REX1 and MAHRP2) (McHugh et al, 2020).
Furthermore, the KEGG pathway 03410 ‘base excision repair’ facilitating the maintenance of the genome integrity by repairing small bases lesions in the DNA was expressed at significantly higher levels in first-time infected patient samples (Figure 9A, Figure S4). In total, six out of 15 P. falciparum genes included into the KEGG pathway were found to be statistically significant enriched upon first-time infection, more precisely the putative endonuclease III (PF3D7_0614800) from the short-patch pathway and the putative A-/G-specific adenine glycosylase (PF3D7_1129500), the putative apurinic/apyrimidinic endonuclease Apn1 (PF3D7_1332600), the proliferating cell nuclear antigens 1 (PF3D7_1361900), the catalytic (PF3D7_1017000) and small (PF3D7_0308000) subunits from the DNA polymerase delta from the long-patch pathway (Figure 9B).
Discussion
Analysis of blood samples from travelers returning to Germany and hospitalized with P. falciparum malaria for the first time, allows studies of parasites’ development and gene expression unaffected by host immune responses elicited in previous Plasmodia infections. Here we analyzed gene expression in 32 such patients using direct ex vivo RNA-seq to exclude transcriptional adaptation through in vitro cultivation. Most of these returning travelers studied were infected with a single or very few parasite genotypes, which likely simplified the analysis of var gene expression due to the restricted genomic repertoire. We were able to distinguish retrospectively between first-time infected and pre-exposed patients by medical history and assessing the presence of antibodies to P. falciparum antigens. Eight out of 15 first-time infected patients were classified as severe malaria cases due to impaired organ function according to their medical reports. Despite the relatively low number of patient samples, the RNA-seq approach confirmed previously reported associations between transcripts encoding type A and B EPCR-binding PfEMP1 and infections in naïve hosts and disease severity (Table S6, S7)(Duffy et al, 2019; Tonkin-Hill et al, 2018; Kessler et al, 2017; Bernabeu et al, 2016; Jespersen et al, 2016; Lavstsen et al, 2012).
Overall, there was a high degree of consensus between analyses of the var transcriptome data in relation to the different levels of PfEMP1 domain annotation. Stratifying var gene expression according to different main and subtype of DBL and CIDR domains, showed only A- and DC8-type PfEMP1 domains, and predominantly those linked to EPCR-binding PfEMP1, to be associated with first-time infections. Conversely, domains typical for CD36-binding PfEMP1 proteins was found at higher levels in malaria-experienced patients. Specifically, expression of PfEMP1 domains included in DC8, DC13 and DC15 as well as all EPCR-binding CIDRα1 domains were associated with first time infections, whereas DBLα0 and CD36-binding CIDRα2-6 domains were linked to pre-exposed individuals. These differences were in large due to the differential expression between the first-time infected patients with more severe symptoms and patients with non-severe malaria. Here, domains of DC8 and DC15 as well as all DBLα1/2 and CIDRα1 domains were associated with severe symptoms, while NTSB, DBLα0, CIDRα2-6 domains including specific subsets of CIDRα2 were linked to non-severe symptoms. These conclusions were closely mirrored in the DBLα tag analysis, and was further corroborated by the differential RNA-seq expression stratified according to the smaller homology blocks, which identified mainly homology blocks of DBLβ1,3, 5 and 12 DBLβ domains to be associated with first-time infected patients. These DBLβ domains are parts of DCs associated with EPCR-binding, so it is hard to distinguish between co-occurring domains and clear associations. DBLβ domains do not segregate distinctly by sequence similarity into groups reflecting their observed binding to ICAM-1 and gC1qR, (Otto et al, 2019). Best defined is ICAM-1 binding DBLβ5 domains found in CD36-binding B-type PfEMP1 (Janes et al, 2011; Lennartz et al, 2019) and ICAM1-binding of and DBLβ1 and 3 domains found in EPCR-binding type-A PfEMP1 (Lennartz et al, 2017). The relative importance of ICAM1 binding to CD36 or EPCR binding PfEMP1 is not well understood, but ICAM-1-binding is believed to contribute to disease severity by either tethering endothelial binding (Bernabeu et al, 2019) or initiating or securing endothelial sequestration on inflamed endothelium, which is likely to shed EPCR (Jensen et al, 2020)
In addition, three other A-type-associated domains, CIDRγ3, CIDRδ from the DC16 and DBLβ9 from DC5, were found associated with first-time infected patients. DC5 could have been detected due to its presence C-terminally to some EPCR-binding PfEMP1. However, the CIDRδ domain of DC16 (DBLα1.5/6-CIDRδ1/2) constitute a different subset of A-type PfEMP1, which together with A-type PfEMP1 with CIDRβ2 (found in DC11) or CIDRγ3 domains may be associated with rosetting (Carlson et al, 1990; Ghumra et al, 2012). Direct evidence that any of these CIDR domains have intrinsic rosetting properties is lacking (Rowe et al, 2002). Rather, their association with rosetting may be related to their tandem expression with DBLα1 at the N-terminal head (Ghumra et al, 2012). The CIDRδ domain was not associated with severe malaria patient group and the two different DC16-associated DBLα1 domains were found associated with severe and non-severe malaria, respectively. However, CIDRγ3 expression was low, but it was found at higher levels in severe malaria patients.
The DC16 group A signature was not associated with severe disease outcome in previous DBLα-tag studies or qPCR studies by Lavstsen et al. (Lavstsen et al, 2012) and Bernabeu et al. (Bernabeu et al, 2016), but DBLα1.5/6 and CIDRδ of DC16 were enriched in cerebral malaria cases with retinopathy in the study of Shabani et al. (Shabani et al, 2017) and Kessler et al. (Kessler et al, 2017) using the same qPCR primer set. Also, association of DC11 with severe malaria in Indonesia was found using the same RNA-seq approach as used here (TonkinHill et al, 2018). Rosetting is thought to enhance microvascular obstruction but the role of ro-setting in severe malaria pathogenesis remains unclear (McQuaid & Rowe, 2020). Together with previous observations, our data suggest that pediatric cerebral malaria infections are dominated by the expansion of parasites expressing EPCR-binding domains accompanied by parasites expressing other group A PfEMP1, possibly rosetting variants.
To the best of our knowledge, this study is the first description of expression differences between the two var1 alleles, 3D7 and IT. At the transcript level the var1-IT allele was found to be enriched in parasites from first-time infected patients; conversely, several transcripts covering almost the full-length protein and in total half of the domains from the var1-3D7 allele were increased in pre-exposed and non-severely ill patients. Expression of the var1 gene was previously observed to be elevated in malaria cases imported to France with an uncomplicated disease phenotype (Argy et al, 2017). In general, the var1 subfamily is ubiquitously transcribed (Winter et al, 2003; Duffy et al, 2006), atypically late in the cell cycle after transcription of var genes encoding the adhesion phenotype (Kyes et al, 2003; Duffy et al, 2002) and is annotated as a pseudogene in 3D7 due to its premature truncation. Similarly, numerous isolates display frame-shift mutations often in exon 2 in the full gene sequences (Rask et al, 2010). However, none of these studies addressed differences in the two var1-alleles just recently described by comparing var gene sequences from 714 P. falciparum genomes (Otto et al, 2019) and to date it is unclear if both allele forms fulfill the same function or harbor the same characteristics previously described. Overall, the var1 gene – and the first 3.2 kb of the 3D7 allele in particular - seems to be under high evolutionary pressure (Otto et al, 2019) and the bi-allelic pattern can be traced back before the split of P. reichenowi from P. praefalciparum and P. falciparum (Otto et al, 2018b). Our data indicate that the two alleles, 3D7 and IT, may have different roles during disease, however, this remains to be determined in future studies.
In general, data from immunologically naïve malaria patients are rather limited, restricting our comparison mainly to the severe phenotype described in numerous previous studies. However, a recent study explored the var gene expression during infancy in Kenyan children and could correlate the waning of maternal antibodies with increasing transcription of DC8, DC13 and A-type var genes in general. After the first year of life the amount of these transcripts decreases with age and acquired immunity (Kivisi et al, 2019). A high expression of A-type var genes in naïve malaria cases imported to France and an association of DC4, 8 and 13 with disease severity has also been reported (Argy et al, 2017). Both qPCR studies are in agreement with the RNA-seq data from our cohort of immunologically malaria-naïve adults, but we could extend the list with DC15 and DC16, that are presumably involved in binding of infected erythrocytes to EPCR and that may also mediate binding to uninfected red blood cells by ro-setting.
Overall, most studies – including this one – are looking at differentially expressed genes. From CHMI studies we already know that at the early onset of infection the parasite population expresses a wide range of different var genes located in the subtelomeric regions (A- and mainly B-type var genes). Since B-types are also highly expressed in pre-exposed cases domains of these PfEMP1s may be missing in pattern of first-time infected patients. Furthermore, EPCR- and ICAM-1 binding and rosetting-mediating variants may confer a parasite growth advantage in malaria naïve hosts, and in some circumstances increase the risk for severe malaria, so that a selection towards these variants may have been already occurred in our patients. This would fit nicely to the observation that the severe patients within the malaria-naïve patient group seem to be infected for a longer time period. Moreover, patients with preformed immunity recognize several CIDRα1 domains capable of EPCR-binding as well as the two atypical CIDR head domains CIDRδ1 and CIDRγ3 more frequently than CIDRα2-6 domains with CD36-binding affinity. This is in agreement with studies from malaria endemic setting indicating that IgG against EPCR-binding domains were acquired first followed by domains with unknown binding phenotypes associated with rosetting and CD36-binding domains. Resulting in the earlier acquisition of antibodies against DBL and CIDR domains of group A and B/A, associated with EPCR binding, than against B- and C-type domains (Obeng-Adjei et al, 2020; Cham et al, 2009, 2010; Turner et al, 2015).
Genes involved in PfEMP1 biology were found to be expressed at lower levels in severe malaria patients (Tonkin-Hill et al, 2018). In concordance, GO term analysis revealed a general lower expression of genes involved in antigenic variation or host cell remodeling during firsttime infection and severe disease, but the expression analysis of clonally variant genes is complicated by the existence of multiple diverse families that have strain specific members leading to mis-mapping of reads to genes present in the 3D7 strain that do not exist in the clinical samples analyzed. Therefore, manually selected genes encoding regulators of var gene expression and PfEMP1 trafficking were additionally inspected for association with a naïve immune status and severity. The NAD+-dependent histone deacetylases Sir2a and Sir2b remove acetyl groups from the N-terminal tails of histone 3 and 4 and are therefore considered as var silencing factors. Sir2b regulates the most telomeric B-type var genes, Sir2a-regulated var genes are of type A, C and E by deacetylation of H3K9ac, H3K14ac and H4K16ac (Duffy et al, 2014; Tonkin et al, 2009). A down-regulation of Sir2a indicates an elevated expression of these var types, which is in concordance with our data on A- and E-type var genes found more frequently in first-time infected patients. Contradictory to a previous study showing that the exoribonuclease PfRNase II controls the silencing of A-type var genes and was negatively associated with var-A expression in severe malaria, PfRNase II expression was significantly enriched in first-time infected and severely ill patients (Zhang et al, 2014). Our results also contradicted down-regulation of A- and partially B-type var genes by conditional knockout of SWIB (Wang & Zhang, 2020). However, high expression levels of the var1-3D7 allele, also an A-type var gene, in pre-exposed and non-severe cases may be responsible for these opposite results. On the other hand, expression of genes encoding other factors was also found inverted in comparison to other studies. GBP was found to be highly enriched in severe relative to uncomplicated malaria cases (Lee et al, 2018), but is significantly lower expressed in first-time infected patients and has the same trend in severe cases in this study. Contrarily, LyMP, SBP1 and SET proteins were expressed on a significantly lower level in severe cases in the study of Tonkin-Hill et al. (Tonkin-Hill et al, 2018), but elevated levels were found in our study.
A recent study showed that parasites isolated from symptomatic infections were on average younger than blood-circulating parasites from asymptomatic infections presumably due to a more efficient sequestration of parasites in the symptomatic cases (Andrade et al, 2020). Based on this and our findings we hypothesize that parasites circulating in severely ill patients are younger due to the expression of EPCR-/ICAM-1-binding PfEMP1 variants, whereas parasites in non-severe patients may circulate longer due to their expression of PfEMP1 binding to CD36. Although affinity of the CIDR domains to each of these receptors was shown to be similar with median Kd values of 12 nM for CD36 (Hsieh et al, 2016) and 16.5 nM for EPCR (Lau et al, 2015), recent papers describe a rolling binding phenotype of infected red blood cells over CD36 and a static binding for EPCR and ICAM-1 under flow conditions (Lubiana et al, 2020; Bernabeu et al, 2019; Dasanna et al, 2017; Herricks et al, 2013). Due to their biconcave shape trophozoites seem to roll faster but less stable by flipping over CD36-expressing cells, whereas schizonts roll over longer distances at different shear stresses applied. This might explain why young trophozoites are found in blood samples from non-severe cases, but not older trophozoites or schizont stages as also previously described (Tonkin-Hill et al, 2018).
The parasite population in first-time infected individuals may have broader binding potential after liver release as there is no pre-existing immunity to clear previously experienced PfEMP1 variants, but during the blood stage infection variants with high-affinity binding to EPCR and ICAM-1 binding are selected which may lead to severe symptoms. This hypothesis is supported by (1) the difference in parasite age between severe and non-severe malaria cases calculated by matching RNA-seq data to a reference data set (López-Barragán et al, 2011). (2) This correlates with a higher var1 expression in parasite from non-severe and pre-exposed patients, which expression is not suppressed in trophozoites (Kyes et al, 2003). (3) The expression of EPCR- & ICAM-1-binding variants in parasites from severe and first-time infected patients is significantly increased, conversely transcripts of CD36-binding variants are found more frequently in parasites from non-severe and pre-exposed patients. For parasites survival and transmission, it may be highly beneficial to have more less virulent PfEMP1 variants able to bind CD36. This interaction may not, or is less likely to, lead to obstruction of blood flow, inflammation and organ failure at least of the brain, where CD36 is nearly absent.
Material and methods
Ethics statement
The study was conducted according to the principles of the Declaration of Helsinki in its 6th revision as well as International Conference on Harmonization–Good Clinical Practice (ICH-GCP) guidelines. All patients, aged 19 to 70 years, provided written informed consent for this study, which was approved by the Ethical Review Board of the Medical Association of Hamburg (reference number PV3828).
Blood sampling and processing
Blood samples from P. falciparum malaria patients collected either at the diagnostic unit of the Bernhard Nocht Institute for Tropical Medicine, Hamburg, Germany, or at the Medical Clinic and Polyclinic of University Clinic Hamburg-Eppendorf, Hamburg, Germany, were used in this study. EDTA blood samples (1–30 mL) were obtained from the patients. The plasma was separated by centrifugation and immediately stored at −20°C. Erythrocytes were isolated by Ficoll gradient centrifugation followed by filtration through Plasmodipur filters (EuroProxima) to clear the remaining granulocytes. An aliquot of red blood cells (about 50–100 μl) was separated and further processed for gDNA purification. At least 400 μl of purified erythrocytes were rapidly lysed in 5 volumes pre-warmed TRIzol (ThermoFisher Scientific) and stored at −80°C until further processing (ex vivo samples, n = 32).
Serological assays
Luminex assay
The Luminex assay was conducted as previously described using the same plex of antigens tested (Bachmann et al, 2019). In brief, plasma samples from patients were screened for individual recognition of 19 different CIDRα1, 12 CIDRα2–6, three CIDRδ1 domains and a single CIDRγ3 domain as well as of the controls AMA1, MSP1, CSP, VAR2CSA (VAR2), tetanus toxin (TetTox) and BSA. The data are shown as mean fluorescence intensities (MFI) allowing comparison between different plasma samples, but not between different antigens. Alternatively, the breadth of antibody recognition (%) was calculated using MFI values from Danish controls plus two standard deviations (SD) as cut off.
Merozoite-triggered antibody-dependent respiratory burst (mADRB)
The assay to determine the mADRB activity of the patients was set up as described before (Kapelski et al, 2014). Polymorphonuclear neutrophil granulocytes (PMNs) from one healthy volunteer were isolated by a combination of dextran-sedimentation and Ficoll-gradient centrifugation. Meanwhile, 1.25 x 106 merozoites were incubated with 50 μl of 1:5 diluted plasma (decomplemented) for 2 h. The opsonized merozoites were pelleted (20 min, 1500 g), re-suspended in 25 μl HBSS and then transferred to a previously blocked well of an opaque 96 well high-binding plate (Greiner Bio-One). Chemiluminescence was detected in HBSS using 83.3 μM luminol and 1.5 x 105 PMNs at 37°C for 1 h to characterize the PMN response, with readings taken at 2 min intervals using a multiplate reader (CLARIOstar, BMG Labtech). PMNs were added in the dark, immediately before readings were initiated.
ELISA
Antibody reactivity against parasitophorous vacuolar membrane-enclosed merozoite structures (PEMS) was estimated by ELISA. PEMS were isolated as described before (Llewellyn et al, 2015). For the ELISA, 0.625 x 105 PEMS were coated on the ELISA plates in PBS. Plates were blocked using 1% Casein (Thermo Scientific #37528) and incubated for 2 h at 37°C. After washing using PBS/0.1% Tween, plasma samples were added at two-fold dilutions of 1:200 to 1:12800 in PBS/0.1% Casein. The samples were incubated for 2 h at room temperature (RT). IgG was quantified using HRP-conjugated goat anti-human IgG at a dilution of 1:20,000 and incubated for 1 h. For the color reaction, 50 μl of TMB substrate was used and stopped by adding 1 M HCl after 20 min. Absorbance was quantified at 450 nm using a multiplate reader (CLARIOstar, BMG Labtech).
Protein microarray
Microarrays were produced at the University of California Irvine, Irvine, California, USA (Doolan et al, 2008). In total, 262 P. falciparum proteins representing 228 unique antigens were expressed using an E. coli lysate in vitro expression system and spotted on a 16-pad ONCYTE AVID slide. The selected P. falciparum antigens are known to frequently provide a positive signal when tested with sera from individuals with sterile and naturally acquired immunity against the parasite (Obiero et al, 2019; Dent et al, 2016; Doolan et al, 2008; Felgner et al, 2013). For the detection of binding antibodies, secondary IgG antibody (goat anti-human IgG QDot™800, Grace Bio-Labs #110635), secondary IgM antibody (biotin-SP-conjugated goat anti-human IgM, Jackson ImmunoResearch #109-065-043) and Qdot™585 Streptavidin Conjugate (Invitrogen #Q10111MP) were used (Taghavian et al, 2018).
Study serum samples as well as the European control serum were diluted 1:50 in 0.05X Super G Blocking Buffer (Grace Bio-Labs, Inc.) containing 10% E. coli lysate (GenScript, Piscataway, NJ) and incubated for 30 minutes on a shaker at RT. Meanwhile, microarray slides were rehydrated using 0.05X Super G Blocking buffer at RT. Rehydration buffer was subsequently removed and samples added onto the slides. Arrays were incubated overnight at 4°C on a shaker (180 rpm). Serum samples were removed the following day and microarrays were washed using 1X TBST buffer (Grace Bio-Labs, Inc.). Secondary antibodies were then applied at a dilution of 1:200 and incubated for two hours at RT on the shaker, followed by another washing step and a one-hour incubation in a 1:250 dilution of Qdot™585 Streptavidin Conjugate. After a final washing step, slides were dried by centrifugation at 500 g for 10 minutes. Slide images were taken using the ArrayCAM® Imaging System (Grace Bio-Labs) and the ArrayCAM 400-S Microarray Imager Software.
Microarray data were analyzed in R statistical software package version 3.6.2. All images were manually checked for any noise signal. Each antigen spot signal was corrected for local background reactivity by applying a normal-exponential convolution model (McGee & Chen, 2006) using the RMA-75 algorithm for parameter estimation (available in the LIMMA package v3.28.14) (Silver et al, 2009). Data was log2-transformed and further normalized by subtraction of the median signal intensity of mock expression spots on the particular array to correct for background activity of antibodies binding to E. coli lysate. After log2 transformation data approached normal distribution. Differential antibody levels (protein array signal) in the different patient groups were determined by Welch-corrected Student’s t-test. Antigens with p<0.05 and a fold change >2 of mean signal intensities were defined as differentially recognized between the tested sample groups. Volcano plots were generated using the PAA package (Turewicz et al, 2016) and GraphPad Prism 8. Individual antibody breadths were defined as number of seropositive features with signal intensities exceeding an antigen-specific threshold set at six standard deviations above the mean intensity in negative control samples.
Unsupervised random forest model
An unsupervised random forest (RF) model, a machine learning method based on multiple classification and regression trees, was calculated to estimate proximity between patients. Variable importance was calculated, which shows the decrease in prediction accuracy if values of a variable are permuted randomly. The k-medoids clustering method was applied on the proximity matrix to group patients according to their serological profile. Input data for random forest were Luminex measurements for MSP1, AMA1 and CSP reduced by principal component analysis (PCA; first principal component selected), mADRB, ELISA, and antibody breadth of IgG and IgM determined by protein microarray were used to fit the RF model. Multidimensional scaling was used to display patient cluster. All analyses were done with R (4.02) using the packages randomForest (4.6-14) to run RF models and cluster (2.1.0) for k-medoids clustering.
DNA purification and MSP1 genotyping
Genomic DNA was isolated using the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s protocol. To assess the number of P. falciparum genotypes present in the patient isolates, MSP1 genotyping was carried out as described elsewhere (Robert et al, 1996).
RNA extraction, RNA-seq library preparation, and sequencing
TRIzol samples were thawed, mixed rigorously with 0.2 volumes of cold chloroform and incubated for 3 min at room temperature. After centrifugation for 30 min at 4°C and maximum speed, the supernatant was carefully transferred to a new tube and mixed with an equal volume of 70% ethanol. Afterwards the manufacturer’s instruction from the RNeasy MinElute Kit (Qiagen) were followed with DNase digestion (DNase I, Qiagen) for 30 min on column. Elution of the RNA was carried out in 14 μl. Human globin mRNA was depleted from all samples except from samples #1 and #2 using the GLOBINclear kit (ThermoFisher Scientific). The quality of the RNA was assessed using the Agilent 6000 Pico kit with the Bioanalyzer 2100 (Agilent) (Figure S5), the RNA quantity using the Qubit RNA HA assay kit and a Qubit 3.0 fluorometer (ThermoFisher Scientific). Upon arrival at BGI Genomics Co. (Hong Kong), the RNA quality of each sample was double-checked before sequencing. The median RIN value over all ex vivo samples was 6.75 (IQR: 5.93–7.40) (Figure S5), although this measurement has only limited significance for samples containing RNA of two species. Customized library construction in accordance to Tonkin-Hill et al. (Tonkin-Hill et al, 2018) including amplification with KAPA polymerase and HiSeq 2500 100 bp paired-end sequencing was also performed by BGI Genomics Co. (Hong Kong).
RNA-seq read mapping and data analysis
Var gene assembly
Var genes were assembled using the pipeline described in Tonkin-Hill et al. (Tonkin-Hill et al, 2018). Briefly, non-var reads were first filtered out by removing reads that aligned to H. sapiens, P. vivax or non-var P. falciparum. Assembly of the remaining reads was then performed using a pipeline combining SOAPdenovo-Trans and Cap3 (Xie et al, 2014; Huang & Madan, 1999; Liao et al, 2013). Finally, contaminants were removed from the resulting contigs and they were then translated into the correct reading frame. Reads were mapped to the contigs using BWA-MEM (Li, 2013) and RPKM values were calculated for each var transcript to compare individual transcript levels in each patient. Although transcripts might be differentially covered by RNA-seq due to their variable GC content, this seems not to be an issue between var genes (Tonkin-Hill et al, 2018).
Var transcript differential expression
Expression for the assembled var genes was quantified using Salmon v0.14.1 (Patro et al, 2017) for 531 transcripts with five read counts in at least 3 patient isolates. Both the naïve and pre-exposed groups as well as the severe and non-severe groups were compared. The combined set of all de novo assembled transcripts was used as a reference in addition to the coding regions of the 3D7 reference genome. The Salmon algorithm identifies equivalence sets between transcripts allowing a single read to support the expression of multiple transcripts. This enables it to account for the redundancy present in our dataset. To confirm the suitability of this approach we also ran the Corset algorithm as used in Tonkin-Hill et al., (Tonkin-Hill et al, 2018; Davidson & Oshlack, 2014). Unlike Salmon which attempts to quantify the expression of transcripts themselves, Corset copes with the redundancy present in de novo transcriptome assemblies by clustering similar transcripts together using both the sequence identify of the transcripts as well as multi-mapping read alignments. Of the transcripts identified using the Salmon analysis 5/15 in the naïve versus pre-exposed and 4/13 in the severe versus non-severe were identified in the significant clusters produced using Corset. As the two algorithms take very different approaches and as Salmon is quantifying transcripts rather than the ‘gene’ like clusters of Corset this represents a fairly reasonable level of concordance between the two methods. In both the Salmon and Corset pipelines differential expression analysis of the resulting var expression values was performed using DESeq2 v1.26 (Love et al, 2014). The Benjamini-Hochberg method was used to control for multiple testing (Benjamini & Hochberg, 1995).
To check differential expression of the conserved var gene variants var1-3D7, var1-IT and var2csa raw reads were mapped with BWA-MEM (AS score >110) to the reference genes from the 3D7 and the IT strains. The mapped raw read counts (bam files) were normalized with the number of 3D7 mappable reads in each isolate using bamCoverage by introducing a scaling factor to generate bigwig files displayed in Artemis (Carver et al, 2012).
Var domain and segment differential expression
Differential expression analysis at the domain and segment level was performed using a similar approach to that described previously (Tonkin-Hill et al, 2018). Initially, the domain families and homology blocks defined in Rask et al. were annotated to the assembled transcripts using HMMER v3.1b2 (Rask et al, 2010; Eddy, 2011). Domains and segments previously identified to be significantly associated with severe disease in Tonkin-Hill et al., 2018 were also annotated by single pairwise comparison in the assembled transcripts using USEARCH v11.0.667 (Tonkin-Hill et al, 2018; Edgar, 2010). Overall, 336 contigs (5.22% of all contigs >500 bp) possess partial domains in an unusual order, e.g. an NTS in an internal region or a tandem arrangement of two DBLα or CIDRα domains. This might be caused by de novo assembly errors, which is challenging from transcriptome data. Therefore, in both cases the domain or segment with the most significant alignment was taken as the best annotation for each region of the assembled var transcripts (E-value cutoff of 1e^8), once with the additional requirement that at least 60% of the domain was found. The expression at each of these annotations was then quantified using featureCounts v1.6.4 before the counts were aggregated to give a total for each domain and segment family in each sample. Finally, similar to the transcript level analysis, DESeq2 was used to test for differences in expression levels of both domain and segment families in the naïve versus pre-exposed groups as well as the severe versus non-severe groups. Again, more than five read counts in at least three patient isolates were required for inclusion into differential expression analysis.
Differential expression of core genes
Differential gene expression analysis of P. falciparum core genes was done in accordance to Tonkin-Hill et al. (Tonkin-Hill et al, 2018) by applying the script as given in the Github repository (https://github.com/gtonkinhill/falciparum_transcriptome_manuscript/tree/master/all_gene_analysis). In brief, subread-align v1.4.6 (Liao et al, 2013) were used to align the reads to the H. sapiens and P. falciparum reference genomes. Read counts for each gene were obtained with FeatureCounts v1.20.2 (Liao et al, 2014). To account for parasite life cycle, each sample is considered as a composition of six parasite life cycle stages excluding the ookinete stage (López-Barragán et al, 2011). Unwanted variations were determined with the ‘RUV’ (Remove Unwanted Variation) algorithm implemented in the R package ruv v0.9.6 (Gagnon-Bartsch & Speed, 2012) adjusting for systematic errors of unknown origin by using the genes with the 1009 lowest p-values as controls as described in (Vignali et al, 2011). The gene counts and estimated ring-stage factor, and factors of unwanted variation were then used as input for the Limma /Voom (Law et al, 2014; Smyth, 2005) differential analysis pipeline.
Functional enrichment analysis of differentially expressed core genes
Genes that were identified as significantly differentially expressed (defined as −1<logFC>1, p<0.05) during prior differential gene expression analysis were used for functional enrichment analysis using the R package gprofiler2 (Kolberg et al, 2020). Enrichment analysis was performed on multiple input lists containing genes expressed significantly higher (logFC > 1, P < 0.05) and lower (logFC < −1, P < 0.05) between different patient cohorts. All var genes were excluded from the enrichment analysis. For custom visualization of results, gene set data sources available for P. falciparum were downloaded from gprofiler (Raudvere et al, 2019). Pathway data available in the KEGG database (https://www.kegg.jp/kegg/) was accessed via the KEGG API using KEGGREST (Tenenbaum, 2020) to supplement gprofiler data sources and build a custom data source in Gene Matrix Transposed file format (*.gmt) for subsequent visualization. Functional enrichment results were then output to a Generic Enrichment Map (GEM) for visualization using the Cytoscape EnrichmentMap app (Merico et al, 2010) and RCy3 (Gustavsen et al, 2019). Bar plots of differential gene expression values for genes of selected KEGG pathways were generated using ggplot2 (Wickham, 2016) and enriched KEGG pathways were visualized using KEGGprofile (Zhao S, Guo Y, 2020).
DBLα-tag sequencing
For DBLα-tag PCR the forward primer varF_dg2 (5’-tcgtcggcagcgtcagatgtgtataagaga-cagGCAMGMAGTTTYGCNGATATWGG-3’) and the reverse primer brlong2 (5’-gtctcgtgggctcggagatgtgtataagagacagTCTTCDSYCCATTCVTCRAACCA-3’) were used resulting in an amplicon size of 350-500 bp (median 422 bp) plus the 67 bp overhang (small type). Template cDNA (1 μl) was mixed with 5x KAPA HiFi buffer, 0.3 μM of each dNTP, 2 μM of each primer and 0.5 U KAPA HiFi Hotstart Polymerase in a final reaction volume of 25 μl. Reaction mixtures were incubated at 95°C for 2 min and then subjected to 35 cycles of 98°C for 20 s, 54°C for 30 s and 68°C for 75 s with a final elongation step at 72°C for 2 min. For the first 5 cycles cooling from denaturation temperature is performed to 65°C at a maximal ramp of 3°C per second, then cooled to 54°C with a 0.5°C per second ramp. Heating from annealing temperature to elongation temperature was performed with 1°C per second, all other steps with a ramp of 3°C per second. Agarose gel images taken afterwards showed clean amplicons. The DBLα-tag primers contain an overhang, which was used to conduct a second indexing PCR reaction using sample-specific indexing primers as described in Nag et al. (Nag et al, 2017). The overhang sequence also serves as annealing site for Illumina sequencing primers and indexing primers include individual 8-base combinations and adapter sequences that will allow the final PCR product to bind in MiSeq Illumina sequencing flow cells. Indexing PCR reactions were performed with a final primer concentration of 0.065 μM and 1 μl of first PCR amplicon in a final volume of 20 μl; and by following steps: Heat activation at 95 °C, 15 min, 20 cycles of 95 °C for 20 s, 60 °C for 1 min and 72 °C for 1 min, and one final elongation step at 72 °C for 10 min. Indexing PCR amplicons were pooled (4 μl of each) and purified using AM-Pure XP beads (Beckman Coulter, California, United States) according to manufacturer’s protocol, using 200 μl pooled PCR product and 0.6 x PCR-pool volume of beads, to eliminate primer dimers. The purified PCR pool were analyzed on agarose gels and Agilent 2100 Bioanalyser to verify elimination of primer dimers, and correct amplicon sizes. Concentration of purified PCR pools was measured by Nanodrop2000 (Thermo Fisher Scientific, Waltham, MA, USA) and an aliquot adjusted to 4 nM concentration was pooled with other unrelated DNA material and added to an Illumina MiSeq instrument for paired end 300 bp reads using a MiSeq v3 flow cell.
DBLα-tag sequence analysis
The paired-end DBLα-tag sequences were identified and partitioned into correct sample origin based on unique index sequences. Each indexed raw sequence-pair were then processed through the Galaxy webtool (usegalaxy.eu). Read quality checks was first performed with FastQC to ensure a good NGS run (sufficient base quality, read length, duplication etc.). Next, the sequences were trimmed by the Trimmomatic application, with a four base sliding window approach and a Phred quality score above 20 to ensure high sequence quality output. The trimmed sequences were then paired and converted, following analysis using the Varia tool for quantification and prediction of the domain composition of the full-length var sequences from which the DBLα-tag originated (Mackenzie et al, 2020). In brief, Varia clusters DBLα-tags with 99% sequence identity using Vsearch program (v2.14.2), and each unique tag is used to search a database consisting of roughly 235,000 annotated var genes for near identical var sequences (95% identity over 200 nucleotides). The domain composition of all “hit” sequences is checked for conflicting annotations and the most likely domain composition is retuned. The tool validation indicated prediction of correct domain compositions for around 85% of randomly selected var tags, with higher hit rate and accuracy of the N-terminal domains. An average of 2,223.70 reads per patient sample was obtained and clusters consisting of less than 10 reads were excluded from the analysis. The raw Varia output file is given in Table S8. The proportion of transcripts encoding a given PfEMP1 domain type or subtype was calculated for each patient. These expression levels were used to first test the hypothesis that N-terminal domain types associated with EPCR are found more frequently in first-time infections or upon severity of disease, while those associated with non-EPCR binding were associated with pre-exposed or mild cases. Secondly, quantile regression was used to calculate median differences (with 95%-confidence intervals) in expression levels for all main domain classes and subtypes between severity and exposure groups. All analyses were done with R (4.02) using the package quantreg (5.73) for quantile regression.
For the comparison of both approaches, DBLα-tag sequencing and RNA-seq, only RNA-seq contigs spanning the whole DBLα-tag region were considered. All conserved variants, the subfamilies var1, var2csa and var3, detected by RNA-seq were omitted form analysis since they were not properly amplified by the DBLα-tag primers. To scan for the occurrence of DBLα-tag sequences within the contigs assembled from the RNA-seq data we applied BLAST (basic local alignment search tool) v2.9.0 software (Altschul et al, 1990). Therefore, we created a BLAST database from the RNA-seq assemblies and screened for the occurrence of those DBLα-tag sequence with more than 97% percent sequence identity using the “megablast option”.
Calculation of the proportion of RNA-seq data covered by DBLα-tag was done with the upper 75th percentile based on total RPKM values determined for each patient. Vice versa, only DBLα-tag clusters with more than 10 reads were considered and percent coverage of reads and clusters calculated for each individual patient.
For all samples the agreement between the two molecular methods DBLα-tag sequencing and RNA-seq was analyzed with a Bland-Altman plot, each individually and summarized. The ratio between %-transformed measurements are plotted on the y-axis and the mean of the respective DBLα-tag and RNA-seq results are plotted on the x-axis. The bias and the 95% limits of agreement were calculated using GraphPad Prism 8.4.2.
Supplement
Supplement figure 1: Early immune response in mild and severe malaria within the naïve patient cluster. Antibody reactivity against individual antigens within the three subgroups ‘naïve with mild symptoms’, ‘naïve with severe symptoms’ and ‘pre-exposed with mild symptoms’. Sera from all volunteers were assessed on protein microarrays and data normalized to control spots containing no antigen (no DNA control spots). Median reactivity of the mild infected malaria-naïve, severely infected malaria-naïve as well as the mild infected with preexposure to malaria are represented as bar-charts. IgG data is given for all 262 P. falciparum proteins spotted on the microarray representing 228 unique antigens (A). To estimate differences in immune response in mild and severe malaria within the malaria-naïve population, normalized IgG (B) and IgM (C) antibody responses were compared in the two subpopulations. Differentially recognized antigens (p <0.05 and fold change >2) are depicted in red.
Supplement figure 2: Differential expression of the var1-allele forms and var2csa between patient groups. RNA-seq reads from each patient were normalized against the number of mappable reads to the 3D7 genome and aligned to the var1-3D7 and var1-IT allele forms as well as var2csa. The resulting bigwig files were displayed in Artemis (Carver et al, 2012). Individual samples are colored according to the patient group: first-time infected in blue (A), severe in red (B) and the respective pre-exposed or non-severe samples in grey.
Supplement figure 3: Comparison of the DBLα-tag sequencing with RNA-seq analysis. DBLα-tag sequencing and RNA-seq data compared in Bland-Altman plots for all patients summarized (A) and for each individual patient (B), where the mean log expression of each gene is indicated on the X-axis and the log ratio between normalized DBLα-tag (% of reads) and RNA-seq values (% of RPKM from all contigs containing both DBLα-tag primer binding sites) on the y-axis. The mean (equal to bias) of all ratios (line) and the confidence interval (CI) of 95% (dotted lines) are indicated. Data points with negative values for one of the approaches are displayed in dependence of their mean log expression on top (DBLα-tag sequence clusters not detected by RNA-seq) or bottom (RNA-seq contigs not found within DBLα-tag sequence cluster) of the graph.
Supplement figure 4: The base excision repair (KEGG:03410) in P. falciparum. Orthologues present in P. falciparum are indicated by gene IDs, log fold changes (logFC) are indicated by color code (red: up-regulated, blue: down-regulated) (A). Summary of logFC in gene expression in first-time infected relative to pre-exposed patients and p-values for the logFC.
Supplement figure 5: RNA quality. The Bioanalyzer automated RNA electrophoresis system was used to characterize the total RNA quality prior library synthesis. The calculated RIN value is provided, although this measurement is questionable for samples from mixed species. From the four rRNA peaks visible in all samples, the inner peaks represent P. falciparum 18S and 28S rRNA, the outer peaks are of human origin.
Data S1: Sequences of assembled var contigs from all patient isolates.
Supplement table 1: Data from Luminex, mADRB, ELISA and protein microarray. Sero-prevalence of head-structure CIDR domains determined by applying a cut off from Danish controls (mean + 2 STD) to the Luminex data.
Supplement table 2: Raw read counts by sample for H. sapiens, P. falciparum, var exon 1 and percentage of reads that mapped either to P. falciparum or var exon 1 as well as the number of assembled var contigs >500 bp in length.
Supplement table 3: Features of the assembled var fragments annotated in accordance with Rask et al.. (Rask et al, 2010) and Tonkin-Hill et al. (Tonkin-Hill et al, 2018). The reading frame used for translation is given after the contig ID, the position of each annotation is provided by starting and ending amino acid followed by the p-value from the blast search against the respective database. For annotations in accordance with Tonkin-Hill et al. (Tonkin-Hill et al, 2018) either the short ID or ‘NA’ (not applicable) is listed at the end. Short IDs are only available for significant differently expressed domains and blocks between severe and non-severe cases (Tonkin-Hill et al, 2018).
Supplement table 4: Summary of var gene fragments assembled for each patient isolate showing length, raw read counts, RPKM, blast hits, domain and block annotations in accordance with Rask et al.. (Rask et al, 2010). The RPKM for the contigs was calculated as number of mapped reads and normalized by the number of mapped reads against all transcript in each isolate, respectively. Therefore, RPKM expression values are only valid to compare within a single sample since RNA-seq reads were mapped only to the contigs of the respective patient isolate using BWA-MEM (Li, 2013). Further, the amount of blast hits with 500 bp or 80% of overlap against the ~2400 samples from varDB (Otto, 2019) with an identity cutoff of 98%. Further hits of 1 kb (>98% identity) against the var genes from the 15 reference genomes (Otto et al, 2018a) are listed. The last two column show the annotations from Rask et al. (Rask et al, 2010) associated to each contig.
Supplement table 5: Log transformed normalized Salmon read counts for assembled var transcripts, TPM for collapsed domains and homology blocks from each patient isolate. Normalized counts and TPM values calculated for transcripts, domains and blocks with expression in at least three patient isolates with more than five read counts.
Supplement table 6: Differently expressed var transcripts, domains and homology blocks between first-time infected and pre-exposed patient samples.
Supplement table 7: Differently expressed var transcripts, domains and homology blocks between severe and non-severe patient samples.
Supplement table 8: Data from DBLα-tag sequencing.
Supplement table 9: Differentially expressed genes excluding var genes (all gene analysis) between first-time infected and pre-exposed patient samples.
Supplement table 10: Differentially expressed genes excluding var genes (all gene analysis) between severe and non-severe patient samples.
Acknowledgments
We thank all the patients providing an extra blood sample for our research purposes. Furthermore, we thank Jürgen May for critical reading of the manuscript and Tobias Spielmann for helpful discussions. We thank Marlene Danner Dalgaard, Kathrine Hald Langhoff and Sif Ravn Søeborg technical assistance with DBLα-tag sequencing.