Chromatin structure and var2csa – a tango in regulation of var gene expression in the human malaria parasite, Plasmodium falciparum?

Over the last few decades, novel methods have been developed to study how chromosome positioning within the nucleus may play a role in gene regulation. Adaptation of these methods in the human malaria parasite, Plasmodium falciparum, has recently led to the discovery that the three-dimensional structure of chromatin within the nucleus may be critical in controlling expression of virulence genes (var genes). Recent work has implicated an unusual, highly conserved var gene called var2csa in contributing to coordinated transcriptional switching, however how this gene functions in this capacity is unknown. To further understand how var2csa influences var gene switching, targeted DNA double-strand breaks (DSBs) within the sub-telomeric region of chromosome 12 were used to delete the gene and the surrounding chromosomal region. To characterize the changes in chromatin architecture stemming from this deletion and how these changes could affect var gene expression, we used a combination of RNA-seq, Chip-seq and Hi-C to pinpoint epigenetic and chromatin structural modifications in regions of differential gene expression. We observed a net gain of interactions in sub-telomeric regions and internal var gene regions following var2csa knockout, indicating an increase of tightly controlled heterochromatin structures. Our results suggest that disruption of var2csa results not only in changes in var gene transcriptional regulation but also a significant tightening of heterochromatin clusters thereby disrupting coordinated activation of var genes throughout the genome. Altogether our result confirms a strong link between the var2csa locus, chromatin structure and var gene expression.


INTRODUCTION
Although the recent coronavirus pandemic has dominated the news for the last few years, it is important to note that human malaria parasites still infect 249 million people worldwide annually.Plasmodium, the causative agent of malaria, remains one of the deadliest parasite-borne diseases, with an estimated 608,000 deaths in 2022 [1] affecting mostly children and immunocompromised individuals (e.g., pregnant women and the elderly) in sub-Saharan Africa and Southeast Asia.There are five members of the Plasmodium genus known to infect humans, Plasmodium falciparum being most deadly.P. falciparum can not only evade the host immune system but can sequester and cyto-adhere to blood vessels leading to circulatory obstruction resulting in dysfunction of multiple organs including the brain.Understanding the mechanisms by which P. falciparum evades the immune response, and thus sustains further replication within the host, is key to identifying novel drug targets and abating the impact of malaria on global health.
To evade the host immune responses, the parasite switches the expression of specific exported proteins or antigens on the surface of infected red blood cells.The molecular mechanisms underlying 'antigenic variation' have been extensively studied over the years with particular focus on the var gene family encoding Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP-1).Expression of this gene family is thought to be regulated, at least partially, by epigenetic mechanisms [2,3].In eukaryotes, epigenetics involves various covalent modifications of nucleic acids and histone proteins which lead to changes in chromatin structure and gene expression.Similar mechanisms of gene regulation have been detected in P. falciparum with tight control of virulence genes organized in heterochromatin clusters with repressive histone mark H3K9me3 [4, 5, 6] localized at the nuclear periphery [7,8].PfEMP1 is exported to the plasma membrane of an infected red blood cell and mediates the adhesion of the infected erythrocytes to receptors on endothelial cells of the post capillary endothelium.Although there are approximately 60 members of the var gene family, only one var gene is expressed at any given time through a process known as mutually exclusive gene expression [9,10].To circumvent the host immune response, the parasite will switch the var gene expressed at a rate of 2-18% per generation preventing the recognition of parasitized cells by host antibodies [3,11,12].During this switching process, it has been hypothesized that an intermediate state exists in which there is no dominantly expressed var gene and instead transcription is dispersed across several var genes before settling on a single dominant gene [13].This was originally referred to as the "many" state and data supporting this model were recently obtained through transcriptomic analysis [14].It is not known how parasites transition between these transcriptional states; however, this model implies a mechanism that unites the entire var gene family into a single network, potentially providing the basis for coordination of var gene expression within a large population of infected RBCs.Transcriptional dominance is complex and recent evidence in other models, including transcriptional regulation of odorant receptor genes, has identified possible non-coding mechanisms within mRNA that mediate the exclusive expression of a single gene within a large variable gene family [15].Of the 60 var genes comprising the PfEMP1 family, var2csa is one of the most conserved between parasite strains and is the only variant whose transcription is upregulated in P. falciparumparasitized erythrocytes in the placenta, causing clinical symptoms in the expectant mother and serious harm to the fetus [16,17,18].Pregnancy associated malaria (PAM) results from sequestration of infected erythrocytes by binding chondroitin sulfate A (CSA) on the surface of syncytiotrophoblasts in the placental tissue [16].Although expression on the surface of CSA-bound infected erythrocytes is exclusive to pregnant women, var2csa can be translationally repressed in infected erythrocytes while still being transcriptionally active at very low levels in non-pregnant populations, providing evidence that var2csa serves an additional, possible regulatory function [19,20,21,22].
Although considerable progress has been made in elucidating the genetic and epigenetic basis for coordinated var gene switching, the hierarchy of importance in mechanisms controlling regulation remains unknown.Switching between different var genes is highly coordinated and imperative to parasite survival, but evidence suggests that some var genes may be more important than others and thus the rate of switching is variable depending on the location of the expressed var gene-subtelomeric versus central regions of the chromosomes [23,13,24,25,26].Internal var genes are remarkably stable and transcriptional switching is uncommon compared to subtelomeric var genes which readily switch in the presence of environmental pressure [25,26,27].Manipulation of H3K9me3 and H3K36me3 epigenetic marks results in upregulation of var2csa expression within a mixed population, regardless of previous var gene expression, suggesting that var2csa inhabits a prominent position within the var gene activation hierarchy [25,28,29].Furthermore, when var2csa was deleted, var gene switching was disabled or greatly reduced, resulting in much more stable var gene expression over time [14].This coincided with an overall reduction in the level of var transcripts, in particular the more dispersed transcriptional pattern observed in parasites transitioning through the hypothetical switching intermediate state.
This implied that var gene expression had somehow become more tightly regulated, with fewer genes displaying even low-level transcriptional activity.However, how the loss of the var2csa locus led to this broader transcriptional phenotype is not known.Using a combination of transcriptomic, epigenetic and chromatin structure analyses we determined that var2csa deletion leads to a significant tightening of heterochromatin clusters throughout the genome, thus providing a likely explanation for reduced var transcription and disrupted transcriptional switching.These results confirm the importance of chromatin architecture during the process of var gene switching and provide a molecular basis for the phenotype resulting from deletion of the var2csa locus.

Deletion of var2csa alters var gene expression
To evaluate the effect of var2csa disruption on changes in overall var gene expression, we used Clustered Regularly Interspaced Short Palindromic Repeat/CRISPR-associated protein-9 (CRISPR/Cas9) targeted DNA double-strand breaks (DSBs) within the sub-telomeric region of chromosome 12 near the var2csa promoter (Fig 1A).The successfully edited cell line described in this manuscript as var2csa-deleted line (∆V2) was previously shown to have undergone repair of the DSB through telomere healing by the addition of telomeric heptad repeat elements near the site of the break [30].This resulted in the loss of the var2csa locus and surrounding region of the chromosome.Furthermore, although var gene transcription was not lost, deletion of var2csa was shown to greatly reduce var gene switching and resulted in a significant reduction in the var transcripts that make up a semi-conserved pattern of what were referred to as var "minor" transcripts [14].These are low-level transcripts that come from a subset of var genes dispersed across the genome.How loss of var2csa led to a reduction in expression of transcripts coming from var genes located on other chromosomes was not clear, and any effects on members of other variant gene families was not determined.
To further investigate the phenotype resulting from deletion of var2csa, we used RNA-seq to compare var gene expression of the ∆V2 line to an NF54 wild-type line (Fig 1B Furthermore, 11 of 14 rifin genes-an antigenic gene family found in close proximity to var gene clusters-are also downregulated.These significant differences are supported by the results of our gene ontology analysis, which shows significant enrichment of genes involved in antigenic variation for the downregulated genes (Fig S1C).These results confirm the total loss of var2csa expression following the introduction of the DSB, as well as the subsequent downregulation of several other var genes.Interestingly, the only highly upregulated var gene is located within an internal var gene cluster on chromosome 4, whereas all of the downregulated var genes are located in subtelomeric var gene clusters.This suggests that the knockout of var2csa results in epigenetic and chromatin structural changes that cause the downregulation of these subtelomerically located genes.

Tighter var gene expression control by repressive histone marks
To better understand the role of epigenetics in var gene expression in our edited ∆V2 line, we assessed the distribution of repressive histone mark H3K9me3 and examined whether ∆V2 displays differentially bound heterochromatin due to var2csa deletion.We performed ChIP-seq experiments in duplicate in ∆V2 and NF54 WT lines using anti-H3K9me3 antibodies to identify whether the downregulated var genes strongly correlate with repressive histone marks.Our analysis confirmed an abundance of H3K9me3 on both subtelomeric and internal var gene regions in each chromosome.This result correlates well with previous evidence suggesting var genecontaining regions are tightly controlled in heterochromatin cluster(s is characteristically absent from other regions of the genome, confirming that a sizable percentage of the genome is available for openly permissive gene expression with the exception of members of the variably expressed gene families, including var gene loci [31].Peak calling analysis found 198 and 227 significant consensus peaks for the WT and ∆V2, respectively, covering less than 10% of the entire genome.Of the significantly called peaks located within 1 kb of a gene coding region, ~25% of peaks are near var genes and an additional ~36% are located near rifin genes (S4 Table and S5 Table).The transcription start site (TSS) coverage was ~3-4x higher in var gene promoter regions and >5x higher across the gene body in ∆V2 compared to the wild type (Fig 2B , C).These data support our transcriptomic analysis and indicate that the down regulation of var genes following var2csa knockout is associated with an increase in heterochromatin-associated histone marks.Differential peak calling analysis found significant overlap between WT and ∆V2 peaks and only 12 differential binding sites (Fig S2 correlation between WT and ∆V2 samples was also strong (SCC > 0.80) suggesting concentrated structural differences in chromatin architecture.Due to strong correlation, biological replicates for each condition/genotype were merged to increase the data resolution for downstream analysis.
Random sampling was then performed on the WT due to differences in sequencing depth and number of total valid interaction pairs.Interactions between telomeres and var gene-containing regions across all chromosomes, including internal var genes, are significantly higher than the background for both intrachromosomal and interchromosomal interactions in both WT and ∆V2 The difference in interaction counts between the background and subtelomeric regions in the wild type and ∆V2 is made even more apparent by the differential interaction analysis (q < 0.05), which revealed an increase in interchromosomal interactions between bins containing var genes and a general trend of decreased interactions between subtelomeric and internal var gene DNA repair via a cascade of non-homologous recombination events with the telomeres of other chromosomes produces chimeric var genes and ultimately supports parasite survival [30].Previous work indicated that loss of var2csa reduced var transcriptional switching and led to a reduction in detectable var transcripts [14].Our analysis confirmed a moderate downregulation of several var genes and other nearby subtelomeric genes.Differential expression analysis indicated that var2csa deletion affected transcription of subtelomeric genes almost exclusively, while transcription within the rest of the genome remained steady.A model of promoter competition is frequently described as a possible mechanism underlying mutually exclusive expression within multicopy gene families in model organisms [39,40].We have similarly proposed that competition between var promoters could play an important role in selecting which var gene is active as well as contributing to var gene switching [41,14].
The var2csa promoter appears to be unique in its competitiveness compared to other var promoters [21,42], and thus its presence in the genome could serve to maintain expression levels of other var promoters within a range that is conducive to periodic switching.Its loss therefore could upset this balance, leading to reduced switching and lower levels of the var "minor" transcripts associated with var promoter activity.Consistent with this model, our Hi-C experiments showed that compared to the wildtype line, the var2csa deleted line displayed greater compaction of heterochromatin with increased interchromosomal interactions across subtelomeric var gene regions of all chromosomes.Compartment analysis also highlighted chromatin modifications near the centromere and loss of jet-like structures.The 3D chromatin models further emphasized the clear compaction of subtelomeric regions, which is in accordance with increased H3K9me3 silencing and downregulation of subtelomeric var gene expression.All together our data confirm the importance of chromatin structure and var2csa expression in var gene regulation.Detection of additional molecular factors, protein(s) or RNA(s), interacting with this locus will most likely be needed to fully comprehend at the mechanistic level sequential coding and non-coding components required for var gene expression.

RNA library preparation
Infected erythrocytes were treated with 0.15% saponin then flash frozen.Total RNA was extracted using TRIzol LS Reagent (Invitrogen) and chloroform and incubated overnight at -20°C in isopropanol.Samples were then treated with 4 units of DNase I (NEB) for 1 hr at 37°C. mRNA was then purified using NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB, E7490) according to manufacturer's instructions.Libraries were prepared using NEBNext Ultra Directional RNA Library Prep Kit (NEB, E7420) and PCR amplified (15 min at 37°C, 12 cycles of 30 sec at 98°C, 10 sec at 55°C, and 12 sec at 62°C, then 5 min at 62°C) with KAPA HiFi HotStart Ready Mix (Roche).Libraries were quantified by Bioanalyzer (Agilent) and sequenced using the NovaSeq 6000 (Illumina) and S4 300 flow cell for paired-end reads.

RNA-seq data processing and differential expression analysis
Sample quality of the paired-end RNA-seq libraries was first assessed via FastQC (v.0.11.9) and the per-base sequence quality and content was used to determine read trimming length.The index adapters and an additional ~12 bp were trimmed using Trimmomatic (v0.39) [43].Paired and trimmed reads were then aligned to the P. falciparum 3D7 genome (PlasmoDB v58) using HISAT2 (v2.1.0)[44].Non-properly mapped or paired reads and reads with a mapping quality score less than 30 were filtered using Samtools [45].High quality properly paired and aligned reads were then mapped to protein coding genes using HTseq (v1.99.2) [46].DESeq2 (v1.32.0) [47] was utilized to distinguish differentially expressed genes between the NF54 control line (WT) and the CRISPR-Cas9 modified line (∆V2).

ChIP-seq data processing and peak calling
Paired-end ChIP-seq libraries were processed by FastQC (v0.11.9) for quality assessment.
Indexes and low-quality reads were trimmed from the ends before pairing forward and reverse reads using Trimmomatic (v0.39) [43].Bowtie2 [48] was then used to align paired reads to the P. falciparum 3D7 genome (PlasmoDB v58) with the "-very-sensitive" option to ensure the highest quality alignment.PCR duplicates were tagged using Picardtools (v2.26.11) [49].Samtools [45] was utilized to filter, sort and index the aligned, de-duplicated reads with a mapping quality cutoff of 40 and keeping only mapped and properly paired reads.To remove background noise and analyze genome-wide coverage, reads were mapped to the genome at 1-bp resolution using Samtools and then counts-per-million normalized input reads were subtracted from H3K9me3 samples.Broad peak calling was performed using MACS3 (v3.0.0a7) [50] for peaks with FDR < 0.05.Coverage of the transcription start site (TSS) was evaluated with deeptools2 (v3.5.1) [51] by first using bamCompare to subtract input reads and then computeMatrix to map reads from 500 bp 5' of the TSS to 1.5 kb 3' of the TSS at 1-bp resolution.Differential peak calling was then performed using DiffBind (v3.2.7) [52].
Crosslinked DNA was digested using MboI (NEB) restriction enzyme then 5' overhangs were filled in by dNTPs with Biotin-14-dCTP (Invitrogen) using DNA Polymerase I and Large (Klenow) Fragment (NEB).Blunt-ends were ligated with 4000 units T4 DNA Ligase and chromatin de-crosslinked using de-crosslinking buffer (50 mM Tris-HCl at pH 8.0, 1% SDS, 1 mM EDTA, and 500 mM NaCl).Biotinylated DNA was sheared to 300-500 bp using a Covaris S220 (settings: 10% duty factor, 140 peak incident power, and 200 cycles per burst for 65 seconds) and biotinylated DNA fragments were pulled down using MyOne Streptavidin T1 beads (Invitrogen).End repair, A-tailing, and adapter ligation were all performed in lo-bind tubes using the KAPA library preparation kit (KAPA biosystems).Library was amplified using the HiFi HotStart ReadyMix (KAPA Biosystems) as well as the universal forward primer and barcoded reverse primer and incubated following the PCR program: 45 sec at 98°C, 12 cycles of 15 sec at 98°C, 30 sec at 55°C, 30 sec at 62°C and a final extension of 5 min at 62°C.The library was then purified using double-SPRI size selection, with 0.5Å~ right-side selection (25 μl AMPure XP beads) and 1.0Å~ left-side selection (25 μl AMPure XP beads).Libraries were quantified by NanoDrop (Thermo Scientific) and Bioanalyzer (Agilent), prior to multiplexing and sequencing in a 150-bp paired-end run on a NovaSeq 6000 (Illumina).

Hi-C Data Analysis
Paired-end HiC library reads were processed (i.e., mapping, filtering, and pairing, and normalization) using the HiC-Pro suite [53] and the P. falciparum 3D7 genome (PlasmoDB v58) with mapping quality cutoff set to 30 and mapping at 10 kb resolution.Intra-chromosomal and inter-chromosomal ICED-normalized interaction matrices were interaction counts-per-million normalized before generating interaction heatmaps, with all contacts less than 2 bins apart set to 0 to enhance visualization.Differential interaction analysis was conducted at 10 kb resolution using Selfish while removing sparse regions [54].HiCExplorer was used to identify and plot A/B compartments and O/E plots [55].3-dimensional chromatin remodeling was performed by generating 3-D coordinate matrices using Pastis [35] and then visualizing using ChimeraX [56].

S1 Table. Sample data.
Table listing all samples for each experiment and the raw read count contained within the fastq files.The aligned/filtered read count is the number of valid reads contained within the output files following genome alignment and filtering.
regions and other bins throughout the genome(Fig 3A and Fig S6).This indicates a chromatin restructuring where telomeres are moving away from the rest of the genome and condensing into even tighter heterochromatin structures.Decreases in chromatin interactions are also detected near the centromeres, resulting in a loss of centromeric A compartments in ∆V2 (Fig.3B).The chromatin structure of the centromere is so distinct that principal component 1 (PC1) of the compartment analysis highlighted the B compartment surrounding the centromere in the WT rather than the dense heterochromatin var gene regions (Fig.S7B).The interactions near P. falciparum centromeres resemble the jet-like projections found in mammalian chromatin[35].These chromatin jets are small regions of open chromatin that permeate nearby dense B compartments and extend perpendicularly, similar to the X-shaped structures near annotated centromeres and internal var gene clusters in the WT heatmaps (Fig.3A, and Fig.S5).Hi-C analysis of chromatin jets in cohesin-depleted human lymphocytes shows substantial weakening of cohesin-dependent loop-extrusion and loss of chromatin jets[35].Thus far no studies have been conducted on jet-like structures in Plasmodium spp.; therefore, additional experiments will be required to identify the breadth of molecular factors responsible for centromere and chromatin maintenance which result in these features.However, we provide compelling evidence to suggest that expression of var2csa is a key component necessary for the maintenance of the overall chromatin structure.Increased interaction in telomeric regions may lead to a loss of the jet-like structures observed near centromeres and internal var genes in P. falciparum.To gain a more comprehensive view of the chromatin architecture and identify structural changes resulting from deletion of var2csa, we generated 3D models using Pastis[36].The goal was to identify regions that show clear structural changes that would indicate chromatin remodeling resulting from deletion of var2csa, and perhaps the resulting recombination events, which would likely result in more compacted chromatin structure within var gene regions.Our models are consistent with previous work that utilized 3D chromatin modeling in that our models show centromeres and telomeres clustered into two separate distinct regions (Fig.4)[37, 7, 38].We see that compared to wildtype parasites, ∆V2 shows a more compacted chromatin structure involving var regions, with telomeres concentrated into closed structures with a decreased distance between telomeric var gene bins.Compaction of the telomeres in closed structures prevents access to transcriptional machinery and thus suppresses their expression, which is supported by the RNAseq data showing downregulation of several subtelomeric var genes.DISCUSSIONConservation of the highly variable var gene family within subtelomeric regions throughout the P. falciparum genome is imperative for adhesion of infected erythrocytes to receptors on endothelial cells.Expression of PfEMP1 and excessive cytoadherence of iRBCs within the microvasculature of various organs increases the severity of symptoms in malaria infections.Erythrocytes infected by VAR2CSA-expressing parasites bind to syncytiotrophoblasts within the placenta of pregnant women and can induce symptoms in previously immunocompetent women and cause significant damage to the fetus.The var2csa gene is unique among the var gene repertoire due to its specialized receptor preference and highly conserved sequence between parasite genomes, as well as being both transcriptionally and translationally regulated.Low-level transcription of var2csa in non-pregnant populations allows the parasite to be primed for proliferation within the placental tissue when infecting a pregnant woman while also not wasting energy on translation.Maintenance of functional var genes aids the parasite in surviving the host immune response and avoiding splenic clearance through intermittent var gene switching, preventing immune recognition of the PfEMP1 surface proteins on iRBCs.Due to the importance of var2csa and maintenance of functional expression of PfEMP1, it is important to fully understand the underlying mechanisms controlling var gene expression and antigenic variation.The CRISPR-Cas9 targeted DSB within the promoter region of var2csa resulted in a complete loss of the locus.
Parasite lines were NF54 isolates, one WT control line contributed by Megan G. Dowler and obtained through BEI Resources, NIAID, NIH: Plasmodium falciparum strain NF54 (Patient Line E), Catalog No. MRA-1000; and one line containing CRISPR/Cas9 targeted DSBs in a subtelomeric region of chromosome 12 (∆V2).The parasites were maintained at ~5-10% parasitemia in human erythrocytes at 5% hematocrit.Cultures were synchronized twice at the ring stage ~8 hr apart using 5% sorbitol and then samples collected during the trophozoite stage at ~6% parasitemia.

Figure 1 .
Figure 1.Overall var gene transcription decreased following a targeted DSB within the

Figure 2 .
Figure 2. H3K9me3 is concentrated within var gene clusters and occupancy increased

Figure 3 .
Figure 3.The targeted DSB resulted in a loss of chromatin interactions throughout most of

Figure 4 .
Figure 4. Genome-wide 3D models.(A) 3D chromatin model of the WT generated from ICED

Figure S2 .
Figure S2.Differential peak calling between the WT and ∆V2 cell lines.(A) Pearson

Figure S3 .
Figure S3.Hi-C correlation analysis.(A) Stratum-adjusted correlation between all samples used