ABSTRACT
Genome-wide analysis of transcription in the human malaria parasite Plasmodium falciparum has revealed robust variation in steady-state mRNA abundance throughout the 48-hour intraerythrocytic developmental cycle (IDC) suggesting that this process is highly dynamic and tightly regulated. However, the precise timing of mRNA transcription and decay remains poorly understood due to the utilization of methods that only measure total RNA and cannot differentiate between newly transcribed, decaying and stable cellular RNAs. Here we utilize rapid 4-thiouracil (4-TU) incorporation via pyrimidine salvage to specifically label, capture and quantify newly-synthesized P. falciparum RNA transcripts at every hour throughout the IDC following erythrocyte invasion. This high resolution global analysis of the transcriptome captures the timing and rate of transcription for each newly synthesized mRNA in vivo, revealing active transcription throughout all stages of the IDC. To determine the fraction of active transcription and/or transcript stabilization contributing to the total mRNA abundance at each timepoint we have generated a statistical model to fit the data for each gene which reveals varying degrees of transcription and stabilization for each mRNA corresponding to developmental transitions and independent of abundance profile. Finally, our results provide new insight into co-regulation of mRNAs throughout the IDC through regulatory DNA sequence motifs associated with these processes, thereby expanding our understanding of P. falciparum mRNA dynamics.
INTRODUCTION
Hundreds of millions of annual malaria cases are caused by intracellular protozoan Plasmodium parasites resulting in almost half a million deaths and an enormous economic impact on almost half the world’s population annually1. Efforts to eradicate this pathogen have been at the forefront of infectious disease research for decades. Although it has been over 15 years since the completion of the whole genome sequence2 of P. falciparum, transcriptional regulation throughout the parasite’s complex life cycle remains poorly understood3.
The malaria parasite life cycle commences upon the transfer of Plasmodium sporozoites from the saliva of an infected Anopheles spp. female mosquito vector to a human host. These sporozoites travel to and colonize a small number of hepatocytes within the liver where they replicate, forming thousands of merozoites4. Liver-stage merozoites are released into the circulatory system and initiate the blood stage of infection, which is responsible for the clinical symptoms of malaria. Development during the 48-hour intraerythrocytic development cycle (IDC) is comprised of parasite maturation and cell division, resulting in the formation of up to 32 daughter cells, which are released and invade new erythrocytes5. A small proportion of asexual parasites commit to an alternative cellular fate of sexual differentiation during the IDC, eventually resulting in the development of male and female gametocytes. Maturation of P. falciparum sexual stage parasites takes 10-12 days, and it is these sexual forms that are taken up by a mosquito vector for eventual onward transmission to another human host, thus completing the Plasmodium full life cycle6. Differentiation of the malaria parasite in these various cell types and tissues is regulated by coordinated programs of gene expression7-13.
Gene expression during the IDC is highly periodic and follows a transcriptional cascade with the majority of genes expressed in a “just-in-time” fashion7,13,14. Although canonical transcriptional machinery is present, the mechanisms that underlie the regulation and specificity of active transcription remain largely uncharacterized15,16. To date, only a single family of specific transcription factors, the 27-member Apicomplexan AP2 (ApiAP2) proteins, has emerged as transcriptional regulators with functions across all developmental stages15,17-25. However, the challenge of regulating roughly 5,500 Plasmodium genes with a small repertoire of transcription regulators suggests a significant role for epigenetic control and post-transcriptional regulation in Plasmodium spp.3,16,26-30. The control of gene expression at the RNA level is evidenced by several studies reporting a significant delay between transcription and translation31-34, ribosomal influence on the timing of mRNA translation35,36, and a lack of coordination between active transcription and mRNA abundance during parasite development37,38. Bioinformatic analyses have suggested that between 4-10% of the Plasmodium genome encodes RNA-binding proteins (RBPs)39,40. While the majority of predicted RBPs remain uncharacterized, several post-transcriptional regulatory factors have been well-studied such as DOZI (DDX-6 class DEAD box RNA helicase), CITH (Sm-like factor homolog of CAR-I and Trailer Hitch) and pumilio family proteins (PUF1 and PUF2), that have been demonstrated to play critical roles in the translational repression of genes essential for Plasmodium transmission41-45. Additionally, the ALBA (acetylation lowers binding affinity) domain-containing proteins (PfAlba1-4)46-48, Bruno/CELF family members (CELF1 and CELF2)49, and Pfg2750 have been demonstrated to interact with RNA; however, the exact role has yet to be defined for a majority of these RBPs in parasite development.
In all living systems, cellular mRNA levels are determined by the interplay of tightly regulated processes of RNA production, processing, stabilization and degradation51. Thus, changes in transcript abundance measured by DNA microarray, RNA-sequencing, or RT-qPCR studies reflect the relative rates of both transcription and mRNA stabilization. Since the advent of the complete P. falciparum genome sequence, numerous studies have utilized whole-genome methods to measure transcript abundance from populations of parasite-infected red blood cells11-13,31,52 as well as, more recently, single cells53,54. In addition, other studies have measured transcriptional activity38,55, mRNA half-lives56, RNA Pol II binding57 and transcription start sites58 throughout development. Although these studies have revealed many important details of the regulation of gene expression in Plasmodium during the IDC, none capture mRNA dynamics on a whole-genome scale without perturbations to the parasite. Measurement of mRNA half-lives is commonly performed following chemical inhibition of transcription which induces a stress response or cellular death thereby impacting the ability to accurately measure physiological mRNA turnover parameters59. Similarly, whole genome capture of transcriptional activity via nuclear run-on55 or GRO-seq38 methods require isolation of nuclei or permeabilization of cells, which compromises cellular physiology and nascent nuclear transcriptional activity. Therefore, we characterized the mRNA dynamics of the P. falciparum transcriptome by comprehensively capturing nascent mRNA transcription, stabilization, and total abundance of mRNA in vivo throughout the IDC using 4-TU biosynthetic RNA labeling.
Biosynthetic labeling of pyrimidines has revolutionized the capture and measurement of in vivo cellular mRNA dynamics during development and in response to various stressors60-64 across a diverse array of organisms62,64-70 including Plasmodium37. In human and yeast cells, short periods of metabolic labeling into nascent RNA has been used to capture RNA synthesis, decay and splicing rates62,63,69,71-74. However, thiol-modified RNA capture and kinetic profiling of RNA dynamics is dependent upon pyrimidine salvage, and the evolutionary lineage of Plasmodium has lost this biochemical capacity75. We have previously demonstrated that genetic engineering of the pyrimidine salvage pathway into Plasmodium parasites enables the efficient uptake of 4-TU and biosynthetic capture of thiol-labeled RNAs to examine RNA dynamics during parasite development37.
Herein, we report the application of biosynthetic RNA labeling and capture37 to measure genome-wide nascent mRNA transcription and stability in vivo throughout the intraerythrocytic development cycle of the malaria parasite. This method relies on the genetically facilitated scavenging of 4-TU and genome-wide DNA microarray measurements of labeled and unlabeled RNAs. Using this method, we have captured mRNA dynamics during asexual development at hourly resolution and apply a quantitative normalization model to define gene-specific mRNA metabolism. These data provide dynamic transcription and stability information for all mRNAs expressed during the IDC and also recapture known features of the “just-in-time” cascade of mRNA transcription. Additionally, modeling of the mRNA dynamics reveals regulatory mechanisms that cooperate during parasite development, which we used to predict functional roles for putative transcriptional regulators. Our results provide an updated in-depth view of mRNA transcriptional dynamics and gene regulation in P. falciparum throughout the IDC.
RESULTS
Engineering P. falciparum to establish biosynthetic pyrimidine salvage
In vivo mRNA labeling is dependent upon the salvage of 4-thiouracil (4-TU); however, P. falciparum is not capable of pyrimidine precursor uptake and incorporation into RNA2,75. We previously reported the use of episomal expression of the bifunctional fusion gene encoding Cytosine Deaminase and Uracil Phosphoribosyltransferase (FCU) from yeast to enable parasite salvage of 4-TU37. To avoid cell-to-cell heterogeneity in plasmid copy number expressing FCU, this study uses a single copy cassette that is stably 5 integrated into the attB locus of 3D7attb 76 to constitutively express a GFP-tagged FCU protein (Fig. S1A)37. Stable integration of fcu-gfp (3D7attb::FCU-GFP) was verified by PCR (Fig. S1A) and whole-genome sequencing of a final clonal parasite line (data not shown). Expression of FCU-GFP was confirmed by Western blot and localized to the cytoplasm by immunofluorescent imaging analysis (Fig. S1B and C). Salvage and incorporation of a 4-TU into total cellular RNA by the 3D7attb::FCU-GFP transgenic parasites was confirmed by Northern blot (Fig. S1D).
Nascent in vivo mRNA labeling and capture throughout the IDC
The goal of this study was to produce high-resolution RNA dynamics data for each transcript expressed during the 48-hour P. falciparum IDC transcriptome. To capture nascent RNA transcription and stabilization we exposed highly synchronized 3D7attb::FCU-GFP to 10 min pulses of 40µM 4-TU followed by total RNA extraction and capture of labeled and unlabeled transcripts (Fig. 1A and Fig. S2A)37. We chose a short labeling time of 10 min based on the time required for 4-TU uptake and incorporation into 4-thiol-RNA transcripts37. When the length of 4-TU incubation is sufficiently short, the labeled RNA is likely still in the nucleus and subjected to minimal degradation, and thus is reflective of the average transcription rate72. To capture nascent transcription throughout the IDC, 10 min pulses were repeated on individual timepoints every hour for the full 48-hour developmental cycle (Fig. 1A) followed by total RNA extractions for each hourly sample and biotinylation of the thiolated RNA transcripts. Nascently transcribed (4-TU labeled) mRNA was affinity-captured via streptavidin magnetic beads (Fig. S2A)37,64. For each timepoint the genome-wide profiles for the resulting RNA populations, total, labeled (transcribed), and unlabeled (stabilized) RNAs, were quantified using DNA microarray analysis (Fig. S2A). The RNAs present in the cell that were not modified during the 10 min 4-TU labeling pulse are designated as “stabilized” throughout this study. However, we note that our method cannot distinguish whether these “stabilized” RNAs are free or bound by RBPs and are likely turning over at variable rates. Therefore, the term “stabilized” herein is simply used to describe the persistence of an unlabeled transcript that was present prior to the labeling pulse.
To investigate whether in vivo RNA labeling perturbed transcription dynamics, we correlated the total RNA abundance levels from the 4-TU treated 3D7attb::FCU-GFP timecourse with an earlier P. falciparum transcriptome 6 from untreated wild-type 3D7 cells 13 and found no significant changes in the pattern of RNA levels during the IDC (median corr = 0.72, Fig. S2B). This suggests that transcription or steady-state mRNA levels are not perturbed in the presence of a short pulse of 4-TU. We next determined if there was bias in our RNA labeling due to the brevity of the 4-TU pulse by comparing gene length and dUTP content to intensity of the signal captured (Fig. S2C and S2D). Although the distribution of gene length and dUTPs/per gene distribution is broad (23-30864 bps and 7-9242 dUTPs, respectively; Fig. S2C), there is no significant bias in the mRNAs that are labeled with 4-TU and captured by biotin-streptavidin interaction (Fig. S2D). Therefore, the chosen 10 minute pulse length allows for the capture and accurate measurement of nascent transcription genome-wide without further normalization. Overall, genome-wide biosynthetic labeling and capture throughout the IDC revealed nascent transcription profiles for 5,198 genes (91.4% of P. falciparum annotated transcripts).
Statistical model of P. falciparum asexual mRNA dynamics
Because standard two-color DNA microarray analysis normalizes log2(green/red) values within individual timecourses, the log2 values do not distinguish the relative contributions of the various pools of RNA (labeled, unlabeled) to the total RNA signal (Fig. 1B; Raw Log2 Ratios) when timecourses are directly compared to each other. To circumvent this issue, we developed a statistical model to quantitatively determine the contribution of nascent transcription (labeled) and mRNA stabilization (unlabeled) to the total abundance profile of each gene by comparing the signal intensities for each probe on the DNA microarrays across all three timecourses (Materials and Methods). After the three timecourses were separately normalized by Rnits77 (Fig. S3), the mRNA dynamics were estimated by non-negative least squares since the contribution of unlabeled and labeled mRNA to the total cellular RNA is always additive (Fig. S3). The balance between nascent transcription and mRNA stabilization is estimated by optimizing the following equation:
Estimated coefficients for α and β were utilized to calculate the Expression Value of nascent transcription and mRNA stability, respectively (Fig. S3). Expression Values representing the total abundance, transcription and stabilization were averaged across the gene to which they mapped using Tukey’s biweight78,79 and plotted across the 48 hour IDC (Fig. 1C, Calculated Expression Values), resulting in mRNA dynamic profiles for most P. falciparum genes (Table S1). Gene plots can be found on individual gene pages on http://PlasmoDB.org 80 and http://iris.bx.psu.edu/projects/manuel-llinas/output/results.html where both the raw log2(green/red) ratios and normalized Expression Values are also available for download.
Based on the calculated normalized nascent mRNA expression values, we found that active transcription from the nuclear genome occurs throughout the IDC (Fig. 1C and Table S1). We also found that transcription of nascent mRNAs from both the mitochondrial and apicoplast organellar genomes is tightly coordinated and begins 8-10 hpi (Fig. S4A and 4B). Transcription of these non-nuclear genes peaks at 31 hpi (Fig. S4B), coinciding with the start of organellar genome replication and division81-83. In contrast, nuclear encoded genes destined for the apicoplast are transcribed prior to apicoplast encoded genes (Fig. S4C), likely due to their role in transcription and translation of the apicoplast genome 84. Interestingly, we also observe that the capture of nascent transcription from the apicoplast genome reveals a wide range of expression levels regardless of their genome position (Fig. S4D).
Based on the statistical model of the data, we found that the abundance profiles of individual genes are dictated by a combination of nascent transcription and mRNA stability as the parasite develops (Fig. 1B). When considering the global transcriptome, the calculated contribution (from the statistical model) of nascent transcription and mRNA stabilization to total abundance profiles varies as the parasite progresses through the IDC (Fig. 1C and S5). To confirm the accuracy of the model, we summed the Expression Values of both nascent transcription and stabilization to simulate the total RNA values (Fig. 1C). When the estimated total mRNA abundances are plotted in the same order as the observed values, the canonical phaseogram of the P. falciparum transcriptome observed by traditional methods is recapitulated (Fig. 1C)7,11-13, with a median correlation of 0.94 (Fig. S5A). Additionally, the gene-specific patterns of mRNA stabilization or transcription correlate with the overall transcript abundance profiles (Fig. S5B and S5C; transcription vs. total abundance median corr = 0.86, stabilization median corr = 0.85), confirming that our method and model capture the mRNA dynamics that dictate the abundance of a given transcript throughout the IDC. In contrast, gene-specific transcription and stabilization dynamic profiles are not highly correlated with each other (median r = 0.69) (Fig. 2A), likely due to the differences in peak timing of nascent transcription versus stabilization. By calculating the difference in the maximum transcription and stabilization to the total abundance, we estimate that peak transcription precedes the peak abundance (−80 min), while peak stabilization follows approximately 41 minutes after (Fig. 2B). This supports the current notion that the majority of genes are transcribed in a “just-in-time” manner with nascent transcription peaking prior to mRNA abundance, while post-transcriptional regulation likely influences the delay from peak time of mRNA abundance.
Variable rates of transcription and stabilization shape temporal mRNA dynamics throughout the IDC
To determine the stage-specific effects of mRNA dynamics on transcript abundance, we binned genes into six groups based on their peak timing of abundance during the IDC as captured by the total RNA timecourse (Fig. 2C). These six groups reflect the major morphological stages of development during the IDC including early ring (0-10 hpi), mid-ring (11-15 hpi), late ring/early trophozoite (16-21 hpi), mid-trophozoite (22-26 hpi), late trophozoite (27-32 hpi) and schizont (33-48 hpi). We then predicted the predominant ( r ≥ 0.50) mechanism acting on each gene based on the correlation of either transcription (Fig. 2C, red) or stabilization (blue) to the total abundance profile (Fig. 2C, pie charts). Overall, the influence of transcriptional activity or mRNA stability on total transcript abundance across the genome is relatively equal (transcription = 2767 genes; stabilization = 2660 10 genes) (Fig. 2C). However, there is a higher proportion of contribution from nascent transcription to mRNA abundance profiles at a majority of developmental stages (early ring = 55%, late ring/early trophozoite = 68%, mid-trophozoites = 61%, late trophozoites = 52%), while stabilization dominates at the mid-ring (64%) and schizont (62%) stage transitions (Fig. 2C and S5D). These results reveal that there are stage-specific regulatory mechanisms that influence the patterns of mRNA abundance through both transcription and mRNA stability during the IDC.
Previous studies have suggested that both transcriptional activity and mRNA half-life increase as the parasite progresses through the IDC55,56. These results were based on methodologies that relied on measuring ex vivo transcription from isolated nuclei38,55 or inhibition of transcription with Actinomycin D56 followed by RNA-seq or DNA microarray to quantify mRNA. Because our dataset was generated without physical or chemical perturbations to the parasite, we analyzed our results in the context of these earlier studies55,56. To do this, the rate of transcription for each gene was calculated based on the slope of the expression value of the labeled mRNA just prior to the peak of nascent transcript levels, while the rate of decay was determined by the slope of the unlabeled mRNA expression value following the peak of stabilization (Table S2). The transcription and decay rates of genes with peak expression during the early ring, mid-ring, late ring/early trophozoite, mid-trophozoite, late trophozoite and schizont stages were grouped together. The rate of transcription and decay are represented in Fig. 2D and 2E for each phase of development. These data show that genes expressed during the early and mid-ring stage are transcribed at a faster rate (median = 1.77 and 2.6 gene transcripts/min, respectively), compared to the rest of the life-cycle (median = 4.1 – 6.3 gene transcripts/min) (Fig. 2D, Table S2). Conversely, the mean rate of decay varies significantly as the parasite progresses through the IDC (Fig. 2E; Table S2), suggesting that the regulation of mRNA turnover is stage-specific. Surprisingly, the median rate of decay is slowest during the mid- to late-ring stages and becomes significantly more rapid during the mid-trophozoite stage of parasite development (Fig. 2E). However, there are an increased number of genes that are stabilized during the later stages of parasite development (Fig. 2E positive outliers 36-48 hpi = 21 genes, Table S2). Although the increase in stability of a small number of genes during the late IDC stage confirms earlier findings by Shock et al.56, it is important to note that the vast majority of genes with peak abundance during the late trophozoite and schizont stage do not display a significantly lower rate of decay (Fig. 2E).
mRNA regulatory “bursts” coincide with major morphological transitions of asexual development
Although it is conventionally accepted that transcription during the Plasmodium IDC occurs in a “just-in-time” fashion13, this is largely based on transcript abundance measurements and annotated gene function. Using the measured mRNA dynamics, we examined the timing of active transcription and stabilization of the parasite as it develops throughout the IDC. By binning the genes based on their peak time of nascent transcription or peak time of stabilization every 30 minutes throughout the IDC, our data reveal peaks in the numbers of genes being transcribed and/or stabilized at four major time periods during parasite development (Fig. 3 and Table S3). Interestingly, these ‘bursts’ of mRNA regulatory activity coincide with major morphological transitions during the IDC (Fig. 3). Next, we determined the total number of genes within each ‘burst’ by placing the boundaries at the nearest minimum trough adjacent to each peak and then performed GO-term enrichment analysis on all genes. The first peaks of both transcription and stability occurs during the mid-ring stage of development (12 hpi), and is characterized by peak transcription of genes whose products are enriched for RNA processing, translation, and antigenic variation (Fig. 3 and Table S3). During this stage, genes encoding for variant surface antigens and cell adhesion also reach their peak stability (Fig. 3 and Table S3). As expected, peaks of gene regulation that occur in the early-, mid- and late-trophozoite, are enriched for genes whose functions are necessary and unique for each phase of trophozoite development (Table S3). For example, the mid-trophozoite is known to be the most metabolically active stage of development, and during this phase there is an enrichment in transcription and stabilization of genes which encode for amino acid, tRNA, ncRNA, pyruvate, glycolytic, and carbohydrate metabolic processes (Fig. 3 and Table S3). In contrast, the late-trophozoite phase is characterized by a peak in the regulation of genes involved in DNA metabolism and replication in preparation for mitotic production of daughter cells.
Beyond the identification of genes enriched during developmental transitions, we also identified a number of genes whose peak timing of transcription and stabilization occurred at either 1 or 48 hpi. As expected, at 48 hpi the number of stabilized genes (378 genes) outnumbers those that are actively transcribed (140 genes) (Fig 3 and Table S3). This corroborates earlier studies demonstrating the increased presence of mRNA-interacting proteins during late schizogony 39 and the increased half-life of genes at this stage56. As expected, these genes are enriched in processes necessary for parasite invasion and the establishment of a niche within the erythrocyte that will allow for the uptake and utilization of nutrients from the extracellular space (Table S3 Biological process GO-term enrichment; vesicle mediated transport (GO:0006888, p-value = 4.52e-3) and cytoskeletal protein binding (GO:0008092, p-value = 8.59e-10) including kinase signaling, and microtubule formation. Conversely, genes with peak transcription occurring immediately following invasion outnumber any other hour of the IDC (Fig. 3; Peak Transcription 1 hpi, 684 genes; Table S3). These genes are enriched for RNA metabolic processes (GO:0016071, p-value = 7.96e-5), likely in preparation to increase cellular transcription levels necessary for parasite development and aid in post-transcriptional regulatory processes. Identification of genes stabilized prior to or transcribed upon invasion may facilitate the dissection of the cellular and molecular processes that govern the establishment of the parasite within a new erythrocyte and provide molecular targets for future investigation.
Nascent transcription captures clonal variation within the parasite population
One advantage to biosynthetic mRNA capture is the ability to detect and compare the level of transcriptional activity for individual genes. Due to the importance of parasite-derived erythrocyte surface antigens in modulating immunity to malaria infections, transcriptional regulation of these antigens is one of the most highly-studied areas of Plasmodium biology85,86. It has long been established that members of surface antigen gene families (vars, rifins, stevors, pfmc-2tms, and surfins) are transcribed during the late ring to early trophozoite stages of development and exhibit variation in their expression among parasite clones85. To identify the timing of transcription, we calculated the average normalized expression values (Log2 mean centered) for each gene family to determine the peak time of nascent transcription during the IDC (Fig. 4A). Transcription of the var and rifin gene families peaked at 11 hpi, while stevors reached their peak transcription at 12 hpi (Fig. 4A). Interestingly, the peak expression of surfins and pfmc-2tm genes occur at 10 and 16 hpi respectively (Fig. S6A), which may be explained by the demonstration that the transcriptional regulation of these two families occurs through different factors87. To determine which member(s) of these surface antigen gene families was being transcribed in this clonal population of transgenic parasites, we determined the expression values of each gene at the peak hour of transcription for each family of surface antigens (Fig. 4B). Our data demonstrate transcription of a single dominant var, as well as multiple rifin and stevor genes (Fig. 4B), supporting the mutually exclusive expression of only the var gene family of surface antigens. Although pfmc-2tm and surfin genes are also considered surface antigens, they do not demonstrate clonal variation (Fig. S5B), as reported previously88. Taken together, the biosynthetic labeling and capture of nascent transcription detects transcriptional variation within clonally variant gene families.
Regulatory dynamics of genes involved in erythrocyte invasion are predictive of protein function
A closer examination of the mRNAs stabilized during the schizont stage of development revealed that a majority encode proteins that function in the invasion of merozoites into erythrocytes. Because of this observation, we determined if these mRNAs are co-regulated by comparing their similarity in total abundance, transcription, and stabilization (Fig. 5A). As previously demonstrated13 we found that the total mRNA abundance profiles of invasion genes peak during schizogony (Fig. 5A). However, the levels of active transcription and stabilization vary drastically (Fig. 5A and B), revealing that there are at least two distinct regulatory mechanisms that dictate these patterns of total abundance. For example, a majority of mRNAs that encode merozoite surface proteins (msps) including msp7 and msp2 are predominantly regulated at the level of transcription and appear to be minimally influenced by stabilization (Fig. 5A and 5B). This may suggest that msps are rapidly transcribed and translated during schizogony for presentation on the surface of the merozoite prior to rupture from the red blood cell89. Conversely, although rhoptry genes rhoph2 and rhoph3 are also transcribed during the schizont stage, their peak abundance is contributed by mRNA stabilization (Fig. 5B), suggesting that these mRNAs may be translationally repressed or are slow to turn over until their gene products are necessary. Previous reports have demonstrated a “step-wise” role for proteins involved in invasion which is required after the merozoite has formed a tight interaction with the erythrocyte surface90. More recently, invasion proteins (such as RhopH2 and RhopH3) have been shown to assist in the establishment of nutrient transporters across the parasitophorous vacuolar membrane following invasion89,91-93. Although these genes are transcribed during the schizont stage, their stabilization guarantees that the mRNAs will be available for translation immediately upon invasion. Interestingly, the mRNA dynamics of both sera3 and sera4 reveal a balance of transcription and stabilization (Fig. 6B), potentially due to their dual role in two stages of the IDC, rupture as well as reinvasion. The SERA proteins have been demonstrated to be essential for schizont rupture and remain associated with the merozoite surface until attachment and invasion of a new red blood cell94.
Global mRNA dynamics aid in the prediction of cis -regulatory DNA-sequence motifs
High resolution capture of mRNA dynamics provides a more refined set of data for the identification of DNA regulatory motifs associated with co-transcription. Using the FIRE (Finding Informative Regulatory Elements) algorithm95, we used co-regulated mRNAs to predict DNA sequence motifs that are enriched in the 5’ untranslated regions of these genes (up to −1000bp from ATG start site) (Fig. 1C and Table S1). We found enrichment of 21 DNA motifs associated with co-transcription during the IDC (Fig. 6A). Of these 21 motifs, seven have been previously identified as ApiAP2 cis-interacting motifs96. To verify the accuracy of our motif identification, we calculated the average transcription of genes enriched for the known DNA recognition motif (GTGCA), which is bound by the third AP2 domain of PfAP2-I96, compared to the expression of pfap2-i (Fig. 6B). As expected for a transcriptional activator, genes containing the GTGCA regulatory motif are actively transcribed following the peak expression of pfap2-i (Fig. 6B). Additionally, our bioinformatic analysis identified 120 genes previously demonstrated to be directly regulated by PfAP2-I via ChIP-seq analysis (Fig. 6C)19. Conversely, transcription of genes enriched for the cis- regulatory motif ACAAC decreases following the transcription of ap2-g2 (Fig. 6C), which has been suggested to act as a transcriptional repressor during the IDC in Plasmodium berghei18. We compared the target genes enriched with the ACAAC DNA-motif with the AP2-G2 ChIP-seq gene targets and found that capture of nascent transcription, combined with FIRE analysis, identified greater than 50% of the reported targets of AP2-G218 (623/1222) (Fig. 6C). Taken together, using motif identification analysis, we can predict the potential regulatory role of ApiAP2 proteins based upon the transcriptional patterns of genes containing their cognate DNA-interaction motif compared to the mRNA dynamics of individual ApiAP2s (Fig. S7A). Finally, we used the nascent transcription data to predict the possible regulatory roles for ApiAP2s proteins for which no identified cis-motif has been reported96. To do this we compared the timing and intensity of all genes enriched for the orphan AGACA motif (Fig. 6A) to the transcriptional pattern of two functionally uncharacterized ApiAP2 protein-coding genes: pf3D7_1139300 and pf3D7_0404100. The peak timing for both of these apiap2 genes occurs just prior to a decrease in transcription of genes enriched with the AGACA motif (Fig S7B) (as seen for AP2-G2 (Fig. 6C)).
DISCUSSION
Cellular RNA levels are subject to extensive regulation involving alterations in the rates of RNA synthesis (transcription), processing (e.g., splicing, polyadenylation, transport), stabilization, and decay51. Data on gene expression levels in P. falciparum obtained by microarray or RNA-seq9,11,13,14,88 reflect steady state quantities of intracellular RNA which are a product of the rate of both genome-wide transcription and stabilization. However, a major constraint of these analyses is the inability of these methods to differentiate between transcription and stabilization which contribute to global abundance profiles throughout the IDC. As we have demonstrated, this constraint can be overcome by 4-TU biosynthetic labeling of newly transcribed mRNA providing direct isolation and analysis of nascent transcripts with minimal perturbations and toxic effects to the cell37.
Here we genetically modified P. falciparum to constitutively express the pyrimidine salvage pathway by single-copy genome integration, enabling the rapid incorporation of 4-TU into nascent mRNA transcripts. Using these parasites we labeled and captured nascent transcription and mRNA stability and determined their contribution to the total abundance profiles of each gene throughout intraerythrocytic development. Earlier transcriptome studies captured timing and mRNA abundance of each gene by DNA microarray analysis7,13, revealing a periodicity to gene expression during the 48h IDC. Here we demonstrate that the periodicity of each gene is defined by contributions from both transcription and stabilization. However, mRNA regulation is also time-dependent and gene-specific. Given the complexities of genome-wide mRNA dynamics, we examined the contribution of mRNA regulation to gene expression as the parasite progresses through the IDC revealing that transcription and stabilization do not contribute equally to the patterns of abundance observed for a given gene. Our results identify active transcription throughout the IDC (Fig. 1C), in contrast to previous reports which suggest that there is minimal transcriptional activity during the ring and schizont stages of development38. We also find that there are distinct periods of transcriptional and post-transcriptional regulatory bursts which coincide with breaks in protein production at the late-ring and trophozoite stages of development33, supporting the suggestion that replenishment of proteins at these stages is regulated at the level of gene expression31. Our data also support previous studies which suggest that the half-life of the transcripts increases during the intraerythrocytic developmental cycle56. However, this slow rate of decay applies to less than 25 genes and the majority of genes undergo a fairly consistent rate of decay through the IDC. Therefore, the degradation machinery is consistently active throughout the IDC40 and it is more likely that post-transcriptional regulation leads to stabilization of this small subset of genes. Interestingly, during the ring stage of development both the rate of transcription and the rate of decay are the highest, suggesting that ring-stage metabolism is dependent upon mRNA recycling. Perhaps this is due to the absence of active transcription of genes encoding for pyrimidine synthesis. This observation is consistent with reports that ring-stage parasites are more resistant to anti-malarial chemotherapies97 and supports evidence suggesting that they may be able to respond to stressors more readily than other stages98.
Beyond the examination of genome-wide P. falciparum mRNA dynamics throughout the IDC, these global transcriptional dynamics provide gene-specific profiles of nascent transcription and stabilization. From this we distinguished regulatory dynamics of genes with similar patterns of abundance and determine mutually exclusive expression of single genes within clonally variant families. Co-transcription and co-stabilization of genes provide a useful resource for the detection of cis- acting regulatory motifs and, the prediction of motifs with associated trans- factors. These orphan motifs can now be used to identify new DNA-binding proteins which remain uncharacterized. Our data, provides a comprehensive resource that will enable further investigation into the gene regulatory systems and mRNA metabolism of the malaria parasite.
Future studies will need to clarify the mechanisms regulating genome-wide rates of decay by evaluating the RNA-interacting proteins that regulate transcript stability throughout the IDC. By combining biosynthetic mRNA labeling and capture with crosslinking and mass spectrometry99-104 a complete RNA-protein interactome of developmentally regulated transcripts can now be achieved. Additional future applications of this method will also focus on measuring immediate transcriptional responses to environmental or chemotherapeutic treatments. This data set not only provides the malaria research community with measurements of the RNA dynamics for all individual genes throughout the IDC, but will also serve to enhance the bioinformatics analysis of transcription models, co-expression networks, and will aid in defining the timing of biological processes.
MATERIALS AND METHODS
Transgene construction
The open reading frame of the yeast FCU gene was PCR amplified from Plasmodium falciparum vector pCC1105, and cloned as a translational fusion into the unique AvrII and BsiWI sites of the pLN-ENR-GFP vector76, resulting in pLN-5’cam-FCU-GFP plasmid and placing transcriptional control under the P. falciparum calmodulin promoter (pf3d7_1434200). These plasmids were transformed into, replicated in, and isolated from DH5α E. coli for transfection into P. falciparum.
Strains and culture maintenance
P. falciparum strain 3D7attB 76 have been described in previous studies and parasite cultures were maintained under standard conditions106 at 5% hematocrit of O+ human erythrocytes in RPMI1640 containing hypoxanthine, NaH2CO3, HEPES, glutamine and 5 g/L AlbuMAX II (Life Technologies). The 3D7attB lines has been genetically altered to contain an acceptor attB site (from M. smegmatis) into the non-essential cg6 locus. This recombinant locus is maintained by continuous selection with 2.5 nM WR99210 (Jacobus Pharmaceuticals, Princeton, NJ)107.
Generation of genetically modified parasite lines
Transfection of P. falciparum strains was performed as previously described107. Briefly, 5-7% ring-stage parasite cultures were washed three times with 10 times the pellet volume of cytomix. The parasitized RBC pellet was resuspended to 50% hematocrit in cytomix. In preparation for transfection, 100µg of both pLN-5’cam-FCU-GFP and pINT plasmid, expressing a Mycobacterium Bxb1 integrase that mediates attP and attB recombination, were precipitated and resuspended in 100μL cytomix. The plasmid and 250μL of the 50% parasitized RBC suspension were combined and transferred to a 0.2cm electroporation cuvette on ice. Electroporation was carried out using a BioRad GenePulser set at 0.31kV, 960uF. The electroporated cells were immediately transferred to a T-25 flask containing 0.2mL uninfected 50% RBCs and 7mL medium. To select for parasites containing both plasmids, medium containing 1.5ug/µL Blasticidin S (Sigma-Aldrich) and 250 µg/mL G418 Sulfate (Geneticin®, Thermo Fisher Scientific) was added at 48 hours post transfection. Cultures were maintained under constant 1.5ug/µL Blasticidin S (Sigma-Aldrich) pressure, splitting weekly, until viable parasites were observed. Viable transgenic parasites were then cloned by limiting dilution and genomic DNA isolated to detect the presence of the integrated fusion gene of fcugfp was PCR verified using previously published primer pairs 76.
Verification of transgene expression
Western blot analysis of FCU-GFP protein expression was carried out on mixed stage transgenic P. falciparum (10% parasitemia, 5% hematocrit), isolated by saponin (0.01%) lysis. Protein extracts from both 3D7attB wild-type parasites and 3D7attB::fcu-gfp (clone A2) transgenic parasites were run on a SDS-10% polyacrylamide gel and transferred to a nitrocellulose membrane. The membrane was blocked with 5% milk for 30 min. To detect protein, the membrane was exposed to Sheep IgG anti-Cytosine Deaminase primary antibody (Thermo Fisher Scientific) diluted 1:500 in 3% bovine serum albumin and incubated overnight at 4 °C followed by a 1 hour incubation with goat-anti sheep-HRP secondary antibody (Thermo Fisher Scientific) diluted 1:1000 in 3% BSA. The membrane was incubated for 1 min in Pierce® ECL-reagent (Thermo Fisher Scientific) and protein detected by exposure to film.
Verification of transgene expression by immunofluorescence microscopy
FCU-GFP expression in transgenic parasite lines was verified by immunofluorescence imaging analysis. Mixed stage transgenic P. falciparum infected erythrocytes were fixed with 4% formaldehyde/0.0075% gluteraldehyde (Polysciences, Inc.) using the method of Tonkin et al. (Tonkin et al. 2004). Fixed cells were permeabilized with 0.01% Triton-X 100 (Sigma-Aldrich) and blocked with 10% normal goat serum and 3% bovine serum albumin (BSA). To detect FCU-GFP localization, the suspension was incubated with a 1:1000 dilution of a mouse anti-GFP antibody (Thermo Fisher Scientific) followed by secondary antibody Alexaflour-488 rat anti-mouse (Molecular Probes) diluted to 1:500. Cells were resuspended in Slowfade Antifade reagent with DAPI to stain parasite nuclei (Thermo Fisher Scientific) and mounted on slides using Flouromount-GTM (Southern Biotech). Fluorescent microscopic images were obtained with an Olympus BX61 system and deconvoluted using SlideBook 5.0 software (Intelligent Imaging Innovations).
Assessment of FCU-GFP transgene ability to salvage 4-thioluracil via Northern blot
Function and FCU-GFP transgene was verified in the 3D7attB::fcu-gfp compared to wild-type 3D7attb by the addition of 40 µM of 4-thiouracil (4-TU) (Sigma Aldrich) to the culture medium from a 200mM stock concentration prepared in DMSO as previously described37. After parasites were grown in the presence of 4-TU for 12 hours, total RNA was prepared from parasites in 5ml of Trizol and resuspended in DEPC-treated water to a final concentration of 0.5 µg/µL. As described in Painter et al.37, Total RNA was biotinylated with EZ-link Biotin-HPDP dissolved in formamide to a concentration of (Thermo Fisher Scientific) for three hours at room temperature under RNase-free conditions to reduce the amount of RNA degradation that can occur. Biotinylated RNA (2.5 µg) was run on an Ethidium Bromide 1% agarose gel at 4 °C for 30 min at 200V and was transferred to Hybond-N+ nitrocellulose (Amersham) membrane using traditional northern blotting techniques. The RNA was UV-crosslinked to the membrane and probed with streptavidin-HRP (1:1000) (Thermo Fisher Scientific). Biotinylated RNA was detected via incubation with ECL-reagent and exposure to film.
Thiol-labeling timecourse
P. falciparum transgenic parasites were cultured in human erythrocytes at 5% hematocrit in RPMI1640 complete medium as described above. All transgenic parasites were synchronized with three successive (48 hours apart) L-Alanine treatments to remove late stage parasites108. To ensure a sufficient amount of mRNA was labeled and captured for further analysis, we grew 48 individual parasite cultures to 15% parasitemia in 100 mL of medium at 5% hematocrit, one culture flask (150 cm2) for each timepoint. Upon invasion 40 µM 4-TU was added to the medium and parasites were incubated for 10 min followed by total RNA isolation with TriZol. This 4-TU addition, incubation, and RNA extraction was repeated every hour for 48 successive timepoints throughout intraerythrocytic development.
Biosynthetic modification and isolation of labeled mRNA throughout development
Briefly, transgenic P. falciparum was incubated with 4-TU and total RNA was extracted as described above. All RNA was subjected to the biosynthetic modification and magnetic separation as previously published with a few adjustments60,64,109. Specifically, 80 µg of total RNA (at a concentration of 0.4 µg/µl) was incubated at room temperature protected from light for 3 hours in the presence of 160 µl of 1 mg/ml solution of EZ-link Biotin-HPDP (Thermo Fisher Scientific). Biotinylated total RNA was precipitated and resuspended in DEPC-treated water to a final concentration of 0.5 µg/ml. Incorporation of 4-TU and biotinylation was determined by NanoDrop analysis and Northern blot probed with streptavidin-HRP. 4-TU labeled-biotinylated RNA was purified using Dynabeads® MyOne™ Streptavidin C1 magnetic beads (Life Technologies) at a concentration of 2 µl/µg of RNA. Beads were prewashed as per manufacturers protocol to remove any RNases and blocked with 16 µg of yeast tRNA (Life Technologies). 4-TU labeled-biotinylated RNA was added to the bead slurry and incubated at room temperature for 20min with rotation. The RNA-bead slurry was placed on a magnetic stand for 1 min, and liquid carefully removed and saved for RNA precipitation. This sample contained RNA that was not thiolated or biotinylated. Then the RNA-bound beads underwent five rounds of stringent washes with buffer consisting of 1M NaCl, 5mM Tris-HCL (pH 7.5), 500µM EDTA in DEPC-treated water. 4-TU labeled-biotinylated RNA was eluted from the beads with 5% 2-mercaptoethanol incubated for 10 min, placed back on the magnetic stand, liquid removed and saved for RNA precipitation. The RNA in this fraction contained mRNA transcribed in the presence of 4-TU from FCU-GFP expressing cells.
Reverse Transcription and Microarray Analysis
Total RNA, unbound, and bound fractions of mRNA are precipitated by traditional methods (0.1 vol NaCl, 0.1 vol linear acrylamide and 3 x volume 100% EtOH) followed by resuspension in 20 μl DEPC-treated water. The concentration of each sample is determined by NanoDrop ND-1000. Starting with 1.0-2.5 μg of RNA, single-strand aminoallyl-containing cDNA synthesis and Amersham CyDye-coupling (GE Healthcare) was carried out as previously described13 with the addition of control RNAs to aid in the normalization between samples and arrays (two color RNA spike-in kit, Agilent Technologies). To prevent Cy5 degradation by ozone110, all steps starting with dye resuspension were carried out in an ozone-free environment. Final cDNA concentration and dye-incorporation was assessed on a NanoDrop ND-1000 spectrophotometer. To reduce sample and time-point bias, cDNA from each time-point was labeled with Cy5 and combined with an equal amount of Cy3-labelled cDNA reference pool generated from equal amounts of ring, trophozoite, and schizont stage mRNA and hybridized to P. falciparum custom arrays (Agilent Technologies 60mer SurePrint platform, AMADID #037237111). Hybridized arrays were incubated for 16 hrs in a rotating hybridization oven (10 rpm) at 65 °C. Prior to scanning, arrays were washed in 6X and 0.06X SSPE (both containing 0.005 % N-lauryl-sarcosine (Sigma-Aldrich, St. Louis, MO, USA), followed by an acetonitrile rinse.
Array scanning and data acquisition
Arrays were scanned on an Agilent G2505B Microarray Scanner (Agilent Technologies) with 5μm resolution at wavelengths of 532 nm (Cy3) and 633 nm (Cy5) using the extended dynamic range (10–100%) setting. Normalized intensities were extracted using Agilent Feature Extractor Software Version 9.5 employing the GE2_1100_Jul11_no_spikein extraction protocol.
Array data normalization and expression value calculation
Each set of 48 arrays for Total, Labeled, and Unlabeled RNA was normalized using the Rnits (v1.2.0) from the Bioconductor project in R77. Arrays for 45 hpi contained an abundance of extreme outliers and were excluded from further analysis. Prior to data smoothing and fitting, probes that are Agilent Controls, rRNAs (28 probes), or found to be not unique (133 probes) were removed. The timecourse profiles of each probe were then smoothed using cubic splines and all negative values set to 0.5. For each probe, contribution of unlabeled (stabilized) and labeled (transcribed) RNAs was modeled to sum up to the total abundances, assuming that contributions are naturally non-negative:
In this equation α and β are the coefficients applied to the probe intensity values on the labeled and unlabeled RNA arrays, respectively. This problem of non-negative least squares is solved by the Lawson-Hanson algorithm (the nnls package in R). As a simple verification that the expression values estimated for both transcription and stabilization were accurate, the predicted total mRNA abundance was computed by adding the estimated unlabeled and labeled expression values. The Pearson correlation between the predicted total mRNA abundances and the conventional observed abundances was calculated. Each probe was summarized by gene using Tukey’s biweight with a parameter c = 5. Finally, three timecourses for each gene are plotted over 48 hrs. These data are available on http://www.PlasmoDB.org80 on individual gene pages or for download of the full dataset. The same data are available on Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo) under study accession number GSE66669.
Determination of Peak Timing and Rates of Transcription or Decay
We utilized the Lomb-Scargle periodogram approach to determine the peak time of total abundance, transcription, and stabilization for each gene as described previously112 using the available LombScargle package for R113. The resulting peak times for each gene can be found in Table S1. These peak times were used to order genes based on peak time of total abundance and to bin genes based on their peak times of transcription and stabilization.
Rates of transcription for each gene were calculated based upon the slope of the calculated expression values from the start of transcription to the adjacent peak. Conversely, rates of decay were calculated by determining the slope of the expression values of unlabeled RNA (stabilized fraction) from the peak to the following trough. Calculated rates of transcription and decay for each gene were then binned into groups based upon the peak timing of nascent transcription or stabilization of that gene (Table S2). These groups represent six major morphological stage transitions; early ring (0-10 hpi), mid-ring (11-15 hpi), late ring/early trophozoite (16-21 hpi), mid-trophozoite (22-26 hpi), late trophozoite (27-32 hpi) and schizont (33-48 hpi). The calculated rates are plotted as a box-whisker plot with Tukey’s Biweight distribution and median represented.
Motif and GO-term enrichment analysis
We identified motifs enriched in the 5’ UTRs of genes co-transcribed throughout the IDC using a regulatory element discovery algorithm (FIRE)95 and these results are represented as a heatmap. The activity profile of each motif was used to identify a refined list of target genes that were both enriched for the motif and similarly transcribed. Gene lists identified by FIRE analysis to be enriched for either the AP2-I or AP2-G2 motif were compared to the previously published target gene listed identified by ChIP-seq18,19.
GO-term enrichment of select gene groups or clusters was carried out using the Analysis Tool at http://www.PlasmoDB.org80 and terms that are significantly enriched (Bonferroni corrected p-value ≤ 0.05) are summarized in Table S3.
AUTHOR CONTRIBUTIONS
H.J.P. and M.L. designed the experiments. H.J.P. performed experiments and H.J.P. and N.C.C. analyzed the data. H.J.P. and M.L. wrote the manuscript and generated figures. N.C.C. and J.D.S. created and implemented the statistical model used for data analysis. A.S. and I.A. contributed to data analysis and public availability for hosting the data. M.L. and J.D.S. provided critical comments and project oversight.
MATERIALS & CORRESPONDANCE
Address all correspondence to Manuel Llinás, manuel{at}psu.edu.
ACKNOWLEDGEMENTS
This work was funded through support from NIH R21 AI133379. We would like to thank the many people who have contributed to this project including Simon Cobbold, Bjorn F.C. Kafsack, Jess O’Hara, Lindsey Orchard, Yoanna Pompalova, Joana Santos, April Williams, and the Genomics Core Facility at Princeton University and its staff, especially Donna Storton and Jessica Buckles Wiggins. We also thank Omar Harb, Brian Brunk and Shon Cade at http://www.PlasmoDB.org for making this data publicly available ahead of publication. Lastly, we thank Erik Allman, Gabrielle Josling, Scott Lindner and Timothy Russell for critical reading of the manuscript.