Messenger RNAs with large numbers of upstream open reading frames are translated via leaky scanning and reinitiation in the asexual stages of Plasmodium falciparum

Abstract The genome of Plasmodium falciparum has one of the most skewed base-pair compositions of any eukaryote, with an AT content of 80–90%. As start and stop codons are AT-rich, the probability of finding upstream open reading frames (uORFs) in messenger RNAs (mRNAs) is high and parasite mRNAs have an average of 11 uORFs in their leader sequences. Similar to other eukaryotes, uORFs repress the translation of the downstream open reading frame (dORF) in P. falciparum, yet the parasite translation machinery is able to bypass these uORFs and reach the dORF to initiate translation. This can happen by leaky scanning and/or reinitiation. In this report, we assessed leaky scanning and reinitiation by studying the effect of uORFs on the translation of a dORF, in this case, the luciferase reporter gene, and showed that both mechanisms are employed in the asexual blood stages of P. falciparum. Furthermore, in addition to the codon usage of the uORF, translation of the dORF is governed by the Kozak sequence and length of the uORF, and inter-cistronic distance between the uORF and dORF. Based on these features whole-genome data was analysed to uncover classes of genes that might be regulated by uORFs. This study indicates that leaky scanning and reinitiation appear to be widespread in asexual stages of P. falciparum, which may require modifications of existing factors that are involved in translation initiation in addition to novel, parasite-specific proteins.


Introduction
Malaria affects millions of people in tropical and sub-tropical regions of the world. Over the years, many attempts have been made to control the disease and recently, the World Health Organization renewed the call towards the global eradication of malaria by 2030 (World Malaria Report, 2018). These efforts could be thwarted through resistance to anti-malarial drugs and as a result, new drug targets are still being identified, especially for the most virulent species of the parasite that causes malaria: Plasmodium falciparum. In this parasite, an essential pathway proposed as a drug target is the protein synthesis machinery (Goodman et al., 2016). Plasmodium falciparum employs eukaryotic protein synthesis machinery for translation of nuclear-encoded mRNAs, however, an endosymbiotic organelle, the apicoplast, carries out translation using machinery that resembles prokaryotes (Fichera and Roos, 1997;Roy et al., 1999;Chaubey et al., 2005) and is the target of several anti-malarial drugs (Goodman et al., 2016;Pasaje et al., 2016). Interestingly, the cytosolic translation machinery of P. falciparum also exhibits differences from that of the human host (Jackson et al., 2011;Wong et al., 2014) and hence, has also been proposed to be a drug target (Wong et al., 2017;Sheridan et al., 2018). Another noteworthy distinction between the cytosolic translation of the parasite and its host is a result of the high overall content of adenine and thymine (80.6%) of the P. falciparum genome that rises to 90% in the intergenic regions (Gardner et al., 2002). As start and stop codons are AT-rich, this biased sequence leads to the presence of numerous upstream open reading frames (uORFs) in the 5 ′ leader sequences of messenger RNAs (mRNAs) (Caro et al., 2014;Kumar et al., 2015).
Unlike human and mouse genomes, from which ∼50% of the transcribed mRNAs would be expected to contain one or more uORFs (Calvo et al., 2009;Ye et al., 2015), 96.5% of P. falciparum mRNAs are predicted to contain at least one uORF per coding sequence (CDS) (Srinivas et al., 2016). An average of ∼4 uORFs has been predicted within 350 nucleotides upstream of the start codon of each CDS (Kumar et al., 2015). Such a widespread presence of large numbers of uORFs has serious implications for translation. The scanning model of translation initiation in eukaryotes proposes that the ribosome recognizes the 5 ′ cap and moves along with the transcript until the start codon of the CDS is reached (Kozak, 1978;Aitken and Lorsch, 2012;Hinnebusch et al., 2016). Scanning ribosomes will surely encounter numerous uORFs in the transcripts of P. falciparum before reaching downstream ORFs (dORFs), which are the protein-coding sequences (CDS) that encode the multitude of proteins required for the parasite's complex life cycle.
As is the case with other eukaryotes (Child et Kumar et al., 2015). This post-transcriptional gene regulation (PTGR) mediated by uORFs takes place by engaging or stalling the ribosome at the uORF, subsequently decreasing the probability of the ribosome to initiate translation of the dORF (Morris and Geballe, 2000). It is remarkable that despite the frequent occurrence of uORFs in the majority of transcripts of P. falciparum, ribosomes are still able to translate the coding sequences present downstream (Lasonder et al., 2002). One mechanism through which the ribosome can reach the start codon of the dORF is leaky scanning where the ribosome skips the start codon of the uORF, continues scanning and reaches the AUG of the dORF to start translation (Kozak, 1984;Liu et al., 1984). Another mechanism employed by eukaryotic ribosomes to reach the dORF is reinitiation, where the ribosome translates the uORF and then reinitiates translation at the downstream AUG (Hughes et al., 1984;Kozak, 1984;Hinnebusch et al., 2016;Johnstone et al., 2016). The mechanism employed by the ribosome to reach the dORF is dictated by sequence features of the mRNA leader such as length of the uORF and the inter-cistronic length, which have been discussed in detail in this study.
Considering a large number of upstream AUGs (uAUGs) and uORFs present in P. falciparum mRNAs, it is of interest to understand how ribosomes are able to reach the dORF. In this report, we study the stages of the intra-erythrocytic developmental cycle (IDC) to investigate the effect of varying three features viz. uORF length, inter-cistronic length, and Kozak sequence on the translation of a dORF in P. falciparum. Codon usage is also expected to affect the ability of a uORF to influence translation at the dORF, however, as codon usage tables are available for P. falciparum (Saul and Battistutta, 1988;Nakamura et al., 2000) and algorithms for calculating codon adaptability indices (CAI) also well studied (Sharp and Li, 1987), this feature is not assessed in this report. Classes of genes that contain uORFs having each of the features are further studied using gene ontology (GO) enrichment analysis. We also attempt to shed light on the conundrum that the majority of P. falciparum transcripts contain multiple uORFs, yet the translation of the CDS still takes place. This work brings new insights into the mechanisms of cytoplasmic translation of mRNAs during the asexual life cycle of P. falciparum.

Materials and methods
Mutational analysis of the Kozak sequence of the reporter gene Plasmid Pf86 (a kind gift from Kevin Militello and Dyann Wirth, Harvard School of Public Health, Boston) contains a firefly luciferase reporter gene which is flanked by the 5 ′ leader sequence and 3 ′ untranslated region (UTR) of the gene for P. falciparum heat shock protein 86 (Pfhsp86; PF3D7_0708400) (Militello et al., 2004). In this plasmid, 1837 base pairs of DNA (encompassing the promoter region and a leader sequence of 686 base pairs from Pfhsp86) are cloned upstream of the firefly luciferase coding region. To mutate the Kozak sequence of luciferase, the plasmid was treated with BstBI restriction enzyme (Thermo Fisher Scientific) whose recognition sites are present 16 nucleotides upstream and 165 nucleotides downstream of the start codon of a luciferase reporter gene. This fragment was replaced by a fragment which had a different Kozak sequence. DNA fragments with different Kozak sequences were generated by site-directed mutagenesis (SDM) of plasmid Pf86 using a set of forward primers to introduce desired Kozak sequences and a common reverse primer complementary to the ∼165 nucleotide region. The primers used for this cloning are included in Additional File 1. The polymerase chain reaction (PCR) products were then digested with BstBI and ligated to BstBI-digested plasmid Pf86. The resulting colonies were screened for the presence of an insert with the desired mutation by restriction digestion as well as colony PCR. Clones were confirmed by sequencing.
Generating clones for recombinant firefly luciferase, expressed in bacteria Wild type firefly luciferase had 'G' at its + 4 position. To generate mutants of the gene with 'A'/'T'/'C' at +4 positions, SDM was performed with PCR using a forward primer having the desired mutation and a reverse primer complementary to the end of the CDS. Sequences of the primers are included in Additional File 1. This PCR fragment and the vector pET43a were separately digested with XhoI and NdeI restriction enzymes (Thermo Fisher Scientific). The insert fragment was ligated in the digested vector and transformed into Escherichia coli DH5α. Colonies were screened by digestion and clones were confirmed by sequencing. The resulting plasmid had a luciferase gene with mutations at the+4 position driven by the lac promoter and a C-terminal 6-Histidine tag for purification.
Induction and purification of variant forms of firefly luciferase pET43a plasmids containing the firefly luciferase coding sequences with 'A', 'T', 'G' and 'C' at +4 positions were transformed into E. coli BL21 competent cells. To obtain variants of luciferase protein, the secondary culture of each variant was induced with 0.25 mM isopropyl β-D-1-thiogalactopyranoside (IPTG). This was followed by 16 hours of incubation at 18°C under shaking conditions until the OD600 reached 0.6. The harvested cells were re-suspended in 15 mL of ice-cold binding buffer (50 mM NaH2PO4, 300 mM NaCl and 10 mM imidazole). The cells were lysed by sonication (Vibrosonics) at 60% amplitude with a pulse of 13 s (3 s ON and 10 s OFF) for 30 min on ice. Cell lysates were centrifuged at 16 000 g at 4°C for 15 min. The supernatants were loaded onto preequilibrated nickel-nitriloacetic acid (Ni-NTA) columns (Qiagen Ni-NTA Superflow Cartridge) and the proteins were purified as per the manufacturer's protocol. The enzymes were eluted in 5 mL of elution buffer (50 mM NaH2PO4, 300 mM NaCl and 500 mM imidazole) and dialysed overnight at 4°C in dialysis tubes (Spectrum Labs Float-A-Lyzer G2 Dialysis Device) in 1.5 L of dialysis buffer (50 mM NaH2PO4 and 300 mM NaCl). Purified enzymes before and after dialysis were analysed on sodium dodecyl sulphatepolyacrylamide gel electrophoresis (SDS-PAGE) gels to check the purity of the enzymes.

Luciferase assay of recombinant luciferase proteins
Protein concentrations of purified luciferase enzyme variants (+4G, +4C, +4A and +4T) were quantified using the bicinchoninic acid (BCA) kit by following the manufacturer's protocol (Sigma-Aldrich). An equal amount of each luciferase variant was diluted in Passive Lysis Buffer (Promega) and freeze-thawed for three cycles in liquid nitrogen to mimic the conditions of the luciferase assay done with parasites. The assay was performed using 90 μL of LAR reagent (Promega) and 10 μL of diluted proteins. Readings were captured for 30 s using a luminometer (Berthold Junior LB 9509).
Removal of the native uORF from the Pf86 plasmid Plasmid Pf86 was used to test the effect of synthetic uORFs on the expression of the firefly luciferase reporter gene. The native uORF present 474 bases upstream in the 5 ′ leader was removed by mutating the start codon (ATG) to TTG by SDM to get Pf86*, a plasmid devoid of any uORF in the 5 ′ leader. Primers used for this cloning are included in Additional File 1.
Site directed mutagenesis to introduce uORFs SDM was used to introduce different uORFs in 5 ′ leader sequence of Pfhsp86 cloned in plasmid Pf86*. Specific regions in the 5 ′ leader sequence were chosen for the introduction of the uORFs. This choice was dictated by the sequence of the 5 ′ leader since the presence of AT-rich repeats resulted in technical problems with primer-binding during the procedure of SDM. Hence, the regions 191, 40 and 29 nucleotides upstream of the start codon of the luciferase reporter gene were chosen, as they had enough GC content to allow for annealing of the SDM primers. Non-overlapping forward and reverse primers containing the desired mutation were phosphorylated by polynucleotide kinase (New England Biolabs) as per the manufacturer's protocol. PCR was carried out by Q5 HiFi DNA polymerase (New England Biolabs). Primers used for SDM are included in Additional File 1. Purified PCR products were treated with FastDigest DpnI restriction enzyme (Thermo Fisher Scientific) to eliminate the parental plasmid vector. Linear PCR products were circularized by ligation with T4 DNA ligase (Thermo Fisher Scientific). The final product was transformed into competent E. coli DH5α. The clones were confirmed to contain the desired mutation by sequencing.

Increasing the length of uORF4 by introducing repeating units
The sequence of uORF4 was designed in a way that it contained a recognition site for restriction enzyme AvrII (Thermo Fisher Scientific). The length of this uORF was thus increased by digesting the plasmid uORF4-191 and ligating it to annealed oligonucleotides of increasing length. In this process, the recognition site was lost after ligation. Screening was done by restriction digestion reaction with AvrII and colony PCR to confirm the presence of the insert. The sequences of the oligonucleotides used to generate the plasmids with increasing uORF4 length are included in Additional File 1.

Transfection and luciferase assay
Transient transfection of P. falciparum 3D7 was carried out by following the pre-loaded RBCs protocol (Deitsch et al., 2001). A plasmid containing the Renilla luciferase gene (pPfrluc) was used as a control to analyse different transfection reactions. Renilla luciferase acts as an internal control to normalize the firefly luciferase readings and thus eliminate any error due to varying transfection efficiencies. An amount of 100 μg each of the plasmidsdesired plasmid construct with insertion/point mutation and plasmid containing Renilla controlwas co-transfected into uninfected RBCs. The transfected RBCs were washed with 10 mL of the RPMI medium. Late trophozoite stage infected RBCs were added to the pre-loaded RBCs to give a final parasitaemia of 0.2-0.4%. The medium was changed every 24 h until the luciferase assay was carried out 85-90 h post transfection. The parasite pellet obtained after saponin treatment was re-suspended in 60 μL of 1X Passive Lysis Buffer (Promega). Parasites were lysed by flash freezing in liquid nitrogen and thawing at 37°C for three cycles followed by centrifugation at 6300 g for 2 min to remove the cell debris. Luciferase assay was performed using the Dual-Luciferase kit (Promega) as per the manufacturer's instructions. Relative Light Units (RLUs) were measured over 30 s with a luminometer (Berthold Junior LB 9509).

Gene ontology (GO) term enrichment analysis
Previous studies have used a fixed length of leader sequence (i.e. ∼350 nucleotides) for each gene to extract uORFs giving an average of ∼4 uORFs per CDS in P. falciparum (Kumar et al., 2015;Srinivas et al., 2016). Since the length of leader sequences vary for each gene, hidden Markov model (HMM)-defined leader sequences were used to extract uORFs in this study for more accuracy (Caro et al., 2014). A Python script was written to extract uORFs from the 5 ′ leader sequences of 3137 transcripts that are expressed in asexual blood stages. The nucleotide at the +4 position of each uORF, the length of the uORF, and the inter-cistronic length from the CDS were calculated. The list of uORFs with these features is given in Additional File 2. Categorization of uORFs based on different features (nucleotide at +4 position, length and inter-cistronic length) was done. GO term enrichment analysis was done by using the GO enrichment tool of PlasmoDB

Codon Adaptability Index (CAI) calculations
The CAI of all the uORFs used in this study was calculated by using the algorithm described previously (Sharp and Li, 1987). The frequency of each codon in P. falciparum was obtained from previously published work (Nakamura et al., 2000).

Results
The +4 position of the Kozak sequence plays a major role in translation initiation Previous work indicated that the +4 position of the Kozak sequence plays a critical role in translation initiation while the nucleotides preceding the start codon (−5 to −1) have no significant contribution towards the strength of the Kozak sequence in P. falciparum (Kumar et al., 2015). In this report, the effect of a 'T' at the +4 position could not be determined as changing the 'G' at the +4 position to 'T' in the luciferase expression vector (Pf86, designated as control), altered the wild type codon 'GAA' to a stop codon. To address this lacuna, we decided to change the +5 position to 'C' in Pf86 such that now a 'T' at the +4 position would generate a TCA codon. In order to facilitate a direct comparison to the data reported for 'A', 'C' and 'G' at the +4 position, the experimental approach taken in this section was identical to the one described in the study by Kumar et al. (2015).
The control plasmid Pf86 was modified to generate seven constructs where the luciferase start codon was surrounded by Kozak sequences all of which had 'T' at the +4 position and different sequences from the −5 to −1 positions. These were transiently transfected into P. falciparum 3D7 parasites along with 50μg of plasmid containing Renilla luciferase as a control for transfection efficiency. The activity of the reporter gene in each of these mutants was measured and compared with that of wild type Pf86. Attempts to quantify luciferase transcript levels were
unsuccessful due to the low transfection efficiency that has been reported for P. falciparum (Deitsch et al., 2001;Skinner-Adams et al., 2003;Hasenkamp et al., 2012;Rug and Maier, 2013). Therefore, the results shown in this section have been described as a combined effect of variation in transcription and translation of the luciferase reporter gene owing to the changes introduced in the Kozak sequence; hence, the word 'expression' is used. When the expression of constructs with variant Kozak sequence with 'T' at the +4 position was assayed, it was observed that despite differences in the −5 to −1 positions, the luciferase expression for each construct with 'T' at +4 position was ∼60% of the control (Fig. 1A). Consistent with the earlier study, the strength of the different Kozak sequences having 'T' at the +4 position and varying nucleotides at the −5 to −1 positions, does not correlate with the frequency of each Kozak sequence in the genome of P. falciparum (Fig. 1A).
It is important to note that changing the +4 position changes the second amino acid of the reporter gene from wild type glutamate (GAA) to glutamine (CAA), lysine (AAA) or serine (TCA). The amino acid differences at the second position might have effects on luciferase reporter activity, as deleting the first seven amino acid residues from firefly luciferase resulted in a significant decrease in the enzyme activity of the protein (Wang et al., 2002). If this was the case for the mutants reported here, the altered luciferase activities would be the outcome of altered luciferase enzyme, rather than alterations in the expression levels of the enzyme due to the Kozak sequence.
In order to eliminate this possibility, the four variants of the luciferase gene (corresponding to the luciferase proteins generated by making mutations at the +4 position) were cloned into a pET43a expression vector and expressed as recombinant proteins (Fig. 1B). The activities of these recombinant luciferase enzyme variants were compared to the wild type luciferase enzyme that has glutamate immediately following the start methionine. It was found that changing the second amino acid from glutamate (GAA) to glutamine (CAA), lysine (AAA) or serine (TCA) in the firefly luciferase enzyme did not change the luciferase activity significantly (Single-factor ANOVA test; P = 0.1026) (Fig. 1C). Extrapolating from this data, luciferase enzyme activity in parasites transfected with constructs containing each of the four nucleotides at the +4 position of the Kozak sequence in Pf86, would not result from non-functional enzymes and instead, could be explained by changes in translation initiation driven by altered strengths of the P. falciparum Kozak sequence.
By incorporating the data shown here for 'T' at the +4 position with the data published by Kumar et al. (2015), the average firefly luminescence readings obtained for different Kozak sequences were grouped according to the nucleotide at the +4 position ( Fig. 1D). Constructs that had Kozak sequences with 'G' at the +4 position show the highest reporter gene activity (∼95% as compared with the wild type variant), while constructs containing Kozak sequences with 'A' at the +4 position show the lowest activity (∼8%) of the reporter gene. The constructs with 'T' or 'C' at the +4 positions showed intermediate reporter gene activities of ∼64% and 30%, respectively.
These data led us to assess the +4 positions of coding sequences (CDS) and uORFs in the P. falciparum 3D7 genome (PlasmoDB v24), and using bioinformatics, the sequences of all uORFs were extracted from 5 ′ leader sequences of 3137 genes expressed in different stages of the IDC (Additional File 2). Data for the 5 ′ leader lengths was exported from a previous study which predicted the length of leader sequences using RNA sequencing of ribosome footprints in RNAs isolated from asexual cultures (Caro et al., 2014).
Out of 3137 transcripts, only 67 contained no uORF. On the contrary, 3070 transcripts (i.e. greater than 97% of the transcripts analysed) contained at least one uORF, as reported earlier (Kumar et al., 2015). A total of 36 769 uORFs were predicted from 3070 transcripts. To understand the distribution of uORFs across the genome, the number of uORFs was plotted against the number of CDS ( Fig. 2A). A large number of transcripts have uORFs in the range of 6-10 with an average of 11 uORF per CDS. However, the transcript of a gene PF3D7_1107800 (AP2 domain transcription factor, putative) contains 123 uORFs, which is the highest number of uORFs reported for any gene in P. falciparum (Additional File 2). It was also observed that the number of uORFs is strongly correlated with the length of the leader sequence with a high R-square value of 0.8144 (Fig. 2B). These data suggest that uORFs are dispersed along the 5 ′ leader lengths in P. falciparum with longer leader sequences (Watanabe et al., 2002;Caro et al., 2014) having a higher probability of containing uORFs. Having extracted uORFs from the genome, analysis of the nucleotide at the +4 position of their Kozak sequences was performed.
The frequency of finding each of the four nucleotides at the +4 position of the Kozak sequences of annotated CDS and uORFs was computed ( Fig. 3A and B). This genome-wide analysis revealed that CDS are most likely to have 'A' at the +4 position (47%) while 'G' follows next, being found in 30% of the CDS. The proportion of CDS with 'T' and 'C' at their +4 position is 16% and 7%, respectively. On the other hand, only 8% of uORFs had 'G' at the +4 position, and 9% of uORFs had 'C' at the +4 position. As expected, 'T' was seen in the +4 position of 48% of the uORFs and 'A' was found in 35%, consistent with the AT-bias of intergenic regions (>90% AT) being higher than that of the CDS (∼60-70% AT).
Given that the 'G' at the +4 position resulted in a strong Kozak sequence, it is likely that 92% of uORFs have lower probabilities of engaging the ribosome. However, 8% of the total uORFs present in the P. falciparum genome have a 'G' at the +4 position, indicating that ∼2900 uORFs engage the ribosome with high probability and have the potential to repress expression of the downstream CDS. These uORFs may result in lower translation efficiency of the dORFs with which they are associated or may play regulatory roles in translation. Alternatively, although reinitiation is not commonly observed in model eukaryotes, in P. falciparum, these data suggest that thousands of uORFs may have properties that allow the ribosome to reinitiate at the dORF.
Reinitiation is suggested by another observation from this analysis. Interestingly, 56% of the 36 769 uORFs predicted in 3070 protein-coding transcripts have 'G' or 'T' at the +4 position (Fig. 3B), and we show that these types of features yield a stronger Kozak sequence as compared to the ones which have 'A' and 'C' at their +4 position (Fig. 1D). This is consistent with a previous study, which reports that approximately half of the total ribosome footprint coverage in 5 ′ leader sequences of mRNAs overlaps with predicted uORFs (Caro et al., 2014).
In an attempt to find categories of genes with uORFs having different Kozak sequences, GO term analysis was performed with a P value cut off of 0.05 using the Gene Ontology Enrichment tool in PlasmoDB Release 41 (Aurrecoechea et al., 2009). GO terms were enriched for gene sets that are associated with uORFs having Kozak sequences with different nucleotides at their +4 position (Additional File 3). The terms that were common between all four sets had high − log 10 (P value) in the range of 15 and were associated with DNA replication, translation, response to stress, cellular transport, and localization (data not shown). This indicates that uORFs with different nucleotides at +4 position are distributed across different classes of genes.
However, classes of genes having GO terms that were different in each set were seen at lower values of − log 10 (P value). These classes of genes are shown in the radar chart according to their P value from Fisher's exact test (Fig. 3C). Genes involved in tightly

1104
Chhaminder Kaur et al. regulated processes such as protein sumoylation and methylation of macromolecules including histones, DNA, tRNA and proteins, emerged in the set of genes associated with uORFs having 'G' at their +4 position. Apart from these broad categories, specific genes such as the translocon component PTEX88/150, involved in translocation of proteins to the host RBC surface, were also seen in this set. Interestingly, an analysis of all CDS in the genome, regardless of whether the gene is transcribed during the IDC, showed a large enrichment of genes involved in host-parasite interactions having 'G' at the +4 position of their Kozak sequences. These genes include the var, rifin and stevor genes that play roles in antigenic variation, suggesting that these gene families may be under translational control. Indeed, a member of the var gene family, the var2csa gene, also has a uORF with a Kozak sequence having 'G' at the +4 position and this uORF plays a role translational repression of the dORF (Lavstsen et al., 2003;Amulic et al., 2009;Bancells and Deitsch, 2013;Chan et al., 2017). Analysis of genes sets associated with uORFs having nucleotide 'A' and 'C' at their +4 position (weak Kozak context), revealed gene categories that are majorly involved in housekeeping functions. Phosphatases and polyadenylation factors were enriched in the gene set having uORFs with nucleotide 'C' at the +4 position. On the other hand, genes that are involved in the processing of tRNA, rRNA and ncRNA were enriched in the gene set having uORFs with nucleotide 'A' at +4 position. Proteins involved in RNA Polymerase III and RNA Polymerase I transcription were enriched in gene sets with uORFs having 'A' and 'C' at the +4 position, respectively. Since having nucleotides 'C' or 'A' at the +4 position does not correspond to a strong Kozak sequence, it is likely that the ribosome can scan past these uORFs and reach the dORF by leaky scanning. A list of these uORFs along with their genomic location and dORF has been given in Additional File 2.
Decreasing the inter-cistronic length leads to increased repression of reporter gene expression The presence of 8% of uORFs having strong Kozak sequences with a 'G' at the +4 position suggested that in addition to leaky scanning, the translation machinery of P. falciparum might also employ reinitiation to reach the dORF. For reinitiation to occur, after translation termination at the uORF, a fraction of ribosomes would remain associated with the mRNA (Morris and Geballe, 2000). Successful reinitiation at the dORF would depend on the probability of these ribosomes regaining the ternary complex (eIF2, GTP, and initiator Met-tRNAiMet). In mammalian cells, longer inter-cistronic lengths between the uORF and the dORF lead to efficient reinitiation of the dORF (Kozak, 1987;Luukkonen et al., 1995;Child et al., 1999) and similar results have been reported for the var2csa gene in P. falciparum (Bancells and Deitsch, 2013). It has been proposed that the long inter-cistronic lengths allow the ribosomes to reacquire the ternary complex thereby increasing the probability of reinitiation.
Therefore, the effect of changing the inter-cistronic length on expression of the dORF was tested. Here, the 5 ′ leader sequence of the hsp86 gene cloned upstream of a luciferase reporter, was analysed in the plasmid Pf86. The 5 ′ leader sequence has a native ORF (7 amino acids in length), 474 bases upstream of the luciferase start codon. This uORF was termed native-474 and its translatability score based on the Codon Adaptability Index (CAI) (Sharp and Li, 1987) was computed to be 0.759. This uORF has a 'T' at the +4 position of the Kozak sequence, a nucleotide that should give an intermediate Kozak strength (∼65% that of 'G' at the +4 position).
The start codon of the native ORF was mutated in order to generate a construct which does not have any uAUG/uORF (termed Pf86*). In the subsequent experiments, Pf86* was taken as a control and activity of luciferase obtained by this construct was taken as 100%. Activity from all the subsequent constructs has been shown as a percentage of the control. The constructs were transiently transfected into P. falciparum 3D7 parasites along with plasmid containing Renilla luciferase as an internal control of transfection. Attempts to quantify luciferase transcripts did not yield results because of the characteristically low transfection efficiency observed for P. falciparum (Deitsch et al., 2001;Skinner-Adams et al., 2003;Hasenkamp et al., 2012;Rug and Maier, 2013). Therefore, in the results shown in this section and the subsequent sections, the term 'expression' has been used to describe a combined effect of variation in transcription and translation of the reporter gene due to the changes introduced in the leader sequence.
When the expression of luciferase from Pf86* was compared to that from the plasmid containing native-474, it was seen that the native-474 construct gave ∼40% of the luciferase activity of Pf86*. Therefore, mutating the start codon of the native ORF led to a 2.5-fold increase in the expression of the reporter gene (Fig. 4a). This result is consistent with a previous study which reports that even one uAUG/uORF can repress expression of dORF (Kumar et al., 2015). To test the effect of changing intercistronic distance, the native uORF was moved to a position corresponding to 29 bases upstream of the luciferase start codon (native-29). Decreasing the inter-cistronic length to 29 nucleotides gave ∼20% luciferase expression (Fig. 4a) compared to Pf86*.
Another uORF, uORF1 (a synthetic uORF with a high CAI value of 1), was introduced at either 474 or 29 nucleotides upstream of the start codon of luciferase. For these constructs, uORF1-474 and uORF1-29 respectively, luciferase expression reduced to 55% and 12% of the Pf86* control (Fig. 4b). The +4 position for the Kozak sequence of uORF1 was 'C', a nucleotide that shows weaker ability to engage the ribosome (∼30% that of 'G' at the +4 position). Both the uORFs tested, native uORF and uORF1, showed approximately equal levels of repression suggesting that repression depends on a combination of Kozak sequences and CAI scores. Finally, a short upstream ORF, uORF2 (a synthetic uORF with a high CAI value of 1), coding for two amino acids and having a 'T' at the +4 position of the Kozak sequence, was introduced in the 5 ′ leader. Despite having a Kozak sequence of a strength corresponding to 64% of that of the strongest Kozak tested, the introduction of uORF2 191 nucleotides upstream of the start codon (uORF2-191) did not reduce luciferase expression. The combination of a strong Kozak sequence with a short coding sequence of merely two amino acids appears to make this uORF one that allows the ribosome to reinitiate at the luciferase gene. This hypothesis will be addressed in a subsequent section of this report. Interestingly, when uORF2 was introduced 40 nucleotides upstream of the luciferase start codon (uORF2-40), luciferase expression reduced to 16% as compared to Pf86* (Fig. 4c). In sum, data from all three uORFs tested so far indicates that intercistronic length plays an important role in expression of the dORF.
From our data, it can be concluded that introducing a uORF close to the start codon of dORF leads to a significant decrease in the expression of the dORF. In all the cases, the maximal reduction in reporter activity was seen when the uORF was introduced within 50 nucleotides upstream of the start codon of luciferase, irrespective of differences in CAI, Kozak sequence and length of the different uORFs. This is similar to mammalian cells, where a decrease in the inter-cistronic length leads to inefficient translation of the dORF (Kozak, 1987;Luukkonen et al., 1995). Importantly, as inter-cistronic length has no effect on leaky scanning, the data are suggestive of reinitiation being the mechanism for translation of dORFs in P. falciparum when uORFs can engage the scanning ribosomes.
Driven by the experimental results of inter-cistronic lengths, a bioinformatics analysis of the genome of P. falciparum was undertaken to extract the sequences of uORFs that are present at different inter-cistronic lengths from the annotated CDS. Since uORFs that are closer to the CDS are expected to contribute more towards repression, all uORFs nearest to the CDS were extracted and the nucleotides at the +4 position of their Kozak sequences analysed. The frequency of each of the four nucleotides in the Kozak sequences of these uORFs was almost identical to the frequency seen for the dataset of all uORFs (Fig. 3b), suggesting no enrichment of a particular Kozak sequence based on the distance of the uORF from the CDS. Further, uORFs that lie within 50 bases of the CDS were identified as they might cause repression. Using the 5 ′ leader sequence, we mapped all the uORFs found in the leader sequence all the transcribed CDS in asexual blood stages of P. falciparum (Additional File 2). GO term analysis of gene sets having uORFs within 50 nucleotides upstream of the start codon showed enrichment of genes associated with the centromere and kinetochore assembly, snRNA processing, protein neddylation and ubiquitination, tRNA charging and amino acid activation, and biosynthetic pathways for purine and pyrimidine biosynthesis (Fig. 4d) (Additional File 3).
Analysis of the uORFs associated with all CDS (regardless of whether they are transcribed in the IDC stages), revealed enrichment of GO terms associated with var and rifin genes (data not shown). It is well known that var genes are subjected to transcriptional control such that out of the repertoire of var genes, only one is translated in the late asexual stages (Scherf et al., 1998;Kyes et al., 2003;Dzikowski et al., 2007). Our data on Kozak sequences and inter-cistronic lengths of uORFs associated with var genes suggest that for the transcribed var genes, there appears to be a high likelihood of translation repression by uORFs which would be relieved by mechanisms such as reinitiation.

Expression of the reporter gene depends on the length of uORF
So far, results shown in this report have indicated that in asexual stages of P. falciparum, reinitiation takes place to allow the ribosome to handle the large number of uORFs found in mRNAs. Reinitiation efficiency depends on the length of the uORF, with longer uORFs presumably resulting in the loss of initiation factors from the ribosome, in turn, decreasing reinitiation efficiency in mammalian cells (Kozak, 1987(Kozak, , 2001Luukkonen et al., 1995;Hood et al., 2009). It is proposed that initiation factors involved in scanning, remain bound to the 40S subunit of the ribosome briefly after translation has been initiated. As the ribosome elongates a uORF with a longer length, these factors dissociate from the ribosome, thereby decreasing the probability of reinitiation at the dORF.
Possible effects of the length of the uORF were investigated in a bi-cistronic transcript by introducing ORFs of varying lengths upstream of the luciferase reporter gene. A 12 nucleotide long upstream ORF, uORF4 (CAI: 0.484) was introduced 191 nucleotides upstream of the start codon of the reporter gene. Keeping the Kozak sequence and inter-cistronic distance constant, the length of this uORF was increased by inserting a repeating unit, 24 nucleotides in length, in a restriction site present in the uORF to give uORFs of different lengths. This cloning strategy generated constructs termed as uORF4-1X to uORF4-5X that were 36, 60, 84, 108 and 132 nucleotides long and encoded peptides consisting of 11, 19, 27, 35 and 43 amino acids with CAI values of 0.837, 0.890, 0.911, 0.922 and 0.929, respectively. Plasmid Pf86* with no uORF was used as a control for measuring luciferase reporter activity in transient transfection assays (Fig. 5a). Fig. 4. The effect of inter-cistronic distance on the repression of reporter gene expression. Pf86*, a construct which does not harbour any uORF has been used as positive control while Pf86end, a construct in which the second codon of luciferase reporter gene is mutated to stop codon has been used as a negative control. The firefly luciferase readings and the standard deviations have been calculated from three replicates. The firefly luminescence units were normalized against those of Renilla luciferase for each construct. The normalized firefly readings of Pf86* were in the range of 1500-5000 RLUs, measured in Berthold luminometer. (A) The firefly luciferase units obtained when a uORF present natively (Native-474) in the Pf86 plasmid 474 nucleotides upstream of the start codon was moved to 29 nucleotides upstream of the luciferase reporter gene (Native-29). (B) The firefly luminescence units obtained when uORF1 was present at 474 and 29 nucleotides upstream of the start codon (uORF1-474 and uORF1-29, respectively). (C) The firefly luminescence units obtained when uORF2 was present at 191 and 40 nucleotides upstream of the start codon . (D) The GO terms enriched in the genes associated with uORFs with the inter-cistronic length less than or equal to 50 nucleotides. The enriched GO terms are plotted against theirlog 10 of P value from Fisher's exact test (Additional File 3).
Increasing the length of upstream ORF, while keeping the Kozak sequence and inter-cistronic length constant, led to a gradual decrease in reporter gene expression. Although the CAI value of uORF4 was approximately half as compared to the other uORFs, the CAI values of uORF4-1X through uORF4-5X increased incrementally from 0.837 to 0.929. It is expected that increased CAI values would reflect uORFs that are easier to translate, and should result in a higher probability of the ribosome reaching the dORF and therefore, an increase in luciferase activity. Instead, luciferase activity showed a decrease in the presence of longer uORFs.
This suggests that as in other eukaryotes, in P. falciparum, apart from the Kozak sequence and inter-cistronic length, the length of the uORF also has a role to play in the expression of dORF. These findings could be due to multiple reasons. For example, the addition of repeat elements to increase the length of the uORF could change the secondary structure of the mRNA which might affect the translation of the dORF. Nevertheless, given similar results in other eukaryotes (Kozak, 1987(Kozak, , 2001Luukkonen et al., 1995;Hood et al., 2009), one of the mechanisms by which uORF length affects the expression of the dORF in P. falciparum could also be at the level of translation. In the presence of longer uORFs, the ribosomes have a lower probability of reaching the reporter gene and hence, the expression is repressed. In accordance with the data presented in the previous section, these results also suggest that reinitiation occurs in P. falciparum.
Here we show that uORFs coding for peptides longer than 19 amino acids (60 nucleotides) result in repression greater than 50% as compared to the control plasmid having no uORF (Pf86*). As the P. falciparum genome has a large number of uORFs, for the dORF to be translated, one would expect that the majority of uORFs should be less than 60 nucleotides long. As expected, the average length of predicted uORFs in the transcripts is 45 nucleotides (14 amino acids). The frequency distribution of the lengths of uORFs shows that 67% of ORFs are encoded by sequences of less than 45 nucleotides in length (Fig. 5b). The majority of these uORFs would allow the ribosome to reinitiate successfully at the start codon of the CDS. However, longer uORFs (length >45 nucleotides) may engage the ribosomes for a longer time and hence, reduce efficient reinitiation at the start codon of the dORF. A list of these uORFs and their downstream CDS has been given in the Additional File 2. sequence are present at the same distance upstream of the start codon of the reporter gene. The firefly luminescence units have been obtained in the presence of uORF4 and its derivatives with increasing length, uORF4-1X, uORF4-2X, uORF4-3X, uORF4-4X, and uORF4-5X, upstream of the luciferase reporter. Pf86* and Pf86end are the positive and the negative control, respectively. The firefly luminescence units were normalized against those of Renilla luciferase for each construct. The normalized firefly readings of Pf86* were in the range of 2000-3500 RLUs, measured in Berthold luminometer. The firefly luciferase readings and the standard deviations have been calculated from three replicates. (B) Frequency distribution of the length of uORFs predicted from 3070 expressed CDS (C) The GO terms enriched in the genes associated with the uORFs whose length is greater than 199 nucleotides. The enriched GO terms are plotted againstlog 10 (P value) from Fisher's exact test (Additional File 3).
Differential GO term analysis between the sets of genes associated with uORFs whose length is less than and those greater than 45 nucleotides does not reveal enrichment of any category (data not shown). However, a set of genes that are associated with uORFs with length greater than 200 nucleotides was enriched in GO categories involved in transcription regulation, housekeeping processes (lipid catabolism and pH regulation) and protein ubiquitination and phosphorylation (Fig. 5c) (Additional File 3). Interestingly, the AP2 domain transcription factor family which is involved in transcriptional regulation during the development of P. falciparum (Painter et al., 2011) has numerous uORFs that are longer than 200 nucleotides. The uORFs present in the 5 ′ leaders of these genes might be involved in regulating the expression of this transcription factor and hence, controlling development in the parasite.

Assessing the contribution of re-initiation in the expression of a dORF
Results shown so far have indicated that P. falciparum parasites, despite having multiple uAUGs/uORFs in the 5 ′ leaders of their mRNAs, are able to translate their CDS by leaky scanning to bypass weak Kozak sequences and by reinitiation at the dORF. Reinitiation has been described previously in P. falciparum for translation of the var2csa gene in the presence of a 360 nucleotide long uORF (Bancells and Deitsch, 2013), with a parasite translation factor, PTEF (Chan et al., 2017), being induced in parasites that are found in the placenta of pregnant women suffering from malaria. Our data show that reinitiation may occur frequently in asexual parasites that are not sequestered to chondroitin sulphate in the placenta. Next, we analysed whether leaky scanning and reinitiation occur simultaneously in the asexual stages, in the presence of a single repressive uORF.
The strategy involved cloning a single uORF, out of frame with the dORF (luciferase reporter gene). The next step was to eliminate all the in-frame stop codons downstream of the uORF by mutation. To test the extent of reinitiation, the stop codon of the uORF is mutated. In case ribosomes rely on reinitiation alone as the mode of translation of the dORF, the expression of the dORF should be eliminated after mutating the uORF stop codon. This is because the uORF is out of frame with the dORF and if initiation takes place, with no in-frame stop codons, the resulting protein will not encode luciferase. In case the ribosomes do not rely on reinitiation at all and instead use leaky scanning and other mechanisms, the expression of the dORF should remain the same as it was before mutating the stop codon. However, if a mix of reinitiation and other mechanisms, including leaky scanning occurs, the expression should be less than the expression seen before mutation of the stop codon of the uORF (Fig. 6a).
To assess the contribution of reinitiation in the translation of the dORF, the construct containing uORF2-191 ('T' at the +4 position; 9 nucleotides in length), which does not repress the dORF (Fig. 4c) was selected. The presence of this uORF results in a high probability of the ribosome reaching the dORF either through reinitiation, leaky scanning or other mechanisms, thus providing an excellent scenario to test the contribution of reinitiation in the translation of the dORF. The stop codon of the uORF and all the in-frame stop codons were mutated to create a construct termed uORF2-191stop. It was observed that the expression of luciferase decreased to 25% as compared to Pf86* (Fig. 6b). This reduced expression indicated that the probability of ribosomes reaching the reporter gene by reinitiation would be 0.76, calculated as [1 − the ratio of the luciferase activity of uORF2-191stop (25%) and uORF2-191 (106%)] (Fig. 6c). It is important to reiterate here that the calculation derives from the notion that any reduction in luciferase activity due to mutation of the uORF stop codon  present 191 and 65 nucleotides upstream of the luciferase start codon, respectively. Mutating all the in-frame stop codons eliminates any chance of reinitiation, leading to the expression of luciferase only via mechanism other than reinitiation. The firefly luciferase readings and the standard deviations have been calculated from three replicates. The firefly luminescence units were normalized against those of Renilla luciferase for each construct. The normalized firefly readings of Pf86* were in the range of 2000 to 5000 RLUs, measured in Berthold luminometer. (c) The contribution of reinitiation and other mechanisms including leaky scanning for two constructs containing the uORFs of different lengths present at different inter-cistronic distances.
would be due to a dependence on reinitiation. In the case of uORF2-191, the chances of reinitiating the dORF could be high due to 191 nucleotides in the inter-cistronic length and a small uORF length, paving the way for the ribosome to reacquire initiation factors.
Another uORF, uORF3-65 which is 135 nucleotides in length was introduced 65 nucleotides upstream of the luciferase start codon. The Kozak sequence of uORF2-191 and uORF3-65 are the same due to which the extent of leaky scanning in both the cases would be expected to be the same, however, the extent of reinitiation would differ for the two uORFs. As discussed in the previous sections, the length of this uORF and the short intercistronic distance would predict that the ribosomes have a lower probability of reacquiring initiation factors. Therefore, the contribution of reinitiation to the translation of the dORF should be less than that seen for uORF2-191. Consistent with these predictions, the luciferase expression obtained in presence of this uORF was 39% of that of Pf86* and the mutation of the stop codon of uORF3-65 led to a further, yet a small decrease in expression (25% of Pf86*) as compared to the case when the stop codon was present (Fig. 6b). The probability of ribosomes reaching the reporter gene via reinitiation was calculated using the same strategy as used for uORF2-191 and seen to be 0.36 (Fig. 6c). This result is consistent with the expectation that the extent of reinitiation from uORF3-65 should be less than that of uORF2-191 (probability of 0.36 compared to 0.76).
For these two uORFs, a mix of leaky scanning and reinitiation is seen for expression of the dORF. These results reinforce the observations that reinitiation may be widespread in asexual stages of P. falciparum, and not restricted to parasites isolated from pregnancy-associated malaria (PAM) samples.

Discussion
In this report, a systematic study of the features that contribute to repression by uORFs resulted in the observation that P. falciparum asexual stage parasites employ widespread leaky scanning and reinitiation to allow the scanning ribosome to reach the dORF and express a multitude of proteins. Some classes of proteins may undergo translational regulation as uORFs associated with their CDS have features associated with repression: strong Kozak sequences, inter-cistronic lengths less than 50 nucleotides and/or lengths greater than 200 nucleotides. Specifically, these classes of proteins include antigenic variation gene families, including var, rifin and stevor, proteins involved in ubiquitination and members of the AP2 transcription factor family. Potential mechanisms by which translational regulation of these classes of proteins by uORFs might occur are discussed.
Reinitiation of the downstream ORF is a strategy used by P. falciparum to handle large numbers of repressive uORFs We show that uORF length and inter-cistronic distance affect the expression of the downstream luciferase reporter gene, both suggesting that reinitiation takes place in P. falciparum in asexual stages, including the parasites that bind to the placenta as previously reported (Bancells and Deitsch, 2013). Data suggest that the ribosome is able to reinitiate translation of CDS when uORFs are less than 60 nucleotides long. This is comparable to other eukaryotes where uORFs that allow reinitiation are usually less than 30, 48 and 90 nucleotides in Saccharomyces cerevisiae, Arabidopsis thaliana and mammals, respectively (Kozak, 2001;Hinnebusch, 2005;Calvo et al., 2009;von Arnim et al., 2014). In case of inter-cistronic length, we show that uORFs within 50 nucleotides of the start codon repress the expression of dORF drastically in P. falciparum. A similar observation has been made for viruses, where expression of the dORF decreases when the inter-cistronic length is reduced from 64 to 16 nucleotides (Luukkonen et al., 1995). Our analysis shows that 40% transcripts in the IDC stages of P. falciparum have at least one uORF within 50 bases upstream of start codon of CDS. We propose that to translate the CDS, P. falciparum would utilize molecular factors to enhance the probability of reinitiation.
One such player is PTEF (Plasmodium translation enhancing factor) which is required for reinitiation of translation of the var2csa CDS in the presence of a uORF (Chan et al., 2017). This 360-nucleotide long uORF has a 'G' at the +4 position of the Kozak sequence; both these features have been reported to cause repression in the expression of dORF in the present study, presumably by allowing the scanning of ribosomes to initiate at the strong Kozak sequence, allowing them to translate the uORF and terminate before reaching the var2csa CDS. In order to relieve this repression, PTEF, a factor that promotes reinitiation is required. PTEF is expressed in intra-erythrocytic stages with expression levels increasing from ring to schizont stage (Pease et al., 2013). However, in the case of pregnancy-associated malaria, 7-15-fold higher expression of PTEF appears to be needed for translation of var2csa due to the presence of the 360-nucleotide long uORF which is significantly longer than the average length of predicted uORFs (Chan et al., 2017). One might speculate that in asexual stage parasites, low levels of PTEF are sufficient to enable reinitiation in the presence of numerous smaller uORFs which are abundant in transcripts. The presence of wide-spread reinitiation, possibly driven by PTEF, leads to the speculation that as this protein shows low sequence identity with human proteins, it might be a potential drug target for the blood stages of P. falciparum.
Another molecular player that is involved in reinitiation is the translation initiation factor, eIF2α. Evidence of reinitiation mediated by phosphorylation of eIF2α during nutritional stress responses is well documented for the S. cerevisiae GCN4 transcript (Hinnebusch, 1993(Hinnebusch, , 2005 and for the integrated stress response in mammalian cells (Young et al., 2015). In the case of P. falciparum, one would expect that if there is a significant amount of reinitiation, PfeIF2α should be phosphorylated. Interestingly, phosphorylation of PfeIF2α is not seen in ring and trophozoite stages, however, a sudden increase is observed in schizonts (Zhang et al., 2017). Although suggestive of reinitiation occurring predominantly in schizonts, due to a lag observed between the peaks of mRNA synthesis and corresponding protein abundance in P. falciparum (Le Roch et al., 2004;Foth et al., 2011), it is also possible that the phosphorylated PfeIF2α might promote reinitiation from mRNAs expressed at earlier IDC stages.
Another molecular factor that plays a role in reinitiation is eIF3, warranting further work on this protein in P. falciparum. Eukaryotic IF3 (eIF3) remains transiently attached to the elongating ribosome and stabilizes the post-termination complex to stimulate reinitiation of the dORF in S. cerevisiae (Cuchalová et al., 2010;Hronová et al., 2017;Mohammad et al., 2017). Similarly, the H subunit of eIF3 helps in an efficient resumption of scanning after the translation of the uORF in Arabidopsis (Roy et al., 2010).

The Kozak sequence determines the extent of leaky scanning
Alleviation of translational repression due to numerous uORFs can also be achieved through leaky scanning, possibly the most metabolically efficient way to handle multiple uORFs. A known contributor to leaky scanning is the strength of the Kozak sequence (Kozak, 1999;Ferreira et al., 2013Ferreira et al., , 2014. This report confirms published observations that the nucleotide following the start codon (+4 position) plays a significant role in determining the strength of the Kozak sequence in P. falciparum (Kumar 1110 Chhaminder Kaur et al. et al., 2015), unlike other eukaryotes where nucleotides at −3 and +4 positions are both important (Pisarev et al., 2006). In addition to Kozak sequences, eIF1, a translation initiation factor in the preinitiation complex can facilitate selection of the start codon (Cheung et al., 2007) by the release of eIF1 from the preinitiation complex leading to translation initiation (Pestova et al., 1998;Passmore et al., 2007;Nanda et al., 2009). High concentrations of eIF1 lead to the stringent selection of AUGs having strong Kozak sequences (Loughran et al., 2012;Andreev et al., 2015;Fijalkowska et al., 2017) and the phosphorylation state of eIF1 under stress conditions also determines the selection of the start codon, helping to bypass start codons with weak Kozak sequences (Zach et al., 2014). Another factor, the m7G-cap-binding factor eIF4G1, also enhances leaky scanning of uORFs near the cap when bound to eIF1 in mammalian cells (Haimov et al., 2018). Yet another factor that affects leaky scanning in eukaryotes is eIF2α. When eIF2α is phosphorylated under stress conditions, uORFs with weak Kozak sequences are more likely to be bypassed by leaky scanning (Palam et al., 2011). Interestingly, eIF2α phosphorylation also facilitates reinitiation (Hinnebusch, 2005), suggesting that this protein could be a key player in handling repression by uORFs.

Implications of uORFs in gene regulation
Like other eukaryotes, P. falciparum utilizes a spectrum of mechanisms of gene regulation ranging from epigenetic (Caro et al., 2014;Saraf et al., 2016) to post-transcriptional gene regulation (Bunnik et al., 2013). However, unlike other eukaryotes, P. falciparum seems to lack the variety of canonical eukaryotic transcription factors or stage-specific transcription factors (Gardner et al., 2002;Coulson et al., 2004). Stage-specific regulation at the transcriptional level is achieved via the AP2 family of transcription factors (Balaji et al., 2005). This family of transcription factors is known to control the development of the parasite through various stages (Painter et al., 2011). Interestingly, transcript and protein levels during each stage of the parasite's lifecycle are poorly correlated which points towards translationally controlled expression (Le Roch et al., 2004) and a translational regulator, PfALBA1, represses translation of transcripts involved in invasion until the parasite is mature enough to invade (Vembar et al., 2015(Vembar et al., , 2016. Apart from PfALBA1, cis-regulatory factors such as uORFs downregulate the expression of dORFs. The frequent occurrence of uORFs in the transcripts of P. falciparum poses an additional layer of post-transcriptional gene regulation (Kumar et al., 2015). One way to regulate the number of uORFs is by changing in the leader length of the transcripts which has been observed during different stages (Caro et al., 2014). Another regulatory mechanism to remove uORFs could involve alternative splicing of the leader sequence. Evidence for this possibility lies in observations that 431 splice junctions fall in intergenic regions of genes (Sorber et al., 2011). For the remaining uORFs that still pose a hindrance to the ribosome, we propose that a mix of leaky scanning and reinitiation could regulate the expression of the dORF, not only during the IDC but also during the numerous situations of stress that are faced by the parasite during multiplication and development in the human host.
Widespread use of reinitiation and leaky scanning has been observed during stress conditions in a multitude of eukaryotes (Hinnebusch, 1993(Hinnebusch, , 2005von Arnim et al., 2014;Zach et al., 2014;Andreev et al., 2015;Young et al., 2015;Hinnebusch et al., 2016;Fijalkowska et al., 2017). Seemingly, organisms facing stress undergo non-canonical translation to economize their energy and resource usage. Hence, investigating translation initiation factors during stress conditions in P. falciparum would provide insights into the role of multiple uORFs harboured by the parasite's transcripts.
In addition to regulation of translation during stress, an assortment of non-canonical translation mechanisms including reinitiation, leaky scanning, internal ribosome entry and ribosome shunting has been observed in viruses (Ryabova et al., 2006;de Breyne and Ohlmann, 2018). Given their small genome size, viruses have adopted multiple strategies to transcribe and translate their protein repertoire. In the case of P. falciparum, the constraint is not genome size, but the high AT content of the genome. Owing to this, mRNAs have numerous uORFs and uAUGs which the parasite's translation machinery appears to handle by widespread use of non-canonical translation mechanisms.
In conclusion, this is the first report that systematically delineates the features of uORFs that affect translation of the downstream gene in P. falciparum. Additionally, we show that a mix of reinitiation and leaky scanning mechanisms are employed in asexual stages of P. falciparum to translate the dORF in the presence of upstream ORFs. Therefore, initiation factors such as PTEF, PfeIF1, PfeIF2 and PfeIF3 may be critically involved in translation and regulation of parasite proteins. This distinguishing feature of the P. falciparum cytoplasmic translation machinery has the potential to become a novel target for anti-malarial drugs.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S0031182020000840.