Short 5′ UTR enables optimal translation of plant virus tricistronic RNA via leaky scanning

Regardless of the general model of translation in eukaryotic cells, a number of studies suggested that many of mRNAs encode multiple proteins. Leaky scanning, which supplies ribosomes to downstream open reading frames (ORFs) by read-through of upstream ORFs, is the most major regulatory mechanism to translate polycistronic mRNAs. However, the general regulatory factors controlling leaky scanning and their biological relevance have rarely been elucidated, with exceptions such as the Kozak sequence. Here, we have analyzed the strategy of a plant RNA virus to translate three movement proteins from a single RNA molecule through leaky scanning. The in planta and in vitro results indicate that significantly shorter 5′ UTR of the most upstream ORF promotes leaky scanning, potentially finetuning the translation efficiency of the three proteins in a single RNA molecule to optimize viral propagation. Moreover, in plant endogenous mRNAs, we found that shorter UTRs were more frequently observed in uORFs of polycistronic mRNAs. We propose that the promotion of leaky scanning induced by a short 5′ UTR (LISH), together with the Kozak sequence, is a conserved gene regulation mechanism not only in viruses but also in eukaryotes.


Introduction
According to the general model of translation in eukaryotic cells, the translation machinery recognizes a single open reading frame (ORF) in a monocistronic mRNA. This recognition process has three steps: initiation, elongation, and termination. The first step, translation initiation, is a highly regulated process dependent on the 5′ cap structure of mRNA (1). The 5′ cap structure is recognized by the cap-binding protein eukaryotic translation initiation factor 4E (eIF4E) followed by recruitment of the 43S preinitiation complex (PIC) to the mRNA. Then the PIC scans the mRNA downstream from the 5′ end until the first appropriate initiation codon is recognized (2). Recognition of the initiation codon induces recruitment of the 60S ribosomal subunit to form the 80S ribosome complex, which transitions polypeptide synthesis referred to as the elongation step. Finally in the termination step, the ribosomes and synthesized proteins are released from the mRNA after stopping at the termination codon.
Increasing experimental evidence has demonstrated that many eukaryotic mRNAs have polycistronic structure: a single RNA molecule encoding multiple proteins. In fact, 44% of human mRNAs contain upstream ORFs (uORFs) located upstream of the start codons of annotated ORFs (3). Moreover, the start codon of uORF is more conserved among various mammalian species than would be estimated for neutral evolution, suggesting that this uORF plays a particular biological role (4,5). Similar to animals, approximately 35% of mRNAs in the model plant Arabidopsis thaliana contain uORF, and half of them have multiple uORFs (6). Translation from polycistronic mRNA is accomplished mainly by read-through of an initiation codon referred to as leaky scanning. In leaky scanning, the PIC does not recognize the upstream AUG (uAUG) and reads past it, arriving at a downstream AUG (dAUG) to start the translation of the downstream ORF (dORF). Because the dAUG competes with the uAUG for the entry of PIC, increase of translation from the uAUG decreases translation from the dAUG, whereas decrease of translation from the uAUG increases translation from the dAUG.
The efficiency of AUG codon recognition by PIC depends on the nucleotide sequence context, and in particular, the optimal context is called the Kozak sequence (7). The context of the uAUG is an important regulatory factor in dORF expression because high efficiency of uAUG recognition promoted by a strong translation initiation sequence context disturbs the translation initiation from the dAUG. In fact, mutations in the translation initiation context of a uORF can result in serious physiological disorders at the organismal level, such as tumorigenesis in humans caused by natural variants in the Kozak sequence of a single gene (8). However, the regulatory mechanism of leaky scanning in eukaryotic cells is not fully understood.
Viruses are intracellular parasites that rely heavily on the host machinery for their propagation. To synthesize viral proteins, viruses retain a sophisticated mechanism to exploit the host translation system (9,10). Presumably due to their limited genome size, RNA virus genomes are frequently polycistronic and they are expressed by multiple strategies. The genus Potexvirus includes a group of monopartite positive-sense, single-stranded RNA viruses that infect a wide range of plant species. The potexviral genome with a 5′ cap and 3' poly (A) tail possesses five ORFs encoding RNA-dependent RNA polymerase (RdRp); triple gene block protein (TGBp) 1, 2, and 3; and coat protein (CP) (11,12). TGBp1/2/3, which function as movement proteins (MPs) required for the intracellular movement of viruses in plant cells, are translated from three partially overlapping ORFs. Both the gene structure and the functions of TGBps are conserved in multiple genera of plant viruses other than potexviruses. Translation of these three proteins is considered to require two subgenomic RNAs (13) but the detailed mechanism of this translation system is not fully understood.
Here, we investigated the translation mechanism of TGBps and found that the three proteins are all translated from a single RNA molecule, sgRNA1, via leaky scanning. Through in vitro, in planta, and in silico analyses of the mechanism that regulates the translation of TGBps from sgRNA1 and the translation of Arabidopsis thaliana mRNAs, we showed that, in addition to the Kozak sequence, the length of the uORF 5′ UTR regulates leaky scanning.
Thus, we propose a model named leaky scanning induced by a short 5′ UTR (LISH) as a highly conserved translation regulatory mechanism, not only in viruses but also in their eukaryotic hosts.

Plant materials and growth conditions
Nicotiana benthamiana plants were maintained in a growth chamber with a 16-h light, 25°C/8-h dark, 20°C cycle throughout the assays. Detailed conditions for Arabidopsis cell culture were described previously (14). Detailed conditions for Nicotiana cell culture were described previously (15).
The sequence data are shown in Supplementary Table 3. The GFP coding fragment was ligated with plasmid fragments using a GeneArt Seamless Cloning and Assembly Kit.
Plasmid fragments were obtained from pPlAMV using the GRF/35SR primer set. The fragments were ligated using a GeneArt Seamless Cloning and Assembly Kit.
(xi) Leader-sequence variants of PVX-GFP for agroinoculation (PVX-GFP-5U4nt and -10nt). The sgRNA1 sequence was amplified from pPVX-GFP via PCR using the primer sets shown in Supplementary Table 4. The plasmid fragments were obtained from pPVX-GFP using the PVXTGB1F/35SR primer set. The fragments were ligated using a GeneArt Seamless Cloning and Assembly Kit.

Northern blot analysis
Total RNA (1 g) was analyzed using the digoxigenin (DIG) system (Roche). A DIG-labeled probe for PlAMV RNA detection was produced as described previously (16).
Probes for PVX, LoLV, and PVM RNA detection were transcribed with T7 RNA polymerase from the DNA fragments amplified from each viral cDNA clone using the primer sets PVX5435F/PVX-RT7, LoLV6611F/LoLVRT7, and PVM7531F/PVMRT7, respectively.

Protoplast preparation and transfection
Arabidopsis suspension culture cells (30)

Immunoblotting
Agroinfiltrated leaves were harvested at 4 dpi, total protein was extracted using RIPA buffer

Luciferase assay
Transfected protoplasts were collected at 19 hpi, and luciferase protein was extracted using extraction buffer [0.1 M phosphate (pH 7.0) and 5 µM DTT]. Luciferase activity was measured using a dual-luciferase reporter assay system (TOYO B-Net, Tokyo, Japan) according to the manufacturer's instructions.

5′ RACE
The TSS of PlAMV sgRNA was detected in 5′ RACE analysis using a GeneRacer Kit

Analysis of Arabidopsis UTRs
Sequence information for mRNAs was obtained from the TAIR9 version of the A. thaliana Col-0 genome (http://www.arabidopsis.org/). We identified the first AUG triplets in each mRNA when scanning from the 5′ end and classified them into two groups in which the first AUG triplet was matched or unmatched to the start codon of the annotated protein. The perl script used for classification is shown in Supplementary Table 5. For each of the two classified groups, we analyzed the length of the sequence upstream of the AUG triplet located at the 5′ end.

All three TGB proteins are translated mainly from sgRNA1
Although TGBp2 and TGBp3 of potexviruses have been suggested to be translated from sgRNA2 (13) potexviruses (32,33). To determine the transcription start sites (TSSs) of sgRNAs presumably transcribed from PlAMV genomic RNA, we used the 5′-rapid amplification of cDNA ends (5′-RACE) method with two gene-specific primers. The TSSs of sgRNA1 and sgRNA3 were successfully mapped to two major sites, G4224 and G5339, respectively, where the 5′ ends of all the cloned transcripts aligned (S. Fig. 1a, 1b). The same sites can be predicted to be TSSs based on consensus promoter sequences in viral genomic RNA (26). By contrast, no potential TSS was predicted for sgRNA2. The 5′ ends of cloned transcripts starting between the initiation codons of TGBp1 and TGBp2, which likely corresponded to the TSSs of sgRNA2, were consistently variable (S. Fig. 1c). These results indicated that a major TSS could not be defined for sgRNA2.
We hypothesized that the absence of sgRNA2 signals in northern blot analysis may be due in part to lower sgRNA2 accumulation. We reasoned that another viral RNA, most likely sgRNA1, participates in the translation of TGBp2/3. We prepared an sgRNA1-deleted PlAMV mutant (∆sg1), in which several nucleotide substitutions were introduced into the sgRNA1 promoter sequence, following the procedure in a previous report (26; S. Fig. 1d). To test whether the ∆sg1 mutant produced TGBp1, TGBp2, and TGBp3, we inoculated the Δsg1 mutant into N. benthamiana leaves using the agroinoculation method. Total RNA extracted from the inoculated leaves at 1.5 days post-inoculation (dpi) was analyzed using northern blot hybridization. We confirmed that genomic RNA (gRNA) and sgRNA3 accumulated similarly as with wild-type (WT) PlAMV in ∆sg1-inoculated leaves, but sgRNA1 did not (Fig. 1a).
Immunoblot analysis revealed that TGBp2 and TGBp3, as well as TGBp1, were not detected in the ∆sg1-inoculated leaves in which RdRp and CP accumulated to the same levels as in the WT (Fig. 1a). These results suggest that sgRNA1 is responsible for the translation of TGBp2 and 3 as well as TGBp1.
To further confirm the role of sgRNA1 in the production of TGBp2 and TGBp3 proteins, we examined whether TGBp2/3 were translated in sgRNA1-transfected protoplasts. Given the findings that TGBp1/2/3 are translated mainly from sgRNA1, we next analyzed the mechanism underlying the translation of TGBp1/2/3 from sgRNA1. Translation of dORFs in polycistronic mRNAs requires non-canonical mechanisms, such as leaky scanning (9). We hypothesized that the dORFs encoding TGBp2 and TGBp3 in sgRNA1 were regulated via leaky scanning for two reasons. First, according to the conventional potexviral TGBp expression model, leaky scanning regulates the translation of TGBp3 from sgRNA2 (13). Second, there is no AUG triplet between the TGBp1 and TGBp2 initiation codons in the PlAMV genome. This feature certainly favors leaky scanning to ensuring adequate translation of TGBp2/3 from sgRNA1.

Protoplasts isolated from
To exclude the possibility that a non-canonical translation mechanism other than leaky scanning regulates the translation of TGBp2/3 from sgRNA1, we tested the involvement of two potential non-canonical translation mechanisms, IRES and re-initiation.
IRES is the structural RNA sequence that permits cap-independent translation initiation for the dORF by directly recruiting the translation apparatus including the PIC. We inserted a 40-nt Kozak-stem loop sequence (KS-sg1) to impair the migration of PIC scanning from the 5′ terminus (34,35; Fig. 2a). In addition, a green fluorescence protein (GFP)-coding sequence with a 21-nt 5′ leader sequence was fused to the 5′ end of sgRNA1 (GFP-sg1) to prevent majority of the PIC from the 5′ terminus from reaching to TGBp initiation codons by making the PIC recognize the AUG of GFP. If an IRES was located upstream of the start codon of TGBp2 to allow translation of TGBp2/3, the level of TGBp2/3 accumulation with KS-sg1 or GFP-sg1 would be similar to that with WT sgRNA1. Immunoblot analysis showed no accumulation of TGBp1/2/3 in Arabidopsis protoplasts transfected with KS-sg1 and GFP-sg1 ( Fig. 2b), suggesting that sgRNA1 does not contain a functional IRES. The expression of GFP protein was confirmed via immunoblotting (Fig. 2b).
To further evaluate the translational efficiency of TGBp1/2/3 in KS-sg1 and GFP-sg1 mutants, we performed a dual luciferase assay. We constructed modified KS-sg1, GFP-sg1, and WT sgRNA1 in which the coding region of TGBp1, 2, or 3 was replaced with the Renilla luciferase (Rluc) gene (Fig. 2a). The Rluc-containing sgRNA1 constructs were transfected into Arabidopsis protoplasts, and Rluc activity was normalized to firefly luciferase (Fluc) activity to assess the activity levels and transformation efficiency in the protoplasts. For all TGBp proteins, the relative Rluc activities of each mutant (KS and GFP constructs) were significantly lower than that of each WT construct (Fig. 2c), supporting the notion that sgRNA1 does not contain a functional IRES.
Next, to examine the possibility of the involvement of reinitiation, we replaced the UGA stop codon of TGBp1 with a GGA triplet (S. Fig. 2a). The frequency of reinitiation depends on the distance between stop codon of uORF and dAUG (36,37). If TGBp2 and TGBp3 were translated via reinitiation, replacement of the TGBp1 stop codon would result in the clear separation between the initiation codon of TGBp2 and the end of the TGBp1 sequence, thereby reducing the level of TGBp2/3 expression. As expected, immunoblotting using anti-TGB1 antibody showed a band with a slightly higher molecular weight than that of the WT sg1-transfected protoplasts in the ∆stp-transfected protoplasts, and a signal with a slightly lower molecular weight, presumably a degradation product, was also obtained (S. Fig.   2b). Both TGBp2 and TGBp3 accumulated in protoplasts transfected with ∆stp to the same level as with WT-transfected samples (S. Fig. 2b). These results indicate that TGBp2 and TGBp3 are not translated from sgRNA1 via reinitiation.

TGBp2/3 are translated via leaky scanning through the TGBp1 initiation codon
We tested our hypothesis that TGBp2 and TGBp3 are translated from sgRNA1 via leaky scanning of the TGBp1 initiation codon. In leaky scanning, the uAUG competes with the dAUG for translation initiation by ribosomes: the nucleotide context of the uAUG affects the efficiency of translation initiation and subsequently the continuation of PIC scanning to reach the dAUG. In dicots, the optimal sequence context for translation initiation is "RNN AUG G,"  (13), we also confirmed that TGBp3 translation is actually mediated by leaky scanning of the TGBp2 initiation codon in sgRNA1 (S. Fig. 2c, d, e).
To further test the leaky scanning hypothesis, we introduced six AUG codons throughout the TGBp1 ORF by substituting nucleotides without changing the TGBp1 amino-acid sequence. In addition, the nucleotides located at −3 of each introduced AUG were replaced with G to satisfy the optimal Kozak sequence, unless the TGBp1 amino-acid sequence would be altered (S. Fig. 4a, midAUG). If TGBp2 and TGBp3 were translated from sgRNA1 by leaky scanning, insertion of several AUG codons preceding the AUG codons of TGBp2 and TGBp3 would reduce their translation efficiency by trapping the majority of PIC. The midAUG mutant and its Rluc derivatives were transfected into protoplasts, and expression of TGBp1/2/3 was analyzed via immunoblotting and luciferase assay. As expected, although immunoblotting showed that the accumulation of mutated TGBp1 did not differ from that of the WT, the levels of TGBp2 and TGBp3 and their translation efficiency as quantified by the luciferase assay were significantly reduced (S. Fig. 4b, c). These results also agree with our hypothesis that leaky scanning of the TGBp1 initiation codon regulates the translation of TGBp2/3 from sgRNA1.
As shown in Supplementary Figure 2c remarkably short (7 nt). Due to the conserved genomic structure among flexiviruses, we predicted that the mechanism of TGBp2/3 translation from sgRNA1 would be conserved among other flexiviruses. To examine whether this short leader sequence of sgRNA1 is widely conserved among potexviruses and related viruses, we determined the TSSs of the sgRNA1s of several potexviruses, a lolavirus, and a carlavirus. We found that the sgRNA1 5′-leaders of these viruses were all no more than 8 nt in length, supporting the notion that this short 5′-leader length in the translation of TGBp2/3 from sgRNA1 is conserved among these viruses (Fig. 4). Thus, we assumed that the short leader sequence of TGBp1 is another factor that regulates leaky scanning of sgRNA1.
Immunoblotting revealed that TGBp1 levels gradually decreased as the leader length became shorter; by contrast, accumulation of TGBp2 and TGBp3 increased (Fig. 5b). Luciferase assay results also showed that the translational efficiency of TGBp2/3 increased in a stepwise manner as the 5′-leader sequence was shortened (7-nt length to 1-nt length), whereas that of TGBp1 decreased (Fig. 5c), indicating that shortening the leader sequence promoted the efficiency of leaky scanning. To further validate this hypothesis, we reasoned that extension of the sgRNA1 5′-leader sequence would reduce the efficiency of leaky scanning. To test this possibility, we constructed three sgRNA1 variants: dp5U with a tandemly duplicated WT leader sequence (14 nt), 5Usub1 with a heterologous long leader sequence (42 nt), and 5Usub2 with a heterologous GC-rich leader sequence with the same length as the WT (7 nt) (S. Fig. 5a). Immunoblotting and luciferase assay results showed that the translation levels of TGBp2/3 in protoplasts transfected with sgRNA1 variants with a long leader sequence (dp5U and 5Usub1) were lower than those produced from WT sgRNA1 (sg1), whereas the accumulation of TGBp1 did not differ (S. Fig. 5b, c). By contrast, 5Usub2 had the same leader length as the WT but a higher GC content, and was equivalent to the WT in terms of the accumulation levels of all TGBps (S. Fig. 5b, c). Taken together, these results indicate that leaky scanning induced by a short 5′ UTR (LISH) enables ribosomes to efficiently translate TGBp2 and TGBp3.

Leader sequence length is optimized for viral infection
Our results so far indicate that the amount of the three TGBps translation from sgRNA1 varies depending on the length of the leader sequence. Thus, to test whether this variation has biological relevance, we employed a trans-complementation system incorporating a GFP-labeled PVX mutant lacking sgRNA1 (PVX-GFP-∆sg1) in combination with transient transfection with PVX-sgRNA1 variants with various leader lengths (1, 4, 7, or 10 nt). Using GFP-tagged PVX, the efficiency of viral spread from infected cells to neighboring healthy cells (cell-to-cell movement) was quantified by measuring the area of GFP fluorescence (16).
As shown in Supplementary Figure 6, GFP foci derived from the trans-complementation of sgRNA1 variants with different leader lengths were smaller than those resulting from WT leader-containing sgRNA1, suggesting that the alteration of LISH via the mutation of leader length modulated the cell-to-cell movement efficiency of PVX, likely due to an imbalance in the accumulation levels of TGBps (S. Fig. 6b, c). Similar results were obtained in PlAMV using a corresponding trans-complementation assay combining a GFP-labeled PlAMV mutant lacking sgRNA1 (PlAMV-GFP-∆sg1) and PlAMV-sgRNA1 variants with various leader lengths (1, 3, 5, 7, or 10 nt) (S. Fig. 7a). The efficiency of cell-to-cell movement of PlAMV-GFP increased as the leader sequence became longer, and, noticeably, the movement efficiency of the virus with a 10-nt leader was nearly the same as that of the WT (S. Fig. 7b,   c). This may be because higher efficiency of cell-to-cell movement does not always maximize viral fitness (39).
Furthermore, we constructed PVX-GFP mutants to express sgRNA1 with various leader lengths in cis (Fig. 6a). The overlap of the RdRp coding region and the leader sequence of sgRNA1 prevented the construction of PlAMV variants without changing the amino-acid sequence of RdRp. We confirmed that the TSSs of sgRNA1 variants transcribed from these PVX mutants by RACE analysis were as intended. In addition, northern blotting and quantitative reverse transcription PCR (qRT-PCR) indicated that these mutations did not significantly affect viral RNA multiplication in protoplasts isolated from Nicotiana suspension culture cells (BY-2) (Fig. 6b, c). When GFP-tagged PVX variants expressing sgRNA1 with leader lengths of 4 or 10 nt were inoculated into N. benthamiana, the relative sizes of GFP fluorescence spots produced by the WT virus were significantly larger than those of mutants with shorter and longer leader sequences (Fig. 6d, e). Taken together, our results suggest that the native leader length optimizes the efficiency of LISH to achieve optimal cell-to-cell movement.

A short 5′ UTR is a widely conserved feature among uORFs in host plant mRNAs
Our results suggest that LISH finely controls efficiency of the translation from viral polycistronic RNAs. On the other hand, accumulating evidence has shown that the translation of a large number of eukaryotic polycistronic mRNAs is regulated by leaky scanning of the uORF AUG codon (6). Therefore, we hypothesized that LISH may be widely employed for the translational regulation of such eukaryotic polycistronic mRNAs.
In the Arabidopsis genome, uORFs can be found in almost 35% of mRNAs, and they are suggested to play crucial roles in controlling the translation level of the main ORFs (mORFs) via non-canonical translation mechanisms, including primarily leaky scanning. To verify the hypothesis, we computationally analyzed the positions of the 5′-proximal first AUGs (fAUGs) and compared them to annotated mORF AUGs (mAUGs) in the registered Arabidopsis mRNA database (TAIR9). Among 48,034 registered mRNAs, 21,546 mRNAs (~44.9%) were canonical in structure with no uORF and matching fAUGs and mAUGs.
Among these 21,546 canonical fAUG-mAUG-matched mRNAs (matched-mRNAs), the mAUGs of 5,339 mRNAs were located at the 5′ terminus, which was likely due to inadequate sequence analysis of the mRNA structures, and therefore, we omitted these mRNAs from further analysis. As a result, 16,207 mRNAs were regarded as canonical matched-mRNAs, whereas 26,488 were regarded as fAUG-mAUG-mismatched mRNAs (mismatched-mRNAs) that contained uORFs.
We analyzed the lengths of the 5 UTRs of fAUGs in all matched-and mismatched-mRNAs. In mismatched-mRNAs, the two most frequent lengths of 5 UTRs of fAUGs were shorter than 20 nt. Of the total 26,488 mismatched-mRNAs, 7,705 (29.1%) mismatched-mRNAs with fAUGs had 5 UTRs shorter than 20 nt (Fig. 7). By contrast, of the total 16,207 matched-mRNAs, 1,101 (6.8%) matched-mRNAs with fAUGs had 5 UTRs shorter than 20 nt. Interestingly, the most frequent 5 UTR lengths in matched-mRNAs were found between 50 and 100 nt, suggesting there is some bias to avoid short UTR. Although we cannot rule out the possibility that a portion of the sequences of mismatched-mRNAs with short 5 UTRs as well as matched-mRNAs has not been fully analyzed, these results suggested that LISH is an intrinsic mechanism of uORF-mediated translational gene regulation in plants.

Discussion
In this study, we elucidated a novel regulatory mechanism involving leaky scanning. We found that the TGBps encoded in the genome of potexviruses are all translated from a single RNA molecule, sgRNA1, via leaky scanning. Moreover, the remarkable shortness of the leader sequence upstream of the first ORF in sgRNA1, TGBp1, is well conserved among potexviruses. Mutational analyses showed that the efficiency of leaky scanning was strictly dependent on the 5′ UTR length. Surprisingly, genome-wide analysis of A. thaliana mRNAs showed that the distribution of 5′ UTR lengths in mRNAs with a uORF differed drastically from that of mRNAs lacking a uORF, indicating that 5′ UTR length-dependent regulation of leaky scanning would be another regulatory mechanism employed by not only viruses but also eukaryotes. Taken together, we present here a mechanism for the regulation of leaky scanning in addition to the Kozak sequence.

All TGB proteins are translated from sgRNA1
We analyzed the molecular mechanism underlying the translation of three TGBps from sgRNA to re-verify the potexviral TGBps expression model proposed previously (13

5' UTR lengths in eukaryotic mRNA, viral gRNA, and viral sgRNA
We found that the 5′ UTR of the flexiviral sgRNA1 is very short, ranging from 1 to 8 nt.
Considering the 5′ UTR of flexiviral RdRp, which is translated directly from gRNA, is between 70 and 110 nt, the 5′ UTR of sgRNA1 is extremely short. Although plant virus MPs are in many cases translated from sgRNAs, the 5′ UTR of sgRNA1 of flexiviruses is much shorter than those of MP-encoding sgRNAs of other RNA viruses. For instance, the 5′ UTR length of the sgRNA encoding the MP of tobacco mosaic virus (TMV) is approximately 60 nt, which is much longer than that in flexiviruses, indicating that LISH is not required for the translation of TMV MP in harmony with its single ORF character. On the other hand,

TGBp-type MPs are also encoded in several species in the genus Benyvirus and family
Virgaviridae, but the 5′ UTRs of these MPs are all more than 100 nt long. This may be because TGBp2 in these viruses have been reported to be translated from its own sgRNA, not by leaky scanning of TGBp1.
Strikingly, we found that short 5′ UTRs are also conserved in plant mRNAs besides viral RNAs. The remarkable shortness of the 5′ UTR is more conserved in a large proportion of uORFs of endogenous polycistronic mRNAs in Arabidopsis compared to monocistronic mRNAs. Only 6.8% of the mORFs had a 5′ UTR length shorter than 20 nt, compared to 29.1% of the uORFs. Taken together, LISH may be a general translational regulatory mechanism involving leaky scanning derived from the intrinsic function of eukaryotic ribosomes, together with Kozak sequences.

Molecular mechanism underlying LISH
In this study, we demonstrated that the efficiency of leaky scanning is regulated by the length of the 5′ UTR of uORF. Although a previous study using an in vitro translation system and artificial mRNA showed that it is theoretically possible that the short 5′ UTR upregulates the efficiency of leaky scanning (43,44), this is the first study to show that LISH is actually employed to regulate the efficiency of leaky scanning in viral and eukaryotic mRNAs in living cells. Our results raise a question about the molecular mechanism by which such short mRNA leader sequences enhance the efficiency of leaky scanning. A previous study reported that when the P-site of the mammalian 43S PIC binds to the AUG initiation codon, the PIC complex structurally contacts multiple nucleotides proximal to the AUG codon, including the upstream 17 nt and the downstream 11 nt (45). Therefore, when the 5′ UTR length is shorter than 17 nt, the absence of RNA in the space inside the PIC normally occupied by the 5′ UTR sequence may alter the PIC's overall conformation, disrupt the efficiency of start codon recognition, and thereby increase the efficiency of leaky scanning. On the other hand, given that the efficiency of leaky scanning of sgRNA1 was also dependent on the Kozak sequence even when the 5′ UTR was 7 nt (Fig. 3), 43S PIC scanning seems to still function even under the LISH condition.

Biological significance of LISH in viral infection
Using a trans-complementation assay, we showed that the infection efficiency of sgRNA1-deficient PVX was optimal when sgRNA1 with a WT 5′ UTR length was complemented. The infection efficiency decreased upon complementation with sgRNA1s containing elongated or shortened 5′ UTRs (S. Fig. 6). The same result was obtained when the intact genomic sequence of a PVX-GFP infectious clone was manipulated to generate virus mutants with elongated or shortened 5′ UTRs (Fig. 6). These results suggest that the length of the sgRNA1 leader sequence is optimal for regulating the efficiency of cell-to-cell movement, potentially through fine-tuning the accumulation levels of the three TGBps. In the region upstream of the TGBp3 initiation codon in PlAMV sgRNA1, there are only two AUG codons including those of TGBp1 and TGBp2, implying evolutionary pressure to avoid unnecessary dAUGs that would impair the translation of TGBp2/3.
TGBps act as MPs, but their detailed characteristics differ slightly (46). TGBp1 is a multifunctional protein with RNA-binding and ATPase activities that acts as an MP and an RNA-silencing suppressor. TGBp1 modifies the plasmodesmata size exclusion limit and reorganizes the viral genome, proteins, host actin, and endomembranes to form the viral X body for cell-to-cell movement (46). Both TGBp2 and TGBp3 are transmembrane proteins, but they localize to different subdomains of the endoplasmic reticulum (47). Therefore, viruses may orchestrate the different functions of TGBps to adapt to the cellular environment of the host by optimizing the levels of the three MPs. In fact, the Kozak sequence of the AUGs of TGBp2 and TGBp3 are not optimal, and recognition of the TGBp1 AUG is restricted by LISH. Thus, all three TGBp accumulation is suppressed to a certain degree by LISH together with Kozak sequence regulating their translation initiation efficiency. This may be due to the fact that the accumulation of TGBps can be stressful to plant cells. PVX TGBp1 triggers cell death resulting from endoplasmic reticulum stress in N. benthamiana (48). Moreover, potexviral TGBp3 drastically modifies the membrane structure of the endoplasmic reticulum (49), and its overexpression causes veinal necrosis (50). This is in agreement with the finding that TGBp3 of shallot virus X and lily virus X, both are the members of genus Potexvirus, generally has a non-AUG initiation codon with lower translation initiation efficiency compared to the normal AUG codon (51,52). Furthermore, a simulation study clearly demonstrated that excessive cell-to-cell movement efficiency impairs optimal selection against defective genomes and deteriorates the quality of viral genomes (39). The highest efficiency of movement does not always maximize viral fitness.
Suppression of MP translation may also increase viral fitness by suppressing movement efficiency.
In conclusion, our study elucidated a novel translation regulatory mechanism involving leaky scanning, LISH, which regulates the translation efficiency of the dORF dependent on the length of the 5′ UTR of the uORF. Future study will unveil the functional universality of                    Figure 7. Short leader-sequence length is conserved among endogenous Arabidopsis uORFs.
We omitted the value of matched-0 because the value was thought to be based on inadequate sequence data. The values for mRNAs whose leaders were longer than 401 nt are not shown in this graph because the values were small.