Main

Positive-strand RNA viruses constitute more than one-third of all virus genera and cause various diseases in plants and animals. Notable members of this group include hepaciviruses, coronaviruses, flaviviruses and tombusviruses. These viruses have single-stranded RNA genomes that have the same coding sense as host mRNAs and that thus function directly as templates for the translation of viral proteins (Fig. 1). One of the proteins that is expressed during infection is RNA-dependent RNA polymerase (RdRp), which is responsible for the replication of viral genomes and, in certain viruses, the transcription of smaller viral subgenomic mRNAs (sgmRNAs)1 (Fig. 1). The regulation of fundamental viral processes generally involves local RNA elements within genomes, which recruit viral and host factors2,3. Such regulatory units are known as cis-acting RNA elements, and the factors that they interact with, such as viral or cellular proteins, are generically known as trans-acting factors.

Figure 1: Fundamental steps in positive-strand RNA virus replication.
figure 1

A linear representation of a generic positive-strand RNA viral genome is shown with the encoded proteins represented by cylinders. The viral protein that is encoded at the 5′ end, p1 (for example, the viral RNA-dependent RNA polymerase (RdRp)), is translated directly from the genome. RdRp synthesizes a full-length complementary negative-strand RNA using the genome as a template, which is subsequently used for the synthesis of progeny genomes. Certain viral genomes also function as templates for the transcription of subgenomic mRNAs (sgmRNAs), which is a process that is also mediated by the viral RdRp and that involves a negative-strand intermediate. The ORFs of additional viral genes that are located downstream of the first ORF are generally not efficiently translated, owing to poor ribosome access (not shown). The transcription of smaller viral sgmRNAs enables the expression of these additional proteins (for example, p2), as downstream ORFs in sgmRNAs are relocated to 5′-proximal positions, which enables efficient ribosome access.

PowerPoint slide

Although most of the regulatory cis-acting RNA elements in positive-strand RNA viruses are local sequences or confined secondary or tertiary structures, there is mounting evidence that long-range intragenomic RNA base pairing can also have essential functions, such as executing and modulating different viral processes4,5,6,7. The definition of a long-range RNA–RNA interaction can vary and, depending on the scientific perspective, can range from tens to thousands of nucleotides. There are many examples of interactions in the lower range that regulate viral processes, and many of these have been described in other reviews3,4. Conversely, interactions in the upper range tend to be less prevalent, but can also have important functional consequences for viral gene expression and/or genome replication in plant and animal positive-strand RNA viruses (Table 1). Such long-range interactions were first reported for positive-strand RNA bacteriophages in the 1990s8,9,10,11 and, since then, have been discovered in various positive-strand RNA viruses that infect eukaryotes4,5,6,7. All of the long-range interactions that have been characterized so far involve Watson–Crick RNA base pairing, and they often include sequences that are positioned in loop regions or in internal bulges within RNA structures. In this Review, we focus on the burgeoning field of interactions that span large distances, that is, ≥1,000 nucleotides, and we discuss how these interactions modulate viral processes. In particular, we focus on: the regulation of translation initiation by 3′ cap-independent translational enhancers (3′ CITEs) and internal ribosome entry sites (IRESs); translational recoding events, including ribosomal frameshifting and stop codon readthrough; genome replication; and sgmRNA transcription. We then consider the possible roles and regulatory mechanisms of such interactions and discuss how they function in the complex context of viral RNA genomes. Last, we discuss outstanding questions that will determine the future directions of research into long-range intragenomic RNA–RNA interactions.

Table 1 Functional intragenomic interactions in positive-strand RNA viruses

Translation of viral proteins

Positive-strand RNA viral genomes are directly translated into viral proteins by host ribosomes, and a minority are generated by translational recoding mechanisms that involve either ribosomal frameshifting or stop codon readthrough12. Regardless of the translational strategy that is used, all viral genomes compete against abundant cellular mRNAs for ribosomes. Accordingly, these RNA genomes contain structural features that assist in the recruitment of the host translational machinery. Some viruses use the conventional terminal structures of mRNAs, that is, the 5′ cap and the 3′ poly(A) tail13. In other cases, less typical elements are used, such as 3′ CITEs5,6 or IRESs14, which are located in genomic 3′ and 5′ UTRs, respectively. Some of the viruses that use 3′ CITEs or IRESs, or that rely on translational recoding mechanisms, also require associated long-range RNA–RNA interactions.

3′ CITE-mediated translation. 3′ CITEs have been identified in several genera of the virus family Tombusviridae, as well as in the related genera Luteovirus (family Luteoviridae) and Umbravirus (family unclassified)5,6. These positive-strand RNA genomes lack both 5′ caps and 3′ poly(A) tails and instead rely on 3′ CITEs for protein translation. Several different structural classes of 3′ CITE have been uncovered, including a 68 nucleotide RNA stem–loop15 and more complex multihelix pseudoknot16,17 or tRNA-shaped structures18,19. Most 3′ CITEs function by binding to the ribosome-recruiting eukaryotic translation initiation factor 4F (eIF4F) complex via one or both of its eIF4G and eIF4E subunits15,16,20,21,22, whereas others bind directly to ribosomal subunits23,24. Thus, 3′ CITEs can recruit ribosomes by either direct or indirect means. Despite their structural and functional diversity, 3′ CITEs share a common relative position near to, or within, the 3′ UTRs of viral genomes. Paradoxically, this 3′-proximal position places them, and the recruited translational machinery, at the opposite end to the site of translation initiation. Consequently, many of these viruses use long-range RNA-based interactions to relocate 3′ CITEs that are bound to the translational machinery to positions that are close to their cognate 5′ UTRs.

In the genome of barley yellow dwarf virus (BYDV; genus Luteovirus)25, complementary sequences that are located in the 5′ UTR and 3′ CITE form a long-distance RNA–RNA bridge via a base-pair-mediated kissing-loop interaction25. The importance of this interaction for viral translation was confirmed by compensatory mutational analysis, whereby disruption, and then restoration, of base-pairing potential by substitutions correlated with low and high levels of translational activity, respectively25. Similar functional base-pairing interactions between 5′ UTRs and 3′ CITEs have also been shown for different members of the genus Tombusvirus, including maize necrotic streak virus (MNeSV), carnation Italian ringspot virus (CIRV) and tomato bushy stunt virus (TBSV)15,22,26,27,28, as well as for saguaro cactus virus of the genus Carmovirus29. These findings have led to a general model for 3′ CITE activity, in which simultaneous binding of the 3′ CITE to both the eIF4F complex and the 5′ UTR enables the eIF4F-mediated recruitment of the 40S ribosomal subunit to the 5′ end15,16,21,25,27,30 (Fig. 2a). In vitro studies with the MNeSV 3′ CITE have confirmed the formation of the proposed tripartite 5′ UTR−3′ CITE–eIF4F complex and its requirement for efficient ribosome recruitment to the 5′-proximal start codon15.

Figure 2: Translation initiation regulated by long-distance interactions.
figure 2

a,b | 3′ cap-independent translational enhancer (3′ CITE)-mediated translation. a | General 3′ CITE model showing the long-range interaction (indicated by the double-headed arrow) that positions the eukaryotic translation initiation factor 4F (eIF4F)-bound 3′ CITE close to the 5′ end of the genome, where it enables eIF4F-mediated recruitment of the 40S ribosomal subunit to the 5′ end to initiate translation. Interacting sequences are shown in green. b | Dual 3′ CITE model. Pea enation mosaic virus (PEMV) contains two different types of 3′ CITE: a panicum mosaic virus-like translational enhancer (PTE), which is located in the 3′UTR and binds to eIF4F; and a kissing-loop T-shaped structure (kl-TSS), which is positioned immediately upstream and binds directly to the 60S ribosome subunit and mediates long-distance RNA–RNA base pairing with a 5′-proximal hairpin. c–e | Internal ribosome entry site (IRES)-mediated translation. c | The foot-and-mouth disease virus (FMDV) IRES is stimulated by the 3′ UTR, which engages in long-range contacts with two regions in the 5′ UTR: the IRES and a region that has been shown to be involved in genome replication, which is known as the S-region. The specific sequences that are involved in this interaction have not been identified and the interaction with the S-region may modulate genome replication. d | The 3′-terminal hexamer CGGCCC in classical swine fever virus (CSFV) is a negative modulator of IRES-mediated translation and may confer its inhibition by pairing with a ribosome-binding region in the IRES, thus blocking ribosome binding. e | Hepatitis C virus (HCV) IRES activity is negatively regulated by an interaction between helix IIId of the IRES and a bulge in the structure 5BSL3.2, which is located in the coding region of non-structural protein 5B (NS5B). The same bulge in 5BSL3.2 also interacts with a genomic sequence around position 9110 of the genome, and the terminal loop of 5BSL3.2 can pair with the 3′ SL2 element located in the 3′ UTR, which may modulate genome replication. These interactions may coordinate viral translation and genome replication.

PowerPoint slide

3′ CITE-dependent translation is more complicated in pea enation mosaic virus (PEMV; Umbravirus)31. PEMV contains two different types of 3′ CITE: a panicum mosaic virus-like translational enhancer (PTE), which is located in its 3′ UTR and binds to eIF4F16,17, and a kissing-loop T-shaped structure (kl-TSS), which is positioned immediately upstream of the PTE and binds directly to 60S ribosome subunits24 (Fig. 2b). Accordingly, both direct and indirect modes of ribosome recruitment probably occur in PEMV. However, unlike the PTEs that have been identified in other viruses29, the PTE in PEMV does not interact with the 5′ end of the viral genome16. Instead, the adjacent kl-TSS engages in a long-distance interaction with a 5′-proximal hairpin, thereby uniting the terminal regions31 (Fig. 2b). The specific contribution of each of the 3′ CITEs to the PEMV translational process remains to be fully determined; however, it is clear that the kl-TSS-mediated long-range interaction could be beneficial to the activity of both of the 3′ CITEs by repositioning them close to the site of translation initiation.

IRES-mediated translation. IRESs are structured RNA elements that recruit ribosomes — either directly or with the assistance of cellular proteins — to the vicinity of a start codon14. Certain viral IRESs are modulated by regions that are far downstream of translation initiation sites. For example, the 5′-uncapped but 3′-polyadenylated genome of the picornavirus foot-and-mouth disease virus (FMDV) contains a 5′ IRES that is positively regulated by interactions with the genomic 3′ UTR32. In vitro, this 3′ UTR participates in two long-range interactions with the 5′ UTR — one with the IRES and the other with a 5′-terminal S-region that is involved in genome replication33 (Fig. 2c). Although the specific sequences involved were not identified, the interaction of the 3′ UTR with the IRES was found to be independent of its interaction with the S-region, and the two interactions could not form simultaneously. Therefore, the detected contacts could potentially modulate both translation and genome replication.

Negative regulators of viral IRES activity have also been identified. The genome of the pestivirus classical swine fever virus (CSFV) lacks both a 5′ cap and a 3′ poly(A) tail, and the 3′ UTR inhibits the translational activity of the IRES in the 5′ UTR34. The negative regulatory sequence that affected IRES activity was mapped to a 3′-terminal RNA hairpin that ends with CGGCCC-OH. This terminal sequence was also found to be complementary to a sequence that is located in the ribosome-binding region of the IRES (Fig. 2d), which suggests that a CGGCCC–IRES base-pairing interaction inhibits ribosome recruitment34.

Regulation in hepatitis C virus (HCV) is more complex and involves a network of RNA–RNA interactions. The HCV genome does not contain a 5′ cap or a 3′ poly(A) tail but instead has an IRES that binds directly to the 40S ribosomal subunit. IRES activity is downregulated by a long-range RNA–RNA interaction35,36,37 that occurs between the apical loop of helix IIId in the IRES and a bulge in an essential 3′-proximal cis-acting replication element in the coding region of the non-structural protein 5B (NS5B), which is known as 5BSL3.2 (Ref. 35) (Fig. 2e). Interestingly, the same 5BSL3.2 bulge sequence also mediates genome replication by interacting with a nearby upstream sequence located around nucleotide 9110 (Refs 38, 39, 40, 41) (Fig. 2e). Since the two interactions are equally probable in a thermodynamic context41, shifting of the conformational equilibrium between the two interactions could regulate viral translation and genome replication39,41,42. In addition, there is both genetic and structural evidence that the terminal loop of 5BSL3.2 base pairs with the 3′ SL2 element, which is an RNA element that is located in the 3′ UTR and is involved in genome replication39,40,41,43 (Fig. 2e). Accordingly, 5BSL3.2 functions as a central hub for a network of interactions that collectively modulate IRES-mediated translation and genome replication.

Intriguingly, some uncapped and non-polyadenylated virus genomes use both a 5′ IRES and a 3′ CITE; for example, the plant nepovirus blackcurrant reversion virus (order Picornavirales) uses a hybrid 5′ IRES−3′ CITE-mediated translation mechanism and also requires a long-distance interaction between the 5′ and 3′ genomic ends for optimal translation44,45,46. Although the details of the individual or combined functions of the 5′ IRES and 3′ CITE remain to be determined, it was suggested that this terminal interaction might help to facilitate the re-recruitment of terminated ribosomes46. Indeed, this potential function in ribosome recycling is also applicable to some of the above examples in which 5′−3′ interactions increase translational efficiency.

Translational recoding. Recoding via stop codon readthrough or ribosomal frameshifting leads to the production of carboxy-terminally extended proteins. In certain viruses, functional long-range base-pairing interactions were found to be required for both of these types of recoding events47,48,49,50, and the involvement of such interactions is particularly prevalent in positive-strand RNA plant viruses12.

The most common form of ribosomal frameshifting involves a small proportion of elongating ribosomes moving backwards one base and then resuming translation in the new −1 reading frame12. This process is facilitated by a 'slippery' heptanucleotide sequence at the frameshift site and a stimulatory RNA structure that is located a few nucleotides downstream12 (Fig. 3a). In addition, in the plant viruses BYDV and red clover necrotic mosaic virus (RCNMV; genus Dianthovirus), the efficient −1 frameshifting that produces their viral RdRps requires base pairing between their proximal stimulatory RNA structures and complementary sequences that are located 4,000 and 2,500 nucleotides downstream, respectively47,48. In BYDV, a bulge in the stimulatory RNA structure next to the frameshift site interacts with the terminal stem–loop of an RNA hairpin near to the 3′ end of the genome (Fig. 3a), and a similar interaction occurs in RCNMV48. In addition to mediating frameshifting, the interaction is also proposed to assist in the coordination of translation and negative-strand synthesis, which are directionally opposed processes for an RNA genome47,48.

Figure 3: Translational recoding facilitated by long-distance interactions.
figure 3

a | Linear representation of the barley yellow dwarf virus (BYDV) RNA genome, which shows coding regions as cylinders. p39 and p60 correspond to proteins of 39 kDa and 60 kDa, respectively, which are encoded in two separate ORFs in different reading frames. p39 is produced when no ribosome frameshifting occurs. However, when a −1 frameshift occurs near the very 3′ end of the p39 ORF, ribosomes are shifted into the p60 reading frame and translate the p60 ORF as a carboxy-terminal extension of p39. The resulting frameshift protein is the viral RNA-dependent RNA polymerase (RdRp), which is approximately 99 kDa. Frameshifting is stimulated by a long-range interaction (double-headed arrow) between a bulge in an RNA structure that is near to the frameshift site and the terminal loop of a 3′-proximal stem–loop. Interacting sequences are shown in green. b | Linear representation of the carnation Italian ringspot virus (CIRV) RNA genome, which shows readthrough translation of its RdRp. Ribosomes that initiate at the 5′ end of the genome normally terminate at the stop codon at the end of the p36 ORF, producing a protein of 36 kDa. Translational readthrough of the p36 stop codon produces a C-terminally extended readthrough protein of 95 kDa (p95), which is the viral RdRp. Readthrough requires base pairing between the proximal readthrough element (PRTE) that is located near to the stop codon and the 3′-proximal distal readthrough element (DRTE). The DRTE 3′ sequence is associated with one of two mutually exclusive RNA conformations. The SL-T conformation facilitates readthrough and prevents genome replication, whereas the alternative conformation, which contains SL-2, promotes replication and inhibits readthrough. These two conformations thus represent a type of RNA switch that probably coordinates translation and replication.

PowerPoint slide

The interactions between the stimulatory RNA structures and the 3′-proximal RNA hairpins in BYDV and RCNMV are presumed to occur intramolecularly; however, there has been a recent report of a −1 ribosomal frameshifting event that is enhanced by an intermolecular genomic interaction49. In the severe acute respiratory syndrome coronavirus (SARS-CoV), such an interaction involves a palindromic loop sequence in a local pseudoknot that is positioned just downstream of the frameshift site. The palindromic loop sequences in two SARS-CoV genomes form a kissing-loop structure that increases frameshifting efficiency in vitro. Mutations that disrupted the base pairing abolished dimerization, reduced frameshifting and inhibited the accumulation of viral RNA in infected cells49. The disruption also affected the ratio of genomic RNA to sgmRNA levels and growth kinetics, which suggests that this intramolecular interaction has a genuine regulatory role in the viral life cycle49.

Another common viral recoding strategy is stop codon readthrough12, whereby, instead of ribosome termination, the stop codon is decoded as a sense codon. Translation then proceeds in the original reading frame, which results in an extended protein that is produced at a low frequency. As with frameshifting, the efficiency of codon readthrough is typically influenced by RNA sequences and structures that immediately surround the stop codon12. In the plant tombusvirus CIRV, stop codon readthrough generates the viral RdRp, and this process requires a long-distance interaction between an RNA structure that is immediately downstream of the readthrough site (which is known as the proximal readthrough element (PRTE)) and a sequence in the 3′ UTR (which is known as the 3′-proximal distal readthrough element (DRTE))50 (Fig. 3b). The DRTE is associated with one of two mutually exclusive stem–loop structures, SL-T and SL-2, of which SL-2 is essential for genome replication50. Formation of SL-T positions the complementary 3′ sequence in its terminal loop, which facilitates the establishment of the long-distance interaction that improves translational readthrough and simultaneously inhibits genome replication by precluding the formation of SL-2. Conversely, the SL-2-containing conformation promotes genome replication and impedes readthrough (Fig. 3b). On the basis of these observations, SL-T and SL-2 were proposed to function as an RNA switch that assists in the coordination of translation and replication50.

Viral genome replication

The replication of positive-strand RNA virus genomes occurs via the synthesis of a complementary negative-strand RNA, which is subsequently used as a template for the production of progeny positive-strand RNA genomes (Fig. 1). This process is catalysed by a virally encoded RdRp and is assisted by viral and host proteins. The initiation of negative-strand synthesis involves the RdRp accessing the 3′ terminus of a genome, and RNA sequences and structures that facilitate this are usually located near to the 3′ end. However, there is compelling evidence that RNA elements that are considerably distal to 3′ ends can also influence the efficiency of complementary strand production51,52.

Flavivirus genome replication. Several members of the genus Flavivirus, including dengue virus (DENV), West Nile virus (WNV) and yellow fever virus (YFV), require genome circularization, which is mediated by base-pairing interactions between sequences in their genomic termini, for replication51,53,54,55,56,57,58,59,60,61. For DENV, three different complementary sequences are involved in these interactions; these sequences are known as the cyclization sequence, the upstream of AUG region (UAR) and the downstream of AUG region (DAR) (Fig. 4a). The resulting RNA circularization has been observed directly by atomic force microscopy (AFM) in the absence of proteins, which shows that circularization can be entirely RNA-based56 (Fig. 4a). In addition, structural analysis by chemical probing also supports protein-independent interactions between the 5′ and 3′ termini of the genome62. However, although circularization can occur autonomously, protein factors might assist in the process, as both the flavivirus core protein and NS3 helicase have been shown to mediate 5′- and 3′-end base pairing of genomic RNA in vitro63,64. The observed circularization is required for flavivirus genome replication because RdRp binds to an RNA stem–loop in the 5′UTR, which positions RdRp 11 kb upstream of the 3′ terminus where negative-strand synthesis initiates65,66. The long-distance RNA–RNA interaction between the genomic termini thus repositions the RdRp to the 3′ end, where it can commence initiation65 (Fig. 4a).

Figure 4: Viral genome replication directed by long-distance interactions.
figure 4

a | A simplified representation of relevant RNA secondary structures in the dengue virus (DENV) RNA genome. Genome replication requires three sets of long-distance interactions: 5′−3′ upstream of AUG region (UAR), 5′−3′ downstream of AUG region (DAR) and 5′ cyclization sequence−3′ cyclization sequence (CS). Formation of these interactions results in genome circularization, which repositions the 5′-bound RNA-dependent RNA polymerase (RdRp) to the 3′ end of the genome, where it initiates negative-strand synthesis. Interacting sequences are shown in green. The formation of the alternate regulatory structure small hairpin (sHP; shown in red) inhibits circularization. A circularized version of the genome, as observed by atomic force microscopy (AFM), is shown, with the 5′−3′ base-paired region denoted by a white arrow65. b | A simplified representation of the RNA secondary structures in the tomato bushy stunt virus (TBSV) RNA genome that are involved in genome replication. The 3′ cap-independent translational enhancer (3′ CITE) region is also included, as the downstream linker is located immediately upstream of it. The RdRp interacts with its auxiliary replication protein p33, which binds as a dimer to an internal RNA element, RII. A long-distance interaction that involves the upstream linker and downstream linker (UL–DL) sequences brings RdRp, which is associated with RII close to the 3′ end of the genome, where it can initiate negative-strand synthesis. The UL–DL interaction also mediates the formation of a RII–RIV RNA platform that facilitates the assembly of the viral replicase complex. The AFM image in part a is reproduced, with permission, from Filomatori, C. V. et al. A 5′ RNA element promotes dengue virus RNA synthesis on a circular genome. Genes Dev. 20, 2238–2249 (2006) © Cold Spring Harbor Laboratory Press.

PowerPoint slide

DENV genome circularization can be modulated by localized regulatory elements, such as a conserved RNA pseudoknot that is located adjacent to the cyclization sequence in the 5′ UTR67. Moreover, it was recently shown that the DENV genome requires a specific balance between circularized and linear (that is, non-circularized) conformations68. Parts of the 3′ UAR and 3′ DAR sequences can fold into a small local RNA hairpin in the 3′ UTR, which is known as sHP, and the formation of sHP inhibits the interaction with 5′-proximal partner sequences68 (Fig. 4a). Virus replication was found to be sensitive to mutations that altered the natural balance between local sHP formation and long-range pairing, which indicates that a defined ratio between circularized and linear conformations is necessary for viability68. The regulatory function of sHP might be even more complex, as only base pairing of sHP is important for replication in mammalian cells, whereas, unexpectedly, both base pairing and sequence identity are important in mosquito cells69.

Although the requirement for genome circularization in flavivirus replication is now generally accepted, it should be noted that an alternative model has recently been proposed, whereby the 5′ and 3′ complementary sequences would function in trans and generate dimers and/or oligomers of flavivirus genomes70. The formation of such concatamers would presumably be concentration-dependent and could have regulatory effects that differ from those of circularized monomers70. Although such alternative pairing scenarios are theoretically feasible, their existence and possible biological relevance remain to be investigated.

Tombusvirus genome replication. Similarly to flaviviruses, RdRp of the tombusvirus TBSV associates with the viral genome far upstream of the 3′ terminus. In this case, RdRp forms a complex with its auxiliary replication protein, p33, which binds specifically to an internal RNA element, known as RII, that is located more than 3 kb upstream of the 3′ end71,72 (Fig. 4b). The 3′ terminus of the genome contains an RNA element known as RIV, which is essential for genome replication. RII and RIV are united by a long-range base-pairing interaction that occurs between an upstream linker sequence, which is 3′-proximal to RII, and a partner downstream linker sequence, which is near to the 3′ terminus52 (Fig. 4b). In addition to facilitating 3′ end access to the RdRp, the association between these linker sequences generates a bipartite RII–RIV RNA platform that is necessary for the assembly of the replicase complex, which is composed of viral and host proteins52,73. As RII-like internal replication elements are also present in other viruses of the Tombusviridae family, it is probable that other members of this family might also require similar long-range intragenomic interactions for genome replication74.

Viral sgmRNA transcription

Many positive-strand RNA viruses are polycistronic, which means that they encode multiple viral proteins within a single genome segment. ORFs that are located downstream are usually not efficiently translated, owing to poor ribosome access Thus, to enable the robust expression of these proteins, these viruses transcribe smaller viral sgmRNAs in which the downstream ORFs are relocated to 5′-proximal positions, which thereby enables efficient ribosome access (Fig. 1). The mechanism that is involved in the generation of sgmRNAs depends on the virus, and some viruses that use discontinuous template synthesis or premature termination mechanisms require long-range RNA–RNA interactions.

Coronavirus sgmRNA transcription. Coronavirus genomes are extraordinarily large — they can be up to 32 kb in length — and expression of their 3′-proximal genes depends on sgmRNA transcription75. These sgmRNAs consist of a common 5′-terminal region (known as the leader) that is fused to a variable 3′-terminal segment (known as the body). They are transcribed from complementary negative-strand templates, which are generated by a discontinuous template synthesis mechanism76 (Fig. 5a). During negative-strand synthesis, RdRp dissociates from the positive-strand genome at specified locations for each sgmRNA (which defines the body segment) and then reprimes on the template within the 5′ UTR (where it copies the common leader sequence). The positions of RdRp release and repriming are guided by transcription-regulating sequences (TRSs)76. The discontinuous RNA template is then used to transcribe corresponding sgmRNAs, which contain a common 5′ leader that is connected to different 3′-terminal body sections that encode the different viral ORFs. In transmissible gastroenteritis virus (TGEV), two long-distance interactions control the transcription of the sgmRNA that encodes the nucleocapsid protein (sgmRNA-N)77,78,79 (Fig. 5a). One of these interactions spans almost 26 kb, which is the longest that has been reported so far; it forms between complementary sequences known as the B-motif (BM) and the complementary BM (cBM), which are located upstream of the TRS for sgmRNA-N (TRS-N) and downstream of the leader TRS (TRS-L), respectively79. This interaction, in combination with a shorter-range interaction between a distal element and a proximal element (DE–PE), brings the two TRS elements into close proximity, which promotes efficient RdRp transfer77,78,79.

Figure 5: sgmRNA transcription mediated by long-range interactions.
figure 5

a | The upper panel shows a linear representation of the transmissible gastroenteritis virus (TGEV) RNA genome with different encoded viral proteins, 1A, 1B, S, 3a, 3b, E, M, N and 7. The relative positions of long-range interactions that are involved in the transcription of the subgenomic mRNA (sgmRNA) encoding the viral nucleocapsid protein (sgmRNA-N) are indicated (double-headed arrows; complementary B-motif–B-motif (cBM–BM) and distal element–proximal element (DE–PE)). Transcription of sgmRNA-N involves discontinuous synthesis (dashed arrow) of a negative-strand RNA that contains sequences from the body (purple) region, which is located at the 3′ end of the genome, and leader (orange) region, which is located at the 5′ end of the genome. The RNA-dependent RNA polymerase (RdRp) repriming step is guided by the transcription-regulating sequences (TRS; red), and the negative strand that is generated is used as a template for sgmRNA-N transcription. In the lower panel, a simplified RNA secondary structure shows the two interactions, cBM–BM and DE–PE (green), bringing the TRS-N and TRS-leader (TRS-L) into close proximity to mediate RdRp repriming. b | The upper panel is a linear representation of the tomato bushy stunt virus (TBSV) genome, which shows the relative positions of long-range interactions that are involved in sgmRNA transcription. The transcription pathway that leads to sgmRNA2 production is shown on the right-hand side. The negative-strand template is generated by premature termination of RdRp, which is caused by an attenuation signal in the genome that is formed by a long-range interaction between an activator sequence and a receptor sequence (AS2–RS2). The lower left-hand panel shows a simplified RNA secondary structure that depicts the interactions that are involved in transcription of both sgmRNA1 and sgmRNA2. The long-range interactions AS1–RS1 and AS2–RS2 constitute the attenuation signals for sgmRNA1 and sgmRNA2, respectively. SgmRNA2 transcription requires an additional interaction between a distal element and a core element (DE–CE).

PowerPoint slide

Tombusvirus sgmRNA transcription. An alternative mechanism for the production of sgmRNAs involves the premature termination of the viral RdRp during negative-strand synthesis of the genome. This results in the generation of a 3′-truncated negative-strand that is then used as a template for the synthesis of positive-strand sgmRNAs1,80 (Fig. 5b). Premature termination is facilitated by an RNA attenuation signal in the positive-strand genome, which is formed by a base-paired RNA segment that is located just upstream of the sgmRNA start site. This signal functions as a physical 'roadblock' that induces polymerase termination. Notably, the attenuation signals that promote the transcription of sgmRNA1 and sgmRNA2 in TBSV are formed by long-range interactions that involve activator and receptor sequences, which are known as AS1–RS1 and AS2–RS2, respectively80,81 (Fig. 5b). AS1–RS1 spans 1.1 kb, whereas AS2–RS2 spans 2.2 kb and requires an auxiliary 1.0 kb long-range interaction between a distal element and a core element (DE–CE)82,83. Interestingly, the identity of the nucleotides that form the attenuation signals is not important83; however, the stability of the base-paired segments was found to be important84.

Nodavirus and dianthovirus sgmRNA transcription. sgmRNA transcription in the bisegmented insect nodavirus flock house virus (FHV) is also likely to occur via a premature termination mechanism. A sequence that is located in the central region of its larger genome segment interacts with two different downstream sequences, one of which is located more than 1.4 kb away and is positioned directly in front of the sgmRNA transcription start site85. The double-stranded RNA structure that forms ahead of the initiation site is probably functionally comparable to the attenuation signals in TBSV.

It has also been proposed that another bisegmented virus — the dianthovirus RCNMV — uses a premature termination mechanism for transcription86. However, for RCNMV, the attenuation signal forms in trans between two complementary sequences that are located separately in the RNA1 and RNA2 genome segments. During infections, the increase in the concentrations of the two genome segments promotes the formation of this bimolecular interaction, which, in turn, activates sgmRNA transcription. Accordingly, it was proposed that this interaction provides a concentration-dependent mechanism to temporally synchronize the transcription of the capsid protein-encoding sgmRNA with the accumulation of the two genomic segments that are co-packaged86.

Roles and regulation of long-range interactions

As described in the above sections, long-range interactions have diverse functions. These include the relocalization of bound proteins to a distal genomic location (for example, repositioning 3′ CITE-bound factors near to 5′ termini or repositioning 5′-proximally bound RdRps near to 3′ termini), the generation of a bipartite RNA platform for the assembly of protein complexes (for example, the RII–RIV RNA platform that is used for tombusvirus replicase complex assembly), the colocalization of two RNA elements that require proximity for function (for example, TRS elements involved in RdRp repriming in coronavirus transcription) and the formation of RNA structures that directly regulate a viral process (for example, the double-stranded attenuation signals that direct the premature termination of RdRps during sgmRNA transcription). In some cases, the need for a long-range interaction is clear, such as in relocating proteins; however, in other cases, the requirement is less obvious. For example, the attenuation signals that are involved in sgmRNA transcription in tombusviruses can be functionally replaced by local RNA hairpins, which indicates that the long-distance interactions are not essential83. Interestingly, some viruses that are related to tombusviruses (for example, carmoviruses and necroviruses) use local, rather than long-range, transcription-attenuation signals87,88,89. Thus, the differences that are observed could simply reflect the random nature of the emergence of local versus long-range interactions during virus evolution. Nevertheless, long-range interactions may provide yet-to-be discovered regulatory advantages that could be mediated via genome-level RNA rearrangement.

Long-distance base-pairing interactions can be regulated by several mechanisms. The most basic strategy is to modulate the stability of the base-paired region by altering the composition and/or number of nucleotides that are involved. The presence of competing RNA structures provides an additional mechanism, which is exemplified by the RNA switches that regulate the interactions that are required for translation and/or replication in HCV40, replication in DENV68 and readthrough in CIRV50 (as described in the previous sections). Furthermore, long-range interactions might also be regulated by proteins that could facilitate or prevent the formation of an interaction and/or destabilize or stabilize an interaction. In line with such concepts, the flavivirus core protein and NS3 helicase have been shown to facilitate circularization in vitro63,64, and it seems probable that proteins also modulate some of the interactions in other systems. Interactions can also be mediated by several different contacts, as has been reported for flavivirus circularization53,54,55,56,57,58,59,60,61, and this suggests that cooperative effects and different regulatory mechanisms for individual contacts could also have a role60.

Intramolecular interactions are not generally predicted to be influenced by the local genome concentration; however, as crowding agents have been shown to increase the folding of small RNAs in vitro90, it is possible that high concentrations of viral RNAs in vivo could also influence the formation or stabilization of intramolecular long-range interactions. Alternatively, if some of the proposed cis-interactions do actually occur in trans, as has been suggested70, then genome abundance would clearly be an additional mode of control (as has been proposed for sgmRNA transcription in RCNMV86 and for frameshifting in SARS-CoV49). However, a dependence on cis-only interactions could be advantageous, as it could provide a form of quality control for genetic completeness to viruses that use intragenomic interactions by selecting for viral genomes that maintain the interacting sequences and presumably intact intervening sequences4,65.

Whole-genome context of long-range interactions

Although we are beginning to understand the function and regulation of long-range interactions, it remains unclear how they are able to function in the complex context of viral RNA genomes that have multiple functions. Some interactions involve overlapping sequences and are therefore mutually exclusive, whereas others promote processes that are opposed either physically (such as translation and replication) or temporally (such as replication and encapsidation). Accordingly, proper coordination of these interactions must be essential for their function. This regulation would be particularly relevant for viruses that have multiple long-range interactions, such as tombusviruses, which have at least six different functional long-distance interactions, all of which span distances of 1 kb or more (Table 1). In such cases, the global structure of the viral genome must have features that enable it to form each of the different interactions at the correct time. Thus, genomes must be dynamic and able to adopt alternative conformations, which would be influenced by the intrinsic features of the RNA and its environmental context. Indeed, the structure of the viral genome is likely to be distinct during different steps of the viral life cycle — for example, when the genome is encapsidated, has just been released from its capsid, is being translated, is being replicated or transcribed or is undergoing packaging. Other events, such as co-replicative 5′-to-3′ folding of the genome during its synthesis, would also influence initial and ultimate structures and the ability to form different long-range interactions10. The factors that govern structural transitions at the genomic level are therefore of great interest.

The study of diverse genome states under different in vitro and in vivo conditions will assist in gaining an understanding of the complex coordination of alternative conformations. Ideally, long-range interactions should be studied in their natural genomic contexts, and a logical first step would be to obtain information about the secondary structure of viral RNA genomes. These studies are now possible, owing to technical advances, such as high-throughput SHAPE structural mapping (selective 2′-hydroxyl acylation analysed by primer extension structural mapping)91,92. This chemical probing method provides information about nucleotide flexibility at each position in an RNA, which positively correlates with the likelihood that a residue is single-stranded in the structure. The SHAPE reactivity data can be incorporated into a thermodynamic-based RNA secondary structure-predicting program as a pseudo-free energy parameter to improve model prediction91,92. Using this approach, the global organization of the 1,058 nucleotide-long satellite tobacco mosaic virus (STMV)93,94 RNA genome and the 4,778 nucleotide-long TBSV RNA genome95 have been predicted, and the results have provided insights into the genomic contexts of long-range interactions (Box 1).

Conclusions and perspectives

In this Review, we have discussed examples that highlight the important and varied roles of long-range RNA–RNA interactions in fundamental viral processes, including the translation of viral proteins, the replication of the viral genome and the transcription of sgmRNAs. It is clear that substantially different types of positive-strand RNA viruses use this distinctive regulatory strategy. These viruses have evolved to integrate long-range interactions within their genomes in a manner that provides them with mechanisms to regulate a diverse array of viral functions. Indeed, it is quite remarkable that these relatively simple interactive structural features are able to carry out such a broad range of structure-based functions. Importantly, these interactions provide the viruses with a unique opportunity for regulation that is linked to genome structure; this, in turn, may provide benefits in addition to those of local RNA elements. Many important advances have been made in understanding the structure and function of long-range interactions. Unfortunately, not all interactions are amenable to the available investigative methodologies. For example, interactions that are transient or that require special conditions in order to form may be difficult to detect biochemically and interactions that have unknown, functionally relevant, alternative base-pair partners may be intractable to genetic techniques, such as compensatory mutational analyses. Regardless of the challenges, the interactions that are supported by both biophysical and genetic evidence should be viewed as having increased credibility. For the future, additional research and technological advances are needed to expand on existing findings and to clarify open questions; for example, it will be crucial to determine the detailed structures of different interactions and to establish how these structures influence activity. Other important challenges will be to identify the dynamics and energetic barriers of transitions and folding pathways. In addition, future research is needed to determine the regulatory advantage of long-range interactions, how these interactions are integrated and coordinated within viral genomes and whether any of these interactions occur in trans. Moreover, it will be crucial to investigate the possible involvement of viral and/or host proteins in regulating interactions and how the cellular environment affects interactions. An increased understanding of long-range interactions will also help to determine whether such interactions are plausible targets for antiviral therapies. Finally, another area for future consideration extends beyond viral contexts to cellular messages: how common are functional long-range RNA–RNA interactions in cellular mRNA biogenesis and function? A limited number of recent reports suggest that these structures can indeed participate in the regulation of pre-mRNA splicing96 as well as in the control of eukaryotic97 and bacterial98 mRNA translation. Consequently, the phenomenon that is observed in viruses may foreshadow a similar prevalence and diversity of functional long-range RNA–RNA interactions in cellular mRNAs.