Abstract
Reassortment of the Rotavirus A (RVA) 11-segment dsRNA genome may generate new genome constellations that allow RVA to expand its host range or evade immune responses. Reassortment may also produce phylogenetic incongruities and weakly linked evolutionary histories across the 11 segments, obscuring reassortant-specific epistasis and changes in substitution rates. To determine the co-segregation patterns of RVA segments, we generated time-scaled phylogenetic trees for each of the 11 segments of 789 complete RVA genomes isolated from mammalian hosts and compared the segments’ geodesic distances. We found that segments 4 (VP4) and 9 (VP7) occupied significantly different treespaces from each other and from the rest of the genome. By contrast, segments 10 and 11 (NSP4 and NSP5/6) occupied nearly indistinguishable treespaces, suggesting strong co-segregation. Host-species barriers appeared to vary by segment, with segment 9 (VP7) presenting the least conservation by host species. Bayesian skyride plots were generated for each segment to compare relative genetic diversity among segments over time. All segments showed a dramatic decrease in diversity around 2007 coinciding with the introduction of RVA vaccines. To assess selection pressures, codon adaptation indices and relative codon deoptimization indices were calculated with respect to common host genomes. Codon usage varied by segment with segment 11 (NSP5) exhibiting significantly higher adaptation to host genomes. Furthermore, RVA codon usage patterns appeared optimized for expression in humans and birds relative to the other hosts examined, suggesting that translational efficiency is not a barrier in RVA zoonosis.
1. Introduction
The high mutation rates and large population sizes of RNA viruses allow them to rapidly explore adaptive landscapes, expand host-ranges, and adapt to new environments. Segmented RNA viruses also may undergo ‘reassortment’ whereby viruses swap entire genome segments during coinfection (1). Reassortment may allow rapid evolution of specific viral traits such as, for example, the acquisition of novel spike glycoproteins during the emergence of H1N1 influenza A in 2009 (2). Similarly, reassortment among segmented dsRNA rotaviruses may have significant implications for human health (3), but it is challenging to determine the prevalence of rotavirus reassortment in nature. Our motivation here is to elucidate apparent restrictions (or lack thereof) to RVA genetic exchange in nature by comparing the relative linkage between each of the RVA segments as shown by phylogeny. In addition, we parse the evolutionary constraints that may contribute to the distinct phylogenies of each segment.
The rotavirus genome consists of 11 segments of double-stranded RNA, each possessing a single open reading frame, except for segment 11, which contains two genes. Rotaviruses are classified based on the antigenicity of the VP6 protein into groups A through I (4). The consensus is that viruses from different groups cannot reassort with one another (5), though some rare cross-group reassortment events appear to have occurred (6). While mammalian RVA strains routinely reassort with other mammalian RVA strains, reassortment between avian and mammalian RVA strains does not seem to occur outside of laboratories (7). Nevertheless, there have been cases of avian strains infecting mammals (8) and causing encephalitis (9), evidence of an avian RVA isolate with a mammalian VP4 gene (10), and some in vitro reassortment assays (7, 11), most recently confirming that mono-reassortants with avian segments 3 and 4 can be recovered using the SA11 reverse genetics system (12). Reassortment between RVA strains from different mammalian species is also less common; however, it is not clear whether this lack of reassortment is due to biological incompatibilities as opposed to genetic incompatibilities. Genotypes resulting from interhost-species reassortment, while rare, have not only occurred, but have fixed in populations (13-21), indicating that reassortment between even distantly related strains is potentially a significant driver of rotavirus evolution (22-25).
1.1 Rotavirus A Genome and Proteins
While the RVA genome is double-stranded, RVA RNA is packaged into and exits capsids as positive-sense, single-stranded RNA (+ssRNA). The RNA-dependent-RNA-polymerase, VP1, and the capping enzyme, VP3, are anchored to VP2 pentamers to form the replicase complex. This complex is present at each of the rotavirus capsid vertices through which +ssRNAs are released (26). The inner VP2 layer is enclosed by an intermediate layer composed of VP6 protein and outermost layer composed of the glycoprotein (VP7) and the multimeric spike protein (VP4) (27). For reassortment to occur, +ssRNAs from two or more parents must be packaged into the same virion.
In addition to the six structural proteins, the RVA genome also encodes five to six non-structural proteins on five separate segments. The NTPase (NSP2) is a +ssRNA binding protein that forms a doughnut-shaped octamer with a positively charged periphery, allowing +ssRNA to wrap around and bind within its grooves during genome packaging (28, 29). NSP2 protein interactions with the +ssRNA are therefore critical to stabilizing +ssRNA contacts (30). NSP2 also interacts with NSP5 to form viroplasms (along with VP1, VP2, VP3, V6 and NSP4) (31) where genome packaging, replication, assembly of progeny cores and double-layered particles (DLPs) occurs (32-34). To effect transmission, RVA encodes an enterotoxin, NSP4, which elevates cytosolic Ca2+ in cells. This Ca2+ elevation causes diarrhea in hosts (35-39). NSP4 also interacts with VP6 on DLPs during RVA production (40-43).
The non-structural protein encoded on segment 7, NSP3, hijacks cellular translation and is a functional analog of the cellular poly(A)-binding protein (44-46). NSP3 binds the group-specific consensus tetranucleotide * UGACC (44, 45, 47) located at the 3’ end of RVA ssRNA, which suggests inter-group reassortment does not occur. The protein NSP1 disrupts cellular antiviral responses (48) and may also play a role in host-range (49). Both segment 5 (NSP1), as well as segment 11 (NSP5/6) are especially prone to genome insertions (50-52).
When reassortment occurs, the resulting reassortant virus must maintain the many protein and RNA interactions required for efficient packaging and replication. Even if the reassortant virus is functional, it may still be outcompeted by other genotypes, go extinct due to transmission bottlenecks, or evolve compensatory mutations. These factors, along with the high genetic diversity of RVA strains, make predicting reassortment facility or barriers to reassortment difficult.
1.2 Selective Pressures on Synonymous Sites
RVA is the most common cause of diarrheal disease in young children and is also an important agricultural pathogen, particularly for cows and pigs. RVA vaccines RotaTeq and Rotarix have produced substantial selective pressure on circulating RVA strains since they were first administered on a large scale in 2006 and 2008 respectively. This selective pressure se ems to have favored certain genome constellations. Strains relevant to humans mostly consist of genogroup 1 (Wa-like) genes, genogroup 2 (DS-1-like) genes, or genogroup 3 (AU-1-like) genes. Specific G and P types also associated more with specific genogroups (53, 54).
Selective pressure on amino acid sequences to conserve protein interactions is a barrier to reassortment compatibility, but segmented viruses are also under considerable selective pressure on synonymous sites as RNA-protein and RNA-RNA interactions are critical in virus packaging (30, 39-42). RVA genome assortment requires +ssRNA molecules from each segment to complex with one another before being packaged, which relies on packaging signals on each segment.
Synonymous sites may also be under selective pressure for certain codon usage patterns. Codon usage bias results in varying levels of efficiency in translation, with highly expressed genes showing stronger bias for codons abundantly available in the host tRNA pool (55, 56), and marginally expressed genes displaying less codon bias (57). Codon usage adaptation to the host is a well-documented phenomenon in DNA viruses (56, 58, 59) and is observed in RNA viruses as well (60-66), though the constraints of secondary RNA structure, as well as the high rate of mutation in RNA viruses, may lower the relative effects of translational selection in RNA viruses. Codon bias is also explained by drift and mutational pressure (i.e., bias towards A/T/U or G/C mutations) as well as translational selection (67). Codon usage that is too similar to the host can also lower efficiency if viral proteins cannot fold properly (68), and differential codon adaptation can be a mechanism of controlling viral gene expression (69, 70).
Rotaviruses have been shown to exhibit codon bias (71, 72), but have especially divergent codon usage patterns from humans relative to other RNA viruses (73). Codon usage can be a potential hindrance to zoonosis if a virus infects a host that cannot efficiently translate the virus’ proteins. RVA’s broad host range and ability to undergo genetic exchange makes RVA’s zoonotic potential a cause for some concern.
We show that RVA’s segments have distinctive evolutionary histories, demonstrating the impact reassortment has had on mammalian RVA between the late 1950s to 2017. Because each RVA segment is under different selective pressure, we also tested whether there was evidence for translational selection on synonymous sites for each of the segments. To assess whether certain segments showed more codon adaptation against different common host genomes, indicating translational efficiency differences, we tested for neutrality in codon position 3, as well as variations in codon adaptive indices and relative codon deoptimization indices for each segment. To test whether RVA showed signs of codon optimization to a particular host, we compared strains isolated from specific hosts against their host genome and other RVA host genomes. We found differences in codon usage patterns between segments, with segment 11 (NSP5) having significantly higher codon adaptation to host genomes, however our study indicates codon usage is not a barrier to rotavirus zoonosis.
2. Materials and Methods
2.1 Sequence Alignment and Phylogenetic Analysis
From all complete RVA genomes in NCBI’s Virus Variation Resource, 789 complete mammalian rotavirus A genomes were chosen by excluding laboratory strains, avian strains, and to avoid sampling bias, any genomes where three or more genomes shared the same sequence identity for NSP4. To ensure that the analysis of each RVA segment employed the same set of strains, we filtered out the selected strains from files containing all available genomes for each segment in Python v3.9.2. The pooled sequences of each of the 11 RVA segments were independently aligned using MUSCLE v3.8.31 (74), and visually inspected for obvious sequencing errors or low-quality sequences. Sequences that were unusually long, short, or contained ‘N’ nucleotides were all removed.
We performed phylogenetic analyses using BEAST v1.10.4 (75). We used tip dating to calibrate molecular clocks and generate time-scaled phylogenies. The analyses were run under an uncorrelated relaxed clock model using a time-aware Gaussian Markov random field Bayesian skyride tree prior (76). For VP1, VP2, VP3, VP6 and VP7, the alignments were run using a GTR + Γ + I substitution model and partitioned by codon position. NSP1-NSP4 were further partitioned into non-translated and translated regions. VP4 was further partitioned by the VP5* and VP8* protein domains. Due to the large insertions in segment 11 (NSP5/6), this segment required three additional partitions based on insertion locations. Log files in Tracer v1.7.1 (77) were analyzed to confirm sufficient effective sample size (ESS) values, and trees were annotated using a 10% burn in. The alignments were run for three chains with a 500,000,000 Markov chain Monte Carlo (MCMC) chain length, analyzed on Tracer v1.7.1, and combined using LogCombiner v1.10.4 (78). Trees were annotated with a 10% burn-in and run in TreeAnnotator v1.10.4 (77). The best trees were visualized using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/), with the nodes labeled with posterior probabilities and node bars representing 95% confidence intervals for the divergence dates (SI Figure 1-11). Bayesian skyride plots were made in Tracer v1.7.1 to compare each segment’s changes in effective population size (used as a proxy for relative diversity) since the root of the tree (∼50 years prior to 2017 for most segments).
The clusters of points represent the geodesic distances between different trees. 350 randomly sampled post-burn-in trees were taken from each segment to account for phylogenetic uncertainty. Points sharing the same color are from the same segment as shown in the legend. Points closer to each other indicate close geodesic distances and high levels of evolutionary linkage.
We used the R package ‘distory’ to calculate the geodesic distances between segments. Geodesic distance uses topology and branch length to visualize the tree space of the 11 segments to determine which segments share a close evolutionary history. To account for phylogenetic uncertainty, 350 post-burn-in, randomly chosen trees were sampled from the BEAST v1.10.4 tree file for each segment. We applied multi-dimensional scaling using the R package, ‘treespace’ to determine whether the ‘time to the most recent common ancestor’ (TMRCA) was consistent between segments. The correlation coefficient of TMRCA estimates from all pairwise comparisons of the 11 segment trees was used to estimate tree-distance and then the matrix of tree distance was plotted. Variation in branch length between different segment trees was visualized as a cloud of points where the center represents the mean of several hundred trees. Segments that co-segregate overlapped in three-dimensional space, while segments that did not co-segregate were isolated in space.
2.2 Codon Bias Analysis Comparison
The relative codon deoptimization index (RCDI) was used to assess if the codon usage of a gene was similar to the codon usage of a reference genome (2,953 coding sequences for Sus scrofa (pig), 93,487 coding sequences for Homo sapiens (human), 6,017 coding sequences for Gallus gallus (chicken), 13,374 coding sequences from Bos taurus (cow), 1,194 coding sequences for Canis familiaris (dog), and 1,115 coding sequences for Oryctolagus cuniculus (rabbit)) (79). RCDI values range from 0.0 to 1.0 with 1.0 indicating maximum codon usage compatibility with a reference genome. Similarly, the codon adaptation index (CAI) was used as a measure of codon usage adaptation to the most used synonymous codons of a reference genome, and was used to predict the expression levels of genes (79). CAI values range between 0.0 and 1.0, where higher CAI values for a particular reference set indicates higher expression levels. To determine if there were significant differences in CAI/RCDI values between the different segments, we calculated CAI and RCDI values using the CAIcal server (http://genomes.urv.es/CAIcal/) for each segment using a subset of RVA isolates containing multiple representatives of common mammalian RVA genotypes (80). Statistical analyses were performed to assess whether RVA was genetically compatible in its codon usage patterns with a set of RVA host reference genomes. To reduce bias that would result from analyzing a small number of genes, the RVA host reference genomes were chosen based on the availability of a large number of genes analyzed for their codon usage tables in the Codon Usage Database (http://www.kazusa.or.jp/codon/). Reference sets for chicken, human, pig, dog, rabbit, and cow genomes were used for the analysis. We performed Tukey’s Honest Significant Difference (HSD) test to compare mean CAI and RCDI values between segments. We also used Tukey’s HSD tests to compare mean CAI and RCDI values between different host genomes after combining all segments’ values.
To test for neutrality at the third codon position, a neutrality plot was made by comparing the GC contents at the first, second and third codon positions. GC12 being the average of GC1 and GC2, was plotted against GC3. If GC12 and GC3 are significantly correlated to one another, and the slope of the regression line is close to 1, mutational bias is assumed to be primarily responsible for shaping codon usage patterns rather than translational selection. Selection against mutation bias can lead to larger differences in GC content between positions 1 and 2, and position 3 and little to no correlation between GC12 and GC3 (81).
RVA strains isolated from specific hosts (pig, human, cow and avian) were also compared to the pig, human, cow and chicken genomes to test whether RVA genomes showed evidence of adapting to specific hosts. If RVA genomes did not appear to differ in CAI and RCDI patterns based on host type, then it would suggest mutational selection was the dominating force over translational selection, and that selection at the translation level was weak or undetectable.
3. Results
3.1 Different Tree Space Occupied by the 11 Segments
Multi-dimensional scaling of the random, post-burn-in sampling of BEAST trees for each of the 11 segments revealed that segments 4 and 9 occupy distinct tree spaces from each other and from the rest of the genome (Fig. 1). Segments 10 and 11 occupied indistinguishable treespace indicating close geodesic distances and high levels of evolutionary linkage between them. Segments 3 (VP3), 5 (NSP1) and 6 (VP6) also shared highly overlapping treespace with one another. Segment 2’s (VP2) evolutionary history was most like segment 6 (VP6). None of the segments overlapped with segment 1 (VP1) except for segment 7 (NSP3). Segment 7 also had the most phylogenetic uncertainty of all 11 segments as shown by the larger spread around the plot of the post-burn-in trees in Fig. 1. The best-supported trees for segments 1-3 (Fig. 2), segments 4-6, 9 (Fig. 3) and segments 7,8, 10, and 11 (Fig. 4) with the host species coded on the tree and the branch lengths color coded by relative evolutionary rate (Fig. 2). While segment 4 had a more independent evolutionary history from other segments, its tree suggested this segment has stricter host boundaries than segment 9 (Fig. 3) indicating less opportunity for divergence due to selection responses to host species change.
The phylogenies are time-scaled using tip-dating. Scale bars below each tree represent branch length time in years. The branches are colored by rate. Cyan indicates the fastest evolutionary rate among all lineages, and black represents the slowest rate of evolution. Colored asterisks specify the host species the strain was isolated from as shown in the legend. Posterior probabilities, node bars for confidence intervals of the divergence dates and tip labels for the strain names can be viewed in SI Figures 1-3.
Phylogenies are time-scaled using tip-dating. Scale bars below each tree represent branch length time in years. The branches are colored by rate. Cyan indicates the fastest evolutionary rate among all lineages, and black represents the slowest rate of evolution. Colored asterisks specify the host species the strain was isolated from as shown in Figure 1. Posterior probabilities, node bars for confidence intervals of the divergence dates and tip labels for the strain names can be viewed in SI Figures 4-7.
Phylogenies are time-scaled using tip-dating. Scale bars below each tree represent branch length time in years. The branches are colored by rate. Cyan indicates the fastest evolutionary rate among all lineages, and black represents the slowest rate of evolution. Colored asterisks specify the host species the strain was isolated from as shown in Figure 1. Posterior probabilities, node bars for confidence intervals of the divergence dates and tip labels for the strain names can be viewed in SI Figures 8-11.
3.2 Evolutionary rates
Segment 8 (NSP2) displayed the lowest mean substitution rate (1.48 x 10−3 substitutions per site per year), while segment 4 (VP4) had the highest (3.77 x 10−3 substitutions per site per year). The mean rate for segment 11 was likely skewed higher due to the frequency of large insertions into the segment. Segments 1-3 showed similar evolutionary rate changes at corresponding clades and time periods in their trees, with the higher evolutionary rates occurring earlier in their evolutionary histories (Fig. 2) (for node confidence intervals at divergence dates see SI Figures 1-11). Higher evolutionary rates tended to be observed along branches leading towards non-human host isolates, particularly for VP7, which had more rate variation across the tree than the other segments (Fig. 3). For example, the Ailuropoda melanoleuca (giant panda) strain (represented by the green asterisk in Figures 2-4) possesses a genomic backbone that is clearly within a cluster of pig and cow strains, except for segment 9. The giant panda’s RVA segment 9 occupies a more divergent branch associated with a higher evolutionary rate than the rest of its segments.
Coefficients of variation (CoV) for each segment (Table 1) were consistently high, supporting the assumption that a strict molecular clock is inappropriate for this analysis, and that a relaxed clock is a better choice. All segments exhibited relatively similar TMRCA estimates with segments 1-3 possessing slightly older TMRCA dates than the other segments (Table 1). Segment 1 had the oldest TMRCA estimate (∼1957) of all the segments, while segment 11 had the most recent TMRCA estimate (∼1969). Node bars for 95% confidence intervals of the node divergence dates are shown in SI Figures 1-11 for each segment. Since this data set of 789 genomes includes strains collected from geographic areas across the world, it appears that most of the mammalian RVA genetic diversity present today has evolved in the past ∼60 years.
Mean rates, 95% highest posterior density (HPD), coefficient of variation, and date for the ‘time to most recent common ancestor’ (TMRCA) for each segment’s phylogenetic tree.
The Bayesian skyride plots indicate that RVA segments reached their peak diversity levels around the year 2000, with a steep decline around the year 2007, which coincides with the introduction and broad-scale use of RVA vaccination (Fig. 5).
The X-axis represents years before 2017, with 0 indicating ∼2017. The Y-axis represent effective population size and is a proxy for genetic diversity. All segments show a sharp decrease in diversity roughly 10 years before 2017 (2006-2008).
3.3 Differing codon usage patterns by segment
Compared to the other ORFs, NSP5 possessed significantly higher CAI scores and significantly lower RCDI scores across all host genomes (see SI Figure 12 for Tukey-Kramer 95% pair-wise confidence intervals). The wide range of values for each RVA segment suggests that, while there may be differences in selective pressure on codon usage by segment, translational selection was relatively weak compared to mutational selection for the rest of the genome.
Codon usage patterns for both mammalian and avian RVA appear more compatible with avian genomes than mammalian genomes (Fig. 6, 7), however human genomes and avian genomes showed similar compatibility between one another, with RVA genomes, and their CAI scores were not significantly different. RVA had higher CAI values and RCDI scores closer to 1 for both human and avian genomes relative to rabbit, cow, pig, and dog genomes. RVA compared against the pig genome resulted in the lowest CAI scores for all segments and strains. In other words, based on CAI and RCDI metrics, RVA is predicted to be least translationally efficient within pigs.
A. Sampled sequences representing common genotypes for each of the 11 segments were measured for CAI with respect to the six hosts shown. The Y-axis value ranges are different for each host as some hosts result in higher CAI values. Higher CAI values indicate more efficient expression. B. RCDI values for the same sampled sequences from A plotted for each segment with respect to each host. RCDI values closest to 1 indicate more similar codon usage patterns. The Y-axis value ranges are different for each host. Tukey-Kramer 95% pair-wise confidence intervals are shown in SI figure 12.
Values used for this test are the combined values for all segments used in Figure 6.
A neutrality plot revealed that NSP4 had the highest GC content in position 3 (GC3) relative to the other segments. VP6 had the highest GC content in positions 1 and 2 (GC12) relative to the other segments while NSP1 and VP3 had the lowest GC12 content. While VP6’s GC12 content was higher than the rest of the RVA genome, VP6’s GC3 content was not. The slopes for all regression lines deviated from 1 (Fig. 8), ranging from 0.043 (VP7) to 0.316 (NSP1), indicating that there was significant selective pressure on position 3 for codon usage for all segments. The slopes indicate NSP1 is under more mutational pressure than the other segments, while VP7 is under more selective pressure at position 3 compared with the other segments, however the lower GC3 values relative to GC12 in VP7 actually indicates lower adaptation to the host genome.
GC percentages of codon position 3 plotted against GC percentages of codon positions 1 and 2 (GC12) for the ORF for 11 RVA genes. Slopes significantly deviating from 1 show evidence of natural selection, while slopes near 1 suggest neutrality wherein mutational selection is the driving force of codon usage patterns.
The lower GC3 values relative to GC12 in VP1 and VP3 also showed low slopes (0.06 and 0.05 respectively), suggesting that they may also be under less mutation pressure and more translational selection, however like VP7, the relatively low GC3 content in already high AU genome, suggests the translational efficiency would be lower. The correlation coefficient overall for GC12 and GC3 was 0.261 (p < 0.001). While some segments/ORFs (VP1, VP2, VP4, NSP1, NSP2, NSP3 and NSP5) were more significantly positively correlated between GC12 and GC3, segments 9 and 10 displayed no significant correlation.
3.4 Synonymous sites under selection, however evidence does not suggest translational selection to specific hosts
When comparing strains isolated from different species, there was no evidence to support the conclusion that RVAs adapt their codon usage to specific hosts (Fig. 9). While the bovine strains and avian strains had RCDI values closer to 1.0 than the pig and human RVA strains, there were no significant differences between RCDI values of avian isolates compared to avian genomes vs. RCDI values of bovine isolates compared to avian genomes. Based on RCDI values, bovine strains were “most compatible” with avian genomes, and human strains were “less compatible” to the human genome than to bovine strains. Given that avian and mammalian RVAs do not exchange genetic information in nature, there is substantial divergence between avian RVA isolates and mammalian RVA isolates. The fact that RVA strains show more similar codon patterns between bovine and avian strains than between human and bovine strains suggests that translational selection by host does not play a large role in codon bias.
RCDI and CAI values were derived by sampling 7 complete genomes (77 segment genomes) for each host, isolated from either avian (red), cow (green), human (blue), or pig (purple) hosts. These RVA genomes were then separately compared to cow, chicken, human or pig codon usage charts.
Rotavirus genes overall had higher CAI scores and lower RDCI scores when contrasted with the avian genome codon usage patterns. The lowest CAI and highest RDCI scores were observed when RVA genes were compared with rabbit genomes. However, there was little variation dependent on the host the viral strain was isolated from (i.e., avian RVA strains did not have significantly different scores when compared to bovine RVA strains using the same reference codon usage patterns). There was no evidence for detectable translational selection by host, however the observation that RVA was generally less optimized for the non-human mammalian genomes suggests that RVA may have higher protein expression levels in humans.
4. Discussion
While segment exchange between different RVA genotypes is common, reassortment is not a random process (82-86). However, the limits of segment exchange, whether ecological or mechanical, are poorly understood. Some segment combinations work well together, whereas others are incompatible (87). Numerous factors potentially could affect whether segments from different RVA genotypes are able to reassort, including protein interactions, RNA-RNA and RNA-protein interactions, and the needs to maintain host range and ensure RNA packaging efficiency.
The goal of this study was to better predict potential (or unlikely) genome constellations that may emerge in nature and enhance our understanding of why some segments co-segregate and some do not. To this end, we compared the evolutionary histories as well as some of the selective pressures acting on the 11 RVA segments, first by performing phylogenetic analyses on a large collection of complete RVA genomes, and then by assessing the selective pressures acting on synonymous sites by segment and host type. We estimated the diversity levels of each RVA segment over the last ∼60 years and linked decreases in diversity to the introduction of RVA vaccines between 2006 and 2008.
The segments that share highly similar tree space may share important interactions that rely on a higher percentage of the sequence (e.g., selective pressure at synonymous sites), while the segments that inhabit distinct tree space are likely more flexible on a variety of genetic backgrounds. RVA’s RNA segments are under different evolutionary pressures, which is clear from the distinct evolutionary histories of the segments (Fig. 1), and the significant difference in nucleotide composition between the segments, despite coming from the same genome (Figs. 6 and 8). The zoonotic potential for rotaviruses makes understanding restrictions on genetic exchange important, as outcrossing events can result in novel strains which may cause more severe disease or be better able to evade a vaccine or immune response. Both codon usage and reassortment potential can be important factors in viral host-range, and both are constrained by RNA interactions, translational selection, and mutational bias. Understanding the relative constraints of the RVA genome can help better assess the risk of zoonotic outbreaks and emerging strains.
Potential impact of rotavirus RNA and protein interactions on segment co-segregation
RVA protein and RNA molecules interact with each other in a variety of ways during infection and assembly processes. Incompatibilities among different genotypes resulting from these interactions may limit the diversity of genome constellations observed in nature. For example, RNA secondary structures and segment-specific sequences found in the non-translated terminal regions (NTRs) may govern the formation of the supramolecular RNA complex associated with segment packaging (30, 86, 88-91) Sequence mismatches between segments from coinfecting RVAs may prevent segment interactions and co-packaging, and hence, the generation of reassortant RVAs. One of the studies (88) also suggested that VP4 has less conserved terminal RNA structure, so the importance of these RNA interactions may vary significantly by segment.
The order in which the segments associate to form the supramolecular RNA complex may be sequential. In this scenario, the smallest segments interact first, then they recruit intermediate sized segments before finally incorporating the largest segments (89). In addition, incompatibilities in the 3’-NTRs of the smaller segments may have stronger effects on segment co-segregation than do the larger segments. For instance, the smallest RVA segments, segments 10 and 11 must directly interact before they can interact with larger segments. Evidence for this supposition that the smaller segments’ RNA structure is under strong selective pressure along with its protein structure (i.e., synonymous mutations are often not neutral) also comes from our observation that segments 10 and 11 co-segregated with one another more strongly than other pairs of segments (Fig. 1), despite their many protein interactions with other segments.
There is evidence for frequent inter-host-species reassortment of the segment 10, which encodes the enterotoxin, NSP4, in nature (14, 92), so reassortment of segment 10 may be more dependent on the sequence conservation of terminal +ssRNA between segments 10 and 11, than on protein interactions. In addition to NSP5, segment 11 also encodes a second out of frame protein, NSP6 via leaky scanning. NSP6 is not required for virion function and sometimes not expressed in rotaviruses (93); however, it is constrained due to overlapping with NSP5. Segment 11’s RNA structure is thought to have some functional importance (88, 94) however segment 11 appears more tolerant of genome insertions than other segments (50, 51), likely due to packaging signal duplication (51). The close evolutionary histories of segments 10 and 11 may also relate to NSP4-NSP5 interactions during viroplasm formation (33)
The treespaces of both segments 4 and 9 (Fig. 1) were notably distinct from the treespaces of rest of the genome. RVA segment 4 encodes the spike protein, VP4, which is cleaved by trypsin into VP8* and VP5*. VP4 interacts with different receptors depending on the strain, including sialoglycans and histo-blood glycans (95-97) These different receptors partly explain why certain P types tend to dominate in different populations, species, and age groups (97, 98). Our results showed that the gene tree of segment 4 was distinct from the rest of the genome, suggesting that segment 4 may reassort more readily than other segments. However, segment 4 also appeared to have stricter host barriers than segment 9, which encodes the other serotype protein, VP7 (Fig. 3). Other environmental studies have also found that segment 4 and segment 9 are more likely to appear in different genetic backgrounds (99-101) Based on the high genetic diversity of segments 4 and 9, and segment 4 seeming to have a less conserved role in RVA +ssRNA assortment, reassortment into new genetic backgrounds may confer a selective advantage and a broader host-range as strains can have an opportunity to evolve in a novel host they may otherwise be unable to infect simply due to a barrier caused by another segment (e.g., poor receptor-binding). The fact that segment 4 had the greatest geodesic distance from the rest of the genome, and also possesses the most diversity in RNA secondary structures, suggests that segment 4’s RNA secondary structures are less critical to the segment’s function. Segment 9, on the other hand, is critical for the formation/stabilization of the supramolecular RNA complex and for packaging the genome. Both segment 4 and 9 have been shown to tolerate homologous recombination among highly divergent genotypes, including recombination events resulting in the disruption of many amino acids, whilst still maintaining overall tertiary structure (102).
Due to the importance of VP1, VP2, and VP3 during the formation of the virion, synthesis of dsRNA, and complexing with the 11 +ssRNA segments, one might expect these segments to be the least likely to reassort independently. However, the critical VP1, VP2 and VP3 interactions are mostly protein-protein interactions, so even a genetically distant strain could maintain a conserved amino acid sequence allowing these segments some flexibility with their genetic background. Our results showed that, while these three segments are generally associated with the larger “gene tree” of the rest of the genome, they have a more independent evolutionary history than for example, segments 10 and 11, which almost entirely overlap in tree space.
Interestingly, segment 7 (encoding NSP3) shared the closest evolutionary history with segment 1 (Fig. 1). We expected segment 1 to have the closest evolutionary history with segments 2 or 3 given their proteins’ interactions, however segment 1’s high degree of structural conservation (5) and having less functionally important RNA structure than the other segments may explain its tolerance for novel genetic backgrounds. Segment 7/NSP3 may have less strain specific interactions with other segments resulting in less fitness variation following a reassortment event. Segment 7 may have endured a significant reassortment event around 1970 (Fig. 4, and SI Fig. 7) which also may explain its geodesic from the other segments. NSP3’s primary function is to recognize a conserved group-specific sequence, * UGACC, present on all group A RV segments and interact with host eIF4G, which is highly conserved among orthologs (103). This suggests NSP3 genes should be flexible to many RVA genetic backgrounds and hosts.
Segment 5 (NSP1) was only particularly distinct in the treespace (Fig. 1) from segments 1, 4 and 9. This was somewhat surprising as NSP1 has relatively low conservation, can tolerate insertions and deletions, and is not required for rotavirus replication in vitro, although the RNA is still required for packaging. NSP1 does however serve important roles in targeting the host’s antiviral response. It acts as an interferon antagonist (104),and inhibits apoptosis (105, 106) and activation of NFkappaB (107). and may play a role in host-range (49). A stricter host-range boundary could help explain NSP1’s more similar evolutionary history with the rest of the genome, as NSP1 reassortment with different host strains may confer a deleterious effect in vivo, despite the reassortant’s ability to compete in vitro (i.e., in the absence of a significant immune response).
Rotavirus evolution following the introduction of rotavirus vaccines
The live-attenuated pentavalent vaccine, RotaTeq (Merck), was introduced in 2006, and the live-attenuated monovalent vaccine, Rotarix (GlaxoSmithKline), was disseminated in 2008. RotaTeq contains five human-bovine reassortant viruses with strain serotypes of G1P[5], G2P[5], G3P[5], G4P[5], and G6P[8]), while Rotarix is comprised of the human G1P[8] strain, RIX4414. These vaccines provided effective protection against contemporaneous globally dominant strains G1P[8], G2P[4], G3P[8], G4P[8], and G9P[8]. Our results show a coincident decline in the relative diversities and effective population sizes of all RVA segments after 2006. Supporting this analysis, several studies, including an analysis of G12 strains in Africa, G12 strains in Spain, and a lineage of P[8] strains also show a general decline in diversity/effective population size after 2008 (108-110).
The introduction of the RVA vaccines resulted in a global reduction in RVA-associated mortality. However, vaccine effectiveness varied substantially by region (111) and resulted in changes to the circulating strain prevalence. For example, in post-vaccine era, the prevalence of G9P[8], G2P[4], and G9P[4], G9P[8] increased, while the prevalence of G1P[8] and G3P[8] declined (112-115)The emergence of rare genotypes or animal RVA reassortants in children also appears to be connected to the selective pressure imposed by vaccination. For example, the increase in abundance of G12 and G11 genotypes (113, 114) and the appearance of atypical Wa-like and DS-1-like reassortants, such as the emergence of G1P[8] with a DS-1-like backbone in Malawi (116), appear linked to increases in RVA vaccination. Given DS-1-like and Wa-like segments are thought to have incompatibilities with one another, limiting their reassortment potential (87), the vaccine-induced selection for mutations can be difficult to assess. That is, it is difficult to determine whether emerging fixed mutations are the direct result of escape mutants or are compensatory mutations resulting from novel reassortants. A study on G2P[8] evolution did not observe evidence of vaccine-induced selection; however, another study focusing on P[6], P[4], and P[8] genes did report substantial divergence from the vaccine strains.
Vaccine-induced selective pressure may partly explain the pattern observed when comparing the geodesic distances of the segments. Like the present study, which found that segments 4 and 9 (encoding serotype proteins VP4 and VP7) were especially amenable to reassorting into new genetic backgrounds, a study of European Bluetongue virus (BTV) isolates also found that segments 2 and 6 (encoding the BTV homologs of VP4 and VP7) were quite distant from the rest of the genome in treespace (117). However, the BTV segments 2 and 6 were closer to one another in treespace, whereas in RVA segments 4 and 9 were highly distinct from each other. Additionally, BTV segments 7 (encoding the inner capsid protein) and 10 (encoding NS3) shared close evolutionary histories with one another, and distinct from the rest of the genome. Interestingly, in another BTV study on strains from India, segment 4 (encoding a protein homologous to VP3 in RVA) was found to be the most isolated segment in treespace (118). While there are BTV vaccines available, BTV does not infect humans, is not as globally distributed as RVA, and far fewer serotypes circulate, so there may be more selective pressure on RVA serotype segments to reassort.
While RVA segments were unlinked and had different phylogenies and rate variations along their trees, the segments’ patterns in relative diversity over time mostly matched each other. In Rift Valley fever virus (119), which is a three-segmented -ssRNA arbovirus, different skyline plots were observed for each segment, suggesting that the segments are evolving independently to some extent. In Rift Valley fever study, the medium segment had a much larger effective population size than the small and large segments, indicating that the medium segment experienced reassortment events more frequently than the small and large segments, which tended to co-segregate. The differences in evolutionary patterns between RVA and Rift Valley fever virus may be a consequence of their differing epidemiologies. Frequent outbreaks of Rift Valley fever are limited to sub-Saharan Africa, and their transmission relies heavily on mosquitos and not human-to-human transmission. Rotavirus is conversely, a globally present pathogen with many dominating strains constantly circulating amongst humans, likely making it less sensitive to bottleneck effects.
A more thorough comparison of the geodesic distances between the spike and outer capsid proteins and the rest of the genome in other dsRNA viruses is indicated. It would be interesting to ascertain whether the patterns observed in RVA are also seen in other dsRNA viruses. While the G and P types are thought to maintain host-barriers, this study suggests that is only the case with VP4. By contrast, VP7 may be more flexible in host range and not evolutionarily linked to VP4. Rotavirus disease is typically discussed in terms of ‘GXP[Y]’ strains, however segment 4 and segment 9 evolve independently both from each other and from the rest of the genome. The prevalence of common G and P combinations in association with certain backbones seems to have more to do with an ecological abundance of those strains and less to do with a functional constraint on the virus. As this study analyzed strains from prior to 2017, the global declines in RVA diversity in response to RVA vaccination may change.
Codon bias analysis shows codon usage differs between segments, but not between host strains
We contrasted the codon usage patterns of several common mammalian RVA genotypes with a set of RVA host reference genomes. Our results showed that mammalian and avian RVA codon usage patterns were most compatible with avian genomes (Fig. 6). For example, the highest CAI values and lowest RCDI values in our comparative analysis suggested that RVA NSP5’s codon usage pattern was best adapted to the chicken genome (Fig. 6). This finding suggests that RVA originated as an avian virus that subsequently expanded its host range to include mammalian hosts. The fact that RVA appears better adapted to human and avian genomes rather than the other mammalian genomes tested further suggests that RVA spread to other mammalian hosts after adapting to humans, though more evidence is required to confirm this hypothesis (Fig. 7). Additionally, NSP5’s relatively high CAI value and low RCDI value (Fig. 6; both values approach 1) may indicate a strong selective advantage for NSP5’s codon optimization. Since efficient NSP5 protein expression is critical for RVA viroplasm formation and replication (120-122), the higher CAI value may indicate that NSP5’s codons are optimized to match avian hosts to maximize NSP5 expression in host cells. The variation in GC content between segments was also notable, as being from the same genome, this suggests mutational bias towards AU or GC is not entirely responsible for RVA’s observed codon bias. The low GC3 content and especially low GC3:GC12 ratio seen in VP3 (Fig. 8), in contrast with the high GC3 content of NSP5 and NSP4 (relative to the rest of the genome), could suggest it is beneficial for the rotavirus to have lower-efficiency VP3 expression relative to NSP4 and NSP5, during infection. Varying the codon bias by gene, or maintaining suboptimal codon bias to the host, may sometimes be beneficial (73, 123), a phenomena that has been observed for example, in hepatitis A virus (60).
Although the bovine and avian RVA strains sampled tended to have higher CAI scores and RCDI scores closer to 1, there was no evidence of divergence in codon usage among bovine and avian strains. The similarity in codon usage between host strains is in contrast with a study of influenza A which found that avian and human influenza strains have distinctly different G/C vs. A/U contents from one another (124). Our finding suggests that, while RVA strains may indeed experience an advantage from matching the codon usage patterns of their hosts, it is unlikely that translational selection can counteract nucleic acid selection (i.e., selection favoring synonymous substitutions improving virus survival and reproduction). The wide range of CAI/RCDI values and GC3 content for some of the segments suggest that translational selection is not an especially strong selective pressure for every gene. However, translational selection does seem to be a stronger selective pressure at least in NSP5, based on its significant divergence from the rest of the RVA genome’s codon usage (Fig. 8). The neutrality plot also shows that GC percentages at position 3 frequently are significantly different from the GC percentages at positions 1 and 2, which would indicate that strong selection is occurring at position 3 (Fig. 8). The variation if GC content and codon usage by segment could point to RVA using codon bias as a mechanism of controlling viral gene expression. While there is no significant difference in codon usage between strains isolated from different hosts, notably RVA in general is significantly more optimized to human and avian genomes, than pig and cow genomes. This indicates translational efficiency is not a barrier for zoonosis between avian and mammalian strains.
There are some limitations in the assessment of codon usage patterns. Despite the potential advantage of possessing a codon usage pattern that strongly resembles that of a host, codon usage patterns on their own are inadequate for making inferences or conclusions about the selective forces acting on virus populations. Furthermore, a significant amount of codon usage pattern variation exists between genes the same host, so forming conclusions regarding viral codon-level adaptation to the host is especially difficult. That is, having high CAI values or RCDIs close to 1.0 does not necessarily mean that the virus is more adapted to that host. It could, however, provide evidence that the viral genes are better expressed in a particular host or that a specific gene of a virus may be more efficiently expressed. Additionally, we note that rotaviruses have especially AU-rich genomes and their codon usage patterns diverge from human usage patterns more than other human viruses (73). It would appear that the benefits of being AU-rich outweigh any benefits conferred through codon optimization.
This study further supports caution when measuring for selection by comparing dN/dS ratios for rotaviruses, as the selective pressure at synonymous sites varies significantly by segment. While the evidence did not support RVA nucleotide composition or translation selection varying based on host strain, certain segments were under stronger selectional (translational) rather than mutational pressure at codon position 3. RVA’s indistinctive codon usage by host strain, suggests translational efficiency is not an important host-range barrier for RVA.
Funding
We appreciate funding from the PSC-CUNY Research Award Program, the CUNY Advanced Science Center Seed Program, and from the Queens College Biology Department’s Seymour Fogel Endowment Fund for Genetic and Genomic Research.
Conflicts of Interest
The authors declare no conflict of interest.
Author Contributions
Conceptualization, I.H.; Methodology, I.H.; Formal Analysis, I.H.; Investigation, I.H.; Resources, I.H. and J.J.D.; Data Curation, I.H.; Writing – Original Draft Preparation, I.H.; Writing – Review & Editing, I.H. and J.J.D.; Visualization, I.H.; Supervision, J.J.D.; Project Administration, J.J.D.; Funding Acquisition, I.H. and J.J.D.
Data Availability Statement
All genomic data analyzed in this manuscript are publicly available through NCBI’s Virus Variation Resource (https://www.ncbi.nlm.nih.gov/genome/viruses/variation/).
Acknowledgments
We appreciate useful conversations from Nanami Kubota and Sarah McDonald Esstman, relating to codon bias and rotavirus evolution. We are also grateful to Thomas Hoxie for editing drafts of this article.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.↵
- 22.↵
- 23.
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.
- 38.
- 39.↵
- 40.↵
- 41.
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.
- 85.
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.
- 91.
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.
- 97.↵
- 98.↵
- 99.↵
- 100.
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.
- 122.↵
- 123.↵
- 124.↵