Rapid multilocus adaptation of clonal cabbage leaf curl virus populations to Arabidopsis thaliana

Cabbage leaf curl virus (CabLCV) has a bipartite single-stranded DNA genome and infects the model plant Arabidopsis thaliana. CabLCV serves as a model for the genus Begomovirus, members of which cause tremendous crop losses worldwide. We have used CabLCV as a model for within-plant virus evolution by inoculating individual plants with infectious clones of either a wild-type or mutagenized version of the CabLCV genome. Consistent with previous reports, detrimental substitutions in the Replication-associated gene (Rep) were readily compensated for by direct reversion and/or alternative mutations. A surprising number of common mutations were detected elsewhere in both viral segments (DNA-A and DNA-B) indicating convergent evolution and suggesting that CabLCV may not be as well adapted to A. thaliana as commonly presumed. Consistent with this idea, a spontaneous coat protein variant consistently rose to high allele frequency in susceptible accession Col-0, at a higher rate than in hypersusceptible accession Sei-0. Numerous high-frequency mutations were also detected in a candidate Rep binding site in DNA-B. Our results reinforce the fact that spontaneous mutation of this type of virus occurs rapidly and can change the majority consensus sequence of a within-plant virus population in weeks.


Introduction
New sequencing technologies have accelerated both virus discovery (Jeske 2018;Cobbin et al. 2021) and studies of within-host virus diversity. Short-read technologies such as Illumina sequencing-by-synthesis provide limited information about viral haplotypes but can yield deep coverage for quantitative analysis of minority (subconsensus) alleles (Lauring 2020). Measuring and functionally characterizing such variation, including spontaneous mutations, is necessary to clarify how virus populations interact with other components of the phytobiome (Roossinck 2019).
Viruses in the family Geminiviridae are transmitted by phloem-feeding insects and cause major crop losses (Rojas et al. 2018). Geminivirus populations vary within individual host plants, a fact evident from the first complete-genome sequence dataset (Stanley and Gay 1983). Controlled experiments have demonstrated high geminivirus mutation frequencies (Isnard et al. 1998) and rapid substitution in response to selection (Ge et al. 2007). Geminiviruses have single-stranded DNA genomes yet have substitution rates comparable to RNA viruses . It is thought that rapid evolution enables geminiviruses to adapt and evade host defenses, but the mechanistic details and implications for disease management are largely unclear (Acosta-Leal et al. 2011; García-Arenal and Zerbini 2019).

Cabbage leaf curl virus is a whitefly-transmitted bipartite virus in the genus
Begomovirus that broadly infects Brassicaceae (Strandberg et al. 1991). We abbreviate the virus name here as 'CabLCV', per the Virus Metadata Resource of the International Committee on Taxonomy of Viruses (Calisher et al. 2019), but note that 'CbLCV', proteins involved in replication and counterdefense ( Figure 1A) and the DNA-B segment encodes two proteins that function in virus movement. CabLCV infects the model plant Arabidopsis thaliana (L.) Heynh. (Hill et al. 1998) and dramatically reprograms host gene expression (Ascencio-Ibáñez et al. 2008).
In this work, we used the CabLCV-A. thaliana pathosystem for genome-wide analysis of spontaneous virus mutation, both in the presence and absence of a strong selective pressure. We built on previous work using infectious clones of CabLCV in which the Replication-associated gene (Rep) was mutagenized (Argüello-Astorga et al. 2007), extending analyses that demonstrated rapid single-nucleotide mutation in this mutagenized region. Such substitutions are thought to restore high-affinity interaction with the host RETINOBLASTOMA family protein required for reprogramming of DNA replication (Argüello-Astorga et al. 2004). Surprisingly, we identified additional convergent mutations in both virus segments (DNA-A and DNA-B) in a majority of inoculated plants. The allele frequency of these mutations varied with virus genotype (wild-type [WT] vs. mutagenized) and host genotype.

Materials and Methods
Two A. thaliana accessions were used -the susceptible standard laboratory accession Columbia-0 (Col-0) and the highly susceptible natural inbred line Sei-0 (from Seis am Schlern, Italy; Kranz and Kirchheim 1987 Figure 1: DNA-A mutagenesis and reversion alleles described by Argüello-Astorga et al. (2007). A. Circular and linear diagram of the CabLCV DNA-A segment in the conventional orientation, starting and ending at the nick site within the taatatt/ac nonanucleotide (black triangle). The five canonical New World begomovirus genes are indicated, along with the genomic positions of four mutagenized nucleotides in the pNSB1101 plasmid inoculum, in blue (t1911c) or purple (a2005g, g2006c, and a2007c). The sequence display shows the virion-encapsidated strand (v-DNA; v-sense) but note that Rep is encoded on the opposite (complementary) strand. Codons for two amino acid residues in the Rep protein (A144 and L145 for parental wild-type virus) are shown along with amino acid replacements resulting from different mutations; these codons are in complementary sense (reverse complement of v-sense). The observed single-nucleotide reversion mutation (g2005a) is indicated in bold and another spontaneous mutation (c2006t) is indicated in red. The resulting amino acid replacements in the AC4 protein (L118R and S119P) are also indicated. B. CabLCV Rep protein domains and motifs, including iteron-related (IR) residues 7-11 (amino acids SFRLA) and the geminivirus Rep sequence (GRS; Nash et al. 2011). The coordinates of the oligomerization domain and RETINOBLASTOMA protein interaction surface are inferred based on alignment to the Rep protein sequence from tomato golden mosaic virus , Kong et al. 2000, Reyes 2012). Walker motifs were inferred based on Ruhel et al. (2021). Positions of amino acids L145 and E176 are shown with a purple line and a blue line.
CabLCV infectious clones were previously described (Turnage et al. 2002). Plasmids pCPCbLCVA.003 and pCPCbLCVB.002 include partial tandem dimers of DNA-A and DNA-B. Resequencing these clones (see below) confirmed that the monomer unit segment sequences (2583 and 2512 nt, respectively) are identical to GenBank accessions U65529.2 and U65530.2 (Abouzid et al. 1992), with the exception of a single-nucleotide deletion adjacent to base 2368 (relative to the nick site) of DNA-B.
Construction of mutagenized DNA-A plasmid pNSB1101 was previously described Illumina data were deposited in the NCBI Sequence Read Archive as PRJNA782339.

Rep reversion and global mutation detection
for each plant is shown in Supplemental Figure S1. We called variants (singlenucleotide replacements, small insertions, and small deletions) and, for simplicity, initially considered them as independent mutant alleles, not attempting to infer their local or global haplotype context.
No variants with estimated allele frequencies above 1% were identified for DNA-A in

Recurrent coat protein S56N substitution
The

Discussion
We have documented rapid and reproducible spontaneous mutations in three regions of the CabLCV genome, one engineered (Rep helix 4) and two unanticipated (CP S56 and a DNA-B iteron). High-throughput sequencing revealed subconsensus variants across the genome, but the degree to which such spontaneous mutations are selectively advantageous is unknown. Future studies could benefit from increased biological and technical replication, particularly because detectable contamination with infectious clones limited our ability to make quantitative comparisons of mutant allele frequencies.
We saw no obvious patterns of variant co-occurrence across plant samples indicating sample-to-sample cross-contamination with actual virus DNA (Supplementary Tables S2 to S5) but including additional controls (spike-in sequence identifiers or mockinoculated plant DNA libraries) could rule out or mitigate this possibility. Higher sensitivity for rare variants could potentially be achieved with alternative DNA preparation strategies (Aimone et al. 2021b;Pinto et al. 2021).

Implications for Rep-DNA and Rep-protein interactions
Although mutagenized CabLCV used here and in the previous study by Argüello-Astorga et al. (2007) is an artificial system, consistent results were obtained with a geminivirus from a different genus, maize streak virus (Shepherd et al. 2005). Similar to the K125M substitution observed here ( Figure 2D), Argüello-Astorga et al. detected a substitution (I167L) on the other side of the RETINOBLASTOMA protein interaction region. It is unclear whether these second-site substitutions directly compensate for disruption of the protein-protein interaction, but these results suggest that epistatic interactions among Rep codons should be explored further.
The strong selective sweep for restored Rep function did not prevent selection on other loci but may have weakened selection for the CP S56N substitution in Col-0 plants. By contrast, the use of mutagenized inoculum appears to have enhanced selection for mutant alleles at the DNA-B iteron, particularly for the g2386a substitution ( Figure 4). This iteron may be associated with one or more Rep-mediated functions, including in DNA-B replication, transcriptional repression, or other as-yet-unknown molecular interactions. Rep auto-represses its own transcription via a DNA-A iteron (Haley et al. 1992;Hanley-Bowdoin et al. 1999), though no similar function has been described for a corresponding DNA-B iteron. Alternatively, the normal function of this region may be unrelated to Rep binding and/or the region may have no molecular function at all.
CabLCV has an atypical Rep structure and the CabLCV DNA-A/DNA-B common regions have a greater number of differences than is typical for begomoviruses (Hill et al. 1998), making it difficult to extrapolate from better studied viruses such as tomato golden mosaic virus . Experimental assessment of the function of this iteron could clarify why these mutations increase fitness. It may be that the substitutions disrupt Rep binding to DNA-B, reducing competitive sequestration of Rep molecules and thus enabling better DNA-A replication. If Rep binding to this DNA-B site is less important, increased DNA-A replication could indirectly provide a fitness benefit to DNA-B. We would expect the Rep L145A substitution, which impairs replication of both molecules, to enhance this benefit, consistent with our observations. The DNA-B long intergenic region was also a mutational hotspot during experimental evolution in cassava (Aimone et al. 2021c), suggesting this pattern will occur in other experimental and natural systems.

CP and host-genotype-specific differences in selection pressure
The CP gene is dispensable for infection in many laboratory situations (Stanley and Townsend 1986;Pooma et al. 1996;Turnage et al. 2002) so strong selection on a CP substitution was not necessarily expected. This result highlights the use of experimental evolution as an efficient tool for 'forward' genetic screening (Cooper 2018). Similar to what we have found here, an ssDNA circovirus also displayed high mutation frequencies within the coat protein gene (Correa-Fiz et al. 2020). Similar results were also observed for tomato leaf curl begomoviruses (Sánchez-Campos et al. 2018).
Sequencing virus populations from experimental inoculation studies may become a routine step in characterizing new infectious clones as sequencing costs decline.
Introducing mutations that rapidly rise in frequency back into their 'parent' infectious clones may reduce experimental variability due to possible phenotypic effects from these mutations.
The S56N substitution detected in 7 of the 11 plants in this experiment likely prevents phosphorylation, providing a possible explanation for its selective benefit. Hipp et al. (2019) detected by mass spectrometry partial phosphorylation of three N-terminal residues (one homologous to the CabLCV residue) in the CP of African cassava mosaic virus. Hipp et al. suggested that this phosphorylation may promote ubiquitin-dependent proteasomal degradation of CP, similar to degradation of tomato yellow leaf curl virus CP (Gorovits et al. 2014(Gorovits et al. , 2016. The S56N substitution may stabilize CP, enhancing its (largely unknown) functions in cell-to-cell and long-distance movement. The N terminus of CP also functions in nuclear localization and in ssDNA binding during virion assembly (Unseld et al. 2001(Unseld et al. , 2004, so other effects remain possible. This amino acid position is conserved across many geminiviruses, with the serine sometimes replaced with threonine (Hipp et al. 2019), another residue that can be phosphorylated. Therefore, it is possible that the within-plant selective advantage we have observed is counterbalanced by long-term costs, particularly if the substitution interferes with packaging and/or whitefly transmission.
The basis for the stronger effect for S56N in Col-0 relative to Sei-0 is not clear, but should be tractable for future study. The geminivirus beet curly top virus also causes strong symptoms on Sei-0, likely because it accumulates to high levels (Park et al. 2004;Lee et al. 1994), but differences in DNA-A and DNA-B levels in Col-0 vs. Sei-0 were variable in our experiment (Supplemental Table S1). Sei-0 can be infected with African cassava mosaic virus, whereas Col-0 cannot (Aimone et al. 2021a). The genetic basis for the hypersusceptibility of Sei-0 is currently unknown but could be genetically mapped.
A. thaliana can be infected with a number of geminiviruses (Ouibrahim and Caranta 2013), but it is unclear which, if any, of these viruses are well-adapted to it. Serially passaging CabLCV in A. thaliana would presumably lead to further substitutions, some of which may involve trade-offs including reduced ability to adapt to changing conditions, as observed for turnip mosaic virus (Butković et al. 2020;González et al. 2019). CabLCV is thought to be an ecologically realistic begomovirus for challenging A. thaliana based on its broad host range within the Brassicaceae and Fabaceae (Strandberg et al. 1991;Fiallo-Olivé et al. 2018), but this assumption should be reexamined. Measuring the prevalence of geminiviruses in natural A. thaliana populations, as has been done for several RNA viruses (Pagán et al. 2010), would be informative and further enhance the value of this experimental system.