Characterization of endogenous Rubus yellow net virus in raspberries

Rubus yellow net virus (RYNV) belongs to genus Badnavirus. Badnavirids are found in plants as endogenous, inactive sequences, and/or in episomal (infectious and active) forms. To assess the state of RYNV infections, we sequenced the genomes of various Rubus cultivars and mined eight additional published whole genome sequencing datasets. Sequence analysis revealed the presence of a diverse array of endogenous RYNV (endoRYNV) sequences that differ significantly in their structure; some lineages have nearly complete, yet non-functional genomes whereas others have rudimentary, small sequence fragments. We developed assays to genotype the six main endoRYNV lineages as well as the only known episomal lineage in commercial Rubus. This study discloses the widespread presence of endoRYNVs in commercial raspberries, likely because breeding programs have been using a limited pool of germplasm that harbored endoRYNVs.


26
Raspberry is an economically important crop with global production in 2018 being over 820,000 tons 27 grown in 125,000 hectares in all continents except Antarctica ("Production of raspberries in 2018". 28 United Nations, Corporate Statistical Database (FAOSTAT) 2019, retrieved January 14, 2021). 29 Commercial breeding for red raspberry (Rubus idaeus) began about 200 years ago, and most of the 30 currently available cultivars share the same germplasm pedigree dating back to the late 1800s and early 31 1900s (Jennings, 2018). 32 More than 40 virus species are known to infect Rubus; yet Rubus yellow net virus (RYNV) is one of only 33 two badnaviruses known to infect the genus (Diaz-Lara et al., 2015;Shahid et al., 2017 counterpart (Chabannes et al., 2013;Gayral et al., 2008;Ndowora et al., 1999), whereas no other 38 badnavirid is known to reactivate from integrated sequences (reviewed by Bhat et al., 2016). 39 RYNV is a component of raspberry mosaic, an important disease first described in the 1920s (Bennett, 40 1927;Stace-Smith, 1955). The virus infects all red raspberry and most blackberry and hybrid berry 41 cultivars in North America and Europe (Stace-Smith and Jones, 1987a) and can reduce yield from 30-75% 42 in the first year and up to 15% in subsequent years in mixed infections with black raspberry necrosis 43 virus (Stace-Smith and Jones, 1987b). Partial RYNV sequences were first obtained at the turn of the 44 century (GenBank accession number AF468454;Jones et al., 2002). The plant used in the Jones et al.

45
(2002) study had virus-like symptoms and bacilliform particles were observed under the electron 46 microscope. The first RYNV genome (RYNV-Ca, GenBank accession number KF241951), assembled from 47 two PCR amplicons, was obtained by Kalischuk et al. (2008). Another genome (RYNV-BS, KM078034, 48 Diaz- Lara et al., 2015) was sequenced from red raspberry 'Baumforth's Seedling A' using DNA from 49 rolling circle amplification. Since then, several RYNV sequences were published using small RNA 50 (Rajamäki et al., 2019) or whole genome sequencing (MN245240). 51 Diaz-Lara (2016) observed that red raspberry plants, supposedly free of RYNV based on aphid or graft 52 transmission onto R. occidentalis 'Munger' indicator, yielded positive results when indexed by PCR-53 based assays. Moreover, those plants were tested positive for RYNV by PCR even after heat therapy and 54 meristem-tip culture for virus elimination. It was demonstrated that RYNV integrates into the red 55 raspberry genome (Diaz-Lara et al., 2020), but no further analysis was conducted for the reported 56 endogenous RYNV (endoRYNV) sequences. In this study, multiple cultivars were assayed to determine 57 the prevalence of endoRYNV and the lineages identified were validated and characterized in-depth. 58 Twenty-five raspberry cultivars maintained as tissue culture plantlets in Watsonville, California were 61 used in the study (Table 1). For 'Baumforth's Seedling A', an additional mature plant was obtained from 62

Materials and methods
Corvallis, Oregon with the RYNV-BS (Diaz-Lara et al., 2015; Table 1) and used as a positive control for the 63 episomal form, hereafter referred to as epiRYNV-BS. In addition, the genome of 75 proprietary red 64 raspberry and 100 proprietary blackberry cultivars were sequenced and assayed for integration of RYNV-65 BS and the episomal form of the virus but their identity is not provided to protect intellectual property 66 rights. 67

68
DNA was extracted using either the DNeasy(R) kit (Qiagen) or the method described by Poudel et al. 69 (2013). All DNA libraries were constructed using a TruSeq DNA HT Sample Prep(R) kit and sequenced 70 individually using paired-end (2 × 300 bp) Illumina HiSeq configuration by Novogene (Sacramento, CA). 71 Raw Illumina reads were subject to de novo assembly using Spades (Bankevich et al., 2012). BLASTn 72 search (Camacho et al., 2009) was performed on the output contigs with e-value=10 against published 73 RYNV nucleotide sequences (nt) downloaded from GenBank nt database (January 16, 2021). After RYNV 74 hits were filtered out, the remaining contigs were processed using BLASTx against a database containing 75 all RYNV protein sequences downloaded from GenBank nr (January 16, 2021). All Illumina datasets were 76 also submitted to VirFind (http://virfind.org, Ho and Tzanetakis, 2014) for virus detection and discovery. 77 Bowtie2 (Langmead and Salzberg, 2012) was used for mapping raw reads to RYNV contigs for visual 78 confirmation of the mapping assemblies with Tablet (Milne et al., 2013). BioEdit (Hall, 1999) was used to 79 calculate sequence identity matrix, and ClustalW (Thompson et al., 1994) of the MEGA X software 80 (Kumar et al., 2018) applied to align nucleotide and amino acid sequences. Expasy 81 (https://web.expasy.org/translate/) was used to predict open reading frames (ORF). Conserved domain 82 search was done using the NCBI homonymous tool (Lu et al., 2020). Breaking points of the RYNV 83 lineages were identified by aligning raw Illumina reads with BLASTn against the assembled sequences 84 and partially aligned reads were manually analyzed for sequence identities. 85

99
Published datasets were mined for RYNV sequences (Table 1)  there was amplification with the correct melting point. The previously published assay of Diaz-Lara et al.

119
(2020) was also included in this validation for specificity comparison. 120

122
The presence of RYNV DNA in commercial raspberries was investigated by whole genome sequencing 123 and mining data of 25 and 8 cultivars, respectively ( identities to each other and the epiRYNV-BS (Table 2). 146

170
The third 'Cuthbert' endoRYNV (CU-3) is present in five cultivars and is heavily truncated. It is lacking 5' 171 IG, ORF1 and ORF2. Its sequence starts with a truncated ORF3 and together with the 3' IG accounting for 172 a 4550-bp stretch. Its RT_RNaseH region shares 99.6% to RYNV-Cu from Chile (MN245240). The 173 integrant is fragmented at nt4423 and 4550, and has plant-virus junctions at nt1 and 1492. 174

175
The endoRYNV-BS was first detected in 'Baumforth's Seedling A' (1880 release, UK) and is present in 17 176 cultivars. The lineage is 7602 bp, and has intact 5' IG, ORF1, and ORF2. When aligned against epiRYNV-177 BS, ORF3 is missing a 132-nt stretch after nt2445 corresponding to 44 aa, and the 3' IG lacks 83 bp. This 178 lineage has all five conserved badnavirid domains. The lineage is present in the assembled genome of 179 'Joan J' as a single copy on chromosome 4. The integrant is 12,143 bp and composed of two fragments. 180 The first is in the forward orientation and contains complete 5' IG that forms a junction with t he plant 181 DNA at its 5' end, followed by complete ORF1, ORF2, and part of ORF3 that is truncated at nt6317. The 182 second follows immediately after in the reverse orientation, with a truncated 3' IG at nt7602, then 183 continues with ORF3 but truncated at nt1777 fusing to the plant genome. No full-length ORF3 is present 184 in either of the fragments. 185

186
First detected in Phoenix (1896 release, UK), endoRYNV-PH1 is present in nine cultivars. The 6631nt 187 sequence starts with the 5' IG, followed by the ORF1 of 177 nt and missing 380 nt after nt561 when 188 aligned against the epiRYNV-BS, before an intact ORF2. ORF3 is missing 675 nt after nt1307 as well as 189 the ribosomal L25/TL5/CTC N-terminal 5S rRNA binding domain. It has the four other conserved domains 190 similar to the epiRYNV-BS. The integrant is fragmented at genomic nt positions 335, 1373, 1814, and 191 5859, with nt1 and nt6631 connected to the plant DNA. 192

193
The last substantial integrated RYNV sequence, endoRYNV-PH2, was only found in 'Phoenix'. The 7091-194 nt fragment's 5' end has the intact 5' IG, ORF1 and ORF2. ORF3 misses 141 nt after nt2461 195 corresponding to 47 aa, and the fragment terminates at nt7091. This sequence has all conserved 196 domains found in the epiRYNV-BS. The integrant is fragmented at nt2201 and 2432, and has two plant-197 virus junctions at nt3094 and 7091.

206
We analyzed the genome sequence data of commercial cultivars from around the globe released as 207 early as 1802 and as recent as 2006. Integrated RYNV sequences were present in 27/33 cultivars (82%). 208 The endoRYNV population could be categorized into six main lineages and other short endogenous 209 fragments. The diversity of endoRYNV is complex with sometimes sequences having inversions, 210 duplications, or deletions. 211 Rubus domestication has resulted in a reduction of genetic diversity (Haskell, 1960;Jennings, 1988), and 212 modern cultivars are genetically similar to each other (Dale et al., 1993;Graham and McNicol, 1995).

213
This can be seen in the case of the cultivars analyzed in this study. All breeding programs share the same 214 endoRYNV lineages, which in turn were discovered in three cultivars commercialized in the 19 th century: 215 'Cuthbert' (1865), 'Baumforth's Seedling A' (1880), and 'Phoenix' (1896). These endogenous sequences 216 presumably became widespread as the three aforementioned cultivars were used as parents, or are in 217 the lineages of most raspberry breeding programs worldwide. 218 endoRYNV-CU1 lineage is the closest isolate to the published RYNV-Ca sequence (KF241951) (Kalischuk 219 et al., 2013). Since RYNV-Ca has two inverted repeats, misses the true 3' IG, and hence likely is an 220 endogenous sequence, we consider epiRYNV-BS as the sole episomal RYNV lineage known to infect 221 Rubus. It is important to note that when aligned against the epiRYNV-BS sequence, all endoRYNV 222 lineages are truncated and missing genomic DNA stretches. From this data, we hypothesize that the 223 endoRYNVs are unable to reactivate and become episomal due to their incomplete genomes. In addition 224 to the raspberry cultivars of this study, we sequenced the whole genome of an additional 75 proprietary 225 red raspberry cultivars, and epiRYNV-BS was absent in all (data not shown), indicating that this lineage 226 may be unable to integrate in the raspberry genome. We also sequenced the genomes of 100 227 proprietary blackberry cultivars (data not shown) but did not find any evidence of endoRYNV, suggesting 228 that endoRYNV sequences may be limited to red raspberry. 229 Diagnostic tests for infectious agents are necessary so that phytosanitary agencies can protect a 230 country's natural resources and agriculture. However, the Rubus industry could be significantly 231 impacted if a diagnostic test was positive for RYNV but inadvertently a no-risk endoRYNV was detected. 232 Published PCR primers were designed to target either RYNV-Ca or epiRYNV-BS as they had been the only 233 known RYNV lineages (Diaz-Lara et al., 2020;Jones et al., 2002;Kalischuk et al., 2008). Diaz-Lara et al.

234
(2020) showed that primers currently used for RYNV detection could produce positive results in cultivars 235 only harboring endoRYNV DNA, indicating the urgency to have a good diagnostic test that can clearly 236 differentiate the two forms, similar to epiBS-2604F/2715R. This test should be developed and validated 237 for accuracy and sensitivity against a wide range of episomal isolates. 238 Theoretically, endoRYNV can be removed from the red raspberry by traditional breeding. However, the 239 effort required to remove endoRYNV DNA after multiple generations of backcrossing would be 240 considerable, especially when desired traits must be retained. CRISPR-Cas9 could be used to remove the 241 endoRYNVs, but for cultivars with multiple endoRYNV fragments, multiple gene-editing events will need 242 to be done. We believe that such actions are not necessary as endoRYNV fragments could not 243 reconstruct a full, infectious, genome. 244