Chromosome Evolution of Octoploid Strawberry

The allo-octoploid cultivated strawberry (Fragaria × ananassa) originated through a combination of polyploid and homoploid hybridization, domestication of an interspecific hybrid lineage, and continued admixture of wild species over the last 300 years. While genes appear to flow freely between the octoploid progenitors, the genome structures and diversity of the octoploid species remain poorly understood. The complexity and absence of an octoploid genome frustrated early efforts to study chromosome evolution, resolve subgenomic structure, and develop a single coherent linkage group nomenclature. Here, we show that octoploid Fragaria species harbor millions of subgenome-specific DNA variants. Their diversity was sufficient to distinguish duplicated (homoeologous and paralogous) DNA sequences and develop 50K and 850K SNP genotyping arrays populated with co-dominant, disomic SNP markers distributed throughout the octoploid genome. Whole-genome shotgun genotyping of an interspecific segregating population yielded 1.9M genetically mapped subgenome variants in 5,521 haploblocks spanning 3,394 cM in F. chiloensis subsp. lucida, and 1.6M genetically mapped subgenome variants in 3,179 haploblocks spanning 2,017 cM in F. × ananassa. These studies provide a dense genomic framework of subgenome-specific DNA markers for seamlessly cross-referencing genetic and physical mapping information, and unifying existing chromosome nomenclatures. Through comparative genetic mapping, we show that the genomes of geographically diverse wild octoploids are effectively diploidized and completely collinear. The preservation of genome structure among allo-octoploid taxa is a critical factor in the unique history of garden strawberry, where unimpeded gene flow supported both its origin and domestication through repeated cycles of interspecific hybridization.

However, reproductive barriers among octoploid Fragaria taxa remain essentially 59 nonexistent, fueling the recurrence of interspecific homoploid hybridization in the origin, 60 domestication, and modern-day breeding of F. × ananassa. 61 62 The modern F. × ananassa lineage traces its origin to extinct cultivars developed in 63 western Europe in the 1700s. These cultivars were interspecific hybrids of non-64 sympatric wild octoploids from the New World: F. chiloensis subsp. chiloensis from 65 South America and F. virginiana subsp. virginiana from North America (Darrow, 1966). isolation implies that homoploid and polyploid hybridization events have not produced 84 significant chromosome rearrangements among octoploid taxa. We hypothesized that 85 the octoploids carry nearly collinear chromosomes tracing to the most recent common 86 ancestor, despite one million years of evolution which produced multiple recognized 87 species and subspecies. 88 89 The octoploid strawberry genome has been described as "notoriously complex" and an 90 "extreme example of difficulty" for study (Folta and  show that massive genetic diversity has been preserved in F. × ananassa, with 168 negligible difference between wild species and domesticated germplasm. The 169 subgenome nucleotide diversity (π) of F. × ananassa (π = 5.857 x 10-3) was equivalent 170 to wild progenitors F. chiloensis (π = 5.854 x 10-3) and F. virginiana (π = 5.954 x 10-3), 171 and comparable to the sequence diversity of Zea mays landraces (π = 4.9 x 10-3) and 172 wild Zea mays spp. parviglumis progenitors (π = 5.9 x 10-3) ( Carolina and Jucunda could be regarded as similarly heterozygous to autopolyploid 211 species such as potato. However, assembly of the allo-octoploid strawberry genome 212 uncovered rampant gene silencing, gene loss, and rearrangements relative to diploid 213 ancestors (Edger et al., 2019b), eroding the conservation of ancestral allele function. 214 The frequency of unique sequence alignment ( Figure S1) and unbroken distribution of 215 subgenomic variant detection ( Figure S2) in our analysis underscore the extensive 216 divergence of the four subgenomes. Thus, traditional polyploid allele dosage models 217 assuming genome-wide fixed heterozygosity may be of limited usefulness for 218 strawberry.

220
Recombination Breakpoint Mapping of Octoploid Strawberry 221 We used WGS sequence analysis and recombination breakpoint mapping of an 222 octoploid strawberry population to explore the breadth of disomic variation as an 223 indicator of bivalent pairing during meiosis. Several cytogenetic and DNA marker 224 studies have proposed the occurrence of polysomy in strawberry (Fedorova, 1946 × AB), and 0.2M co-heterozygous sites (AB × AB). We used the high-density variant 236 data to perform haplotype mapping based on recombination breakpoint prediction, and 237 evaluated segregation ratios of parental alleles across the 28 octoploid chromosomes.

239
We bypassed the computational demand of analyzing pairwise linkage across millions 240 of DNA variants with missing data and genotyping errors by implementing the haplotype that affected mapping of F. × ananassa. Artificial selection pressure in commercially 262 bred hybrids almost certainly accounts for the lower subgenomic heterozygosity of 263 Camarosa relative to Del Norte, which does not support a critical role for genome-wide 264 interspecific heterozygosity in driving cultivar performance. 265 266 850K Octoploid Screening Array 267 We designed Affymetrix SNP genotyping arrays populated with subgenome-specific 268 marker probes to enable genetic mapping, genome-wide association studies (GWAS), 269 and genomic prediction in octoploid strawberry. DNA variants were selected for array 270 design from the subgenomic diversity identified in the WGS panel ( Figure 3). were then filtered to remove candidates that were problematic for array tiling. These 280 included duplicate or near-duplicate probe sequences, probes that inherited ambiguous 281 reference sequences (Ns), probes requiring double-tiling (A/T or C/G alleles), and 282 probes that Affymetrix scored as having low buildability. We retained 6.6M probes that 283 targeted high-confidence F. × ananassa variants and were acceptable for array tiling.

285
We applied three selection criteria for determining a subset of 850K marker probes for 286 tiling a screening array: likelihood of probe binding interference by off-target variants, 287 likelihood of off-target (non-single copy) probe binding, and physical genome 288 distribution. The likelihood of probe binding interference was scored as the sum of non-289 reference allele frequencies for off-target variants in the 35-nt binding region adjacent to 290 the target SNP. The likelihood of off-target probe binding was scored by performing 291 BLAST alignment of the 71-nt probe sequences to the Camarosa v1.0 genome and 292 quantifying the number of off-target alignments with query coverage above 90% and 293 sequence identity above 90%. We then iteratively parsed the Camarosa v1.0 genome 294 using 10 kilobase (kb) non-overlapping physical windows, extracting the best available 295 marker from each window based on probe binding interference and off-target binding 296 likelihoods, until reaching an 850K probe threshold. We reserved 16K positions for were polymorphic in a previous strawberry diversity study  single-copy binding, in addition to measuring subgenome-specific DNA variation ( Figure  319 S4). The complete set of 446,644 validated probes is made available for public use 320 (Dataset S3). 321 322

50K Octoploid Production Array 323
We selected 49,483 polymorphic marker probes from the 850K validated probe set to 324 build a 50K production array (Dataset S4). 5,809 LD-pruned (r 2 < 0.50) marker probes 325 were retained from the iStraw panel to support cross-referencing of octoploid QTL 326 studies and linkage group nomenclatures. We targeted 2,878 genes based on 327 Camarosa Candidate genes were pre-allocated up to two markers (within 1 kb) from the screening 331 panel. We next selected a set of the most commonly segregating markers to support 332 genetic mapping. We identified this set by selecting the marker with the highest pairwise 333 diversity (π) in F. × ananassa across non-overlapping 50 kb physical genome windows. 334 The remainder of the 50K array was populated by iteratively parsing the genome with 335 50 kb physical windows and selecting random QC-passing markers to provide an 336 unbiased genome distribution. Both the 850K and 50K probe sets provide unbroken, 337 telomere-to-telomere physical coverage of the 28 octoploid strawberry chromosomes 338 ( Figure S5). Within the 50K probe set, 53% of the probes were located within genes, 339 and 79% were located within 1 kb of a gene. The 50K probe set was provided to 340 Affymetrix for building the production array.

393
The wild octoploid maps revealed large (Mb+) chromosomal rearrangements relative to 394 the Camarosa v1.0 physical genome on chromosomes 1-2, 1-4, 2-1, 2-3, 6-2, and 6-4. 395 These rearrangements were conserved across the wild species genomes, and 396 supported by corresponding regions represented in the Camarosa genetic map (1-2, 1-397 4, 2-1) ( Figure S3), indicating intra-chromosomal scaffolding errors in the physical 398 reference genome. The fraction of SNPs genetically mapping to non-reference 399 chromosomes ranged from 1. chromosomes of the octoploid progenitor subspecies were completely syntenic ( Figure  431 5, Figure S6). Based on these results, large-scale chromosome rearrangements in 432 octoploid  (Table 1). 455 The existing octoploid nomenclatures each contained subgenome assignments that 456 were incongruent with ancestral chromosomal origins determined by phylogenetic day octoploid taxa appear to be highly diploidized. We observed disomic inheritance of 506 DNA variants across the genomes of the octoploids in the present study, and similar 507 ranges of subgenomic heterozygosity for wild individuals and commercial hybrids. The 508 success of F. × ananassa should not be solely attributed to "fixed heterosis" because 509 neither octoploid progenitor species, which share the effects of fixed heterosis and show 510 similar subgenomic heterozygosity, was commercially successful before the hybrid 511 (Darrow, 1966;Finn et al., 2013). We hypothesize that interspecific complementation, a 512 broader pool of potentially adaptive alleles, and masking of deleterious mutations could 513 be more important than fixed heterosis in F. × ananassa (Alix et al., 2017; Comai, 514 2005). 515 516 We have shown that the purported complexity and previous intractability of octoploid 517 strawberry genomics were largely associated with the technical challenge of 518 distinguishing subgenome level variation from the broader pool of ancestral sequence 519 homology. The use of an allo-octoploid reference genome addressed this problem by 520 allowing variant calling based on unique sequence alignments to the respective 521 subgenomes. While local subgenome homology could remain an issue, we identified a 522 nearly continuous distribution of subgenome-specific variation spanning the octoploid 523 genome by traditional short-read sequencing. Clustering was performed in "polyploid" mode with a marker call-rate threshold of 0.89. 604 Samples were filtered with a dQC threshold of 0.82 and QC CR threshold of 93. A 605 subset of 49,483 probes was selected from polymorphic, QC-passing markers 606 ("PolyHighResolution", "NoMinorHomozygote", "OffTargetVariant") on the 850K 607 screening array to populate the 50K production array. 5,809 LD-pruned (r 2 < 0.50) 608 probes were pre-selected from the iStraw design, in addition to 47 probes associated 609 with QTL for Fusarium oxysporum resistance and the Wasatch day neutral flowering 610 locus (unpublished data). We assigned two markers per gene to a set of 2,878 genes 611 located in expression networks related to flowering and fruit development ( We performed single-marker linkage mapping of populations genotyped using the 50K 652 array or DNA capture sequences because each contained fewer than 10,000 653 segregating markers per parent. Individual parent genotypes were mapped separately 654 using their respective informative marker subsets. We filtered markers based on a chi-655 square test for segregation distortion (p-value < 0.10), and excluded markers with >5% 656 missing data. ONEMAP was used to bin co-segregating markers, calculate pairwise 657 recombination fractions, determine optimal LOD thresholds, and cluster markers into 658 linkage groups based on a LOD threshold of 8, and maximum recombination fraction of All authors contributed to manuscript revision, read and approved the submitted version.

675
Acknowledgments 676 We thank our collaborators at Affymetrix for constructing the 50K and 850K octoploid 677 strawberry SNP genotyping arrays. The datasets generated for this study can be found in the NCBI Sequence Read 688 Archive (https://www.ncbi.nlm.nih.gov/sra). 689 690