Genomic novelty and process-level convergence in adaptation to whole genome duplication

Whole genome duplication (WGD) occurs across kingdoms and can promote adaptation. However, a sudden increase in chromosome number, as well as changes in physiology, are traumatic to conserved processes. Previous work in Arabidopsis arenosa revealed a coordinated genomic response to WGD, involving physically interacting meiosis proteins, as well as changes related to cell cycle and ion homeostasis. Here we ask: is this coordinated shift in the same processes repeated in another species following WGD? To answer this, we resequenced and cytologically assessed replicated populations from a diploid/autotetraploid system, Cardamine amara, and test the hypothesis that gene and process-level convergence will be prevalent between these two WGDs adaptation events. Interestingly, we find that gene-level convergence is negligible, with no more in common than would be expected by chance. This was most clear at meiosis-related genes, consistent with our cytological assessment of somewhat lower meiotic stability in C. amara, despite establishment and broad occurrence of the autotetraploid in nature. In contrast, obvious convergence at the level of functional processes, including meiotic cell cycle, chromosome organisation and stress signalling was evident. This indicates that the two autotetraploids survived challenges attendant to WGD via contrasting solutions, modifying different players from similar processes. Overall, this work gives the first insight into the salient adaptations required to cope with a genome-doubled state and brings the first genomic evidence that autopolyploids can utilize multiple trajectories to achieve adaptation to WGD. We speculate that this flexibility increases the likelihood a nascent polyploid overcomes early stringent challenges to later access the spectrum of evolutionary opportunities of polyploidy. Significance statement Whole genome duplication (WGD) is a tremendous mutation and an important evolutionary force. It also presents immediate changes to meiosis and cell physiology that nascent polyploids must overcome to survive. Given the dual facts that WGD adaptation is difficult, but many lineages nevertheless survive WGD, we ask: how constrained are the evolutionary responses to a genome-doubled state? We previously identified candidate genes for WGD adaptation in Arabidopsis arenosa, which has natural diploid and tetraploid variants. Here we test for evolutionary convergence in adaptation to WGD in a species 17 million years distant, Cardamine amara. This work gives the first genomic insight into of how autopolyploids utilize multiple adaptive trajectories to manage a genome-doubled state.

Introduction emerge exactly at these adaptive alleles between A. arenosa and A. lyrata (9, 15). Therefore, A. lyrata 95 and A. arenosa WGD stabilisation events are not independent. 96 97 Here we use an independent system, ~17 million years diverged from both A. arenosa and A. lyrata 98 (16), to test the hypothesis that this solution of nimble meiosis gene evolution is repeated, and if not, 99 whether changes in other genes from analogous processes are associated with adaptation to WGD. 100 Given the clear results in A. arenosa and A. lyrata, we hypothesised that the adaptive trajectories which 101 are available to mediate adaptation to a WGD state may be constrained, leading to repeated selection 102 of the same meiosis genes. Such a result would offer a striking case of convergent evolution in core 103 cellular processes. To test this hypothesis, we take advantage of a well-characterised model, Cardamine 104 amara (Brassicaceae, tribe Cardamineae). A large-scale cytotyping survey and genetic analysis 105 demonstrated an autopolyploid origin of the successful autotetraploid cytotype found in the Eastern 106 and Central Alps (17)(18)(19)(20). Importantly, C. amara is a perennial herb harbouring a high level of genetic 107 diversity (similar to both A. arenosa and A. lyrata) and shares a similar distribution range and 108 evolutionary history, with a likely single origin, followed by autotetraploid expansion associated with 109 glacial oscillations (18,19). 110

111
To test our hypothesis of gene and process-level convergence, we performed genome scans for 112 selection, contrasting natural autotetraploid and diploid populations. We found the strongest selection 113 signals specifically at genes involved in cellular functions central to adaptation to WGD: chromosome 114 remodelling, meiosis, cell cycle regulation, and ion transport. However, the evolutionary response to 115 WGD in C. amara is very different to that of A. arenosa: overall, we saw minimal gene-level convergence 116 in loci putatively mediating adaptation to WGD. In particular, none of the same meiosis-related genes 117 that control meiotic chromosome crossovers in A. arenosa were under selection in C. amara. This is 118 consistent with observations of clonal spreading and lower meiotic stability in both diploid and 119 autotetraploid C. amara, suggesting that C. amara autotetraploids are already prepared by their diploid 120 lifestyle to thrive, at least temporarily, while suffering reduced meiotic fidelity. However, in contrast to 121 a lack of gene convergence, we find a strong signal of process-level convergence in core processes 122 controlling DNA management, chromosome organisation, stress signalling, and ion homeostasis. 123 Overall, our results provide sharp contrast to widespread reports of gene-level convergence across the 124 tree of life and suggest that the genomic changes associated with a WGD state might not be as 125 constrained as would be expected based on their functional conservation across eukaryotes. 126 127

Results and Discussion
128 Population selection, sampling and genetic structure. To assess the genetic basis of adaptation to WGD 129 in C. amara, we generated a novel synthetic long-read reference genome (N50 = 1.82 mb, 95% 130 complete BUSCOs; see Methods) and resequenced in triplicate four populations of contrasting ploidy, 131 sampling 100 individuals: two diploid (LUZ, VRK) and two autotetraploid (CEZ, PIC; Fig. 1A; SI Appendix, 132 Table S1). We chose these populations based on a comprehensive cytological survey of over 3,000 C. degenerate SNPs. C, Rank Sum design used to minimise any influence of population-specific divergence in tests 142 for directional selection. 'p1' to 'p4' represent the between-ploidy contrasts used for the rank sum calculations.

143
'dd' and 'tt' represent within-ploidy contrasts used to subtract signal of local population history.

145
To obtain robust population allele frequency estimates across genomes, we performed a replicated 146 pooled sequencing approach. From each population we pooled DNA from 25 individuals in triplicate 147 and generated on average 31 million reads per pooled sample. We mapped the reads onto our C. amara 148 assembly. After read mapping, variant calling and quality filtration, we obtained a final dataset of 149 2,477,517 SNPs (mean coverage depth per population = 86, SI Appendix, Table S2). 150

151
Population structure of C. amara (Fig. 1B) showed primary differentiation by ploidy (first axis explained 152 43% of all variability) while the second axis (28% of variability explained) differentiated the two diploid 153 populations from each other. The two autotetraploid populations had the lowest genetic differentiation 154 of all contrasts (Fst = 0.04, mean allele frequency difference = 0.06) and showed a complete absence of 155 fixed differences (Table 1). Close genetic similarity together with spatial arrangement (the populations 156 represent part of a continuous range of autotetraploid cytotype spanning to Eastern Alps) suggest that 157 both autotetraploid populations represent the outcome of a single polyploidization event, in line with 158 previous population genetic inference based on large-scale sampling (18). The similar level of 159 interploidy divergence within both C. amara and A. arenosa (average Fst between diploids and 160 autotetraploids = 0.10 and 0.11, respectively) suggests that the polyploidization events in both species 161 happened at roughly comparable time points in the past (Table 1)

168
Directional selection specifically associated with WGD in C. amara. To minimise false positives due to 169 local population history we leveraged a quartet-based sampling design (21), consisting of two diploid 170 and two autotetraploid populations (Fig. 1C). We calculated Fst for 1 kb windows with a minimum 20 171 SNPs for all six possible population contrasts, and ranked windows based on Fst values. The mean 172 number of SNPs per population contrast was 2,270,868 (Table 1). To focus on WGD-associated 173 adaptation, we first assigned ranks to each window based on the Fst values in each of four possible 174 pairwise diploid-autotetraploid contrasts and identified windows in the top 1% outliers of the resultant 175 combined rank sum (Fig. 1C, contrasts p1-p4). We then excluded any window which was also present 176 in the top 1% Fst outliers in diploid-diploid or autotetraploid-autotetraploid population contrasts to 177 avoid misattribution caused by local population history (Fig. 1C,contrasts' tt' and 'dd'). By this 178 conservative approach, we identified 440 windows that intersected 229 gene coding loci (SI Appendix, 179 Table S3; 'WGD adaptation candidates' below). Among these 229 gene coding loci, a Gene Ontology 180 (GO) term enrichment analysis yielded 22 significantly enriched biological processes (Fisher's exact test 181 with conservative 'elim' method, adjusted p < 0.05, SI Appendix, Table S4). To further refine the gene 182 list to putatively functional candidates we complemented these differentiation measures with a 183 quantitative estimate following the fineMAV method (22) (see Methods). SNPs were assigned a 184 fineMAV score based on the predicted functional consequences of amino acid substitutions, using 185 Grantham scores, amplified by the allele frequency difference between the two amino acids (22). From 186 our 229 Fst WGD adaptation candidates, 120 contained at least one 1% fineMAV outlier amino acid 187 substitution (SI Appendix, Table S3, S5). remodelling factor (29, 30) that also has highly pleiotropic roles in osmotic stress response (31), 207 stomatal aperture (32), root meristem activity (33), and flowering time (34). Beyond this cluster, other 208 related DNA metabolism genes among our top outliers include DAYSLEEPER (Fig. 2)

227
Evolution of stress, signalling, and ion homeostasis genes. The remainder of the enriched GO 228 categories in C. amara revolved around a diversity of intracellular processes, including abiotic and biotic 229 stress response, protein phosphorylation, root development, ABA signalling, and ion homeostasis. The 230 intersection of these processes was often represented by several genes. For example, two of the top 231 20 highest-scoring SNPs in the genome-wide fineMAV analysis reside in SNF1-related protein kinase 232 SnRK2.9 (SI Appendix, Table S5). SnRKs have been implicated in osmotic stress and root development 233 (40, 41), and their activity also mediates the prominent roles of Clade A protein phosphatase 2C 234 proteins in ABA and stress signalling (42). Interesting in this respect is a strong signature of selection in 235 Table S3). Stress-related 236 phosphoinositide phosphatases are represented by SAC9, mutants of which exhibit constitutive stress 237 responses (43). Diverse other genes related to these categories exhibit the strongest signatures of 238 selection, such as PP2-A8 (44) and AT4G19090, a transmembrane protein strongly expressed in young 239 buds (45) (Fig. 2). 240

241
Given the observed increase in potassium and dehydration stress tolerance in first generation 242 autotetraploid A. thaliana (5), it is very interesting that our window-based outliers include an especially 243 dramatic selective sweep at K + Efflux Antiporter 2 (KEA2, Fig. 2), a K + antiporter that modulates 244 osmoregulation, ion, and pH homeostasis (46). Recent evidence indicates that KEA2 is important for 245 eliciting a rapid hyperosmotic-induced Ca 2+ response to water limitation imposed by osmotic stress 246 (47). The KEA2 locus in autotetraploid C. amara features an exceptional ten fineMAV-outlier SNPs ( Fig.  247 2, SI Appendix, Table S3, S5), indicating that the sweep contains a run of radical amino acid changes at 248 high allele frequency difference between the ploidies, strongly suggesting a ploidy-selected functional 249 change. We also detect cation-chloride co-transporter 1 (HAP 5) a Na+, K+, Cl− co-transporter, involved 250 in diverse developmental processes and Cl− homeostasis (48). 251 252 Limited gene-level convergence between C. amara and A. arenosa. We hypothesized that WGD 253 imposed strong, specific selection pressures leading to convergent directional selection on the same 254 genes or at least on different genes playing a role in the same process (gene-or function-level 255 convergence, respectively) between C. amara and A. arenosa. To test for this, we complemented our 256 C. amara genome scan with an analysis of A. arenosa divergence outliers based on an expanded 257 sampling relative to the original A. arenosa genome scan studies (13,14). We selected the 80 diploid 258 and 40 autotetraploid individuals sequenced most deeply in a recent range-wide survey (8) of genomic 259 variation in A. arenosa (mean coverage depth per individual = 18; 160 haploid genomes sampled of 260 each ploidy), and scanned for Fst outliers in 1 kb windows, as we did for C. amara. We identified 696 261 windows among 1% Fst outliers, overlapping 452 gene-coding loci (SI Appendix,

276
To determine whether we may have failed to detect convergent loci due to missing data or if genes that 277 stand as top outliers in A. arenosa had few, but potentially functionally-implicated, differentiated SNPs 278 in C. amara, we performed a targeted search in C. amara for the interacting set of meiosis proteins 279 found to exhibit the most robust signatures of selection in A. arenosa (14). All meiosis-related orthologs 280 in C. amara that also exhibit selection signatures in A. arenosa (13 in total) passed our data quality 281 criteria and were included in our analyses. Three showed any signal at all by fineMAV analysis: PDS5b 282 harbours an unusually high three fineMAV outlier SNPs, although it is not a window based Fst outlier. 283 ASY3, which controls crossover distribution at meiosis, has only one fineMAV 1% outlier polymorphism. 284 Error! Bookmark not defined.Finally, a regulator of endoreduplication, CYCA2;3, also harbours a 285 single fineMAV 1% outlier SNP in C. amara, although it was not included in the Fst window analysis 286 (number of SNPs < 20). However, upon inspection of Fst values of the (unusually low) 7 SNPs in the 287 window overlapping this gene, the selection signal in CYCA2;3 would be high (mean Fst = 0.55). Thus, 288 while we detect varying signal in these three meiosis-related genes following WGD, we do not see 289 signals of selection in the conspicuous set of interacting crossover-controlling genes that were obvious 290 in A. arenosa (14). 291

292
Meiotic stability in C. amara. Despite our broad overall analysis of selection in C. amara, as well as a 293 targeted assessment of the particular meiosis genes, we did not detect a signal of selection in those 294 genes in C. amara (SI Appendix, Table S8). The C. amara autotetraploid is well-established lineage that 295 underwent significant niche expansion in nature (18), but we still wondered if a contrast in meiotic 296 behaviour underlies this difference in specific loci under selection. Therefore, we cytologically assessed 297 the degree of male meiotic stability in C. amara (Fig. 3A). This revealed a low degree of stability in both 298 C. amara cytotypes (mean proportion stable metaphase I cells in diploid maternal seed lines = 0.38 -299 0.69, n = 133 scored cells; in tetraploids = 0.03 -0.38; n = 348 scored cells; SI Appendix, Table S9). 300 However, the degree of meiotic stability was lower in autotetraploids compared to diploids (differing 301 proportion of stable to unstable meiotic cells for each ploidy; D = 62.7, df = 1, p < 0.0001, GLM with 302 binomial errors; Fig. 3B, SI Appendix, Table S9), which corresponds with the lack of selection signal in 303 crossover-controlling meiosis genes. Interestingly, we did find that the degree of stability was variable 304 within each cytotype, suggesting the existence of standing genetic variation that controls stability. In 305 contrast, higher frequencies of stable metaphase I cells (>80%) have been observed for diploid and 306 autotetraploid A. arenosa (9). This, together with the observation of frequent clonal spreading of C. 307 amara (49), indicates that the species has an ability to maintain stable populations, even under varying 308 efficiencies of euploid gamete production, thus perhaps decreasing the urgency to fully stabilise meiosis 309 in either cytotype. This, in turn, may have facilitated the establishment of the autotetraploid cytotype. 310

314
For a complete overview of all scored chromosome spreads see Figure S4. B, Distribution of meiotic stability 315 (calculated as proportion of stable and partly stable to all scored meiotic spreads) in diploid and autotetraploid 316 individuals of C. amara. *** -p < 0.001, GLM with binomial errors.

318
Evidence for process-level convergence. While we found no excess convergence at the level of 319 orthologous genes under selection, we speculated that convergence may occur nevertheless at the 320 level of functional processes. To test this, we used two complementary approaches: overlap of GO term 321 enrichment and evidence of shared protein function from interaction networks. First, of the 73 322 significantly (p < 0.05) enriched GO terms in A. arenosa (SI Appendix, Table S7), we found that five were 323 identical to those significantly enriched in C. amara, which is more than expected by chance (p < 0.001, 324 Fisher's exact test; Table 3). In addition, some processes were found in both species, but were 325 represented by slightly different terms, especially in the case of meiosis ("meiotic cell cycle" in C. amara, 326 "meiotic cell cycle process" in A. arenosa: Tables S4 and S7). Remarkably, the relative ranking of 327 enrichments of all five convergent terms was identical in both C. amara and A. arenosa (Table 3). This 328 stands in strong contrast to the fact that A. arenosa presented an obvious set of physically and 329 functionally interacting genes in the top two categories ('DNA metabolic process' and 'chromosome 330 organisation'), while the genes in these categories in C. amara are implicated in more diverse DNA 331 management roles. 332 333

339
Second, we sought for evidence that genes under selection in C. amara might interact with those found 340 under selection in A. arenosa, which would further support process-level convergence between the 341 species. Thus, we took advantage of protein interaction information from the STRING database, which 342 provides an estimate of proteins' joint contributions to a shared function (50). For each C. amara WGD 343 adaptation candidate we searched for the presence of STRING interactors among the A. arenosa WGD 344 adaptation candidates, reasoning that finding such an association between candidates in two species 345 may suggest that directional selection has targeted the same processes in both species through 346 different genes. Following this approach, we found that out of the 229 C. amara WGD adaptation 347 candidates, 90 were predicted to interact with at least one of the 452 WGD adaptation candidates in 348 A. arenosa. In fact, 57 likely interacted with more than one A. arenosa candidate protein ( Fig. 4

and SI 349
Appendix, Table S10). This level of overlap was greater than expected by chance (p = 0.001 for both 350 "any interaction" and "more-than-one interaction", as determined by permutation tests with the same 351 database and 1000 randomly generated candidate lists). 352 354 Plots show C. amara candidate genes in blue and STRING-associated A. arenosa candidate genes in grey. We 355 used only medium confidence associations and higher (increasing thickness of lines connecting genes indicates 356 greater confidence). A, meiosis-and chromatin remodelling-related genes. B, ion transport-related genes.

358
Several large STRING clusters were evident among WGD adaptation candidates in C. amara and A. 359 arenosa (Fig. 4). The largest of these clusters centre on genome maintenance, specifically meiosis and 360 chromatin remodelling (Fig 4A), and ion homeostasis (especially K+ and Ca2+), along with stress (ABA) 361 signalling (Fig 4B), consistent with the results of GO analysis. Taken together, both STRING and GO 362 analyses support our hypothesis of functional convergence of these processes following WGD in C. Given the expected shared challenges attendant to WGD in C. amara and A. arenosa, we hypothesised 367 at least partially convergent evolutionary responses to WGD. While we found obvious convergent 368 recruitment at the level of functional processes, we did not detect excess convergence at the genic 369 level. This was consistent with the probable absence of shared standing variation between these 370 species (51), which are 17 million years diverged. Nevertheless, we note that if any shared variation has 371 persisted, it was not selected upon convergently in both young autotetraploids, thus strengthening the 372 conclusion that the genes selected in response to WGD are not highly constrained between these two 373 species. The most prominent difference we observed here is the lack of an obvious coordinated 374 evolutionary response in genes stabilizing early meiotic chromosome segregation in C. amara, relative 375 to the striking coevolution of physically and functionally interacting proteins governing crossover 376 formation in A. arenosa. This could be explained to some extent by the observation that in C. amara 377 both diploids and autotetraploids are not uniformly meiotically stable, and autotetraploids may 378 therefore enjoy a less strict reliance on the generation of a high percentage of euploid gametes. This 379 may allow the decoupling of crossover reduction from broader changes across meiosis and other 380 processes we observe. This is not to say that we see no signal of WGD adaptation in C. amara: factors 381 governing timing during later meiosis, especially the exit from meiotic divisions as evidenced by the 382 interacting trio of SMG7, SDS and MS5, along with other chromatin remodelling factors and DNA repair-383 related proteins, such as MSH6 and DAYSLEEPER give very strong signals. The convergent functions we 384 did detect (other meiosis processes, chromosome organisation/chromatin remodelling, ABA signalling 385 and ion transport) provide first insights into the salient challenges associated with WGD. 386

387
We conclude that evolutionary solutions to WGD-associated challenges vary from case to case, 388 suggesting minimal constraint. This may explain how many species manage to thrive following WGD 389 and, once established as polyploids, experience evolutionary success. In fact, we envision that the 390 meiotic instability experienced by some WGD lineages, such as C. amara, could serve as a diversity-391 generating engine promoting large effect genomic structural variation, as has been observed in 392 aggressive polyploid gliomas (3). 393 394

395
Reference Genome Assembly and Alignment 396 We generated a de novo assembly using the 10x Genomics Chromium approach. In brief, a single diploid 397 individual from pop LUZ (SI Appendix, To isolate genomic regions subjected to directional selection acting specifically between diploids and 439 autotetraploids, we sampled a set of two diploid and two tetraploid populations (Fig. 1C). We used 440 comparisons between populations of the same ploidy to constitute a null model for shared 441 heterogeneity in genetic differentiation arising through processes unrelated to WGD (following an 442 approach successfully applied in (21)).

444
Library preparation and sequencing 445 We extracted DNA in triplicate from 25 individuals for each of the following populations: CEZ (4x), PIC 446 (4x), VKR (2x), and LUZ (2x). All plants used for DNA extraction were verified for expected ploidy by flow 447 cytometry. We then pooled samples of each population, constructed Illumina Truseq libraries 448 (Illumina), and sequenced them on an Illumina NextSeq at a 150 base pair, paired-end specification.

450
Data preparation, alignment, and genotyping 451 Fastq files from the two runs were combined and concatenated to give an average of 30.5 million reads 452 per sample. Data processing steps are given in SI Appendix, Methods, along with variant calling steps.

454
Population genetic structure 455 We first calculated genome-wide between-population metrics (Nei's Fst (59) and allele frequency 456 difference). We calculated allele frequencies (AF) as the average AF of all the pools. The AF in individual 457 pools has been calculated as the fraction of the total number of reads supporting the alternative allele 458 (60). We used the python3 PoolSeqBPM pipeline, designed to input pooled data 459 (https://github.com/mbohutinska/PoolSeqBPM). Then we inferred relationships between populations 460 as genetic distances calculated over putatively neutral four-fold degenerate sites using principal 461 component analysis (PCA) implemented in adegenet (61).

463
Window-based selection scan using a quartet design 464 We performed a window-based Fst (59) scan for directional selection in C. amara, taking advantage of 465 quartet of two diploid and two autotetraploid populations (Fig. 1C). Using such quartet design, we 466 identified top candidate windows for selective sweeps associated with ploidy differentiation, while 467 excluding differentiation patterns private to a single population or ploidy-uninformative selective 468 sweeps. To do so, we calculated Fst for 1 kb windows with minimum 20 SNPs for all six population pairs 469 in the quartet (Fig. 1C) and ranked windows based on their Fst value. We excluded windows which were 470 top 1% outliers in diploid-diploid (dd in Fig. 1C) or autotetraploid-autotetraploid (tt) populations 471 contrasts, as they represent variation inconsistent with diploid-autotetraploid divergence but rather 472 signal local differentiation within a cytotype. Next, we assigned ranks to each window based on the Fst 473 values in four diploid-autotetraploid contrasts and identified windows being top 1% outliers of 474 minimum rank sum.

476
To account for possible confounding effect of comparing windows from genic and non-genic regions, 477 we calculated the number of base pairs overlapping with any gene within each window. There was not 478 any relationship between the proportion of genic space within a window and Fst (Pearson's R 2 = −0.057, 479 Figure S3), indicating that our analyses were unaffected by unequal proportion of genic space in a 480 window. 481 In A. arenosa, we performed window-based Fst scan for directional selection using the same criteria as 482 for C. amara (1kb windows, min 20 SNPs per window). We did not use the quartet design as the range-483 wide dataset of 80 diploid and 40 autotetraploid individuals drawn from many populations assured 484 power to detect genomic regions with WGD-associated differentiation.

486
FineMAV 487 We adopted the Fine-Mapping of Adaptive Variation (fineMAV (22)), and modified it to fit the resources 488 available for reference genome of C. hirsuta. Specifically, we replaced CADD, the functional score 489 available for amino acids in human reference (22,62) , by the Grantham score (63), which is a purely 490 theoretical amino acid substitution value, encoded in the Grantham matrix, where each element shows 491 the differences of physicochemical properties between two amino acids. Details on FineMAV data 492 processing are given in SI Appendix, Methods.

494
Arabidopsis arenosa population genomic dataset 495 We complemented our analysis of adaptation to WGD in C. amara with analysis of A. arenosa, based 496 on an expanded sampling (8) relative to the original A. arenosa WGD adaptation studies (13,14). We 497 first aligned the short read sequences to the A. lyrata reference genome, called variants and filtered as 498 previously using the Genome Analysis Toolkit (GATK 3.5 and 3.6 (64)). We used a subset of the dataset 499 consisting of 80 diploid individuals (samples selected based on the highest mean depth of coverage) 500 and 40 tetraploid individuals from populations unaffected by secondary introgression from diploid 501 lineages (8). Such sub-sampling gave us a balanced number of 160 high-quality haploid genomes of 502 each ploidy suitable for selection scans. Finally, we filtered each subsampled dataset for genotype read 503 depth > 8 and maximum fraction of missing genotypes < 0.5 in each lineage. We calculated Fst using 504 python3 ScanTools pipeline (github.com/mbohutinska/ScanTools_ProtEvol). All subsequent analyses 505 were performed following the same procedure as with C. amara data. 506 507 GO enrichment analysis 508 To infer functions significantly associated with directional selection following WGD, we performed gene 509 ontology enrichment of gene list using the R package topGO (65), using A. thaliana orthologs of C. 510 amara/A. lyrata genes, obtained using biomaRt (66). We used Fisher's exact test with conservative 511 'elim' method, which tests for enrichment of terms from the bottom of the GO hierarchy to the top and 512 discards any genes that are significantly enriched in a descendant GO terms (67). We used 'biological 513 process' ontology with minimum node size 150 genes. 514 515 Protein associations from STRING database 516 We searched for association among C. amara and A. arenosa candidate genes using STRING (50)  517 database. We used multiple proteins search in A. thaliana, with text mining, experiments, databases, 518 co-expression, neighbourhood, gene fusion and co-occurrence as information sources. We used 519 minimum confidence 0.4 and retained only 1st shell associations (proteins that are directly associated 520 with the candidate protein: i.e., neighbouring circles in the network). 521 522 Quantifying convergence 523 We considered convergent candidates all candidate genes or significantly enriched GO categories that 524 overlapped across both species. Convergent candidate genes had to be members of the same 525 orthogroups (58). To test for higher than random number of overlapping items we used Fisher's Exact 526 Test for Count Data in R (68). 527

555
The authors thank Veronika Konečná for assistance with the map figure and Doubravka Požárová and 556 Paolo Other supplementary materials for this manuscript include the following:

29
Datasets S1 to S12 30 mapped to our C. amara assembly, while only 74.5% to C. hirsuta. We retained only the alignment to 71 C. amara for all analysis. Using the picard software tool (v. 1.134) (5), first duplicate reads were 72 removed via 'MarkDuplicates' followed by the addition of read group IDs to the bam files via 73 'AddOrReplaceReadGroups'. Finally, to handle the presence of indels, GATK (v. 3.6.0) (6) was used to 74 realign reads to the C. amara assembly via 'RealignerTargetCreator' and 'IndelRealigner'. 75 76 Variant Calling 77 Text files describing sample populations and ploidy were prepared, and variants called for the 12 bam 78 files using Freebayes (v. 1.1.0.46)(7) to generate a single VCF output. Due to working with pooled (high 79 ploidy) samples, Freebayes was run with '--pooled-discrete' (assumes samples result from pooled 80 sequencing). In addition, the software was restricted to biallelic sites ('--use-best-n-alleles 2') and indel 81 sites were excluded ('--no-indels'). The VCF was filtered via bcftools (v 1.8) (8) to remove sites where 82 the read depth was < 10 or greater than 1.6x the second mode (determined as 1.6 x 31 = 50, Figure S2).