Genes derived from ancient polyploidy have higher genetic diversity and are associated with domestication in Brassica rapa

Many crops are polyploid or have a polyploid ancestry. Recent phylogenetic analyses have found that polyploidy often preceded the domestication of crop plants. One explanation for this observation is that increased genetic diversity following polyploidy may have been important during the strong artificial selection that occurs during domestication. To test the connection between domestication and polyploidy, we identified and examined candidate genes associated with the domestication of the diverse crops of Brassica rapa. Like all “diploid” flowering plants, B. rapa has a diploidized paleopolyploid genome and experienced many rounds of whole genome duplication (WGD). We analyzed transcriptome data of more than hundred cultivated B. rapa accessions. Using a combination of approaches, we identified more than 3,000 candidate genes associated with the domestication of four major B. rapa crops. Consistent with our expectation, we found that the candidate genes were significantly enriched with genes derived from the Brassiceae mesohexaploidy. We also observed that paleologs contained significantly more genetic diversity than non-paleologs, suggesting that elevated genetic variation may explain why paleologs are enriched among domestication candidate genes. Our analyses demonstrate the key role of polyploidy in the domestication of B. rapa and provide support for its importance in the success of modern agriculture.


Introduction
Polyploidy, or whole genome duplication (WGD), has long been associated with crop domestication and diversity [1][2][3][4][5][6][7][8] . Many desirable crop traits such as larger seed size, greater stress tolerance, and increased disease resistance are often attributed to polyploidy 2,9 . A recent phylogenetic analysis found that domesticated plants have experienced significantly more polyploidy than their wild relatives 10 . Polyploidy often precedes domestication and crops are nearly twice as likely to be domesticated in lineages with a relatively recent WGD compared to those without 10 . Among the potential explanations for the relationship between polyploidy and domestication, the expanded genetic diversity and plasticity of polyploid plants may be especially advantageous during domestication and crop improvement [11][12][13][14][15] . Analyses in yeast have shown that polyploid lineages not only have higher genetic diversity but also adapt to new environments faster than their lower ploidal level relatives 16 . Similarly, the niches of polyploid plants evolve faster than their diploid relatives 17 . These features may collectively give polyploids unique advantages over diploids during domestication and the global spread of crops that occurred with human population expansion.
Although nearly 30% of plant species are recent polyploids, all flowering plants are paleopolyploids with varying histories of WGD 18 . Given that the genetic consequences of polyploidy play out over time as genomes diploidize and paralogs fractionate 13,19,20 , we may expect that the effects of polyploidy extend to diploidized species. Here, we sought to test whether past polyploidy is associated with increased diversity and domestication in the crops of Brassica rapa . Like all flowering plants, the genome of B. rapa has been multiplied and fractionated many times over. The most recent polyploidization in the ancestry of B. rapa was a mesohexaploidy that occurred approximately 9-28 MYA [21][22][23][24][25][26][27] . Further, B. rapa has been domesticated into many different types of crops across Europe and Asia. These include turnips, oil seeds, pak choi, Chinese cabbage, and mustard seeds. Many researchers have suggested that there is a connection between the mesohexaploidy and the diversity of B. rapa crops 26,28 , but the relationship has never been explicitly tested. Using recently sequenced transcriptomes from a diverse array of B. rapa accessions 29 , we tested if polyploid-derived regions of the genome are enriched with candidate genes associated with domestication. We also compared genetic variation in the polyploid vs non-polyploid derived regions of the B. rapa genome. Given the frequency of ancient polyploidy and its contribution to the evolution of plants, our analyses demonstrate the key role of polyploidy in the domestication of B. rapa and provide support for its importance in the success of modern agriculture.

Partitioning the B. rapa genome into paleologs vs non-paleologs
To test the contribution of paleopolyploidy to the domestication of Brassica rapa , we used an integrated approach to classify genes as paleologs-genes derived from the Brassica mesohexaploidy-or non-paleologs. An  genes in European-Central Asian group. On average 70% of these genes were found in only one crop (Fig. 3a), indicating that many of these genes may have swept during the putative independent domestication of each crop. We also used Similarly, most of the genes found to be associated with recent positive selection or differential gene expression were unique to a particular crop lineage as expected with independent domestication and differentiation of these distinct crops.
Across all of the analyses, only five genes were repeatedly present in tests among one or more of the crops ( Fig. 3d and Supplementary we also developed a list of candidate genes from the literature. We focused on studies published over the last 10 years that identified genes through other approaches, such as fine mapping or bulk segregant analysis, to better establish a causal relationship between loci and crop traits. In total, we identified 40 candidate genes that fit these criteria from the literature (Supplementary Table   5). Many of these genes are associated with leaf and seed color variation, clubroot resistance, and cuticular wax biosynthesis. Notably, 15 of these genes were recovered in our candidate gene scans (Supplementary Table 5). Four of these genes were identified in our selective sweep and differential gene expression analyses. Mapping studies previously identified these genes as being associated with leaf color variation (Bra006208) 41 , cuticular wax biosynthesis (Bra011470 and Bra032670) 42,43 , and clubroot resistance ( Bra019410) 44 . For other loci, mapping studies have identified a small collection of candidate genes in target regions.
For example, Rcr5 is a gene of major effect for clubroot resistance in Brassica rapa . A recent bulk segregant analysis and fine mapping study found eight genes in the Rcr5 target region of B. rapa 45 . In our analyses, three genes in the center of their region were found to be significantly differentially expressed. These results suggest that future mapping studies in B. rapa may be able to leverage our candidate gene lists to improve gene identification. Why did artificial selection preferentially target paleologs? One possible explanation is that these genes may harbor more genetic diversity due to their paralogous history over the past 20 million years. To test this hypothesis, we examined the nucleotide diversity per gene across the B. rapa genome. Only reads that were uniquely mapped were used to estimate nucleotide diversity to minimize error from incorrectly mapped reads. We found that the genes derived from the mesohexaploidy had, on average, four times the nucleotide diversity of the non-paleolog fraction of the genome (Fig. 5). The mean nucleotide diversity of paleologs (mean = 0.408 × 10 -3 ) is approximately four times larger than that of non-paleologs (mean = 0.111 × 10 -3 ). Further, we observed that both nonsynonymous and synonymous diversity of the paleologs was higher than the non-paleologs. The increased genetic diversity of paleologs in B. rapa may have been important for the rapid response of these plants to artificial selection during domestication.

Discussion
Our results establish a connection between ancient polyploidy and    Candidate genes from the literature. To confirm our findings, we compared our observations of paleolog enrichment with a list of candidate genes from published studies of B. rapa. We surveyed the literature for fine mapping and bulk segregant analyses of B. rapa that mapped traits to one or a few candidate genes. These studies and the candidate genes are listed in Supplemental Table 6.
Overall, our survey identified 40 candidate genes in B. rapa that have been    whereas each row represents one gene with significantly different expression between the focal group and control group.    47 Table S5: Detailed information of the five Brassica rapa domestication candidate genes identified in all three of our genome scan approaches.
Annotation information was obtained from the Brassica Database (brassicadb.org/brad/). Brassica rapa gene IDs are given for each candidate gene. Paleolog status is indicated as Y (yes) or N (no). If the candidate gene was also identified in our genome scans, the type of scan is indicated with S (SweeD) or D (Differential gene expression).