Conversion between 100-million-year-old duplicated genes contributes to rice subspecies divergence

Background Duplicated gene pairs produced by ancient polyploidy maintain high sequence similarity over a long period of time and may result from illegitimate recombination between homeologous chromosomes. The genomes of Asian cultivated rice Oryza sativa ssp. indica (XI) and Oryza sativa ssp. japonica (GJ) have recently been updated, providing new opportunities for investigating ongoing gene conversion events and their impact on genome evolution. Results Using comparative genomics and phylogenetic analyses, we evaluated gene conversion rates between duplicated genes produced by polyploidization 100 million years ago (mya) in GJ and XI. At least 5.19–5.77% of genes duplicated across the three rice genomes were affected by whole-gene conversion after the divergence of GJ and XI at ~ 0.4 mya, with more (7.77–9.53%) showing conversion of only portions of genes. Independently converted duplicates surviving in the genomes of different subspecies often use the same donor genes. The ongoing gene conversion frequency was higher near chromosome termini, with a single pair of homoeologous chromosomes, 11 and 12, in each rice genome being most affected. Notably, ongoing gene conversion has maintained similarity between very ancient duplicates, provided opportunities for further gene conversion, and accelerated rice divergence. Chromosome rearrangements after polyploidization are associated with ongoing gene conversion events, and they directly restrict recombination and inhibit duplicated gene conversion between homeologous regions. Furthermore, we found that the converted genes tended to have more similar expression patterns than nonconverted duplicates. Gene conversion affects biological functions associated with multiple genes, such as catalytic activity, implying opportunities for interaction among members of large gene families, such as NBS-LRR disease-resistance genes, contributing to the occurrence of the gene conversion. Conclusion Duplicated genes in rice subspecies generated by grass polyploidization ~ 100 mya remain affected by gene conversion at high frequency, with important implications for the divergence of rice subspecies. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07776-y.


Introduction
Homologous recombination provides a major source of genetic innovation (Kurosawa   We performed genomic colinearity and structure analysis and identified duplicated genes 157 generated by the WGD event common to grasses in the GJ, XI-MH63, and XI-ZS97 genomes. 158 For blocks containing more than four colinear genes, there were more duplicated genes in GJ 159 (3,314 pairs) than in XI-MH63 or XI-ZS97 (2,629 pairs and 2,889 pairs, respectively). We 160 identified 46, 18, and 10 homologous blocks with more than 10, 20, and 50 colinear gene 161 pairs in GJ, respectively. XI genomes had much shorter duplicated blocks, with fewer than 10 162 blocks possessing more than 50 colinear gene pairs (Supplemental Table 1). We also used a 163 bidirectional best BLAST homology search to identify homologous gene pairs residing in 164 paralogous regions because some pairs might have been removed from the colinearity 165 analysis. Finally, 3,256, 2,502, and 2,816 homologous gene pairs were identified in GJ, 166 XI-MH63, and XI-ZS97, respectively (Figure 1). Compared with GJ, the XI varieties had 167 fewer homologous genes because XI has experienced more chromosomal rearrangement 168 events. 169 We used colinearity and structure analysis of intergenomic homologous genes to infer 170 orthologous genes generated by the recent species divergence (Supplemental Table 1). 171 Colinearity analysis identified 19,089 orthologous gene pairs in 103 blocks between the GJ 172 and XI-MH63 genomes. Between GJ and XI-ZS97, there were 18,498 orthologous gene pairs 173 in 119 blocks. The two varieties of XI, XI-MH63 and XI-ZS97, showed better colinearity, 174 with 25,262 orthologous gene pairs between them in 146 blocks. We again performed a 175 bidirectional best BLAST homology search among the three genomes to identify additional 176 orthologous genes. There were 23,719 orthologous gene pairs between GJ and XI-MH63, and 177 23,056 orthologous gene pairs between GJ and XI-ZS97. Since XI-MH63 and XI-ZS97 are 178 more closely related, we identified 35,049 orthologous gene pairs between the genomes of 179 these varieties (Supplemental Table 2).

180
Homologous gene quartets 181 To detect possible gene conversion between homologous genes produced by WGD, we used 182 homology and colinearity information to identify homologous gene combinations for WGD 183 and species divergence, which we defined 'homologous gene quartets.' Assuming that the 184 genomes of two species ('O' and 'S') retain a pair of duplicated chromosomal generated in a 185 common ancestor through WGD, the paralogous genes O1 and O2 in species O and the 186 respective orthologous genes S1 and S2 in species S constitute a homologous gene quartet 187 (Figure 2A). Sequence similarity between orthologous gene pairs is more similar than that 188 between paralogous gene pairs if there is no gene conversion (or nonreciprocal recombination) 189 between the duplicated gene pairs after species divergence ( Figure 2B). However, if gene 190 conversion occurs between duplicated genes, we might find that the gene tree topology has a 191 different structure than expected (Figure 2C-E). Changes in the topological structure of the 192 gene tree can be determined from the similarity of homologous sequences in homologous 193 gene quartets. As the gene sequence may be converted in whole or in part, we used different 194 methods to infer whole-gene conversion (WCV) and partial-gene conversion (PCV) (see 195 Materials and Methods for details).

196
Based on colinearity information of intragenomic and intergenomic homologous genes, 197 we identified 2,788 quartets between GJ and XI-MH63, and 2,879 quartets between GJ and 198 XI-ZS97. Although XI-MH63 and XI-ZS97 are varieties of the same subspecies, relatively 199 few quartets (2,566) were identified between them, probably due mainly to differences in 200 gene loss after the three genomes diverged. By comparing the three genomes, we inferred a 201 possible ancestral gene content before divergence of 19,104. Rates of gene loss or 202 translocation were 6.13%, 13.31%, and 7.89% in GJ, XI-MH63, and XI-ZS97, respectively.

207
Gene conversion and occurrence patterns 208 We removed highly divergent sequences to reduce the possibility of inferring gene 209 conversion events from unreliable sequences (see Materials and Methods for details). After 210 this, 2,788 gene quartets were identified between GJ and XI-MH63, 2,879 quartets between 211 GJ and XI-ZS97, and 2,566 quartets between XI-ZS97 and XI-MH63 (Supplemental Table   212 2). We used two methods to infer gene tree topology, one based on synonymous nucleotide 213 substitution rate (Ks) as a similarity measure and the other based on amino acid identity ratio, 214 which we called whole-gene conversion type I and type II (WCV-I and WCV-II), respectively. 215 We used a combination of dynamic planning and phylogenetic analysis to infer possible 216 partial-gene conversion (PCV) events (Supplemental Table 3). Since paralogous gene pairs 217 may be identified in different quartets, we merged the paralogous gene pairs affected by gene 218 conversion in each genome. This gave us the gene conversion events of each genome after 219 the divergence of rice.

226
In XI-ZS97, 468 pairs (~14%) of paralogs had been converted: 185 pairs (5.69%) were 227 WCVs, comprising 8 pairs inferred by WCV-I and 177 pairs inferred by WCV-II. Another 228 310 pairs (9.53%) were PCVs, which was also more than the number of WCVs (Table 1) Zs12g0396.01, and one gene fragment from 335 to 462 bp was converted through one-way 231 genetic information transmission (or rearrangement) ( Figure 3A). We discovered that the 232 gene conversion rate in XI was significantly higher than that in GJ ( Figure 3B). By analyzing 233 topological changes in the gene trees reconstructed using homologous genes, we further  High-frequency on-going gene conversion 237 By comparing the similarity of homologous gene quartets between different genomes, we 238 inferred gene conversion events in the three genomes during different evolutionary periods.

239
Duplicated gene pairs produced ~100 mya were still being affected by gene conversion. In GJ, 240 we identified 398 pairs of paralogous genes that might have undergone gene conversion after 241 the divergence of rice ( Figure 3D). The amino acid identity of four (1.01%) pairs of 242 paralogous genes was > 99%, with Ks < 0.01 (Supplemental Figures 1 and 2). A relatively 243 large number of duplicated genes were affected by gene conversion in the two XI varieties, 244 XI-MH63 and XI-ZS97. In XI-MH63, we found 466 pairs of paralogous genes that might 245 have undergone gene conversion ( Figure 3D) after the divergence of rice subspecies; six 246 (1.29%) of these pairs of paralogous genes had > 99% amino acid identity between them and 247 Ks < 0.01 (Supplemental Figures 1 and 2). Similarly, we identified 471 pairs of paralogous 248 genes in XI-ZS97 that might have undergone gene conversion after GJ diverged from XI 249 ( Figure 3D), and six (1.27%) of these pairs of paralogous genes had > 99% amino acid 250 identity between them and Ks < 0.01 (Supplemental Figures 1 and 2). We identified small 251 synonymous and nonsynonymous nucleotide substitutions and high sequence identity 252 between duplicated gene pairs in which gene conversion had occurred, suggesting that gene 253 conversion may have occurred over a very short time.

254
Another striking indication was that 407 and 391 pairs of paralogous genes were 255 affected by gene conversion before the formation of XI-MH63 and XI-ZS97, respectively; 78 256 and 79 pairs of paralogous genes were converted after formation of the two varieties, 257 accounting for 16.7% and 16.6% of the total gene conversion, respectively ( Figure 3D). 258 Duplicated genes in XI-MH63 and XI-ZS97 sharing a homologous region showed nearly 99% 259 amino acid identity and 0.99 nonsynonymous nucleotide substitution rate (Ks) 260 (Supplemental Figures 1 and 2). These data suggest that gene conversion between 261 paralogous gene pairs is on-going and occurs at high frequencies in rice subspecies.  Table 4). This suggested that the 272 duplicated gene that had undergone gene conversion was usually present as a donor locus in  Table 5). In GJ, XI-MH63, and 284 XI-ZS97, gene conversions were clustered in the 2 Mb region at the termini of chromosomes 285 11 and 12, and the gene conversion rate was 74.60%, 67.11%, and 73.02%, respectively. This 286 suggests that gene conversion usually occurs at the termini of chromosomes. (Figure 1D). 287 The physical location of genes on chromosomes may influence the chance of gene 288 conversion. Gene conversion is usually found at the termini of chromosomes, where gene 289 density is high (Figure 1; Table 2). In GJ, 692 paralogs were located in the 2 Mb at the 290 termini of chromosomes and about 17.20% of the paralogs were converted. This was higher 291 than the gene conversion rate for the whole genome (12.09%). In XI-MH63, we found 584 292 paralogs in the 2 Mb at the termini of chromosomes, and approximately 25.34% showed gene 293 conversion, which was also higher than the gene conversion rate for the whole genome 294 (18.57%). In XI-ZS97, there were 675 paralogs located in the 2 Mb close to the termini of 295 chromosomes, of which about 20.59% had undergone gene conversion, which was higher 296 than the gene conversion rate for the whole genome (16.62%). We found that the physical  conversion rate in XI-MH63 (R 2 = 0.85, P-value < 0.01). There was also a significant positive 308 correlation between block number of the chromosomes and gene conversion rate in XI-ZS97 309 (R 2 = 0.75, P-value < 0.01) and GJ (R 2 = 0.74, P-value < 0.01) ( Figure 5B). 310 Correlation does not imply a direct factor leading to gene conversion. For this reason, 311 we further analyzed the relationship between block length and gene conversion rate on each 312 chromosome (Supplemental Table 7). We found that longer blocks had a higher gene   We could not determine whether converted genes evolve slowly based on the paralogs 332 themselves, since pairwise distances between paralogs are converted. However, Pn and Ps 333 were slightly larger between orthologous gene pairs affected by gene conversion than 334 between orthologs not showing gene conversion. This suggests that the orthologs in which 335 gene conversion has occurred have evolved faster than those not affected by gene conversion. 336 We used Ps and Pn for determining whether gene conversion was affected by 337 evolutionary selection pressure. The ratio of Pn/Ps reflects the selection pressure between 338 gene pairs during evolution. We compared the Pn/Ps ratio between genes subjected to 339 conversion and those with no conversion. The average Pn/Ps ratio for XI-MH63 gene 340 conversion was 0.41, and the average Pn/Ps ratio of non-converted paralogs was 0.48. This 341 indicates that converted genes were subject to purifying selection ( Table 3). The Pn/Ps ratios 342 for gene conversion in XI-ZS97 and GJ were also smaller than those for non-converted genes.

343
The selection pressure for gene conversion or no gene conversion did not change much.  Table 8). 352 We analyzed 761, 910, and 912 duplicated genes with gene conversion and 5,262, 5,224, and  (Table 4). Four secondary-level terms were significantly enriched at the level of 361 molecular function and biological processes, and accounted for about 30% of the 362 corresponding gene sets. For example, the number of catalytic activity genes and metabolic 363 process genes in the three genomes in which gene conversion occurred (31.4% -37.7%) was 364 significantly more than that in which no gene conversion occurred (26.6% -30.6%) (P-value 365 < 0.01). Similarly, binding genes and cellular process genes showed higher gene conversion 366 (27.4% -39.9%) than duplicated genes without gene conversion (24.6.6% -38.4%), 367 suggesting that they are more likely to be converted.  Table 9). Among these, we identified 462 NBS-LRR genes in GJ, were found on chromosomes 2 and 11. These NBS-LRR genes showed a pattern of proximal 392 localization and young origin in the three genomes, as well as similarity in gene conversion. 393 We found a positive correlation between NBS-LRR genes and converted genes in regions 394 with more than 1% of the NBS-LRR genes in the three genomes. This suggested that during 395 their evolution, NBS-LRR genes might have had many chances to interact with one another, 396 leading to gene conversion. (Figure 6D).    Whole-genome conversion (WCV) inference: Since paralogous genes arise before 505 species divergence, the similarity between orthologous gene pairs in two species should be 506 higher than the similarity between paralogous gene pairs. However, gene conversion events 507 change the similarity between gene pairs. The first whole-genome conversion inference                           A n d Z s me a n s X I -Z S 9 7 ; Mh me a n s X I -MH6 3 ; Oj me a n s GJ .  1 1 -1 2 ) .