Homoeologous non-reciprocal translocation explains a major QTL for seed lignin content in oilseed rape (Brassica napus L.)

Oilseed rape is a major oil crop and a valuable protein source for animal and human nutrition. Lignin is a non-digestible, major component of the seed coat with negative effect on sensory quality, bioavailability and usage of oilseed rape’s protein. Hence, seed lignin reduction is of economic and nutritional importance. In this study, the major QTL for reduced lignin content found on chromosome C05 in the DH population SGDH14 x Express 617 was further examined. SGDH14 had lower seed lignin content than Express 617. Harvested seeds from a F2 population of the same cross were additionally field tested and used for seed quality analysis. The F2 population showed a bimodal distribution for seed lignin content. F2 plants with low lignin content had thinner seed coats compared to high lignin lines. Both groups showed a dark seed colour with a slightly lighter colour in the low lignin group indicating that a low lignin content is not necessarily associated with yellow seed colour. Mapping of genomic long-reads from SGDH14 against the Express 617 genome assembly revealed a homoeologous non-reciprocal translocation (HNRT) in the confidence interval of the major QTL for lignin content. A homologous A05 region is duplicated and replaced the C05 region in SGDH14. As consequence several genes located in the C05 region were lost in SGDH14. Thus, a HNRT was identified in the major QTL region for reduced lignin content in the low lignin line SGDH14. The most promising candidate gene related to lignin biosynthesis on C05, PAL4, was deleted. Key message A homoeologous non-reciprocal translocation was identified in the major QTL for seed lignin content in the low lignin line SGDH14. The lignin biosynthetic gene PAL4 was deleted.

quality, bioavailability and usage of oilseed rape's protein. Hence, seed lignin reduction is of 23 economic and nutritional importance. In this study, the major QTL for reduced lignin content 24 found on chromosome C05 in the DH population SGDH14 x Express 617 was further 25 examined. SGDH14 had lower seed lignin content than Express 617. Harvested seeds from 26 a F2 population of the same cross were additionally field tested and used for seed quality 27 analysis. The F2 population showed a bimodal distribution for seed lignin content. F2 plants 28 with low lignin content had thinner seed coats compared to high lignin lines. Both groups 29 showed a dark seed colour with a slightly lighter colour in the low lignin group indicating that 30 a low lignin content is not necessarily associated with yellow seed colour. Mapping of genomic 31 long-reads from SGDH14 against the Express 617 genome assembly revealed a 32 homoeologous non-reciprocal translocation (HNRT) in the confidence interval of the major 33 QTL for lignin content. A homologous A05 region is duplicated and replaced the C05 region 34 in SGDH14. As consequence several genes located in the C05 region were lost in SGDH14. 35 Thus, a HNRT was identified in the major QTL region for reduced lignin content in the low 36 lignin line SGDH14. The most promising candidate gene related to lignin biosynthesis on C05, 37 PAL4, was deleted. 38 exchange between homoeologous chromosomes (crossover) and non-reciprocal exchange 56 (non-crossover). Non-reciprocal exchanges known as homoeologous non-reciprocal 57 translocations (HNRTs) are duplication/deletion events where an additional copy of a DNA 58 sequence has replaced its homoeologous copy leading to gain and loss of genetic material 59 (reviewed in Gaeta and Pires (2010) and Mason and Wendel (2020)). B. napus genome, genomic segments and genes can be found in multiple copies. Two copies 65 of the A and C subgenomes which share the most recent ancestry are primary homologues, 66 while the remaining ones are secondary homologues and paralogous to one or the other copy 67 (Nicolas et al. 2007, Mason and Wendel 2020). It was demonstrated that regions of primary 68 homology show a high recombination rate of chromosomal rearrangements. This represents 69 an up to 100-fold increase compared to the frequency of homologous recombination measured 70 in euploid lines (Nicolas et al. 2007). Further, for B. napus a subgenome bias towards 71 replacing larger C-subgenome segments by smaller, homoeologous A-subgenome segments 72 was observed (Samans et  Having both adaptive and agronomic importance, HNRTs and HEs could play an important 79 role in the breeding of improved allopolyploid crops. 80 Due to a growing demand of plant-based protein for food and feed (Ismail et al. 2020), the 81 breeding aim in B. napus is increasing the seed oil and protein content (Si et al. 2003). 82 However, in comparison to soybean meal, which is the most important vegetable protein 83 source, oilseed rape meal has a lower protein content as well as higher anti-nutritive fibre 84 compounds (Nesi et al. 2008). Therefore, genetic reduction of anti-nutritive and non-digestible 85 compounds, such as lignin, is required to develop genotypes with improved nutritional quality 86 for the production of optimised vegetable protein products (Wittkop et al. 2009 corresponding oligonucleotides and applied PCR parameters are shown in Figure S1, Table  197 S1 and S2 of Supplementary  Determination and molecular validation of HNRT border sequences 230 For the identification of the border sequences of the HNRT event, a k-mer approach was used. 231 First, border-spanning reads of SGDH14 were divided into 15 bp k-mers. Afterwards, each 232 k-mer was assigned to C05, A05, or considered unspecific. If a k-mer was present in both 233 subgenomes (unspecific) it was excluded from further analyses, yielding bona fide 234 subgenome-specific k-mers. These k-mers were then mapped back to the border-spanning 235 read sequences to narrow down the exact position of the left and right substitution border. 236 Finally, manual curation was used to extract subgenome-specific SNPs around the borders of 237 the HNRT. These SNPs were further used for oligonucleotide design to validate the HNRT 238 borders via Sanger sequencing (Table S1, Supplementary Information 1). 239

Coverage analysis 240
For the genomic coverage analysis BAM derived coverage files were generated as described 241 before (  and their amino acid and coding sequences can be found in Table S6 of Supplementary  312 Information 4. 313

Development of a PAL specific marker 314
For the development of a marker, which can be applied in plant breeding, the subgenome-315 specific SNPs of the A05 and C05 PAL4 homoeologues were harnessed. By using 316 oligonucleotides (Table S1, Supplementary Information 1), which flank the subgenome-317 specific SNPs in the second exon, a subgenome-specific marker for the presence of the A05 318 and/or C05 PAL4 homoeologue was developed. 319 320 321

323
Phenotypic analysisthe F2 population derived from a cross of the low lignin line 324 SGDH14 and high lignin line Express 617 reveals variation in fibre traits 325 326 SGDH14 had lower seed lignin content and higher contents of oil and the sum of oil and protein 327 compared to Express 617 (Table 1). In the SGDH14 x Express 617 F2 population, namely 328 SGEF2, the lignin content of seeds ranged from 3.6 to 13.8 %, hemicellulose from 1.6 to 7.9 % 329 and cellulose from 12.5 to 17.7 % in the defatted meal, respectively (Table 1). Lignin showed 330 a strong negative correlation to hemicellulose and a weak positive correlation to cellulose 331 ( Figure 1B). Particularly, the lignin content showed a bimodal, skewed distribution. The low 332 and high lignin genotypes showed a 1:2 segregation ( Figure 1A). Comparing the low and high 333 lignin groups of the SGEF2 population, the low lignin group showed significantly higher 334 contents of oil and protein in the defatted meal and hemicellulose, as well as for the sum of oil 335 and protein compared to the high lignin group. The oil and protein content ranged from 38.7 336 to 47.1 % and from 17.1 to 22.3 %, respectively. The protein content in the defatted meal 337 ranged from 31.4 to 37.4 % and the sum of oil and protein content varied between 60.6 and 338 66.1 %. The thousand-kernel weight (TKW) ranges from 3.8 to 6.8 g and is significantly higher 339 in the low lignin group (Table 1). All those traits showed a weak negative correlation to lignin 340 ( Figure 1B). A positive correlation between lignin and darker seed colour as well as seed coat 341 content was found ( Figure 1B). The pigmentation of SGEF2 seed ranged from dark brown 342 (7.0) to black (9.0) ( Table 1). Transgressive segregation was observed for most of the traits in 343 the F2 population. The high lignin group exhibited a significantly higher seed coat content 344 (16.1 %) than the low lignin group (14.1 %; Table 1 and Figure 1C). To summarise the SGEF2 345 population was extensively phenotypically characterised and showed a large variation in seed 346 lignin content, which was also identified in the parents with SGDH14 possessing a low lignin 347 content and the Express 617 showing a high lignin content. 348 349  Thus, the QTL region is ~197 kbp bigger in the Express 617 genome assembly. To identify 383 genomic differences located near or inside the major QTL between the parental genotypes, 384 the genomic long-reads of SGDH14 were mapped against the Express 617 genome assembly. 385 A ~208 kbp region ranging from 41,563 to 41,771 kbp on C05 located near the center of the 386 major QTL revealed no read coverage in SGDH14 (Figure 2; Figure S2 of Supplementary  387 Information 2). Corresponding flanking reads of SGDH14 were identified to be chimeric, 388 having a C05 and an A05 sequence part. By separating the chimeric reads in subgenomic 389 k-mers ( Figure S3, Supplementary Information 2), the genomic border sequences were 390 narrowed down and validated via Sanger sequencing (Figure 2). Ultimately, these results 391 showed that the homoeologous A05 sequence ranging from 27,121 kbp to 27,289 kbp was 392 inserted between 41,563 kbp and 41,771 kbp on C05 in SGDH14 (Figure 2). Further, this 393 homoeologous A05 locus was identified to be duplicated, as the average coverage of the A05 394 homoeologous locus was ~2-fold higher, compared to the average coverage of the whole A05 395 chromosome ( Figure S2, Supplementary Information 2). Therefore, the name A05'' is used for 396 sequence features located within this duplicated homoeologous A05'' region ( Figure 2). A 397 homoeologous non-reciprocal translocation (HNRT) is identified in SGDH14. 398 Importantly, in the Express 617 genome assembly on chromosome A05 at position 27,213,244 399 to 27,213,343 a stretch of ambiguous bases was identified. This is indicative of an assembly 400 break that could not be resolved in the Express 617 genome assembly. In accordance, the 401 ONT long-reads of SGDH14 revealed a highly repetitive region located at this position inside 402 the HNRT region, which could not be spanned. Although border spanning chimeric reads 403 reaching into this repeat could be identified, their length was not sufficient to span this large 404 repeat ( Figure S3, Supplementary Information 2). As the region sizes were estimated based 405 on the Express 617 genome assembly, irrespective of this repeat, they are likely 406 underestimating the true sizes. The nGBS data support all above described findings. In 407 summary, in SGDH14 a ~208 kbp region on C05 of the major QTL is deleted and is substituted 408 by a ~168 kbp duplication of the homoeologous A05'' sequence. 409 To analyse the association between the HNRT event and the lignin phenotype we screened a 424 second independent population for the presence or absence of the HNRT event. We used the 425 SGDH14 x Express 617 DH population of Behnke et al. (2018). SGDH14 shows the low lignin 426 phenotype and harbours a HNRT event with a duplication of an A05 chromosome part and 427 replacing the homoeologous C05 region. Analysis of 60K Illumina SNP data revealed that 428 several markers in the QTL region were scored as "failed" not only in SGDH14, but also in its 429 ancestral genotypes Sollux and Gaoyou, as well as in SGEDH13 and Zheyou 50, but not in 430 Express 617 and Adriana (Table S8, Table S9, Supplementary Information 5). Carefully 431 studying the frequency distribution of "failed" scored markers in the C05 region in the original 432 SGDH14 x Express 617 DH genotyping table, allowed a clear separation of DH lines with and 433 without "failed"-markers, i.e. with and without HNRT. Comparison of the DH lines with and 434 without the HNRT event showed a clear, significant difference in several quality traits (Table  435 2). The lignin content was significantly reduced in lines with the HNRT event, whereas 436 hemicellulose and cellulose showed increased contents. Oil and protein content in defatted 437 meal and the sum of oil and protein content were increased in the low lignin lines. The seed 438 protein content was not different between the groups. These results were concordant with the 439 results of the high and low lignin groups of the SGEF2 population (Table 1). 440 441    Figure S5, Supplementary Information 4). The respective gene was identified as PAL4 483 homologue by in silico gene prediction followed by a BLASTp search. The organ-specific 484 expression analysis revealed that both BnaPAL4 homoeologues are much higher expressed 485 in developing seeds than the other identified paralogues of Express 617 (Table S7,  486 Supplementary Information 4). 487 In SGDH14, two A05 PAL4 copies (A05 and A05'' PAL4) were identified by inspecting SNPs 488 of the genomic and transcriptomic SGDH14 read mappings against the Express 617 genome 489 assembly. In total, 16 SNPs (12 in exon No. 3, 3 in 3'UTR, 1 upstream of 5'UTR) revealed a 490 ~50:50 ratio of reference vs alternative alleles (Figure 3, Figure S6 of Supplementary  491 Information 4). Twelve of these SNPs are located in the third exon, of which six SNPs are 492 identical to the C05 PAL allele, while the remaining six were specific for the new A05'' PAL4 493 allele as they differ from the A05 and C05 PAL4 allele ( Figure S6, Supplementary Information  494  4). Some of these SNPs result in amino acid exchanges compared to the A05 PAL4 amino 495 acid sequence, of which three are specific to the A05'' PAL4 (V586I, D596E, V638I), while the 496 remaining two (E610G, V629G) are identical to the C05 PAL4 sequence. Analysis of the 497 putative promoter sequence using 1 kbp upstream of the translational start side showed only 498 one SNP differentiating the A05'' PAL from the A05 PAL allele, while several subgenome-499 specific SNPs were identified ( Figure S6, Supplementary Information 4). Finally, a marker for 500 the presence or absence of the C05 PAL4 homoeologue was designed by using non 501 subgenome-specific oligonucleotides. Sequencing the PCR amplicons derived from the 502 second exon showed the absence of C05 specific SNPs in SGDH14, while Express 617 503 revealed heterozygous SNPs for each homoeologue (Figure 3). In accordance with the 504 genomic results stressing the loss of the C05 PAL4 homologue in SGDH14, the C05 PAL4 505 was not expressed in SGDH14 but in Express 617. The two A05 PAL4 homologues of 506 SGDH14 revealed a ~1.5 higher expression compared to the combined expression of the C05 507 and A05 PAL4 homoeologues of Express 617 (Table S7, Supplementary Information 4). 508 2007). In the present study, the allotetrapolyploid B. napus shows similar chromosomal 555 rearrangements with a subgenome bias. In SGDH14 the 208 kbp large C05 chromosome 556 segment was replaced by a smaller at least 168 kbp segment of the homoeologous A05 557 chromosome. The exact size of this segment could not be determined since a large repeat in 558 this sequence region was not spanned by the long-read sequencing data. In the Express 617 559 genome assembly this repeat is marked as a stretch of ambiguous bases (Lee et al. 2020). 560 Although the Express 617 genome assembly is based on 54.5x coverage with Pacific 561 Biosciences long-reads, linked reads, optical map-, and high-density genetic map data (Lee 562 et al. 2020) this repeat cluster could not be resolved. With the long-read data of SGDH14 the 563 repeat could also not be resolved due to the lack of spanning reads. To determine the exact 564 size of the HNRT, longer reads which span the repeat cluster are necessary. 565 566 567 568 569  570  The major QTL C05 was genotyped and mapped previously with the Illumina Infinium Brassica  571 60K SNP array with a total number of 58,464 markers. The relevant region on C05 with the 572 deleted fragment marked is shown in Table S8 along with the most likely and second most  573 likely position of the SNP markers of Express 617 in Table S9  The DH lines with the HNRT had a significantly lower seed lignin content than the ones without 579

Method for QTL and HE detection
HNRT (Table 2). 580 581 When did the HNRT event occur in SGDH14? To answer this question, the Illumina 60K SNP 582 marker data of the parental genotypes Sollux and Gaoyou of SGDH14 were analysed. Based 583 on the SNP marker data it became obvious that the same SNP markers gave "failed" results 584 for the HNRT region on C05 in both parents. Furthermore, SGEDH13, an offspring of SGDH14 585 x Express 617, showed the same lignin QTL on C05 and the same "failed" results for the 586 HNRT region on C05 (Yusuf et al. 2022). The same was found for the Chinese cultivar Zheyou 587 50 and Yusuf et al. (2022) discussed that Zheyou 50 may be an offspring from the SGDH14 x 588 Express 617 cross, since it originates from the Zhejiang Academy of Agricultural Sciences as 589 SGDH14, and carries the same major QTL on C05 and is of canola quality. Although the HNRT 590 event in Sollux, Gaoyou and Zheyou 50 was not confirmed by sequencing, it appears likely 591 that the HNRT event occurred much earlier and indicates that a positive selection may have 592 led to its enrichment. The nGBS data indicated for Sollux and Gaoyou the HNRT event, but 593 the exact borders could not be determined. The HNRT in Sollux is potentially even bigger 594 (~610 kbp) than in SGDH14. Yield testing of selected high and low lignin bulks of the crosses 595 SGDH14 x Express 617 and of Adriana x SGEDH13 recently indicated an advantage of the 596 low lignin over the high lignin bulks, since a higher yield could be detected (Holzenkamp et al. 597 2022). 598 599 As a consequence of genomic rearrangements, skewed marker segregation patterns often 600 prevent standard mapping procedures accurately localising QTL and causal genes in these 601 regions (Stein et al. 2017). It was proposed to use SNP marker data from loci spanning 602 deletions or duplications, which would be discarded in standard mapping procedures ( Similar results were determined in this study with the SGEF2 population. The high and the low 649 lignin groups of F2:3 seeds showed significant differences in seed coat content ( Figure 1C). 650 651 The seed coat consists mainly of three layers from inside to outside: the mucous epidermal 652 cells, the palisade and the endothelial layers (Moïse et al. 2005 and in aspen trees (Hu et al. 1999). Obviously, there is a crosstalk between the lignin and 688 hemicellulose biosynthetic pathway, which seems not to be the case for cellulose. 689 690 The thinner the seed coat of B. napus cultivars, the larger the proportion occupied by the 691 embryo, which leads to increasing contents of oil and protein (Slominski et al. 1999 All data generated in this study can be found under the ENA/NCBI Bioproject ID PRJEB55241. 803 The applied scripts in this study are freely available on GitHub: Major_low_ADL_QTL (DOI:  804 10.5281/zenodo.6970026). 805