Genomic rearrangements generate hypervariable mini-chromosomes in host-specific lineages of the blast fungus

Supernumerary mini-chromosomes–a unique type of genomic structural variation–have been implicated in the emergence of virulence traits in plant pathogenic fungi. However, the mechanisms that facilitate the emergence and maintenance of mini-chromosomes across fungi remain poorly understood. In the blast fungus Magnaporthe oryzae, mini-chromosomes have been first described in the early 1990s but, until very recently, have been overlooked in genomic studies. Here we investigated structural variation in four isolates of the blast fungus M. oryzae from different grass hosts and analyzed the sequences of mini-chromosomes in the rice, foxtail millet and goosegrass isolates. The mini-chromosomes of these isolates turned out to be highly diverse with distinct sequence composition. They are enriched in repetitive elements and have lower gene density than core-chromosomes. We identified several virulence-related genes in the mini-chromosome of the rice isolate, including the polyketide synthase Ace1 and the effector gene AVR-Pik. Macrosynteny analyses around these loci revealed structural rearrangements, including inter-chromosomal translocations between core- and mini-chromosomes. Our findings provide evidence that mini-chromosomes independently emerge from structural rearrangements of core-chromosomes and might contribute to adaptive evolution of the blast fungus. Author summary The genomes of plant pathogens often exhibit an architecture that facilitates high rates of dynamic rearrangements and genetic diversification in virulence associated regions. These regions, which tend to be gene sparse and repeat rich, are thought to serve as a cradle for adaptive evolution. Supernumerary chromosomes, i.e. chromosomes that are only present in some but not all individuals of a species, are a special type of structural variation that have been observed in plants, animals, and fungi. Here we identified and studied supernumerary mini-chromosomes in the blast fungus Magnaporthe oryzae, a pathogen that causes some of the most destructive plant diseases. We found that rice, foxtail millet and goosegrass isolates of this pathogen contain mini-chromosomes with distinct sequence composition. All mini-chromosomes are rich in repetitive genetic elements and have lower gene densities than core-chromosomes. Further, we identified virulence-related genes on the mini-chromosome of the rice isolate. We observed large-scale genomic rearrangements around these loci, indicative of a role of mini-chromosomes in facilitating genome dynamics. Taken together, our results indicate that mini-chromosomes facilitate genome rearrangements and possibly adaptive evolution of the blast fungus.


Introduction
Genomes of plant pathogens are highly dynamic and typically exhibit an architecture that facilitates rapid adaptation to their hosts. Since the rise of genome sequencing it became evident that plant pathogen genomes are often structured to accommodate high genetic diversification rates at virulence-related loci while maintaining relative stability in housekeeping regions, a phenomenon that shaped the term "two-speed genome" [1]. Since then, the two-speed genome concept has been widely documented in a number of plant pathogenic species [2,3]. Interestingly, various types of genome architecture have been observed in different species. These include effector gene clusters [4,5], lineage-specific genomic regions that are rich in transposable elements [6][7][8][9][10], or enrichment of virulence related genes in specific genomic regions, e.g. unstable telomeric and sub-telomeric regions [11]. Typically, these genomic compartments display higher rates of adaptive mutations compared to the rest of the genome [12]. In addition to signatures of single nucleotide polymorphisms (SNPs) indicative of positive selection and presence/absence polymorphisms, structural variation is common in pathogen populations and ranges from copy number variations of single genes to chromosome-scale rearrangements [13,14]. Extreme cases of genomic rearrangement are large-scale, chromosome length variations and the presence of isolate-specific, supernumerary chromosomes (syn. B-, accessory-, conditionally dispensable, mini-chromosomes). These are usually small, non-essential chromosomes that occur in addition to the regular set of conserved chromosomes within a species and have been described in animals, plants, and fungi [14,15]. Supernumerary chromosomes are present at different frequencies in natural populations of eukaryotic plant pathogens [14,16]. In some fungal species, supernumerary chromosomes have been directly implicated in the emergence of new virulence traits, underpinning the importance of understanding their role in evolution and pathogen adaptation [17][18][19]. However, the diversity of supernumerary chromosomes across plant pathogens and their contribution to genome plasticity is still poorly known.
Supernumerary chromosomes share common features that distinguish them from corechromosomes. Although they are variable in size, they tend to be smaller than corechromosomes with their size ranging from ~400 kb to 3 Mb in plant pathogenic fungi [8,14]. Yet, despite their small size, supernumerary chromosomes can contribute up to 15% to the total genome in certain species. Their number is also variable with some plant pathogenic fungi containing up to eight supernumerary chromosomes in addition to the corechromosome set [9]. Dynamic loss or gain of supernumerary chromosomes has been observed especially under stress conditions indicating that supernumerary chromosomes are major drivers of genome plasticity [20]. In addition to frequent genomic rearrangements, supernumerary chromosomes often do not follow Mendelian inheritance. They tend to be meiotically unstable, and thus, frequently lost [21]. However, in some cases meiotic gene drives have been observed increasing their potential to be inherited and potentially explaining their abundance in pathogen populations [22]. Supernumerary chromosomes are usually gene poor and repeat rich compared to the corechromosomes, and form one illustration of the two-speed genome concept [12,23]. However, a clear association with adaptive evolution is not always evident as they do not necessarily carry virulence-related genes [9,24]. Nonetheless, in some plant pathogenic fungi, supernumerary chromosomes are directly implicated in virulence [17,18]. The origin of supernumerary chromosomes is still debated but there is evidence for segmental duplications from core-chromosomes (or ancient core-chromosome duplication followed by partial chromosome loss) and horizontal chromosome transfer [19,[25][26][27]. Interestingly, supernumerary chromosomes can be transferred between isolates independently of the core genome and can alter the virulence spectrum of plant pathogens [19,27]. It is thus possible that supernumerary chromosomes facilitate gene flow in natural populations leading to new pathotypes.
Plant pathogens are ubiquitous in the environment and can cause severe damage to both cultivated and wild plant species [28][29][30][31]. Plant pathogens are often specialized on a specific host species or taxon. At the center of the co-evolutionary dynamics between pathogens and plants are effector proteins, i.e. secreted proteins that manipulate host processes to facilitate infection and colonization [32][33][34][35].In return, host plants have evolved immune receptors that can detect conserved molecular patterns and effector proteins to defend against invading pathogens [36,37]. This generally leads to fast-paced co-evolution between pathogens and their host plants that often follows arms-race dynamics where the frequency of adaptive mutations rises quickly in pathogen populations [16].
The blast fungus M. oryzae (Syn. Pyricularia oryzae) causes blast disease, one of the most devastating crop diseases worldwide resulting in yield losses in rice and wheat that make it a threat to global food security [28,29,38,39]. Despite its Linnaean binomial name, M. oryzae is a multihost pathogen that can infect more than 50 cultivated and wild grass species. Population genomics of M. oryzae revealed that the species is formed by an assemblage of differentiated lineages that are associated with particular host taxa, such as important cereals like rice (Oryza sativa), finger millet (Eleusine coracana), wheat (Triticum aestivum), and foxtail millet (Setaria italica), as well as weeds such as Indian goosegrass (Eleusine indica) [40,41]. Records of rice blast disease in China date back to the early 17 th century and until today it is recognized as one of the most threatening and widely distributed rice diseases [39]. Recent population genetics studies revealed that the rice-infecting lineage of M. oryzae consists of both a recombining population and multiple, clonally expanded lineages [42][43][44]. In Europe, the rice blast fungus population consists of only one of three major clonal lineages and it is possible that mating type isolation led to local asexual expansions. Another impactful blast disease is wheat blast. In the mid 1980s the disease emerged on wheat plants in Brazil and has since spread across large regions of South America and, more recently, South Asia, demonstrating the ability of M. oryzae to rapidly undergo host-range expansions and global pandemics [45][46][47].
Although signatures of gene flow have been observed within and between lineages [41], M. oryzae is thought to predominantly propagate asexually in agricultural settings. Given that genetically differentiated lineages tend to be associated with particular host genera, selection pressure imposed by the host plant is probably the main driver of adaptive evolution [41]. Adaptation to a specific host can be accompanied by gain or loss of effector genes that define the pathogen host range highlighting the importance of structural genomic variation [35,[48][49][50]. The degree to which genome architecture facilitates structural variation and even gene flow is a fascinating and still poorly understood question [6,11,23,51].
M. oryzae mini-chromosomes have first been described in the early 1990s, when studies on karyotype diversity revealed that large chromosomal rearrangements occur frequently within and between clonal lineages [52,53]. More recently, Chuma et al. [11] demonstrated that the effector gene AVR-Pita underwent multiple translocation events and proposed that genomic location of effector genes to subtelomeric regions could favor rearrangements within the genome and gene flow within asexual populations. Interestingly, AVR-Pita genes occur on different chromosomes, including supernumerary mini-chromosomes. Additionally, Luo et al. [54] and Kusaba et al. [55] showed that two variants of the AVR-Pik effector gene are present on a 1.6 Mb mini-chromosome in the Japanese rice blast isolate 84R-62B and that parts of this mini-chromosome can translocate to core-chromosomes in crosses. Further, loss of the mini-chromosome resulted in gain of virulence on host plants carrying the rice immune receptor Pik, a phenotype due to the associated loss of the AVR-Pik genes [55].
Despite the fact that M. oryzae mini-chromosomes have been known for 30 years, genome sequencing projects have somehow overlooked them. This has changed recently. Peng et al. [56] reported the first sequences of mini-chromosomes of M. oryzae. They analyzed the karyotypes of the three wheat blast isolates T25, B71, and P3. T25 was collected in Brazil in 1988 and B71 and P3 were collected in 2012 in Brazil and Paraguay, respectively. B71 contains a 2 Mb mini-chromosome and P3 contains two mini-chromosomes of 1.5 and 3 Mb, whereas T25 does not contain any mini-chromosomes. The mini-chromosomes of B71 and P3 only share partial sequence similarity, have lower gene and higher repeat content, and display partial similarity to sub-telomeric regions of core-chromosomes. The mini-chromosome of isolate B71 contains the effector genes Pwl2 and Bas1 in close proximity while they are located on separate core-chromosomes in other M. oryzae isolates. This raised the hypothesis that mini-chromosomes are sites of structural rearrangements associated with virulence factors within blast genomes. However, the genetic diversity of mini-chromosomes in other lineages of M. oryzae, and their association with genomic rearrangements and effector diversification remains poorly understood.
The objective of this study was to gain an overall picture of genomic structural variation across lineages of M. oryzae using the host-specific isolates from rice, foxtail millet, goosegrass and wheat that were previously sequenced using Illumina short reads by Chiapello et al. [48]. Our analyses led us to focus on mini-chromosomes, which we detected in three of the examined isolates of M. oryzae. We used long and short read sequencing data in combination with minichromosome isolation sequencing (MCIS) to improve the previous genome assemblies [48] to near chromosome quality. We found that the sequence composition of mini-chromosomes is highly variable indicating independent emergence of individual mini-chromosomes during M. oryzae lineage evolution. Further, we identified effector and virulence-related genes in the mini-chromosome of the rice-infecting isolate FR13 although this protein class was not generally enriched in the three sequenced mini-chromosomes compared to the core chromosomes. We documented several structural rearrangements around virulence-related loci which raises the possibility that chromosome-scale variation, notably mini-chromosome genesis, plays a role in driving adaptive genome plasticity in the blast fungus.

Near chromosome quality genome assemblies of four host-specific isolates of M. oryzae
Following emerging evidence underpinning the abundance of structural, genomic variation in plant pathogens we re-examined 4 previously sequenced M. oryzae genomes of the host specific strains FR13 (rice), US71 (foxtail millet), CD156 (goosegrass), and BR32 (wheat). We used long read sequencing to generate highly contiguous assemblies [57] and improved accuracy of the assemblies by applying a polishing pipeline using nanopore and published Illumina raw reads of the same strains [48]. To assess the completeness and quality of our assemblies we performed a benchmarking universal single-copy orthologs (BUSCO) analysis using the Sordariomycota database (https://busco.ezlab.org/) which confirmed 98 -98.2 % completeness, similar to the chromosome quality reference genome of strain 70-15 (98.2 %; Table 1). Overall, the new assemblies have vastly improved contiguity with contig numbers reduced from 111-2,051 in previous assemblies to 17-55 in the new assemblies (Table 1). Moreover, we increased the proportion of well-assembled repeat rich regions, reduced the number of ambiguous bases ("Ns") to zero, and improved overall completeness (Table 1). These highly contiguous assemblies can thus serve as new reference genomes for hostspecific isolates. The rice, foxtail millet and goosegrass isolates of M. oryzae contain unique sets of minichromosomes To assess the genomes for karyotype variation, we separated and visualized intact chromosomes of each strain by contour-clamped homogenous electric field (CHEF) gel electrophoresis. Strikingly, we found large-scale, structural variation in form of chromosome length polymorphisms and supernumerary mini-chromosomes in the isolates FR13, US71, and CD156 ranging in size between approximately 800 kb and 1.5 Mb (Fig 1). We hypothesized that these mini-chromosomes contribute to structural variation and might be similar to supernumerary, lineage specific chromosomes reported in other plant pathogenic fungi. It has been shown that supernumerary chromosomes can have important functions in pathogenicity, but their sequence composition and intraspecies variation in M. oryzae is still poorly understood. To further analyze the sequence composition of the observed mini-chromosomes, we identified corresponding contigs in our whole genome assemblies by isolating minichromosomal DNA from CHEF gels by electro-elution for mini-chromosome isolation sequencing (detailed protocol available on protocol.io [57]; adapted from [56]). We then mapped mini-chromosome derived raw reads against the whole genome assemblies to identify specific contigs with high coverage compared to the rest of the genome, indicating their mini-chromosomal origin ( Repetitive sequences could lead to ambiguous mapping of reads due to the presence of similar repetitive regions in core-chromosomes. To confirm that the observed increase in coverage is linked to mini-chromosome enrichment and not to ambiguous read mapping we plotted the repeat content per 10 kb sliding window and compared it to the coverage of uniquely mapping reads derived from mini-chromosome isolation sequencing. This analysis showed that repeat-rich regions did not overlap with regions of high coverage, which confirmed that the enrichment is valid and not due to repetitive sequences and ambiguous read mapping. Using this method, we identified 2, 4, and 3 contigs with high depth and breadth of coverage for strains FR13 ( respectively. The combined size of mini-chromosome contigs was 1.7 Mb for FR13, 1.58 Mb for US71, and 860 kb for CD156. For FR13 and CD156 the combined length of mini-chromosome contigs matched the size observed on CHEF gels. However, the combined length of US71 mini-chromosome contigs adds up to approximately double the size of the minichromosome identified by CHEF analysis, indicating that US71 most likely contains two minichromosomes of similar size that cannot be separated by CHEF gel electrophoresis. We also noticed an enrichment of mapped reads to the start of contig 2 in strain CD156 (S4 Fig), which might indicate that a segmental duplication contributed to the emergence of the minichromosome in this strain. Taken together we identified mini-chromosomes of varying size in three strains that likely represent strain-specific, genomic compartments in M. oryzae.

The three examined mini-chromosomes of M. oryzae have different sequence composition
Based on the variation in mini-chromosome size and number we hypothesized that they represent strain-specific genome compartments with unique sequence composition. We first asked the question whether the content of mini-chromosomes is conserved in the core genome of reference strain 70-15, by globally aligning the mini-chromosomes to 70-15. However, only a 761 kb fragment of the FR13 mini-chromosome generated an alignment matching a ~900 kb region of core chromosome 2 of 70-15, whereas mini-chromosomes of US71 and CD156 did not generate significant alignments with the reference genome.
As synteny breaks might disrupt the global alignments between the mini-chromosomes and the 70-15 genome we further mapped mini-chromosome derived raw reads against the 70-15 assembly. The ~900 kb region on chromosome 2 of strain 70-15 showed high coverage after mapping the mini-chromosome reads of strain FR13, confirming that this region of the mini-chromosome corresponds to the end of core-chromosome 2 in 70-15. However, we did not observe unique regions with high coverage after mapping of mini-chromosome derived reads from the more distantly related strains US71 and CD156 (S5 Fig). This indicates that the mini-chromosomes of these strains contain unique sequences compared to strains FR13 and 70-15. We further examined the total amount of reads that mapped to any position in the reference genome. We found that 82.78 %, 59.77 %, and 55.49 % of mini-chromosome reads of FR13, US71, and CD156, respectively, mapped to largely overlapping regions in the reference genome, indicative for unspecific mapping. However, the fraction of unmapped reads varied between FR13 (17.21 %) and US71 (40.23 %) or CD156 (44.51 %) indicating that almost half of the mini-chromosome derived reads of US71 and CD156 originate from isolatespecific regions.
To further assess the uniqueness of mini-chromosomes between isolates we performed pairwise alignments between extracted mini-chromosome contigs of each strain. We filtered the alignments for regions that align over > 10kb to exclude unspecific, short alignments generated by repetitive regions or local similarities. Only a small fraction of the minichromosomes formed alignments under these parameters, whereas core-chromosomes were highly similar to each other ( Fig 3A). Between CD156 and US71 only 28.39 % and 15.39 % of the mini-chromosomes generated alignments, respectively. Between FR13 and US71 the fraction of the mini-chromosomes that generate alignments was even lower with 15.01 % and 14.02 % and between FR13 and CD156 only 5.14 % or 2.61 %, respectively ( Fig 3B). In contrast, between 73.88 % and 81.26 % of the core-genome generated alignments. The difference in sequence similarity between core-and mini-chromosomes supports the hypothesis that minichromosomes represent isolate-specific genomic entities, possibly involved in host specialization and suggests that mini-chromosomes emerged independently in host-adapted lineages.

Mini-chromosomes have lower gene and higher repeat density than core-chromosomes
To determine the gene content of the mini chromosomes, we mapped the GEMO gene annotations [48] to our new genome assemblies using BLASTn in combination with a sequence similarity approach (see methods). Of 14515, 14013 and 14415 genes that were used as queries, 13828, 13746 and 14201 mapped to a single site in the nanopore assemblies for strains FR13, US71 and CD156, respectively. Another 175, 179 and 44 genes mapped to multiple sites reflecting duplicated genes that were likely collapsed in the previous short-read assemblies. Taking these expansions into account, we assigned 14322, 14348 and 14304 genes to FR13, US71 and CD156, respectively. Of these genes, 13963, 13985 and 14143 mapped to the core chromosomes and 359, 363 and 161 mapped to the mini chromosomes of each isolate (S1 Table). On average, in each of the three mini-chromosomes the gene content was approximately 30% lower when compared to the core genome (Fig. 4A, S1 Table). Bar plots of gene and repeat content of mini-(red) and core-chromosomes (grey) of isolates FR13, US71, and CD156. B) Density plots resulting from a bootstrapping analysis of gene and repeat content of isolates FR13, US71, and CD156. 10000 core-genomic fragments were randomly sampled for each mini-chromosome contig and distribution of genes and basepairs of repeats are shown as densities. Number of genes and repeat content of mini-chromosome contigs are indicated by arrowheads. The size of mini-chromosome contigs is given in parentheses. Lower and upper 2.5% frequency intervals are shown in red and blue. X-axis: number of genes and basepairs of repeats. Y-axis: Frequency in 10000 fragments. Y-axis limits set to min/max. We next analyzed the repeat and GC content of the mini chromosome contigs compared to the core genomes. The GC content of core-and mini-chromosomes was similar at ~50%, whereas the repeat content of the three mini-chromosomes differed from the core genome (Fig 4A; S1 Table). Indeed, the repeat content was more than twice as high in minichromosomes (31.38%, 27.87%, and 17.25%) compared to core chromosomes (13.78%, 12.71% and 5.99% in strains FR13, US71, and CD156 respectively (Fig 4A).
To exclude the possibility that the observed differences are due to biases in sample size, we performed a bootstrapping analysis for each mini-chromosome contig (see methods). Briefly, we sampled 10,000 random fragments of the size of each mini-chromosome contig from the core genome and analyzed gene and repeat content. This bootstrapping approach confirmed higher repeat and lower gene content of mini-chromosomes (Fig 4B).
These results indicate that even though the mini chromosomes have distinct sequences, they have common genomic features, low gene and high repeat density, that deviate from the typical composition of core chromosomes.

Mini-chromosomes contain highly dynamic effector and virulence-related loci
Our gene content analysis resulted in a total of 42,974 genes between all three isolates. We grouped these genes into 12,900 orthogroups, of which 9,003 were conserved across all three strains and 3,897 were absent in at least one strain (S2 Table). Among the 3,897 orthogroups that were absent in at least one strain, we found 1,827 that were conserved between two strains and 2,070 were unique to single strains. Based on our mini-chromosome analysis we further analyzed the location of these orthogroups and categorized them into "core-genome specific", "mini-chromosome specific" and orthogroups that contain members that are present on both core-and mini-chromosomes. We found that the vast majority (99.98%) of conserved tribes (orthogroups) was encoded exclusively on the core chromosomes or contained members on both core-and mini-chromosomes and only 0.02% are encoded exclusively on mini-chromosomes. Conversely, strain-specific tribes were slightly more abundant on mini-chromosomes (S2 Table), substantiating the observation that minichromosomes are strain specific and likely emerged independently in different strains or host-adapted lineages.
To determine functional categories of mini-chromosome encoded genes we performed a Hidden Markov Model (HMM) scan against the Pfam database (https://pfam.xfam.org/) using a total of 772 non-redundant, mini-chromosome encoded proteins. More than 50% of these proteins (415) did not contain any known domain, whereas 357 proteins contained Pfam domains. Certain domains were shared between at least two mini-chromosomes and included transcription factor DNA-binding domains, especially Zn(2)-Cys(6) zinc-finger, DDEsuperfamily endonuclease, Tc5 transposase DNA-binding, and Methyl-transferase domains as well as domains of unknown function, amino-acid permease, protein kinase, and glycosylhydrolase domains. However, the relative abundance of these domains varied between individual isolates (S3 Table) and it is unclear if the presence of these protein domains on the mini-chromosomes has functional implications.
The most abundant, isolate-specific domains on mini-chromosomes were present in the rice isolate FR13 and included cytochrome P450 and polyketide-synthase domains. Both of these domains are involved in pathogenicity in M. oryzae and other plant pathogens [17,58]. Interestingly, we identified a well predicted polyketide synthase as Ace1 (avirulence conferring enzyme 1), which is part of a large secondary metabolite cluster and triggers an immune response in host plants carrying the resistance gene Pi33 [58].
Virulence and host adaptation of M. oryzae are largely determined by genes encoding secreted effector. To identify secreted proteins on mini-chromosomes, we predicted the secretome of all isolates using signalp 2.1 in combination with targetp and tmhmm to remove mitochondrial and transmembrane proteins (see Methods). This resulted in 1,394, 1,558, and 1,611 secreted proteins in strains FR13, US71, and CD156, respectively. We found 50, 8, and 2 secreted proteins encoded on mini-chromosomes, most of which correspond to uncharacterized, hypothetical proteins (S4 Table).
We further investigated whether candidate effector genes are present on the mini-and corechromosomes using 27 characterized effectors and 167 predicted MAX-effectors (Magnaporthe AVRs and ToxB) identified by de Guillen et al. [59]. MAX-effectors represent a unique class of proteins that is expanded in M. oryzae and contain proteins that are sequence unrelated but share a similar core structural fold. Using tblastn we identified 75, 67, and 63 proteins with similarity to known or predicted MAX-effectors in the genomes of FR13, US71, or CD156, respectively. Most of these genes were located on the core-chromosomes (S6 Fig). However, we found 5 genes in FR13 and 2 genes in US71 that were located on the minichromosomes. Interestingly, 3 of these genes were duplicated either on the same minichromosome or between the mini-and core chromosomes, suggesting structural, genomic rearrangements following segmental duplication events. Among the duplicated effector genes, we found the rice isolate specific effector AVR-Pik on the mini-chromosome of FR13 (S6A Fig). Interestingly, we found two genes encoding different variants, AVR-PikD and AVR-PikA, located towards both ends of the FR13 mini-chromosome (S6A Fig). This is consistent with previous observations by Kusaba et al. [55], who identified the same effector variants on a 1.6 Mb mini-chromosome in the Japanese isolate 84R-62B and might indicate that certain mini-chromosomes are maintained in M. oryzae populations.

Patterns of genomic rearrangements around effector and virulence-related loci in minichromosomes
Our Pfam and effector analyses suggested that virulence related genes are located on minichromosomes and that these regions undergo rearrangements that possibly involve interchromosomal translocation events. To explore this, we extracted the AVR-Pik and Ace1 loci from other highly contiguous genome assemblies of the related strains 70-15, FJ81278, and Guy11 and analyzed the macro-synteny of corresponding regions for signs of genomic rearrangements. AVR-Pik is present in the reference strain 70-15 and is encoded in the subtelomeric region at the end of chromosome 2 and on contigs 13 and 21 in strains Guy11 and FJ81278, respectively. In strain 70-15 the AVR-Pik gene resides in the region that is syntenic to the 761 kb region on the mini-chromosome of FR13 described earlier. However, comparison of the corresponding region of Guy11 suggested large scale rearrangements and variable degrees of synteny around the AVR-Pik locus in different rice isolates (Fig 5A).
Additionally, we observed that the syntenic regions that are shared between the analyzed strains encode different variants of the AVR-Pik effector. Whereas the first syntenic region of the FR13 mini-chromosome encodes the variant AVR-PikA, syntenic regions in 70-15 and Guy11 encode the variant AVR-PikC. Furthermore, we identified a 82 kb region in isolate FJ81278 encoding AVR-PikD with high sequence similarity and synteny to the end of the FR13 mini-chromosome that encodes the same AVR-Pik variant, suggesting that the two copies of AVR-Pik on the mini-chromosome originated from independent genomic locations and recombined on the mini-chromosome. The polyketide synthase Ace1 is also located on the FR13 mini-chromosome. Ace1 is part of a large secondary metabolite cluster that spans a region of approximately 70 kb [58]. To analyze the genomic context around this cluster on the mini-chromosome, we extracted the gene sequences of the whole cluster in FR13 and analyzed syntenic regions in the aforementioned isolates as well as US71 and CD156. In rice isolates and the foxtail millet isolate US71 the Ace1 cluster and the order of the macrosyntenic region are well conserved (Fig 5B). Interestingly, the ACE1 cluster is encoded on core-chromosomes in strains 70-15, Guy11, and US71, based on the size of the contigs, suggesting inter-chromosomal rearrangements that involve coreand mini-chromosomes. In the more distantly related strain CD156 the Ace1 cluster is less conserved and the syntenic region only spans across 195 kb, reflecting the genetic diversity between rice isolates and distantly related lineages. Strikingly, the macrosyntenic region on the FR13 mini-chromosome is disrupted by a 392 kb insertion after the Ace1 cluster that is absent in the core-chromosomes of other isolates, further substantiating the hypothesis that mini-chromosomes might facilitate genomic rearrangements.

Discussion
Genomes of eukaryotic plant pathogens are notorious for being highly dynamic. Chromosome scale, structural variation, including supernumerary mini-chromosomes, has been documented by electrophoretic karyotyping [14] but has not been studied extensively at the sequence level, mainly due to methodological limitations. Here, we took advantage of recent technical developments to reassess the genomes of 4 previously sequenced, host-specific isolates of the blast fungus M. oryzae. We used a combination of long-and short-read sequence data to generate near chromosome quality genome assemblies. We found that supernumerary mini-chromosomes are common in the independent lineages of the blast fungus M. oryzae. We then used a technique we termed mini-chromosome isolation sequencing to collect short-read data from isolated mini-chromosomes and match them to contigs in the whole genome assemblies. We detected single mini-chromosomes in the rice and goosegrass isolates and two mini-chromosomes in the foxtail millet isolate of M. oryzae and discovered that the sequence of the mini-chromosomes is highly variable between strains, suggesting that they emerged independently probably from rearrangements of corechromosomes. However, the mini-chromosomes share common features such as low gene and high repeat density. Additionally, we located the genes for the effector AVR-Pik and the polyketide synthase Ace1 to the mini-chromosome of the rice-infecting isolate FR13 even though these genes are located on core-chromosomes in other isolates. Analyses of macrosynteny around these regions showed that mini-chromosomes undergo large-scale rearrangements, and thus, could contribute to genomic plasticity within M. oryzae populations. Overall, our results demonstrate the value of mini-chromosome isolation sequencing as a scalable method to reliably identify mini-chromosomes in whole genome assemblies. This will allow us to study this unique genomic compartment at the sequence level and help to unravel the mechanism that compartmentalizes plant pathogen genomes.
Our study, along with Peng et al. [56], reveals that mini-chromosomes are present in independent lineages of M. oryzae. Although we found mini-chromosomes in the examined rice, foxtail millet and goosegrass isolates of M. oryzae, we didn't detect any in the wheat blast isolate BR32 (Fig 1), an isolate that traces back to early outbreaks of wheat blast in 1991. This is consistent with the recent study by Peng et al. [56], which documented minichromosomes in the wheat blast isolates B71 and P3 collected in 2012 but not in T25 collected in 1988. It is possible that wheat blast isolates collected soon after the emergence of this disease in the 1980s lack mini-chromosomes. The lack of meiotically unstable minichromosomes in these isolates may be due to sexual reproduction, which was common among isolates from the early phases of the epidemic [60]. However, we cannot rule out the possibility that these isolates have lost their mini-chromosomes over time due to long term culturing under laboratory conditions. Similar loss of accessory chromosomes within weeks has been demonstrated in vitro for Z. tritici [20]. The observed variation in mini-chromosome content is consistent with earlier studies showing that the occurrence of mini-chromosomes is variable within and between lineages of M. oryzae [52,53].
The three mini-chromosomes we studied share little sequence similarity with each other indicating that they are not conserved across host specific lineages but rather emerged independently throughout the diversification of M. oryzae (Fig 3). In contrast, mapping of mini-chromosome derived reads of wheat isolate P3 to the B71 mini-chromosome sequence revealed several overlapping regions [56]. Despite the overlaps, these regions seem to be interspersed with sequences that are specific to each individual mini-chromosome. It is likely that mini-chromosome divergence follows overall genetic distance between isolates explaining the low sequence similarity between the mini-chromosomes we sequenced. This would be consistent with the view that mini-chromosomes emerge from rearrangements of core-chromosomes. However, given that vast variation in mini-chromosome size was observed among rice isolates [52], variability of mini-chromosomes within host specific lineages is likely to occur. Future comparative analyses of natural populations of M. oryzae will shed light on the mechanisms of mini-chromosome evolution.
Despite their independent origin, mini-chromosomes share common genetic features. We notice that elevated repeat content and lower gene densities are consistent features of minichromosomes (Fig 4) [56]. This is surprising given that all mini-chromosomes share overall little sequence similarity and seem to have emerged independently. What is the underlying mechanism that facilitates emergence of sequence unrelated mini-chromosomes with common genomic features? Our results suggest that mini-chromosomes might emerge from core genomic rearrangements involving primarily core-chromosome ends (Fig 5; S4 Fig). These regions are known to contain higher repeat content and to be more dynamic than central regions of chromosomes [11,56,61]. It seems plausible that repeats are involved in mini-chromosome formation. Repeats facilitate genomic rearrangements, such as interchromosomal translocations, in several plant pathogenic fungi [6,11,20,62,63]. Under stress conditions, specific classes of transposable elements can get activated and induce genomic rearrangements [64]. Interestingly, Peng et al. [56] observed that the B71 mini-chromosome contains more active repeats than core chromosomes based on lower rates of repeat-induced point (RIP) mutations. The emerging model is that transposon activation generates genomic instability, primarily at core-chromosome ends, resulting in the genesis of minichromosomes. Future comparative analyses of mini-chromosomes from closely related isolates, ideally within the same clonal lineage, will further reveal the precise genetic elements associated with the emergence and divergence of mini-chromosomes.
What are the implications of mini-chromosomes for adaptive evolution of the blast fungus? Our results, together with other's suggest that mini-chromosomes form a gene poor, repeatrich genomic compartment that contributes to genome rearrangements and likely gene flow [11,55,56]. Given that effector genes are associated with mini-chromosomes of M. oryzae, this genomic architecture is another example of the two-speed genome concept in which particular compartments contribute to adaptive evolution. This could happen through several mechanisms. Mini-chromosomes may represent an intermediate stage of large rearrangements that facilitate generation of structural variations across the genome. Additionally, mini-chromosomes might facilitate transfer of genetic material between individuals by enabling gene flow independently of core genomic recombination. Transfer of supernumerary chromosomes has been reported for several plant pathogenic fungi [19,27,55,65]. A species such as M. oryzae that consists of genetically defined host-specific lineages may benefit from a process that facilitates horizontal transfer of genetic material. Genetically diverse isolates that infect a common host may give rise to new variants that increase the diversity and adaptive potential of local blast populations [66]. Therefore, it is of utter importance to understand the biology of supernumerary mini-chromosomes and its impact on genomic diversity and gene flow. Our study lays the basis for studying the role of mini-chromosomes in defining the genetic identity of host specific lineages of M. oryzae. Future analyses of a larger number of field isolates will further define the degree to which mini-chromosomes shape the evolution of host-specific lineages of M. oryzae.

Biological material
The M. oryzae isolates analyzed in this study were field isolates collected from different hosts and different regions and represent four host-adapted lineages. FR13 was isolated from japonica rice in France in 1988, US71 was isolated from Setaria sp. (foxtail millet) in the USA, CD156 was isolated from Eleusine indica (goosegrass) in Ferkessedougou, Ivory Coast in 1989, and BR32 was isolated from wheat in Brazil in 1991. All isolates were acquired from Elisabeth Fournier and have been previously reported in the GEMO project [48].

DNA extraction for whole genome sequencing and mini-chromosome isolation sequencing
For mycelium propagation and whole genome sequencing, M. oryzae isolates were cultured on complete medium agar (CM). High molecular weight genomic DNA from M. oryzae was extracted from mycelia of 7-day old cultures following the method described in [67]. Genomic DNA was quantified on a TapeStation (Agilent) and treated with DNAse-free RNAse. RNAse treated DNA was sheared using either a gTUBE or a 22 Gauge needle. Sheared DNA was captured using AMPure beads (Beckman Coulter, Indianapolis, US) and eluted in 45 μl water.
For mini-chromosome isolation sequencing, 8 blocks of 7 days old mycelium were transferred into 150 ml YG-medium (5g/L yeast extract, 20g/L glucose) and incubated at 24˚C and 120 rpm for 3 days. Mycelium was harvested by filtering the culture through two layers of miracloth. Protoplasts were generated by incubation in sterile Trichoderma harzianum lysing enzymes solution (150 mg lysing enzymes in 15 mL 1 M Sorbitol) for 2-4 h. Protoplasts were harvested by filtering through two layers of miracloth followed by centrifugation for 10 min at 1,500 rpm and washed twice in 1 M Sorbitol. Quality of protoplasts was observed microscopically. Protoplasts were then resuspended in 100-200 µl 1 M Sorbitol / 50 mM EDTA, embedded in 2x volumes of 1% certified megabase agarose (Biorad) and incubated in proteinase K containing NDS buffer (10 mg/mL laurylsarcosine, 100 mM TRIS-HCl pH9.5, 500 mM EDTA, proteinase K 200 µg/ml) at 50˚C for 48 h. Plugs were then washed three times with fresh 50 mM EDTA for 1 h prior to CHEF gel electrophoresis (detailed protocol available on protols.io, [68]).

Whole genome sequencing and assembly
Libraries for whole genome sequencing were prepared by following the 1D protocol from Oxford Nanopore. Sequencing runs were performed using MinION R9.4 (Oxford Nanopore Technologies, Oxford, UK). Sequence reads were assembled into contigs using Canu (v1.6 and v1.7) [69]. Contiguity was further improved by merging highly similar contig ends with identity >98% and alignment length >10 kb (for three contigs the length of the alignment was between 8 kb and 10 kb) after whole genome alignments using the nucmer application of the MUMMER3 package [70] and SSPACE [71]. After extracting and merging matching contig ends we tested for read support for both ends and reads spanning the entire region (Sup. Table 5). We then improved base calling quality of the assemblies by applying a polishing pipeline (https://github.com/nanoporetech/ont-assembly-polish) consisting of two iterations of racon (https://github.com/isovic/racon; v1.3.2) and two iterations of pilon v1.22 [72] polishing using Nanopore and published Illumina reads (Illumina reads from [48]). To assess the performance of the polishing pipeline and the overall quality of the genome assemblies we performed a benchmarking universal single-copy orthologs (BUSCO) [73] analysis using the sordariomyceta database (https://busco.ezlab.org/). Details about the assemblies, isolates, and accession numbers are available in [57]. Nanopore sequencing reads are deposited in the European Nucleotide Archive (ENA) under the accession numbers ERR2612751 (BR32), ERR2612749 (FR13), ERR2612750 (US71), and ERR2612752 (CD156).

Mini-chromosome isolation, library preparation and sequencing
Mini-chromosomes of M. oryzae were separated from core-chromosomes by contourclamped homogenous electric field (CHEF) gel electrophoresis. Therefore, plugs containing the digested protoplasts (see above) were placed in a 1% megabase agarose gel. We used 0.5% TAE buffer for subsequent DNA extraction or 0.5% TBE buffer for visualization (Fig. 1).
We separated mini-chromosomes using a Biorad CHEF DRII system with the following settings: Run time: 96 h; Voltage: 1.8-2 V/cm; Initial switch interval: 120 s; End switch interval: 3600 s. The gel was then dyed with ethidium bromide solution (1 µg/ml) for visualization on a UV-transilluminator and individual mini-chromosome bands were excised. Mini-chromosomal DNA was eluted from the gel plugs by electroelution using a D-Tube dialyzer midi, MWCO 3.5 kD (Merck). Therefore, individual plugs were placed in the dialysis tube and covered with 0.5% TAE buffer. The mini-chromosomal DNA was electroeluted for 3 h at 90 V, resuspended by slowly pipetting up and down and concentrated in a vacuum centrifugal evaporator to a concentration of 50-150 ng/µl. Concentration of the extracted DNA was measured spectrophotometrically and fluorometrically by Nanodrop and Qubit, respectively, prior to library preparation.
Libraries for mini-chromosome sequencing were generated following the general guidelines of the Nextera Flex library preparation kit (Illumina). We modified the protocol as follows. For tagmentation we used a total volume of 5 µl consisting of 2.5 µl Tn5 transposase, 0.5 µl of 10x reaction buffer, and 2 µl mini-chromosomal DNA set to a concentration of 0.5 ng/µl. The tagmentation mix was incubated in a thermocycler at 55˚C for 7 min. The tagmentation product was then used in a PCR reaction containing 2.5 µl custom barcoding primers (10 µM) (Sup. Table 6), 25 µl 2x NEBNext High-fidelity PCR mix (New England Biolabs), and 15 µl H2O. PCR conditions: 72˚C for 5 min, 98˚C for 30 sec, 5 cycles of 98˚C for 10 sec followed by 63˚C for 30 sec and 72˚C for 1 min. The numbers of amplification cycles needed for library preparation of each sample was then determined by quantitative PCR as follows: 25 µl SYBR Green Jumpstart Taq ReadyMic (Sigma-Aldrich), 5µl PCR product from previous reaction, 2.5 µl of each barcoding primer, 15 µl H2O. PCR conditions as described above, without the initial 72˚C step. Same PCR conditions were then applied to amplify the tagmented DNA. Libraries were then cleaned up using AMPure PB Bead purification kit (Pacific Biosciences). Concentration and quality of the libraries was confirmed by Nanodrop, Qubit, and BioAnalyzer 2100 using the high sensitivity DNA Kit (Agilent Technologies). Sequencing of mini-chromosomal DNA libraries was carried out on a NextSeq 500 system (Illumina) using the NextSeq 500/550 Mid output Kit (Illumina). Mini-chromosome derived reads were deposited at the European nucleotide archive under the accession numbers ERR3771227-ERR3771238.

Identification of mini-chromosomes in whole genome assemblies
The quality of reads obtained from mini-chromosome isolation sequencing was confirmed with fastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and low quality sequences as well as adaptor sequences were removed using trimmomatic [74]. Minichromosome reads of each strain were then mapped against the whole genome assembly of the same strain using the BWA-mem (burrows-wheeler aligner) algorithm with default parameters. Reads were then filtered to keep only uniquely mapping reads using the samtools package [75] to prevent ambiguous mapping of reads originating from repeat rich regions. MCIS read coverage was calculated per 1 kb sliding window (window size: 1000 bp; slide: 500 bp) using the bedtools package (https://bedtools.readthedocs.io/en/latest/) and plotted using the R package circlize [76]. Mini-chromosome coverage and repeat content used to generate circus plots are shown in S7 Table. Read mapping against the reference genome of strain 70-15 and visualization was carried out as described above. The number of reads mapping to the reference genome was calculated for total reads and uniquely mapping reads. Uniquely mapping reads were used for visualization to exclude ambiguous mapping sites. Total amount of mapping reads is reported as percentage of total reads per sample.

Whole genome and mini-chromosome alignments
Global alignments were generated using the nucmer algorithm of the MUMMER3 package [70]. The alignments were further filtered using the delta-filter utility (MUMMER3) with parameters -l 10000 and -i 80 (length >10 kb; percent identity >80%) to retrieve continuous alignments. Alignment coordinates were extracted with the show-coords utility and the output was modified to generate coordinate files in "bed" format for plotting in R. Plots were generated in R using the packages circlize [76], for circular representations of mini-and corechromosome alignments, and karyoploteR [77], for linear representation of syntenic regions.

Analysis of gene and repeat content
Gene models for all strains were retrieved from previous assemblies published via the Magnaporthe GEMO database ( [48]; http://genome.jouy.inra.fr/gemo/). We then identified genes using a similarity-based approach. We used blastn [78] with all strain specific gene models to identify genes in our assemblies using a threshold of >90 query coverage and 99% identity to account for differences in base calling quality and sequencing errors between two assemblies of the same strain. The >90 coverage threshold was chosen to account for errors in mononucleotide repeats that can occur from nanopore sequencing and the 99% identity threshold was empirically determined by whole genome alignments of both assemblies of the same strain which resulted in an average sequence identity between 99% and 99.5%. Resulting genes were assigned to mini-and core-chromosome encoded. Gene density was calculated as number of features per 10 kb window across each genomic compartment (coreand mini-chromosomes).
Repetitive sequences were annotated with RepeatMasker (http://www.repeatmasker.org/) using a merged library of repeats consisting of the RepBase repeat library for fungi (https://www.girinst.org/repbase/) and Magnaporthe repeats identified by Chiapello et al. [48]. Total repeat content for the whole genome as well as for mini-and core-chromosome contigs was extracted from RepeatMasker.
For bootstrapping of gene and repeat content we extracted the coordinates of features from the gene model blastn and RepeatMasker output. We then transformed the data into bed format for analysis in R. For each mini-chromosome contig we sampled 10,000 random regions from the core genome with equal size to the respective mini-chromosome contig using the randomizeRegions function of the regionR package. We then determined the numbers of genes and the basepairs occupied by repeats for each core-chromosome fragment and the mini-chromosome contigs using the countOverlaps function of the GenomicRanges package. Density plots of the frequency of features per core-chromosome fragment were generated with the R package ggplot.

Prediction of secreted proteins
To predict secreted proteins, we retrieved the protein annotation of all strains from the Magnaporthe GEMO database ( [48]; http://genome.jouy.inra.fr/gemo/). We then predicted proteins containing a signal peptide using the program SignalP v2.1 [79]. We then used TargetP-2.0 [80] to identify mitochondrial proteins and the hidden-markov model TMHMM [81] to identify proteins containing transmembrane domains. After removing mitochondrial and transmembrane proteins we matched secreted proteins to the blastn output used to transfer gene models (described above) to determine the number and genomic location of secreted proteins in the nanopore/canu assemblies.

Pfam domain annotation
Pfam domains were predicted by a hidden-markov model scan using the Pfam protein families database ( [82]; https://pfam.xfam.org/). Protein sequences were retrieved from the Magnaporthe GEMO database as described above.

Predicting orthologous groups by TRIBE-MCL
Orthologous families were predicted using the software TRIBE-MCL [83]; http://micans.org/mcl/). We combined the proteomes of all strains and identified homologous proteins using blastp with a query coverage threshold of 90% and e-value of 10e^10. We then transform the blastp output into an MCL readable format using the mcxdeblast command of TRIBE-MCL and identify tribes by mcl and extracted tribes and protein identifiers using custom perl scripts. Of 12951 tribes predicted from the combined proteome retrieved from the GEMO Magnaporthe database, the members of 12900 matched the new assemblies with high coverage (>90%) and identity (>99%). Resulting tribes were then categorized into three groups: i) tribes that are conserved in all isolates, ii) tribes that are missing in one of the isolates, and iii) tribes only present in one isolate. Then we assigned the genomic location of each of the tribe members as core-chromosome or mini-chromosome encoded, based on our mini-chromosome identification.