Comparative genome analysis indicates rapid evolution of pathogenicity genes in Colletotrichum tanaceti

Colletotrichum tanaceti is an emerging foliar fungal pathogen of pyrethrum (Tanacetum cinerariifolium), posing a threat to the global pyrethrum industry. Despite being reported consistently from field surveys in Australia, the molecular basis of pathogenicity of C. tanaceti on pyrethrum is unknown. Herein, the genome of C. tanaceti (isolate BRIP57314) was assembled de novo and annotated using transcriptomic evidence. The inferred pathogenicity gene suite of C. tanaceti comprised a large array of genes encoding secreted effectors, proteases, CAZymes and secondary metabolites. Comparative analysis of its CAZyme pathogenicity profiles with those of closely related species suggested that C. tanaceti had additional hosts to pyrethrum. The genome of C. tanaceti had a high repeat content and repetitive elements were located significantly closer to genes inferred to influence pathogenicity than other genes. These repeats are likely to have accelerated mutational and transposition rates in the genome, resulting in a rapid evolution of certain CAZyme families in this species. The C. tanaceti genome consisted of a gene-sparse, A-T rich region facilitating a “two-speed” genome. Pathogenicity genes within this region were likely to have a higher evolutionary rate than the ‘core’ genome. This “two-speed” genome phenomenon in certain Colletotrichum spp. was hypothesized to have caused the clustering of species based on the pathogenicity genes, to deviate from taxonomy. With the large repertoire of pathogenicity factors that can potentially evolve rapidly in response to control measures, C. tanaceti may pose a high-risk to global pyrethrum production. Knowledge of the pathogenicity genes will facilitate future research in disease management of C. tanaceti and other Colletotrichum spp..


23
Colletotrichum tanaceti is an emerging foliar fungal pathogen of pyrethrum (Tanacetum

57
The availability of only one genome of a member of the destructivum complex, C.   318 including those associated with translation and chromosome telomeric region (S1 Table).
319 Putative proteins of C. tanaceti were subjected to KEGG pathway analysis which returned 320 assignment of 5,883 proteins to known pathways (S2 Table). The highest number of KO 321 identifiers was among the metabolic pathway assignments (n=693) of which the majority 322 (n=363) were for amino acid metabolism followed by carbohydrate metabolism (n=290) (S3 323 Table). Among the environmental information processing pathways, 81 C. tanaceti genes 324 were assigned into 47 KO identifiers belonging to MAPK pathway (S4 Table). Furthermore, 325 24 C. tanaceti proteins were annotated with 10 aflatoxin biosynthesis pathway KO 326 assignments (S5 Table) and 56 proteins were assigned KOs for ABC transporters (S6 Table).

Genome alignment and synteny
328 The global alignment coverage of 13 other Colletotrichum genomes from C. tanaceti contigs 329 was proportionate to the evolutionary proximity to C. tanaceti (Fig 1a). The highest coverage 330 was in C. higginsianum (63.8%) and the least was in C. orbiculare 4.26%. Among the C. 331 tanaceti contigs aligned to the chromosomes of C. higginsianum, the best alignment coverage 332 was to chromosome NC_030961.1 (chromosome 9) (S7 Table).   353 tanaceti were assigned to 10,074 groups containing orthologs and/or recent paralogs and/or 354 co-orthologs across all species. A total of 6,002 genes were conserved in the genus 355 Colletotrichum. Colletotrichum tanaceti had 9,679 orthologs with C. higginsianum which 356 was the highest ortholog count among Colletotrichum spp. followed by 8,855 orthologs with 357 C. nymphaea (Fig 1b). Twenty of these groups, with 48 genes among them were exclusive to 358 C. tanaceti and were defined as recent paralogs (in-paralogs) of C. tanaceti with no 359 homology to the 16 other species tested. 438 In the C. tanaceti genome, 1,457genes had homologs in the fungal cytochrome P450 database 439 (S14 Table) and 911 out of that had >30% identity. There were 1,824 homologs (S15 Table) 440 in the transport classification database for C. tanaceti with 1,276 genes with >30% identity.
443 Within Colletotrichum genus, members of the gloeosporioides complex had the highest 444 number of homologs for both P450s and transporters (Fig 3b).
445 Homologs in PHI-base 446 A total of 3,497 homologs were recorded in C. tanaceti from the pathogen-host interaction 447 database (PHI), of which 1,592 represented mutated phenotypes with reduced virulence (S16 448 Table). The second most common (n=1,514) were the unaffected pathogenicity category, 382 449 homologs were for loss of pathogenicity and 42 were in the effector category. Notably, the 450 mutant phenotype of 141 homologs was lethal to this particular pathogen, and 103 homologs 451 had increased virulence after mutation (Fig 3c). The two gloeosporioides complex members 452 had the highest number of homologs in the database among the Colletotrichum spp., followed 453 by the acutatum complex species, C. simmondsii, C. fioriniae and C. nymphaea. Despite C.
454 higginsianum having a large number of homologs, C. tanaceti had a below average number 455 for all the categories among the Colletotrichum spp., with a profile similar to C.
471 At the divergence of Colletotrichum spp., 39 expansions and 12 contractions were predicted 472 with respect to its MRCA with Verticillium species (S19 Table). Expansions included the 492 Among the other species considered, Fusarium oxysporum had the highest number of genes 493 (n=344) that were gained, with 75 expanded CAZyme with respect to its MRCA (S20 Table).  (Fig 4). When compared the overall 514 pathogenicity gene profiles of all Colletotrichum spp., which included the numbers of the 515 SMB clusters, transporters, P450s, CAZymes and the homologs to the PHI database, the 516 profile of C. tanaceti was most similar to C. orchidophilum and C. chlorophyti (Fig 5).   (Table 4). The negative Z-scores confirmed the mean distance between those genes 531 and the nearest repetitive element was less than mean of a random sample of the genome.
532 Furthermore, all gene categories except the CAZymes were located significantly closer to the 533 interspersed repeats. However, the expanded and the contacted subgroups of the total 534 CAZome were significantly associated with interspersed repeats (Table 4).   543 tanaceti (Fig 6). A total of 24.3% of the genome which had an average length of 3.77 Kb was 544 rich in A-T and had a maximum G-C of 29%. A total of 85 genes were reported in these 545 regions which had a gene density of 6.04 genes per Mb but the majority (68.25%) of these 546 genes was hypothetical. Two secondary metabolite biosynthetic genes, 3 CAZymes, 2 547 cytochrome P450s, 2 lipases, 4 transporters, one transcription factor and one DNA 548 polymerase were also among the genes in the A-T rich regions (S21 Table). The G-C 549 equilibrated regions accounted for 75.7% of the genome and the average length was 14.6 Kb.
550 The maximum G-C percentage in these regions was 55.6 and 12,087 genes were reported 551 with a gene density of 276 genes per Mb.  574 Repeat-induced-point mutation (RIP) is a fungal-specific mechanism for limiting transposon 575 proliferation below destructive levels [39]. RIP is known to generate A-T rich regions with 576 lower gene densities and higher evolutionary rates than the core genome, thus generating 577 "two-speed" genomes in several fungi [117,[121][122][123]. The presence of A-T rich, gene sparse 578 regions in the C. tanaceti genome could therefore, be a byproduct of the RIP due to TE 579 proliferation. Accumulating repeats followed by expanding genome size with respect to the 580 non-pathogenic strains is a trend observed in many plant pathogenic fungi and can provide an