Abstract
The Global Pandemic Lineage (GPL) of the amphibian pathogen Batrachochytrium dendrobatidis (Bd) has been described as a main driver of amphibian extinctions on nearly every continent. Near complete genome of three Bd-GPL strains have enabled studies of the pathogen but the genomic features that set Bd-GPL apart from other Bd lineages is not well understood due to a lack of high-quality genome assemblies and annotations from other lineages. We used long-read DNA sequencing to assemble high-quality genomes of three Bd-BRAZIL isolates and one non-pathogen outgroup species Polyrhizophydium stewartii (Ps) strain JEL0888, and compared these to genomes of previously sequenced Bd-GPL strains. The Bd-BRAZIL assemblies range in size between 22.0 and 26.1 Mb and encode 8495-8620 protein-coding genes for each strain. Our pan-genome analysis provided insight into shared and lineage-specific gene content. The core genome of Bd consists of 6278 conserved gene families, with 202 Bd-BRAZIL and 172 Bd-GPL specific gene families. We discovered gene copy number variation in pathogenicity gene families between Bd-BRAZIL and Bd-GPL strains though none were consistently expanded in Bd-GPL or Bd-BRAZIL strains. Comparison within the Batrachochytrium genus and two closely related non-pathogenic saprophytic chytrids identified variation in sequence and protein domain counts. We further test these new Bd-BRAZIL genomes to assess their utility as reference genomes for transcriptome alignment and analysis. Our analysis examines the genomic variation between strains in Bd-BRAZIL and Bd-GPL and offers insights into the application of these genomes as reference genomes for future studies.
Introduction
Batrachochytrium dendrobatidis (Bd) is a chytrid fungus and causative agent of the disease chytridiomycosis (Kilpatrick et al. 2010; Daszak et al. 1999; Longcore et al. 1999). The disease is widespread among amphibian populations and has contributed to declines on five continents (Scheele et al. 2019). The pathogen is globally distributed and genetically diverse, with strains being represented in described lineages; Bd-CAPE, Bd-ASIA-2/BRAZIL (hereafter Bd-BRAZIL), Bd-ASIA, and Bd-GPL (Global Panzootic Lineage) (Farrer et al. 2011; Schloegel et al. 2012; Rosenblum et al. 2013; Farrer et al. 2013; O’Hanlon et al. 2018; Byrne et al. 2019). Among these lineages, the Bd-GPL lineage alone is implicated in the majority of amphibian declines (Belasen et al. 2022; Becker et al. 2017; Farrer et al. 2011). The emergence of the Bd-GPL lineage has been suggested to be a more recent expansion, occurring in the late 20th century (O’Hanlon et al. 2018).The other three lineages are genetically divergent from the Bd-GPL strains but are less widespread (Belasen et al. 2022). The genetic diversity between the other Bd lineages and the recent evolution of Bd-GPL suggests the possibility of gene expansions in Bd-GPL strains related to pathogenicity(Joneson et al. 2011; Farrer et al. 2017).
Despite the pathogen’s importance, little is known about the gene content variation between Bd strains. The majority of genetic and genomic studies have relied on the reference genomes of two Bd-GPL strains, JEL423 and JAM81 (Joneson et al. 2011; Farrer et al. 2017). No contiguous genomes of Bd strains from any other lineage have been provided until now. Among the enzootic Bd lineages, Bd-BRAZIL has been previously compared with the Bd-GPL through comparative transcriptomics, regional distribution and virulence analysis (Becker et al. 2017; McDonald et al. 2020). Although strains from both lineages co-occur in South America, the Bd-GPL strains have been observed to be geographically unstructured and more infective towards amphibians in the Brazilian Atlantic Forest (Jenkinson et al. 2016; Greenspan et al. 2018).
Close relatives of Bd (e.g., Homolaphlyctis polyrhiza [Hp]) are saprophytic while the genus Batrachochytrium alone parasitizes amphibians (Joneson et al. 2011; Martel et al. 2013; Berger et al. 1998). Expansions in peptidase, including aspartyl-, M36 metallo-, and S41 serine-like peptidases, have been observed in Bd compared to Hp (Joneson et al. 2011; Farrer et al. 2017). A class of Crinkler Like Necrosis (CRN) genes, thought to be specific to Oomycetes, are also found in abundance in the Bd reference genome and relatives (Joneson et al. 2011; Sun et al. 2011; James et al. 2013). Chitin Binding Module 18 Domain (CBM18) containing proteins, implicated in protecting fungal pathogens against host chitinases, are similarly noted to be expanded in Bd (Abramyan & Stajich 2012; Farrer et al. 2017). Additionally, peptidases, CRNs and CBMs appear to be up-regulated during Bd infection compared to when the fungus is grown on media, further implicating their role in pathogenicity (Rosenblum et al. 2012; Ellison et al. 2017). Recently a newly described species, Polyrhizophydium stewartii (Ps) has been identified and classified as a closer saprophytic relative of Bd than Hp (Simmons et al. 2021; Amses et al. 2022). The discovery of Ps provides a resource to improve understanding of the transition towards pathogenicity in chytrids. Genomic comparisons between Ps and Bd will expand upon previous comparisons between Bd and saprophytic chytrids.
Pangenomes are valuable tools to elucidate the genomic variation within a population. The mechanisms that allow Bd-GPL to globally proliferate while strains from other lineages remain endemic is not understood. Competition between Bd lineages, differences in virulence between Bd-GPL and endemic lineages, and unique genomic recombination in Bd-GPL have been demonstrated (Belasen et al. 2022; Jenkinson et al. 2016; Farrer et al. 2011). Here we utilized Oxford Nanopore Sequencing of three Bd-BRAZIL strains and the species Ps JEL0888 to aid in the establishment of a Bd pangenome. In particular, our assembly of Bd-BRAZIL strain CLFT044 suggests an improvement on previous assemblies, possessing three telomere-to-telomere scaffolds. Furthermore the genome completeness estimates indicate that all three Bd-BRAZIL genomes are of similar quality to the previous assemblies, suggesting their utility in future genomic analysis. We present these new genomes to elucidate the genome content variability between Bd-GPL and Bd-BRAZIL strains.
Materials and Methods
DNA extraction
Three Bd strains from the Bd-BRAZIL lineage, CLFT044, CLFT067, and CLFT071, were grown on 1% Tryptone agar for seven days at 21°C. Zoospore and sporangia tissue was harvested from each culture by flooding with 1mL sterile reverse osmosis (RO) water, scraping colonies with an L-spreader, pelleted at 6500g for seven minutes, and flash frozen in liquid Nitrogen. Ps JEL0888 was grown in PmTG broth (1 g peptonized milk, 1 g tryptone, 5 g glucose, 1 L distilled water) for 14 days at 23°C before the samples were centrifuged at 6500g for seven minutes to remove broth and flash frozen in liquid Nitrogen. Cetyltrimethylammonium Bromide (CTAB) DNA extraction was performed on the frozen tissue samples from each isolate (Carter-House et al. 2020).
RNA extraction
Three biological replicates of three Bd-BRAZIL and two Bd-GPL strains were inoculated onto 1% Tryptone agar plates using 5 x 10^6 total active zoospores in 2mL of sterile RO water. Cultures were left to incubate under the same conditions as the DNA Bd tissue samples until active zoospores were visualized around every colony (5-7 days depending on the isolate). After incubation tissue samples were harvested with an L-spreader and pelleted at 6500g for seven minutes and flash-frozen in liquid nitrogen prior to RNA extraction. Because the samples were not filtered to remove sporangia, tissue samples consisted of both sporangia and zoospores. RNA was extracted from the tissue samples using TRIzol solution (Invitrogen, Mulgrave, VIC, Australia) under manufacturer’s protocol coupled with an overnight precipitation in isopropanol at −21°C to increase yields. Total RNA was sent to Novogene (Davis, CA) for poly A enrichment, cDNA synthesis and NovaSeq PE150 sequencing to achieve 6Gb data per sample.
DNA sequencing
gDNA from the three Bd-BRAZIL strains was sent to MiGS (SeqCenter) for Oxford Nanopore Technologies (ONT) sequencing to obtain 900mbp reads (∼30X coverage) for genome assembly. DNA from Ps JEL0888 was sequenced with the Oxford Nanopore MinION using the NBD104 barcoding kit and LSK109 ligation sequencing kit following manufacturer’s protocols resulting in 696.915 Mb (∼25X coverage).
Genome Assembly and Annotation
The ONT reads from all four samples were assembled de novo using Canu v2.2 (Koren et al. 2017) using the estimated genome size of 25Mb. Canu provided scaffolds with telomeres but did not produce the most contiguous assemblies. Assembly was repeated with MaSuRCA v4.0.9 (Zimin et al. 2013) incorporating publicly available Illumina sequence data for each strain (O’Hanlon et al. 2018; Amses et al. 2022) which produced a more consensus assembly. The Canu assemblies were scaffolded against their respective MaSuRCA assemblies with RAGTAG v2.1.0 (Alonge et al. 2022) to achieve contiguous genomes with telomeres. The assemblies were polished by 10 iterations of Pilon v1.24 (Walker et al. 2014) run within AAFTF (Stajich & Palmer 2022) using publicly available Illumina sequence data (Rosenblum et al. 2013; Farrer et al.
2013; O’Hanlon et al. 2018; Amses et al. 2022; Clemons et al. 2023).
Annotation was performed using Funnanotate v1.8.14 (Palmer & Stajich 2023) utilizing the RNAseq data from the three Bd-BRAZIL strains to increase accuracy of gene predictions for those genomes. Briefly, this entailed sorting the scaffolds by size, RepeatMasker v4.1.4 to mask the repetitive elements in the assemblies, training the Bd-BRAZIL assemblies with our RNAseq data, and functional annotation using default parameters. We downloaded the genome assemblies and annotations for three Bd-GPL strains; JEL423 (GCA_000149865), RTP6 (GCA_003595275.1) (Sumpter et al. 2018), and JAM81 (GCF_000203795) (Amses et al. 2022) to compare genomic content between BRAZIL and GPL strains.
We assessed the quality of our genome assemblies using QUAST v5.0.0 (Gurevich et al. 2013) and BUSCO v5.5.0 (Seppey et al. 2019) in genome mode against the fungi_odb10 gene sets. We compared the BUSCO and QUAST results with those from the reference and high quality Bd-GPL genomes (supplementary table 1). We calculated telomere counts using pattern searching with find_telomeres.py (https://github.com/markhilt/genome_analysis_tools) to assess chromosome completeness for all assemblies. We used PHYling v2.0 (Stajich & Tsai) to generate a multi-gene alignment for phylogenetic analysis of Bsal, Bd, Hp, and Ps to confirm relatedness of strains and species. Phylogenic tree was constructed with RAxML v8.2.12 (Stamatakis 2014) and tree was rendered with ggtree v3.8.2 (Yu 2022).
Transposable Element Content
We used RepeatModeller v2.0.4 (Flynn et al. 2020) and EDTA v2.1.0 (Ou et al. 2022) to generate a library of transposable elements in the six Bd genome assemblies, Ps, Hp, and the Batrachochytrium salamandrivorans (Bsal) strain AMP13/1 (GCA_002006685.2) (Wacker et al. 2023). We merged the libraries generated from the different assemblies and collapsed out duplicate TEs using cd-hit v4.8.1 at 98% ID (Huang et al. 2010). We used this condensed TE library and RepeatMasker v4.1.5 (Smit et al. 2013-2022) to determine counts of LTR, LINE, and DNA TEs in the genomes. Unknown TEs were searched with BLASTN v2.14.1 (Altschul et al. 1997; Camacho et al. 2009) against RepBase (Bao et al. 2015) database to classify them as LTR Copia, Gypsy, or DNA Type II transposable elements.
Synteny analysis between representative Bd-BRAZIL and Bd-GPL strains
Scaffold synteny was estimated between the 10 longest scaffolds in Bd-BRAZIL strain CLFT044 against the Bd-GPL JEL423 Sanger and RTP6 PacBio genomes using GENESPACE v1.2.0 (Lovell et al. 2022). The syntenic blocks between these three genomes were inferred using default settings, including Ps JEL0888 as the outgroup. The GENESPACE v1.2.0 riparian plot was constructed using JEL423 as the reference genome and flipping the orientation of CLFT044 scaffolds; scaffold_1, scaffold_2, scaffold_4, scaffold_5, and scaffold_8 which were inverted with respect to their RTP6 and JEL423 counterparts. We aligned the ONT DNA reads from CLFT044 back against the CLFT044 genome using minimap2 v2.24 (Li 2018) to confirm that reads supported merged scaffolds observed in CLFT044 but not JEL423 or RTP6.
Gene Family Variation between Bd and related species
We used Orthofinder v2.5.4 (Emms & Kelly 2019) as an initial step towards comparative genomics by identifying gene families unique to the Batrachochytrium genus (Bd and Bsal), but absent in their saprophytic relatives. This included the three Bd-BRAZIL ONT-hybrid genomes and Ps, the three public Bd-GPL genomes; JEL423, JAM81, and the published PacBio Bsal genome APM13/1, and the Illumina genome for Hp. JEL423 gene IDs belonging to gene families specific to the pathogenic chytrids Bd and Bsal were searched against the fungidb database to identify Gene Ontology (GO) terms. Additionally we used Orthofinder to identify gene content differences between the three Bd-GPL and three Bd-BRAZIL long-read genomes to identify the core and pangenome of Bd. Orthofinder visualizations were rendered with UpsetR v1.4.0 (Conway et al. 2017). We incorporated the species tree generated with phyling and the Orthofinder results to assess gene family expansions and contractions with CAFE v5.0.0 (Mendes et al. 2021). CAFE visualizations were rendered with CafePlotter v.0.2.0 https://github.com/moshi4/CafePlotter.
Presence Absence Variation (PAV) of Pathogenicity genes in Bd and other chytrids
We performed HMMsearch 3.3.2 with an e-value of 1e-15 for PFAMs PF02128, PF03572, PF00026, and PF00187 to count the number of M36, S41, ASP, and CBM18 proteins respectively in each annotated genome (Eddy 2011). HMMsearches using the Crinkler necrosis (CRN) PFAM PF20147 were unsuccessful at detecting copies in Bd. We used HMMbuild on the previously identified CRN proteins in JEL423 to construct an hmm profile to screen and count CRN proteins in Bd strains (Farrer et al. 2017). We constructed a heatmap for the protein family counts using the R package pheatmap v.1.0.12 (Kolde 2019).
Since homologous genes are likely to be syntenic with homologs from closely related strains, we used Cblaster v1.3.18 (Gilchrist et al. 2021) with a minimum percent identity of 80% and 2 flanking genes on the 5’ and 3’ end to identify the syntenic homologs of every M36 and CBM18 across the Bd strains. Genes were required to share the same four flanking genes, two upstream and two downstream of the M36 or CBM18, to be considered homologous. This analysis identified the non-redundant set of orthologous CBM18 or M36 across all strains, preferentially using the JEL423 copies given its status as a primary reference genome. If orthologs were missing in JEL423, we represented them by a copy from another strain. We used MUSCLE v5.1 (Edgar 2004) to align the non-homologous M36 and CBM18 encoding genes and constructed phylogenetic trees for both gene sets with IQTREE2 v2.2.2.6 (Minh et al. 2020) using the ModelFinder function which selected the VT+R10 model for CBM18 and the TWM+F+R5 model for M36 with 1000 SH-like bootstrap replicates (Kalyaanamoorthy et al. 2017). We included the M36 and CBM18 genes from Hp and Ps as outgroups and to root the phylogenies. Phylogenetic visualizations were rendered with the R package, ggtree v3.8.2 (Yu 2022).
CBM18 domain-containing proteins have been reported to contain multiple domains per gene in Bd strain JEL423 (Abramyan & Stajich 2012). The diversity of JEL423’s CBM18 repertoire includes variable counts of Tyrosinase and Deacetylase domains, as well as Lectin-like CBM18 proteins without additional domains (Abramyan & Stajich 2012). We used HMMSCAN to catalog the variation in CBM18 domain copy number between CBM18 homologs in the Bd Long-read genomes (Eddy 2011).
RNAseq Read Mapping
Based on the genomic diversity we observed between Bd strains, we questioned the efficacy of using a single reference genome for transcriptomic analysis of different Bd strains. To assess whether a Bd-BRAZIL reference genome will increase recovery of Bd-BRAZIL transcripts, we aligned RNAseq reads from four Bd-BRAZIL and two Bd-GPL strains were aligned to the Bd-BRAZIL CLFT044 assembly and to the Bd-GPL genome for JEL423 (GCA_000149865) using HISAT2 v2.2.1 (Kim et al. 2019). All sequence reads were mapped to the indexed genomes. Raw read counts were generated with FeatureCounts from Rsubread (Liao et al. 2019) and TPM values were calculated with edgeR (Robinson et al. 2010). Given the genetic variation between Bd strains, we tested how much the reference genome matters when calculating gene expression in Bd RNAseq data. We aligned the three replicates of PE RNAseq reads from CLFT044 against the CLFT044 genome and against the JEL423 genome to calculate the ratio of TPMs when aligning to CLFT044 (self) over JEL423 (ref). We focused this analysis on Single-Copy orthologous gene families, genes determined from Orthofinder to be single copy in both genomes, to avoid ambiguity caused by comparing distant orthologs. Using a global-pairwise sequence alignment of the two orthologous coding sequences (Pearson 2000) we scored sequence pairs for their alignability and then evaluated the relationship of sequence divergence, number of secondary blast hits, gap openings, and mismatch differences between CLFT044 and JEL423 to test whether length or sequence differences between homologous genes explain the variation in TPM calculations.
We tested whether genomic differences have inflated the number of differentially expressed genes (DEGs) between Bd-BRAZIL and Bd-GPL strains. A previous study reported DEGs between the Bd-BRAZIL and Bd-GPL lineages when RNA from two Bd- BRAZIL strains (CLFT044 and CLFT001) and four Bd-GPL strains (CLFT023, CLFT026, JEL410, and JEL422) are aligned to the reference genome strain JEL423 (McDonald et al. 2020). We performed HISAT2 alignments of the RNAseq data from this study to the reference genome of strain JEL423 and separately aligned to CLFT044. We used the R packages FeatureCounts and DeSeq2 v1.4.2 (Love et al. 2014) to re-calculate the number of differentially expressed genes (log2fold > 1.5 and Bonferroni adjusted p-value < 0.05) between the Bd-BRAZIL and Bd-GPL lineages. We intersected the identities of differentially expressed genes with the identities of Bd-GPL /Bd-BRAZIL specific genes from Orthofinder and the identities of the single copy orthologous genes with reference dependent transcript counts. Images depicting this intersection were rendered using the R package ggvenn v. 0.1.10 (Yan).
Results
Assembly quality of Bd-BRAZIL genomes is comparable to that of the reference genomes
The Bd-BRAZIL genomes were assessed to be of similar quality to the three published GPL genomes in measures of total contig count, N50, and BUSCO completeness (supplementary table 1). After assembly the Bd-BRAZIL genome assemblies composed 79, 86, and 85 contigs for CLFT044, CLFT067, and CLFT071 respectively with total lengths between 22Mb and 26Mb. BUSCO assessment of the assemblies concluded similar scores with average completeness for all Bd-BRAZIL genomes at 89.9% against the fungi_odb10 gene sets, slightly higher than the reference genomes JEL423 and JAM81. The assembly of Ps JEL0888 contained 291 scaffolds with an average BUSCO completeness of 81.4%. The Ps genome assembly was slightly larger than all the Bd assemblies at 31Mb. The Hp assembly (GCA_000235945.1) was far less contiguous and complete than those of the other chytrids, being assembled exclusively from short-read sequencing data. The Hp assembly possesses 11986 scaffolds and a BUSCO completeness of 80.9. Furthermore, its largest scaffold was only 227.1kb long, far shorter than the largest scaffolds of the other chytrid assemblies.
Bd is expanded in genome size and TE content to Hp but reduced compared to Ps
Bd-BRAZIL and Bd-GPL strains were overall similar in genome size, genic space, and TE content (Figure 1). The Bd-BRAZIL strain, CLFT044 alone possesses an abundance of LTR elements and DNA transposons compared to the other strains. Our results support previous findings that Hp has a reduced genome size and TE content compared to Bd (Farrer et al. 2017).
The gene counts found in all Bd strains were higher than the count in Hp, however Ps possesses a greater number of genes than any of the Bd strains. All other species were dwarfed in gene count compared to Bsal while overall gene counts were mostly similar between the Bd genomes.
TE expansion has been suggested as a driving force in the acquisition of pathogenicity genes in Batrachochytrium (Wacker et al. 2023). We identified expansions in LTR, LINE, and DNA transposable elements in Bd and Bsal compared to Hp. Ps was expanded in LTR and LINE elements compared to Bd, however its genome contains fewer DNA TEs than any of the Bd strains. All Bd strains are expanded in DNA TEs compared to the other species including Bsal.
Synteny is conserved along the 10 largest scaffolds between CLFT044 and the GPL reference genomes
Conserved syntenic regions between the longest 10 scaffolds in Bd-BRAZIL strain CLFT044 were compared with their homologs in Bd-GPL strains JEL423 and RTP6 (Figure 2). The 10 largest scaffolds in CLFT044 were present in the RTP6 and JEL423 assemblies. Scaffold_5 and Scaffold_9 appear to be split in CLFT044, being combined as DS022301.1 and QUAD01000002.1 in JEL423 and RTP6 respectively. Scaffold_5 contains one telomere on the 5’ end while Scaffold_9 does not possess telomeric sequences. This suggests completeness on the 5’ end of this region while the center and 3’ ends of the chromosome were not contiguously assembled.
Scaffold_2 in CLFT044 represents a more contiguous assembly, being a merge of the two contigs DS022302.1 and DS022309.1 in JEL423. ONT reads from CLFT044 spanned the entire Scaffold_2 region, indicating the validity of this merged scaffold compared to other assemblies.
Scaffold_3, Scaffold_4, and Scaffold_6 in CLFT044 all contained forward and reverse telomeres based on the telomere search results suggesting their status as complete chromosome assemblies. Although Scaffold_4 in CLFT044 is likely a complete chromosomal assembly, its homolog extends farther on the 5’ end in JEL423 and RTP6. This difference in length could be due to genome size differences between strains rather than assembly quality.
Bd-BRAZIL and Bd-GPL possess many lineage specific gene families
Analysis of shared orthogroups between Bd and closely related species (Figure 3A) reveals 348 gene families that are distinct to the Batrachochytrium genus (all Bd strains and Bsal), but absent in the saprophytic chytrids. Additionally 435 gene families were unique to Bd, being present in all Bd strains but absent in the other chytrid species. GO analysis on the JEL423 genes found in the pathogen specific gene families indicates pathogen specific expansions in Metallopeptidase gene families (Supplemental figure S2). Our cafe analysis revealed significant expansions of orthogroups between Hp, Ps, and Bd (Supplemental Figure S3). We observed 148 expanded and 147 contracted gene families in Bd compared to Ps. Additionally the Bd-BRAZIL lineage demonstrated 153 expanded and 37 contracted gene families while 63 expansions and 22 contractions were observed in Bd-GPL.
Orthofinder results between Bd strains indicate that there are 6278 core gene families in the Bd pangenome (Figure 3B). We identified 1,934 accessory gene families that were present in two or more strains and 160 singletons present in single strains. Among the accessory gene families, we find 202 to be distinct to the Bd-BRAZIL strains while 172 are unique to Bd-GPL. The singletons are unequally divided among strains with strains JEL423 and JAM81 having the highest number of unique gene families (51 and 50 respectively) while the remaining strains each possessed only 10-30 singleton gene families. We identified a gene family of putative Meiotically up-regulated gene 113 (mug113) proteins found in other fungi and in the Bd-BRAZIL lineage but absent in Bd-GPL. Among the Bd-GPL specific gene families were many proteins of unknown function that were absent in all Bd-BRAZIL strains. Interestingly the reference genome strains JEL423 and JAM81 lacked 105 gene families that are present in all the other Bd genomes (Supplementary figure S4).
Pathogenicity genes vary in count and sequence between Bd strains and other chytrids
Our HMMsearch analysis of the five pathogenicity genes revealed variation in copy number and sequence diversity of these gene families among Bd strains and sister species (Figure 4). Among the putative pathogenicity genes, all Bd strains were expanded in copy number for S41 Peptidase, ASP protease, CBM18, and CRN relative to saprophytic chytrids. CRN had the highest observed expansion in Bd compared to its saprophytic relatives which ranged from one to six copies in Hp and Ps and 27-108 copies in the Bd strains. Although we observed variation in pathogenicity gene count variation between Bd strains, the genes were not consistently expanded in Bd-GPL over Bd-BRAZIL. We compared the pathogenicity gene counts detected in long-read genomes with the counts found in the annotated genome of Illumina-only short read genomes for the same strains to determine whether sequencing technology plays a significant role in capturing Pathogenicity gene diversity (Supplementary figure S5). While counts for the SWEET and Adenylate kinase gene families were consistent between Illumina and Long-read genomes, the pathogenicity genes were consistently under-represented in the Illumina assemblies compared to the long-read genomes for the same strain.
Using cblaster we resolved M36 and CBM18 genes into orthologous loci using flanking syntenic genes. To identify core, variable, and singleton M36 and CBM18 genes in the Bd pangenome we allowed co-located homologous sequences found in multiple strains to be represented by an ortholog from a single strain. We searched with cblaster to find these representative orthologs in the Bd pangenome and determine the distribution of core and variable CBM18s and M36s in the Bd pangenome. If the core or variable M36/CBM18 was found in JEL423, the representative was named with the JEL423 copy, otherwise it was named for the representative from another Bd strain. Prior to collapsing duplicate orthologs there were a total of 210 M36 and 91 CBM18 copies in all six Bd strains. Collapsing the orthologous copies from separate strains resulted in 59 and 20 unique orthologs of M36 and CBM18 respectively.
Phylogenetic analyses of the M36 orthologs revealed ancestral and Bd exclusive clades of M36 (Figure 5A). Despite similar counts of M36 genes between Ps and Bd, the copies from both species, we observed sequence level variation that segregated the M36 orthologs from the Ps orthologs. We defined the ancestral M36 clade as the clade of M36 orthologs that contained members of M36 found in all genomes and species while the Bd specific clade contains orthologs found only in Bd. Furthermore we identified a previously unknown clade of M36 genes unique to Ps. Bsal’s M36 repertoire is expanded compared to Bd, with two clades of Bsal specific M36 clades sister to the Bd specific clade (Supplementary figure S6). Bsal and Bd possess similar counts of M36 genes in the ancestral M36 clade with three copies in Bsal and six in Bd.
We searched with cblaster for the newly identified M36 ortholog clusters against the six Long-read Bd genomes to establish the conservation of these M36 orthologs across the Bd strains (Figure 5B). We identified 15 core M36 orthologs conserved in all six genomes. While we did not identify any Bd-GPL specific M36 orthologs we discovered two Bd-BRAZIL specific M36 clusters that were present in two-three Bd-BRAZIL strains but absent in all Bd-GPL. Additionally we reveal the presence of nine M36 singletons, present in only single strains. We analyzed the region of MT418_006102, a singleton M36 from Bd-BRAZIL strain CLFT044, and compared its syntenic structure to its closest relative BDEG_24855, a core M36 from the ancestral clade. We discovered that the flanking region of MT418_006102 was duplicated and downstream of the BDEG_24855 syntenic cluster in CLFT044 (Supplementary figure S7).
Annotation error has likely contributed to the M36 count variation between Bd strains. Cblaster analysis revealed the JEL423 gene BDEG_27858 to be orthologous with a pair of tandem M36 genes in all other strains (Supplementary figure S8). The combined length of the M36 pair in other strains was equal to the length of BDEG_27858 further suggesting the genes were erroneously merged during annotation of the reference genome. One of the M36 loci in this region was likewise fragmented in JAM81 resulting in three M36 genes rather than two.
We identified variation in the domain content of CBM18 genes between Bd, Ps, and Hp (Figure 6A). Bd and Hp both possess only one copy of Tyrosinase containing CBM18 proteins while Ps contains three that are co-localized, possibly a result of tandem duplication. Lectin-like CBM18 proteins were expanded in Bd compared to either of its relative species, with only two present in Ps, none in Hp, and 10-14 orthologs in each Bd strain. Additionally we discovered differences in CBM18 Domain counts within homologs between the Bd strains (Figure 6B). We found that most CBM18 homologs (with the exception of BDEG_23733) varied in CBM18 domain count between different strains. One example displaying such diversity was BDEG_21734 which contains five domain copies in all Bd-GPL strains and four per Bd-BRAZIL. Furthermore, homologs of BDEG_20255 which, although present and syntenic in all Bd genomes, varied significantly in CBM18 domain counts with three to five domains per strain (Supplementary figure S9).
Aligning transcripts from Bd-BRAZIL strains to JEL423 may under or over-estimate gene expression for some genes
Our HISAT2 alignment revealed that transcripts from Bd-BRAZIL strains aligned back to the CLFT044 genome at an average rate of 97.52 and to the JEL423 genome at 96.85. We found that overall, the TPM ratios for CLFT044 transcripts when aligned to CLFT044 vs. JEL423 varied little, with the average TPM ratio at approximately 1 (0.993) (Figure 7B). Despite the largely consistent average, we found that the CLFT044 transcripts from 145 Single Copy genes are under-represented (TPM ratio > 1.2) and 279 are over-represented (TPM ratio < 0.79) when aligning to the JEL423 genome (Figure 7C). Gene length variance, percent ID, and number of secondary blast hits between JEL423 and CLFT044 were significantly correlated to TPM variance, however we were unable to determine a conclusive cause for this variance (Supplementary figure S10).
Differential expression analysis using RNAseq data from a previous study (McDonald et al. 2020) revealed 1083 DEGs between the Bd-BRAZIL and Bd-GPL lineages using JEL423 as the reference genome. Among these DEGs, 89 genes were specific to the Bd-GPL lineage, 43 were among the single-copy orthologous genes identified as over or under-represented when aligning RNA to the JEL423 reference genome (Supplementary figure 11A). Aligning the same RNAseq data to the CLFT044 genome resulted in 788 DEGs. 31 of the DEGs were Bd-BRAZIL specific genes and 49 were genes with reference dependent transcript counts (Supplementary figure 11B).
Discussion
Bd is the causative agent of chytridiomycosis in amphibian populations around the world, including North and South America (Longcore et al. 1999). However, the pathogen is not a monolith, as strains from different lineages exhibit variable distribution and virulence on their amphibian hosts (Dang et al. 2017; Rosenblum et al. 2013). Little is known about the functional differences that drive pathogenicity variance between the Bd-GPL and Bd-BRAZIL lineages; therefore these new Bd-BRAZIL genomes represent a valuable dataset through which to examine this variance. Although genome expansions and TE invasions have been implied as a driving force in hypervirulence between some strains of plant pathogenic fungi (Grandaubert et al. 2014) we did not detect an abundance of TEs in Bd-GPL strains compared to Bd-BRAZIL. Similarly our counts of pathogenicity genes between Bd strains reveals that there is no clear case of Bd-GPL consistently possessing more members of a pathogenicity gene family, however there exists intense copy number variation of pathogenicity genes between Bd strains.
We observed sequence level variation between Bd and Ps M36 genes despite overall similarity in M36 counts between the two species. Phylogenetic analysis revealed an ancestral clade of M36 with copies from Bd, Ps, Bsal, and Hp as well as a clade of M36 genes exclusive to Bd. While the ancestral M36 genes were universally present among the Bd-BRAZIL and Bd-GPL lineages, the Bd exclusive M36 genes were more variable in presence/absence among the Bd genomes. Furthermore we detected 23 M36 genes from Bd-BRAZIL genomes without homologs in the reference genome JEL423, illustrating the breadth of gene family diversity that is lost while relying upon a single reference genome. Two of the newly discovered M36 loci in Bd were co-localized and shared high sequence similarity with conserved M36 genes, suggesting they may have arisen from tandem duplication.
Along with overall gains and losses in pathogenicity gene content between Bd strains, we observe diversity in domain counts within homologs of the CBM18 protein family. The phenomenon of domain gains and losses within a single protein is rare yet has been previously reported in other systems (Prakash & Bateman 2015). In one CBM18 ortholog (BDEG_21734) we note a consistent domain gain from four to five domains between Bd-GPL and Bd-BRAZIL strains. It is possible that such changes in protein structure could affect the functions of these homologs, however this can not be proven without strenuous functional analysis.
Studies have focused on the genomic repertoire of reference genomes JEL423 and JAM81, however we show that there is a breadth of genomic diversity that is lost when relying on a single reference genome. Previous work on gene expression variance between Bd-BRAZIL and Bd-GPL strains has been conducted by aligning transcripts to the JEL423 genome, indicating an up-regulation of some pathogenicity genes in Bd-GPL with respect to Bd-BRAZIL strains (McDonald et al. 2020). Our transcriptome analysis of CLFT044 transcripts aligned against the CLFT044 and JEL423 genomes suggests that, although mostly accurate, transcription will likely be over or under-estimated for many genes. The results of our differential expression analysis between Bd-BRAZIL and Bd-GPL lineages suggests that genomic differences (lineage specific gene families and genes with reference dependent transcript counts) could explain ∼12.8% of the DEGs identified. Additionally we determined that aligning RNA from Bd-BRAZIL isolates to the JEL423 assembly will recover ∼2% fewer reads than when aligning to CLFT044. We therefore suggest a candidate reference genome for every Bd lineage to more accurately assess future transcriptomic comparisons.
While our analysis elucidated a partial pangenome between Bd-BRAZIL and the hyper-pathogenic Bd-GPL strains, it does not include representatives from Bd-CAPE and Bd-ASIA lineages. Although there are Illumina genomes for strains from these lineages, high-quality long-read assemblies for these genomes are currently unavailable nor did we have access to those strains for deep sequencing. Additionally Illumina Bd assemblies were not comparable to the long-read genomes as they did not capture the full diversity of multi-copy gene families, such as pathogenicity genes. Future studies including high quality long-read genomes from all lineages will improve upon the Bd pangenome analysis that we report here. We hope these genomic resources will help future exploration of genomic variation in Bd, potentially elucidating the mechanisms that make Bd-GPL a more globally successful pathogen than the other lineages.
Data Availability
The primary sequence data for Nanopore and Illumina DNA sequencing data are under BioProjects PRJNA987700 (Polyrhizophydium stewartii JEL0888), (Batrachochytrium dendrobatidis CLFT067 [PRJNA987741]), Batrachochytrium dendrobatidis CLFT044 [PRJNA821523], and Batrachochytrium dendrobatidis CLFT071 [PRJNA913953]. RNA sequencing data are deposited under the accession numbers GSE253912 and GSE246809. Genome assemblies for Bd strains, CLFT044, CLFT067, and CLFT071 are deposited under the Accession numbers; GCA_036783925.1, GCA_036289345.1, and GCA_029704095.1 respectively.
Acknowledgements
JES is a Fellow in CIFAR program Fungal Kingdom: Threats and Opportunities. The work was partially supported by a catalyst grant from CIFAR and CIFAR fellowship funds and U.S. Department of Agriculture, National Institute of Food and Agriculture Hatch projects CA-R-PPA-211-5062-H. The Gordon and Betty Moore Foundation Award #9337 (10.37807/GBMF9337) to Lilian Fritz-Laylin (PI), Timothy Y James, and Jason Stajich supported Mark Yacoub. Genome assembly and annotation were performed on the IIGB High-Performance Computing Cluster supported by NSF DBI-1429826, DBI-2215705, and NIH S10-OD016290 grants. We thank Dr. Timothy Y. James and the culture contributors of the Collection of Zoosporic EuFungi of Michigan (CZEUM) for providing the Bd-BRAZIL and Ps isolates used in this study. We would also like to thank Dr. Timothy Y. James, Dr. Cassie Ettinger, Dr. Tania Kurbessoian, Dr. Jessica Huang, Kian Kelly, and Julia Adams for helpful suggestions on this manuscript.
Footnotes
We have performed an analysis to enumerate expanding and contracting gene families between Bd, Ps, Hp, and Bsal. Additionally we include the Accession numbers for the Bd-BRAZIL genomes sequenced as part of this study.