Abstract
Aphelenchoides besseyi is a plant-parasitic nematode (PPN) in the Aphelenchoididae family capable of infecting more than 200 plants. A. besseyi is also a species complex with strains exhibiting varying pathogenicity to plants. We present the genome and annotations of six Aphelenchoides species, four of which belonged to the A. besseyi species complex. Most Aphelenchoides have a genome size of 44.7-47.4 Mb and are amongst the smallest in the clade IV, with the exception of A. fujianensis, which has a size of 143.8 Mb and is the largest. Phylogenomic analysis successfully delimited the species complex into A. oryzae and A. pseudobesseyi and revealed a reduction of transposon elements in the last common ancestor of Aphelenchoides. Synteny analyses between reference genomes indicated that three chromosomes in A. besseyi were derived from fission and fusion events. A systematic identification of horizontal gene transfer (HGT) genes across 27 representative nematodes allowed us to identify two major episodes of acquisition corresponding to the last common ancestor of clade IV or major PPNs, respectively. These genes were mostly lost and differentially retained between clades or strains. Most HGT events were acquired from bacteria, followed by fungi, and also from plants which was especially prevalent in Bursaphelenchus mucronatus. Our results establish a comprehensive understanding on new origins of horizontal gene transfer in nematodes.
Introduction
The ability to parasitise plants has evolved in the phylum Nematoda on at least four occasions1,2. The major plant parasites belonged to the Aphelenchodidae and Parasitaphelenchidae families making up the Aphelenchoidea superfamily and the Tylenchida order of clade IV nematodes3; these plant parasitic nematodes (PPNs) collectively cause worldwide agriculture damages of over US$80 billion each year4. Root-knot nematodes in Meloidogyne genus cause the majority of these losses and were the first of PPNs to have their genomes sequenced5, followed by pinewood nematode Bursaphelenchus xylophilus6,7, potato cyst nematode Globodera pallida8, soybean cyst nematode Heterodera glycines9 and others10,11. Comparing these genomes yield insight into several adaptions that allow PPNs to parasitize plants. Examples include effectors such as carbohydrate active enzymes (CAZyme), which are known to be secreted by PPNs and are hypothesized to be involved in degrading or modifying the composition of different plant structural tissues12,13. Some of these PPN-specific genes are known to be acquired from bacteria or fungi through horizontal gene transfer (HGT)14, giving nematodes the ability to adapt to different environments14. Although numerous HGT genes have been identified and documented in different nematodes, research on the timing and subsequent maintenance of these genes, and why their copy numbers differ, has been restricted to a few PPN clades15.
Currently, the only major groups containing plant parasitic nematodes that lack a reference genome are Trichodoridae and Aphelenchoididae. Of particular interest is the Aphelenchoides besseyi, which is a foliar nematode that infects almost 200 plants in 35 genera16. This nematode is 10mm in body size, has a life cycle of this nematode is around 10 to 12 days and it can reproduce in extreme environments, making it hard to eliminate. Better known as the rice white tip, A. besseyi infects important agronomic crops such as rice, soybeans and strawberries17,18, causing necrosis and distortion of its host’s leaves16,18,19. The nematode has reportedly been responsible for up to a 60% crop loss in some cases20,21 and was listed among the top ten plant parasitic nematodes in a recent review22. Despite the economic damage these parasitic nematodes inflict, particularly in the Asian region, little is known about the basic biology, genetic diversity or evolution of A. besseyi and other Aphelenchoididae members. It has been reported that A. besseyi isolated from different hosts have different levels of pathogenicity. For instance, the populations of A. besseyi isolated from strawberries were unable to parasitise rice21. However, the populations of this species from bird’s-nest fern can reproduce in both rice and strawberries19. Despite their almost overlapping morphological features, we previously identified copy number variations of genes encoding cell-wall-degrading enzymes including glycosyl hydrolase family 5 (GH5) and GH45 cellulases between A. besseyi of different host origins23. An 18S phylogeny separated the strains isolated from rice and fern unambiguously, suggesting that A. besseyi might be a species complex, with literatures also identifying variations in different molecular markers in different hosts that are original to A. besseyi18,24. Subbotin et al. recently used a combination of molecule makers (28S, ITS and mitochondria COI gene)17 to reclassify foliar nematodes into three separated clades: the A. besseyi isolated mainly from strawberries, the A. oryzae mainly isolated from rice and A. pseudobesseyi from wood fern suggesting this species complex may be well differentiated at the genome level. From an evolutionary perspective, A. besseyi is also interesting because its primitive plant parasitism was a relatively recent evolutionary adaptation25.
In this study, we sequenced and annotated the genomes of four A. besseyi species complex strains isolated from different plants which we later designated as A. pseudobesseyi and A. oryzae, and another two species in the Aphelenchoididae family (Aphelenchoides bicaudatus and Aphelenchoides fujianensis). We compared the proteomes of six Aphelenchoididae members with 21 other representative nematodes to delimit species relationships and investigated their gene family dynamics. We identified synteny with representative nematodes and inferred rearrangement events to determine how the three chromosomes of A. besseyi was evolved. The availability of the Aphelenchoides assemblies allowed us to systematically determine the horizontal gene transfer-acquired genes in nematode genomes. By inferring the evolutionary origins of these HGT genes we found historical HGT events that shaped nematode evolution. The major event occurred in the last common ancestor of clade IV nematodes and may have contributed to the early adaptation of these nematodes.
Results
Genome assemblies and annotations of six Aphelenchoides species
We sequenced and assembled the genomes of six nematodes in the Aphelenchoides genus (four A. besseyi, one A. bicaudatus and one A. fujianensis). These species were chosen to represent the Aphelenchoididae family and A. besseyi strains isolated from three plant hosts (supplementary table S1) to delimit their relationship within the species complex. For each species, an initial assembly was produced from either 70-148X Oxford Nanopore or 113-422X Pacbio reads using Flye assembler26 and further polished using Illumina reads (supplementary table S2). Among A. besseyi assemblies, the VT strain isolated from tape grass Vallisneria spiralis had the highest genome quality with N50 5.4 Mb (hereafter denoted as APVT). The contigs of this strain were further scaffolded with 150X Hi-C reads using the Juicer program27 (supplementary fig. S1) yielding a final assembly of 44.7 Mb (N50 = 16.9 Mb). More than 99% of this assembly was in three scaffolds, presumably corresponding to three chromosomes28 (2n=6). Five Aphelenchoides assemblies ranged from 44.7 to 47.4 Mb (N50 = 12.2-17.8 Mb; supplementary table S3), and a sixth assembly (A. fujianensis) of 143.8 Mb (N50 = 553 kb; supplementary table S3) was estimated to be triploid29 (and supplementary fig. S2). Although not present in the assemblies, the telomere motif TTAGGC was identified in the reads of A. peseudobesseyi at low coverage (Supplementary Info), which is consistent to the sister group species of B. xylophilus (supplementary table S4) indicating the presence of telomeres in these species.
Using the proteomes of Bursaphelenchus xylophilus and Caenorhabditis elegans, and the transcriptomes of pooled worms in each species as evidences, we predicted 11,701 to 12,948 protein coding genes in six Aphelenchoides species with Maker2 pipeline30 (supplementary table S3). With the exception of A. fujianensis, these were fewer protein coding genes in these species than in Tylenchida nematodes (12,762 to 19,212) and free-living nematodes (20,184 to 20,992). The completeness of annotated genes was estimated to be 76.4–81.3% based on a BUSCO assessment, lower than that of Bursaphelenchus species (83.0–89.4%), but are higher than that of Tylenchida (59.8–73.8%) nematodes. The lower BUSCO completeness in Aphelenchoides was likely clade specificm as re-annotation of the APVT strain with trained models based on manual curation of 975 genes also gave a similar score of 78.2%. Among them, 66.5% to 71.0% of genes in Aphelenchoides could be assigned at least a domain from the Protein family (Pfam) database31. In addition, orthologous groups were inferred with the proteomes of six Aphelenchoides with 21 other nematodes using Orthofinder32. With the exception of A. fujianensis, 78.5–85.4% (A. fujianensis = 48.7%), 69.4–76.9% (A. fujianensis = 42.8%) and 87.5–98.7% of Aphelenchoides genes were orthologous to B. xylophilus, C. elegans and at least one other nematode species, respectively, suggesting that the reduced proteome in most Aphelenchoides was mainly comprised of conserved genes among nematodes.
Phylogenomics delimit species complex of Aphelenchoides besseyi
To investigate the evolution of plant-parasitic nematodes and the relationships among members in the A. besseyi species complex, a maximum-likelihood phylogenetic tree was constructed based on 74 low-copy orthologues. The phylogeny is consistent with a the previous study33: the major plant parasitic nematodes were divided into Aphelenchoidea and Tylenchida, and six Aphelenchoides species were grouped as sister to Bursaphelenchus (fig. 1a). The A. besseyi strains were clustered into two groups based on their hosts, suggesting that relationships in these species within the A. besseyi species complex can be unambiguously resolved based on their different lifestyles and host preferences. Combined with the previous 28S phylogeny of the A. besseyi species complex17 (supplementary fig. S3), we further designated these two groups as A. oryzae and A. pseudobesseyi groups isolated from rice or other plants (land grass and bird’s-nest fern). The median nucleotide and amino acid identity was 86.6% and 90% between these two groups, respectively (supplementary fig. S4). Strains in each group also differed in heterozygosity (0.017-0.019% in A. oryzae vs 0.071-0.075% in A. pseudobesseyi) and changes in recent effective population sizes inferred using pairwise sequentially Markovian coalescent (PSMC) analysis34 (supplementary fig. 5). Together these results emphasised that relationships among species in the A. besseyi species complex were highly diversified at the genome level despite being challenging to differentiate based solely on morphology17.
Genome reduction as a result of transposable element loss
The Aphelenchoides genomes were smaller than those of other plant-parasitic nematodes (fig. 1b and supplementary table S3), indicating that genome reduction took place in the last common ancestor of the Aphelenchoides genus. Much of the reduction can be explained by the reduced markup of repeat content compared to other nematodes (fig. 1b). The dominant transposable elements of Aphelenchoides were DNA transposons—which were reduced in content (0.14–1.36 Mb vs. 4.2–22.1 Mb)— and number of families (1–7 in Aphelenchoides compared to 9 and 26 in B. xylophilus and H. glycines, respectively) compared to other nematodes. Fewer LTR (0.07–0.8 Mb vs. 0.24–9.3 Mb) and LINE (0.0006–0.66 Mb vs 0.02–4.5 Mb) retrotransposons were also observed in this genus. These results suggest that the reduced genome sizes in Aphelenchoides might have been caused by the rapid loss of transposable elements and led to the eventual loss of entire families in some cases (fig. 2a). Within the A. besseyi species complex, A. peseudobesseyi contained significantly fewer DNA transposons, LTR and LINE retrotransposons than A. oryzae (fig. 2a and supplementary fig. 6).
Gene family specialization in the Aphelenchoides species
We observed 66 enriched and 31 reduced protein domains in the four member of the A. besseyi species complex compared to 21 other nematodes. (fig. 2b and supplementary table S5). Domain reduction included collagen (90–109 copies in the A. besseyi species complex vs. 72–407 in others), Somatomedin B and BTG. Genes containing collagen domains were reportedly associated with capsule formation; the reduced copy of collagen domains in Trichinella spiralis were thought to contribute its lower host-specificity than other nematodes35, and may be related to the wide host range of A. besseyi. In contrast, Aphelenchoidea members possess on average four-fold (91–314 vs. 4–555 copies) more aspartic proteases (ASP) than other nematodes (supplementary table S5). ASPs were reported to associate with the digestion of host haemoglobin in Haemonchus controtus36, and also skin penetration in hookworms37, and may play an important role in the Aphelenchoides’s parasitism process. Other expansions included LIM and peptidase C13 domains, which participate in participating in the regulation of cell motility and cell growth38 or degradation of protein tissues in a host39, emphasizing that these domain dynamics are associated with adaptations to plant parasitism.
The plant cell wall acts as a primary defensive barrier and the production of carbohydrate-active enzyme (CAZyme) families are important for PPNs to infect plants. A total of 132 CAZyme families were identified in the representative the 27 nematodes. Of these, 59–67% of the CAZyme families were observed in Aphelenchoidea which is similar to the 55–66% and 58-68% of the families in Tylenchida and free-living nematodes (supplementary table S6), respectively. A total of 13 families were significantly expanded or lost in the Aphelenchoides genus (fig. 2c), including GH16, GH27 and GH45. GH16 serves as the putative β-glycanases involved in the degradation or remodelling of cell wall polysaccharides40, GH16 had one to six copies in Aphelenchoididae nematodes and was not identified outside this clade except in D. destructor, in which there were three copies. There are three to 11 copies of GH27— which are reportedly involved in the function of hemicellulose and associated with α-galactosidase activity in both bacteria and fungi41— in Aphelenchoidea, but fewer in the Tylenchida nematodes. The previously identified GH45 present in Aphelenchoidea nematodes23—involved in the degradation of beta-1,4-glucans in the plant cell wall19— possess different copy numbers between A. pseudobesseyi and A. oryzae and were absent in A. fujianensis and A. bicaudatus, suggesting differential maintenance of these genes in the same genus may have contributed to variations of pathogenicity to plants.
Chromosome evolution of PPNs
To investigate the extent of the karyotype rearrangements in Aphelenchoides, we inferred the synteny relationships among A. peseudobesseyi (chromosome n=3)28, B. xylophilus (n=6) and C. elegans (n=6) using single copy orthologs. Within the three A. peseudobesseyi chromosomes, orthologs belonging to all C. elegans chromosomes were clustered into distinctive blocks (fig. 3a) suggesting a fusion of ancestral chromosomes. These regions remained contiguous and contained 148-801 orthologous genes that could be assigned from individual chromosomes presumably not yet broken down yet by recombination, allowing us to pinpoint the fusion points and infer the order of rearrangement events based the constitution of chromosomes (fig. 3b). We encountered instances of where an ancestral chromosome was found in different parts of the A. peseudobesseyi chromosomes, suggesting fission also took place. In the case of chr IV—which remained homologous in C. elegans and B. xylophilus—corresponding synteny blocks in A. peseudobesseyi were identified in the arm of chr 2 and chr 1 separated by regions of chr III origin (fig. 3b). The majority of the ancestral sex chromosomes were unambiguously assigned to chr 2, and remapping of male sequences showed equal coverage along the chromosomes (supplementary figure S7), suggesting that the Aphelenchoidea superfamily including A. besseyi exhibited a stochastic sex determination system that was recently characterized in B. xylophilus42. Within the A. besseyi species complex, a total of 91% and 88% of genomes were in synteny between APVT and AORT, respectively. Intra-chromosomal inversions were common at chromosome arms. In addition, we identified a major inversion of length 3.4 Mb long located in the centre of chr 2 (fig. 3c) suggesting rearrangement is still ongoing. Both the LTR and LINE retrotransposons were enriched in the chromosome arms of the A. oryzae strain (AORT) (fig. 3c and supplementary fig. S6), which is consistent with the hallmark of nematode chromosome evolution43. In contrast, only the LTR retrotransposons were found in the two chromosome arms of A. peseudobesseyi, suggesting that these repeats were differentially maintained after speciation.
Major episode of HGT in clade IV nematodes
In plant parasitic nematodes, the GH5 cellulase was found present in Tylenchida and only A. peseudobesseyi and A. bicaudatus within the Aphelenchoidea clade23,44, raising the possibility that many of the horizontal gene transferred genes were acquired in the last common ancestor of major PPNs but were differentially lost. To identify such events, a total of 27 proteomes from representative nematodes including the Aphelenchoides genomes were searched for evidence of HGT by calculating the Alien Index (AI) score using Alienness45. We identified a total of 1,675 HGT orthogroups in 21 nematodes. Placing these orthologs designated as events onto the species phylogeny assuming a parsimonious scenario46, indicated that HGT started in the last common ancestor of clade IV nematodes (fig 4a). Examples include GH16, GH32, GH43 and the aforementioned GH5 cellulases. We inferred a total of 161 orthogroups were acquired in this episode, and most of their origins were inferred to be bacteria (78.3%) (supplementary table S7) belonging to different genera, suggesting multiple acquisitions took place. Of these, we found 36 Pfam terms such as ABC transporter that were identified in multiple orthogroups suggesting some convergence in the acquired functions (supplementary table S8).
The revised GH5 cellulase phylogeny indicated an ancient duplication took place before the divergence of PPNs (fig 5a). One clade contains orthologs of the three Panagrolaimid (P. sp. PS1159, P. superbus and P. davidi), and Tylenchida, and the other clade contains members of Aphelenchoidea and Tylenchida nematodes, which emphasises that the fate of the HGT genes was governed by duplications and loss. Interestingly, the closest GH5 bacterial orthologs were Salinimicrobium xinjiangense and Leeuwenhoekiella sp., which belonged to Flavobacteriaceae family and were from marine environments. We observed two GH16 subfamilies in nematodes. GH16_3 in Tylenchida and Bursaphelenchida nematodes were clustered with bacterial origin sequences, whereas GH16_1 of Aphelenchoides and Panagrolaimus nematodes were clustered with fungal origin (fig 5b), suggesting that the two GH16 groups arose independently. GH32 in G. pallida13 is believed to play a role in the function of fructose hydrolysis and was found in one Panagrolaimus in addition to several Tylenchida nematodes (supplementary fig. S8). GH43 was identified at two distinct clusters of bacterial origin in Tylenchida and Panagrolaimid nematodes which have been proposed to be involved in degradation of the hemicellulose in plants47 (supplementary fig. S9).
The next major episode of acquisition took place in the common ancestor of PPNs, with 47 orthogroups (fig. 4a). These families included pectate lyases 3 (PL3) which is associated with cell wall degradation48. The orthologs of PL3 in Aphelenchus avenae and two Bursaphelenchus nematodes were grouped together with distinct clusters of Meloidogyne species (fig. 5c) is consistent with previous phylogeny finding in PPNs44. The closest bacterial ortholog in the Meloidogyne clade was from Curtobacterium flaccmfaciens which is also known to cause bacterial wilt in the Fabaceae family49. Together, these results suggested some genes that were thought to play important roles in plant parasitism were in fact acquired earlier than the common ancestor of plant parasitic nematodes.
The majority of HGT gene families were of bacterial followed by fungal origin (fig 4b). We also identified genes that were acquired from non-bacterial donors in the last common ancestors of clade IV, as well as in more recent, different PPN lineages (fig 4a). This included the previously characterised fungal origin of GH4523,50, This cellulase family is present in most Aphelenchoidea nematodes except A. fujinensis and A. bicaudatus. The GH16 family was independently acquired from a bacterial and fungal donor in the last common ancestor of clade IV nematodes and the Aphelenchoides genus, respectively (fig 5b). Notably, we identified 40 orthogroups among PPNs that were transferred from the plant phylum Streptophyta, which is consistent with the finding of several sequences that are highly similar to plants in H. glycine51 (fig. 4b). The closest plant orthologs included rice, maple and oak (fig. 5d) which are common hosts to many PPNs. Strikingly, of these orthogroups, 27 were present in B. mucronatus and enriched in the detoxification of cadmium and copper ion function (supplementary table S9), suggesting these genes may help Bursaphelenchus nematodes to degrade the toxin in pine wood hosts.
We identified 0.3-2.4%, 0.6-2.1% and 0.1-5.4% proteomes among Aphelenchoidea, Tylenchida and Panagrolaimomorpha nematodes that were predicted to be HGT (fig. 4c). The majority of these differences were the result of clade-specific evolution after speciation. The high copy number of HGT genes observed in M. incognita was a result of duplication53, indicated by the fact that the number of HGT orthologs of bacteria origin were over two times higher than any other species (supplementary fig. S10). The high number of HGT genes in P. superbus was consistent with a previous study54 and likely to be species specific.
To independently assess the accuracy of our approach and interrogate the fate of HGT genes, we constructed a phylogeny for every orthogroup containing identified HGT candidates. Members of Aphelenchoididae and Tylenchida orthologs in the majority of these orthogroups were predicted to be all HGT genes (with AI > 0; 54.6-76.5% vs. 77.3-89.4%). Genes from a species were typically grouped together in the orthogroup phylogeny regardless of being identified as HGT candidates, suggesting the genes that were not detected using our threshold shared common ancestries with those that were. Presumably, this was a result of accumulating substitutions over time. Consistent with this observation, the more ancient acquired HGT orthogroups in PPNs contained higher copy number of these genes compared to recently acquired families (supplementary fig. S11). The instances included GH5 families with 12.5-70.6% of copies in Tylenchida were failed to identify as HGT candidates, suggesting duplication and possibly neo-functionalisation of the GH family in PPNs after being acquired from bacterial origin (fig. 5a). The differentiation was ongoing and observed in the A. besseyi species complex, which included the GH45 orthogroup with negative AI in two A. oryzae strains (supplementary fig. S12).
Discussion
Characterising the diversity and comparing the genomes of plant parasitic nematodes has been of fundamental importance in understanding how such lifestyles arise and of practical importance in identifying candidate effectors and control methods. The latter has been addressed in several studies, focusing mainly on Meloidogyne55. The Aphelenchoides genome assemblies presented in this study allowed us to gain a holistic view of the evolution of clade IV nematodes, which appeared to gain and lose many adaptations, including plant parasitism56. In their evolution, HGT genes have played important roles in functions related to these adaptations. The most recent comprehensive analyses of HGT in nematodes focused on plant parasitic nematodes15 and found many of these genes were PPN specific. Of these, donors of gene families involved in plant cell-wall modifications were previously found to be associated with plants which was appealing that were sympatric with plant parasitic nematodes which made HGT possible44. Additional HGT events were identified in other clade IV nematodes13,54,57–59 but were part of analysing new de novo genomes.
Our systematic investigation of HGT has instead shown that many of the aforementioned families were acquired much earlier in the last common ancestor of clade IV. Many clade IV nematodes are known to survive extreme desiccation54,59,60 and the acquired HGT genes may be central to their resistance in harsh environment and subsequently catalysis to successful plant parasitism61. Sources of these donors may be the symbionts like the case of insects62, but currently nematode endosymbionts are restricted to Wolbachia and Cardinium63 and were not identified in our analyses. Interestingly, many of the closest bacterial donors were from marine environments, raising the possibility that the last common ancestor of clade IV may have lived in a marine environment that underwent habitat transition64. However, we also identified donors of non-bacterial origin that were usually found in the environments that fit nematodes’ present day lifestyle. Now that more genome sequences are available, historical HGT events were detected in the most recent common ancestor of major organism groups such as land plants65, of moths and butterflies66, which contributed the hosts’ developmental roles and adaptations. These acquisitions were found to be episodic and likely took place in a time when either the host development or genome defence was vulnerable. We speculate that the gain and absence of gene families in clade IV nematodes may have played a role in retention of HGT genes.
The successful delimitation of the A. besseyi species complex unambiguously into A. oryzae and the recently proposed A. pseudobesseyi has important implications in nematode management. Congruent delimitation was observed between genomes and 28S phylogenies confirming the utility of species identification with existing molecular markers18. A. besseyi is generally believed to have limited mobility in natural habitats, so its lack of population structure in China24 was suggested as a consequence of human-mediated dispersal. Our results also supported that A. oryzae appears to be more rice plant specific compared to A. pseudobesseyi which was isolated more frequently in ornamental plants and other agronomic crops18. A comprehensive collection across a wider geographical range and resequencing of strains previously designated as A. besseyi could confirm whether A. oryzae was responsible for all the white tip disease in rice plants and may lead to better characterisations of the biogeography and evolution of different cryptic species.
The reduction of genome size and reduced chromosome numbers of A. besseyi represent an interesting outcome for the typical six nematode ancestral chromosomes around hundred megabases in length. Genome rearrangement and reduction are common across the tree of life including plants67, butterflies68 and nematodes69. We show that A. besseyi underwent multiple chromosome fission and fusion events, and a possible explanation together with genome reduction may be the missing of meiosis genes and the telomeric repeat maintenance genes, which resulted in truncated meiosis (supplementary table S10); this was observed in an extreme case of Diploscapter pachys70 possessing a single chromosome. Alterations in meiosis may lead to genome shrinkage due to a loss of transposable elements as a result of imbalanced chromatin as observed in Caenorhabditis nigoni71. It is likely that Aphelenchoides underwent a similar scenario. However, members of Bursaphelenchus with six chromosomes also failed to identify these aforementioned orthologs (supplementary table S10), suggesting their divergence has taken place since the last common ancestor of Aphelenchoidea. Further cell and molecular evidence are needed to confirm the integrity of meiosis in A. besseyi.
To conclude, the availability of the Aphelenchoides genome and our comparative analyses allowed us to pinpoint the major events of horizontal gene transfer in clade IV nematodes. The results have reinforced the importance of horizontal gene transfers contributing to multiple adaptations of these nematodes including plant parasitism. In addition, the various A. besseyi genomes will assist in developing molecular diagnostic tools to distinguish the specific diseases caused by the species complex.
Methods
DNA, RNA extraction and sequencing
Nematodes were cultured with Alternaria citri on PDA (potato dextrose agar) medium. All stages of nematodes were collected from the medium, washed with sterile distilled water, and purified by sucrose gradients. Genomic DNA was extracted using Qiagen Genomic-tip 100/G according to the manufacturer’s instructions, RNA extraction was conducted using Trizol, and then purified using a lithium chloride purification method. The DNA paired-end libraries were constructed using either a Nextera DNA Flex or KAPA hyper library prep kit (Illumina, San Diego, USA); the RNA paired-end libraries were constructed using a TruSeq Stranded mRNA library prep kit (Illumina, San Diego, USA). Both DNA and RNA pair-end followed with standard protocol and were sequenced by Illumina HiSeq 2500 (Illumina, USA) to produce 150-bp paired-end reads. The HiC library preparation was performed by Phase Genomics (Seattle, WA, USA) proximo HiC animal protocol with some modification in tissue processing. The enriched worms were finely chopped by microtube pellet pestle rods for about 2 minutes. The tissues were crosslinked by adding 1 ml crosslinking solution and incubate for 25 minutes with occasional mixing by rotation. 100 ul quenching solution was added to the crosslinked tissue and mixed for 20 minutes by rotation. The rest of the preparation steps follow the protocol. The library was sequenced by Illumina HiSeq 2500 (Illumina, USA) to produce 150-bp paired-end reads. APFT and AORT were using Pacbio sequencing system to produce long-read, and the rest of 4 Aphelenchoides strains (APVT, AORJ, A. bicaudatus, A. fujianensis) were sequenced using the Oxford Nanopore sequencing platform. The raw nanopore signals were basecalled by Guppy72 (ver 0.5.1) producing a total of 5.0-28.4 Gb sequences at least 1 kb in length.
Assemblies of six Aphelenchoides spp
Raw reads of each species were assembled using Flye (ver 2.8.2)26 assembler. The assemblies from Nanopore reads were corrected using Nanopore reads using Racon73 (ver 1.4.6) and Medaka74 (ver 0.10.0). All assemblies were further corrected using Illumina reads using Pilon75 (ver 1.22) with five iterations. The A. pesudobesseyi VT assembly was scaffolded using HiC reads and subsequently curated in Juice-box27 tools. The other five Aphelenchoides genomes were reference scaffolded based on this assembly using Ragtag76 (ver 1.1).
Gene prediction and functional annotation
The identification of repetitive elements were computed by RepeatModeler77 (ver 1.0.8), TransposonPSI (ver 1.0.0; https://github.com/NBISweden/TransposonPSI) and USEARCH78 (ver 8.1) based on the protocol by Berriman et al.79. Repeat locations were then identified by Repeatmasking80 (ver 4.0.9). RNA-seq reads of six Aphelenchoides strains were trimmed by Trimmomatic81 (ver 0.36), and aligned to corresponding assemblies using STAR82 (ver 2.7.1a). From these mappings, transcripts were inferred using three approaches: i) assembled based on the mappings as guides using Trinity83 (ver 2.84; option: default setting), reconstructed using ii) Stringtie84 (ver 1.3.4; option: default setting) and iii) CLASS285 (ver 2.17; option: default setting). Transcripts generated from Trinity were realigned to the reference using GMAP86 (ver 2017-11-15). The RNAseq mappings were also used in BRAKER87 to train species parameter and generate an initial set of annotations. Proteomes of Bursaphelenchus xylophilus and Caenorhabditis elegans were downloaded from Wormbase (WBPS14; https://wormbase.org) and used as homology guides to pick the best transcripts for each putative locus using MIKADO88 (ver 1.2.4; option: three Mikado steps, containing “prepare”, “serialize” and “pick” procedures), and were also used to train MAKER2. Finally, MAKER2 was invoked to generate a final set of gene annotations using picked EST evidence and protein evidence from MIKADO transcript and proteomes from closely related species (Bursaphelenchus xylophilus and Caenorhabditis elegans), and used gene models (BUSCO89, BRAKER, SNAP90 and Augustus91) as EST hints to train predicted data with three iterations.
Comparative analyses
Proteomes of five plant-parasitic nematodes (Bursaphelenchus xylophilus, Meloidogyne hapla, Meloidogyne incognita, Globodera pallida, Ditylenchus destructor), two free-living nematodes (Caenorhabditis elegans, Caenorhabditis briggsae), six Panagrolaimomorpha (Propanagrolaimus sp. JU765, Panagrellus revidius, Panagrolaimus superbus, Panagrolaimus sp. PS1159, Panagrolaimus davidi and Halicephalobus mephisto) and one animal parasitic nematode (Brugia malayi) were downloaded from Wormbase92,93(ver 14). Orthogroups were determined by Orthofinder32 (ver 2.2.7; options: -S diamond). Sequence alignments of each of the single-copy orthogroups were generated by MAFFT (ver 7.310; options: --maxterate 1000). Then, the concatenated alignment of all single-copy orthogroups was used to compute a maximum likelihood phylogeny using RAxML94 (ver 8.2.3; options: -s -T 32 - N 100 -f a -m PROTGAMMILGF) with 100 bootstrap replicates. Pfam copy numbers of all 27 nematodes were identified from the results of nematode proteomes blast against the database of Pfam website (ver 31; https://pfam.xfam.org/) using HMMER engine with e-value smaller than 0.001. Effector enzymes were identified by searching the nematode proteomes against the CAZyme95 database (http://www.cazy.org) using HMMER engine with sequence length larger than 80bp. The identified sequence was at least larger than 0.35 proportion of conserved domain from database and had an e-value smaller than 1e-15.
Identification of the HGT genes
The probability of genes having been acquired via HGT was estimated by using Alienness Index (AI)45. Our donor group were generated by non-Metazoans from NCBI nr database, and the recipient were Metazoans excluding the following species to prevent self-alignment: Aphelenchoidea, Tylenchida, Rhabditina, Spirurina and Cephaloboidea. The Alien Index (AI) was estimated by calculating the e-value of diamond96 (ver 2.0.14; option: blastx --evalue 0.001) best hits between the donor and recipient database. Orthogroups having at least one gene with an AI value over 30 were selected for further analysis. Gains and losses at each node were inferred using Phylip-Dollop46 (ver 3.69; options: fdollop -method d -ancseq). Some of the HGT family acquired branches were manually curated by their evolutionary place of gene phylogeny due to the fact that nematode genes with AI < 0 were clustered with other HGT genes. The highest AI value of nematode genes with classified taxonomy hit were chosen to represent the HGT origin in each orthogroup. Orthogroups with the same CAZyme annotated and nematode orthology gene AI higher than −50 in those Orthogroups were selected. AI < 0 genes were labelled with “*”. The orthologs were further combined with the HGT identified donor sequence from nr database and the specific cellulase sequence from CAZyme database. To reduce contamination, orthologs of Pfam domain were annotated and filtered by having at least one major domain (cellulase or pectate lyase). Sequences of each HGT orthogroup were aligned using MAFFT (options: -- maxiterate 1000 --genafpair) and trimmed by using trimAl97 (ver 1.4; options: - gappyout). The ortholog phylogeny were computed by using IQtree98 (ver 1.6.6; options: -bb 1000 -alrt 1000). For the CAZyme unclassified HGT orthogroups, the top 2 blast hits sequences from separated Uniprot (bacteria, fungi, land plants and insect) were used to confirm the HGT origin.
AUTHORS CONTRIBUTION
IJT and PJC conceived the study. IJT led the study. YCL, TY and PJC sampled the Aphelenchoides nematodes. YiCL, HMK, WAL conducted the experiments. CKL analysed the data with input from HHL, YuCL and MRL. IJT and CKL wrote the manuscript with input from TY, TK, PJC
DATA AVAILABILITY
The sequencing data and annotation of six Aphelenchoides nematodes are publicly available in NCBI with the BioProject accession PRJNA834627 and scheduled in the next release WBPS18 of WormBase Parasite. The accession numbers of individual samples are listed in supplementary table S2. The information of Clade IV acquired HGT orthogroups could be found in github (https://github.com/lihowfun/CladeIV_HGT.git).
Funding
This work was supported by Academia Sinica grant (AS-CDA-107-L01) to IJT.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵