Summary
Xylella fastidiosa (Xf) is a polyphagous gram-negative bacterial plant pathogen that can infect more than 300 plant species. It is endemic in America while, in 2013, Xf subsp. pauca was for the first time reported in Europe on olive tree in the Southern Italy. The availability of fast and reliable diagnostic tools is indispensable for managing current and future outbreaks of Xf.
In this work, we used the Oxford Nanopore Technologies (ONT) device MinION platform for detecting and identifying Xf at species, subspecies and Sequence Type (ST) level straight from infected plant material. The study showed the possibility to detect Xf by direct DNA sequencing and identify the subspecies in highly infected samples. In order to improve sensitivity, Nanopore amplicon sequencing was assessed. Using primers within the set of the seven MLST officially adopted for identifying Xf at type strain level, we developed a workflow consisting in a multiple PCR and an ad hoc pipeline to generate MLST consensus after Nanopore-sequencing of the amplicons. The here-developed combined approach achieved a sensitivity higher than real-time PCR allowing within few hours, the detection and identification of Xf at ST level in infected plant material, also at low level of contamination.
Originality Significance Statement In this work we developed a methodology that allows the detection and identification of Xylella fastidiosa in plant using the Nanopore technology portable device MinION. The approach that we develop resulted more sensitive than methods currently used for detecting X. fastidiosa, like real-time PCR. This approach can be extensively used for X. fastidiosa detection and it may pave the road for the detection of other tedious vascular pathogens.
Introduction
Xylella fastidiosa (Xf), a gram-negative phytopathogenic bacterium (Wells et al., 1987), has a very broad host range, causing different diseases in important crops (Hopkins, 1989) and in many urban shade trees (Sherald and Kostka, 1992). Xf symptoms may not be evident and many hosts are symptomless making this pathogen very difficult to manage. The pathogen is transmitted by xylem sap feeding insects and colonizes the host xylem vessel, causing the typical leaf scorching. Xf is an endemic pathogen in America and only recently was identified in southern Italy on olive trees (Saponari et al., 2013), in Corsica and in the south-east Mediterranean coast of France on several hosts (Denancé et al., 2019). Subsequently, records of Xf were reported in Germany on oleander (EPPO, 2016a) and in Spain, Mallorca Island and Andalusia, on different plant species (Landa, 2017; Olmo et al., 2017). Recently, a new finding was also reported in Tuscany (Marchi et al., 2018). Different Xf subspecies were identified in Europe in association to the above-mentioned natural outbreaks and interceptions on ornamental plants. There are three formally accepted subspecies of Xf, i.e. fastidiosa, pauca and multiplex, based on DNA–DNA hybridization, as recently confirmed by comparative genomic analysis (Schaad et al., 2002; Marcelletti and Scortichini, 2016; Denancé et al., 2019). However, the International Society of Plant Pathology Committee on the Taxonomy of Plant Pathogenic Bacteria (ISPP-CTPPB) considered valid names only the subsp. fastidiosa and subsp. multiplex whereas other authors classified five subspecies (fastidiosa, pauca, multiplex, sandyi and morus) (Bull et al., 2012; Nunney et al., 2012). Complicating the whole taxonomic scenario, Xf subspecies includes different sequence types (ST) which are determined by Multi Locus Sequence Typing (MLST) analysis based on the Sanger sequencing technology of seven housekeeping gene (Yuan et al., 2010; Nunney et al., 2012; EPPO, 2016b). MLST analysis is recommended for new findings for the undoubtful identification of the subspecies and ST in case of new outbreak or new plant hosts (EPPO, 2016b). Since the isolation of Xf is tedious, the MLST analysis can be performed directly from DNA of the host plant; however, low concentration of the bacteria and contaminants derived from the plant material makes the MLST amplification, thus ST determination, extremely challenging.
Currently, next generation sequencing (NGS) platforms represent high throughput technologies which allow obtaining large datasets of genetic information. In biomedical field, sequencing technologies are rapidly being adopted for bacterial outbreak investigations (Faria et al., 2017) and for rapid clinical diagnostics (Votintseva et al., 2017). A combination of whole transcriptome amplification and Nanopore sequencing device has been used to detect ‘Candidatus Liberibacter asiaticus’ or plum pox virus in plants and insect vectors (Bronzato Badial et al., 2018). The reliability of Nanopore for the diagnosis of several plant pathogen was demonstrated (Chalupowicz et al., 2019) and a rapid MLST determination method was described for ST estimation of Klebsiella pneumoniae isolates by the ONT MinION device (Page and Keane, 2018).
Direct Nanopore sequencing and Nanopore amplicon sequencing of two or seven housekeeping genes were assessed on naturally infected and spiked samples. Additionally, a consensus was generated from all seven MLST using amplicons sequenced by MinION device. Nanopore amplicon sequencing provides the possibility, within few hours, to detect and identify Xf, its subspecie and ST with an equal or slightly better sensitivity than the usual methods of detection and identification or Xylella spp. such as real-time PCR.
Results
Detection and identification of Xylella fastidiosa in infected plant material by direct Nanopore sequencing
Direct Nanopore sequencing was assessed to detect Xf in naturally infected plants by using the ONT MinION device. The assay was firstly performed on DNA of 21 olive samples (dataset 1) and 7 samples of different plant species infected by Xf (dataset 2) for a total of 28 samples. All samples were analysed by real-time PCR which revealed a variable level of infection among samples, e.g. in samples of dataset 1, Ct ranged from 21.28 to 36.17, indicating a difference of about 105 times between the most infected (Olive-1) and the less infected (Olive-11) sample (Tab. 1). The direct Nanopore sequencing was developed in three experiments, the first one included samples Olive-1 to Olive-12 (dataset 1), the second one, samples from Olive-13 to Olive-21 (dataset 1) and the third experiment was composed of samples from different plant species (dataset 2) (Tab. 2). The first flowcell generated ∼3.2 Gb of data for 12 samples in 14 hours of run, the second flowcell produced ∼2.7 Gb for the remaining nine olive samples in ∼30 hours while the last flowcell generated ∼5.5 Gb in ∼30 hours. All these flowcells were stopped before reaching the 48 hours run due to pore saturation. The three subsets of samples were then subjected to independent de-barcoding and analysis (see methods). After demultiplex of the samples, we were able to retain ∼1.3 Gb (∼40% of the total reads) of data from the first subset of 12 samples, ∼800 Mb (∼30% of the total reads) for the second subset while we got ∼2 Gb of data for the samples from other plant species. The amount of data for individual samples ranged from ∼600 Mb in the Cystus sp. sample to ∼10 Mb in Olive-17 displaying a high variability between samples although we tried to use the same amount of input gDNA (Tab. 1, 2). Each sample was analysed using a custom python script (called detection_script) (see material and methods) in order to identify Xf. The analysis showed that each sample had a similar bacterial composition and that Xylella was the most abundant genus (Tab. 1). Twenty-two out of 28 samples showed reads that map uniquely to the genome of Xylella spp. It is worth of mention, that all samples with more reads mapping to the Xf genome had a low Ct value (< 31), suggesting that Xf can be detected only in heavily infected samples with the flowcell throughput achieved in our work (Tab. 1). To better assess the performance of the direct Nanopore DNA sequencing, we prepared 12 samples of uninfected olive DNA amended with a range of known concentrations of DNA of X. fastidiosa subspecies pauca strain De Donno (CFBP 8402) (Xfp). The amount of DNA was next or below the real-time PCR limit of detection, estimated around 10 fg/PCR reaction (Modesti et al., 2017). Three samples (from LOD-10 to LOD-12) were not detected by real-time PCR giving inconsistent results (Tab. 3). A new flowcell was used to sequence these 12 samples generating ∼667 Mb of data in 22 hours. The quantity of data/sample ranged from 45 Mb to 7.7 Mb and Xfp was identified in four samples using the detection_script (Tab. 3). These data confirm that Nanopore direct DNA sequencing can reliably detect Xf in samples with a high bacterial concentration.
Nanopore amplicon sequencing
The MinION device generated about 290 Mb of data for all 12 samples (dataset 3) and after reads demultiplex, we retained ∼155 Mb of data. The number of reads is in a range of 14,360 and 54,405 for LOD-10 and LOD-2, respectively (Tab. 4). The detection_script was used to detect Xf in all samples. The results showed that the number of reads matching uniquely a bacterial genome varied between 3.39% in sample LOD-9 and 50.53% in sample LOD-5 (Tab. 4). As expected, the most abundant genera represented in the mapped reads, >95% of the reads mapped to Xf (Tab. 4, Fig. 1). The only exception is the sample LOD-9 for which low number of reads mapped to bacterial genomes suggesting a problem in the sample preparation (Tab. 4, Fig. 1). The reduction of sequencing error – since the same region is sequenced more than once - produces a more accurate output allowing us to identify Xf at the subspecies level. To validate this, we used the detection_script in combination with a specific Xf database formed by every Xf genome available at the NCBI database (∼60 genomes). The results showed that all 11 samples positive for Xf had >95% of the reads correctly mapped to a genome identified as Xf subsp. pauca which is the subspecies amended to the olive gDNA (Fig. 2). The same approach was used – amplification of cysG and malF, Nanopore sequencing, querying the Xf-customised database – with the dataset 2 composed of gDNA from different Xf naturally infected plant species (Tab. 5, Fig. 3). The results showed that for all samples, except for Rhamnus alaternus, more than half of the reads mapped to Xf. Additionally, the analysis using the specific Xf database showed that >90% of the reads mapped to Xf subsp. multiplex genome (Fig. 3). To further support the subspecies identification results in the unknown samples, we generated a consensus for the cysG and malF sequences using the consensus_script. These consensus sequences were aligned to the Xf specific database matching 100% to multiplex subspecies. This device correctly identify the Xf subspecie multiplex associated to the Tuscany samples, according to the recent report of Marchi et al. (2018).
Identification of Xylella sequence-type using Nanopore amplicon sequencing
MLST consensus for all seven sequences for three (Cystus sp., Rosmarinus officinalis, Lavandula sp.) out of the seven samples within the dataset 3 was generated using both Sanger and Nanopore sequencing. These plant species were selected because the ST of Xf had not already been determined. The consensus_script was used to generate a consensus using only Nanopore errors-prone reads. As expected, the script generated seven sequences for each sample that were compared with the same sequences generated by Sanger technology (Yuan et al., 2010) followed by querying the database at htpp://pubmlst.org/xfastidiosa/ (Jolley et al., 2018). Pairwise alignment of concatenated sequences of all sever MLST derived from Sanger and Nanopore consensus resulted in a 100% identity (Fig. 4). Additionally, when was compared the concatenated sequences from both Sanger and Nanopore consensus sequences to the Xf MLST database, was found that samples of Cystus sp., Rosmarinus officinalis and Lavandula sp. were infected by the ST 87 (Fig. 4). This result is in agreement with recent findings of Xf ST 87 in Prunus amygdalus, Polygala myrtifolia, Spartium junceum and Rhamnus alaternus (Saponari et al., 2019).
Discussion
In the last fifteen years, NGS transformed genomic research with an inevitable impact also on diagnostics, taking us to a new scenario. These technologies have recently been applied for the diagnosis of plant pathogens (Bronzato Badial et al., 2018; Chalupowicz et al., 2019) and for the detection and identification of Xf (Bonants et al., 2019). Although several diagnostic methods are available for the detection of Xf (EPPO, 2016b), the identification of subspecies and ST are currently laborious and time-consuming. Recently, a tetraplex qPCR assays was developed for simultaneous detection and identification of subspecies in plant tissues (Enora et al., 2019). The detection of Xf at subspecies level is fundamental and in case of new outbreaks or new plant hosts, the identification of the ST is strongly recommended (EPPO Standard PM7/24 (4)). To circumvent the complexity of the detection and identification of Xf subspecies and ST, we investigate the use of Oxford Nanopore Technologies (ONT) MinION device.
Direct Nanopore DNA sequencing was firstly assessed by using naturally infected samples (dataset 1 and 2). One of the advantages of MinION device compared to other NGS platform is the portability.
However, our results showed that only in highly infected samples Xf can be reliably detected and its subspecies identified. All samples lacking reads mapping to the Xf genomes, showed a high Ct value by real-time PCR that, associated to low throughput for these samples, made Xf undetectable. This evidence was confirmed by using artificially spiked olive samples with DNA concentration of Xf next and below the limit of detection of the real-time PCR (dataset 3), currently considered the most sensitive assay for Xf detection (EPPO, 2016b; Modesti et al., 2017). Nanopore direct DNA sequencing of these samples provides a very low number of reads which reflect an even lower number of reads mapping to Xf genomes making the analysis doubtable. A similar result was also obtained by Bonants et al. (2019) using Illumina, who reported the ability of NGS to determine the ST of Xf only in highly infected samples. The reason of the low sensitivity of the direct Nanopore sequencing should be addressed to the low quality of the DNA for the tested samples. Nanopore sequencing suffer of low throughput when the DNA used for the sequencing has contaminants and/or when reads are shorts (Chalupowicz et al., 2019). Further optimization of the DNA purification and size selection step may be necessary to maximize the performance of the Nanopore system. This aspect is of relevant importance because Xf has a very broad host-range and even using the same extraction method, the DNA quality can be compromised by contaminants present in the host matrix. However, carrying out massive analyzes cannot provide time-consuming or expensive DNA purifications and generally neither the DNA concentration nor the DNA quality are determined before a routinely analysis. Charalampous et al. (2019) developed a DNA extraction in order to deplete host DNA from the samples and have more pathogen DNA. Using this approach, they were able to identify respiratory pathogens in human samples. An alternative to sophisticated DNA extraction methodology is amplicon sequencing (Kilianski et al., 2015; Radhakrishnan et al., 2019). This latter approach was assessed in order to overcome the low throughput of the flowcells and to set up a more reliable and sensitive method, based on the amplification of two housekeeping genes followed by Nanopore sequencing. Sequencing of cysG and malF for subspecies discrimination is suggested by the EPPO Standard PM7/24 (4) and required, since 2018, in France (Enora et al., 2019). Nanopore amplicon sequencing was more efficacious than real-time PCR in Xf detection of the spiked samples with low concentration of Xf target DNA (dataset 3, Tab. 3). These results suggest that Nanopore amplicon sequencing has better sensitivity than real-time PCR. The addition of the amplification step, even if lengthening the procedure, showed several advantages: i) reduces the influence of DNA quality; ii) mitigates sequence error by repeatedly sequencing the same region; iii) produces usable data faster than genomic DNA sequencing and iv) by using lower stringency condition (40 cycles) in the amplification step, it obtains higher number of copies of the target genes unusable for Sanger sequencing but exploitable for Nanopore sequencing (data not shown).
The addition of the amplification allows in a single sequencing step the detection of the pathogen as well as the identification of its subspecies and ST. The results of amplicon sequencing of the dataset 2 (Tab. 2) (Spartium junceum, Polygala myrtifolia, Rosmarinus offcinalis, Lavandula, Prunus amygdalus, Cystus) confirmed previous findings which identified as multiplex the subspecies infecting these plants (Marchi et al., 2018; Saponari et al., 2019). These evidences showed that by using an amplification step the method was able to detect and identify Xf in different plant species, without interference due to the DNA quality.
The results obtained by amplicon sequencing of cysG and malF encourage us to test all the seven housekeeping genes (leuA, petC, malF, cysG, holC, nuoL, gltT) to define the ST in our naturally infected samples. For this purpose, Nanopore and Sanger sequencing of the seven housekeeping genes were performed in Lavandula sp., Rosmarinus officinalis and Cystus sp. collected in Tuscany Region, for which the ST has not yet been reported. Our results showed that Xf recovered from these samples belong to the new ST 87 accordingly with previous findings on Prunus amygdalus, Polygala myrtifolia, Spartium junceum and Rhamnus alaternus (Saponari et al., 2019). The simultaneous detection and identification of Xf, its subspecie and ST, developed in this study, lead the Nanopore amplicon sequencing assay to be a powerful tool for a quick Xf diagnosis. The evidences obtained in this study shows that the sequencing of two or seven housekeeping genes by MinION is a promising alternative to detect and identify Xf from infected plant material, also in low bacterial concentrations. This higher sensitivity is of interest for Xf detection in traded plants and for latently infected material that represent one of the most serious threat for the dissemination. For an “in field” application, further studies are required to make DNA direct Nanopore sequencing reliable in detecting low concentration pathogens, i.e. using single flow-cells for the processing of individual samples. This allows a greater depth in sequencing, with consequent higher possibility of detection and identification of Xf /subspecies/STs. In conclusion, the evidences obtained in this study paving the way for new opportunities of Nanopore sequencing as an effective survellaince tool for Xf early detection.
Experimental procedures
Samples and DNA extraction
Sample preparation
Three datasets of samples were prepared as following: the first dataset (1) consists of twenty-one DNA samples extracted from naturally infected olive plants (Olea europea L.) collected in Apulia region (southern Italy) as described in Scortichini et al. (2018). The second dataset (2) consisted of naturally infected samples of Spartium junceum, Polygala myrtifolia, Rosmarinus officinalis, Rhamnus alaternus, Prunus amygdalus, Cistus and Lavandula spp., collected in Tuscany. The third dataset (3) consists of DNA of healthy olives tree (collected in Latium) spiked with known quantity of DNA of Xfp strain CFBP 8402. Plant DNA extraction of dataset 2 and 3 were performed by CTAB-based method as reported in EPPO Standard PM7/ 24 (4).
A pure colture of Xfp strain CFBP 8402 was grown for seven days at 28° C in BYCE medium. The culture scraped and resuspended in 100 μl of PBS, was grown in 10 ml of PD2 broth at 28° C, 170 rpm, for 7-10 days. The bacterial DNA was extracted from 700 µl of pure culture using the Gentra Puregene Yeast/Bact Kit (Qiagen, The Netherlands). DNA (about 40 ng/µl) of healthy olive samples was amended with 100, 10, 8, 4 fg/PCR reaction of Xfp DNA, each in three independent replicates.
Real-time PCR and Multi-Locus Sequence Typing (MLST)
Datasets DNA were quantified by DS-11 FX+ spectrophotometer (DENOVIX) and diluted to a final concentration of 20 ng/μL. Xf was detected in dataset 1 DNA by real-time PCR as described by Harper et al. (2010) in a final volume of 10 µl and in datasets 2 and 3 by Francis et al. (2006). All samples were also tested by Li et al. (2013) to confirm the bacterial infection. MLST analysis was performed on Cystus, Lavandula and Rosmarinus as previously described (Yuan et al., 2010; EPPO, 2016b). Subspecie identification by sequencing cysG and malF was performed in dataset 2. For Nanopore amplicon sequencing the amplification of housekeeping genes was modified increasing the cycles of PCR condition from 35 to 40.
Datasets used for Nanopore sequencing
Direct Nanopore DNA sequencing was performed on the three previously described datasets (Tab. 1, 2, 3). Dataset 3 was used to test the direct DNA Nanopore sequencing in condition that was next the limit of detection (LoD) of the real-time PCR (Tab. 3). DNA samples from dataset 2 and 3 were used for Nanopore amplicon sequencing of the two genes, cysG and malF. Three out of seven samples of the dataset 2 (i.e. Cystus, Lavandula and Rosmarinus) were assessed by Nanopore amplicon sequencing of all seven housekeeping genes.
Nanopore sequencing
Nanopore sequencing libraries were prepared according to manufacture instruction for the kit SQK-RBK004 for both direct DNA and amplicon sequencing. In brief, about ∼400 ng of DNA was purified using AMPure beads, ligated to the indexing adapter, combined in one sample and subsequently ligated to the RAP adapter prior sequencing. For amplicon sequencing, PCR amplicons of each sample were pulled together and purified using Isolate II PCR and gel Kit (BIOLINE) and about 100 ng of DNA was used in library preparation. DNA samples were run on the flowcell until pore life ended while amplicon sequencing runs were performed for shorted time (∼5hours). After sequencing, Deepbinner (v0.2.0)(Wick et al., 2018) was used to de-multiples the samples by default parameters. Subsequently, basecalling and a new run of demultiplex was performed using Guppy (v2.3.1; default parameters)(Wick et al., 2019). Deepbinner and guppy basecalling were performed on a GPU card Nvidia GTX 1070 8Gb.
Pipeline for gDNA Xylella detection: detection_script
Reads generated by the two-step de-barcoding analysis were analysed using a custom pipeline. In brief, reads were mapped to a database by using Minimap2 (v2.17-r941)((Li, 2018) software using --MD and --secondary=no as additionally parameters. The mapping is split in two steps: the first step aligns all the reads against a database that includes one representative genome for all sequenced bacterial species (about 10,500 bacterial genomes) present at the NCBI Genebank database and a second step that align all reads mapped to Xf in the first step to a Xf specific database. The Xf database it is formed by ∼60 genomes. For both steps, alignment output files are parsed to retrieve only the best match for reads mapping on multiple genomes. Finally, the number of reads mapping on the same subspecies/strain are combine and summarized in plot (https://github.com/lfaino/xylella).
MLST consensus generation: consensus_script
A second pipeline was written in order to generate MLST consensus after Nanopore sequencing. Briefly, reads from Nanopore sequencing are demultiplexed by using Deepbinner software (v0.2.0)(Wick et al., 2018). Subsequently, porechop software (v0.2.4) is used to remove adapters from the kept reads. Two rounds of reads correction by racon software (v1.3.3) (Vaser et al., 2017) are performed. The corrected reads are passed to jellyfish software (v2.2.8) (Marçais and Kingsford, 2011) and reads of 100 nt are generated. These reads are assembled by SPADES software (v3.12.0) (Vyahhi et al., 2012) and the assembled contigs polished by Nanopolish (v0.11.1) and subsequently by bcftools mpileup software (v1.7.2) (Danecek and McCarthy, 2017). The reconstructed MLST sequences are compared to all other MLST deposited at https://pubmlst.org/ (Jolley et al., 2018). The script used for MLST reconstruction can be found at GitHub (https://github.com/lfaino/xylella).
Acknowledgements
This work was supported by MIPAAFT, Project Oli.Di.X.I.It (“OLIvicoltura e Difesa da Xylella fastidiosa e da Insetti vettori in Italia”), D.M. 23773 del 6/09/2017.