Comparative analysis of two NGS platforms and different databases for analysis of AMR genes

The use of antibiotics in human medicine and livestock production has contributed to the widespread occurrence of antimicrobial resistance (AMR). Recognizing the relevance of AMR to human and livestock health, it is important to assess the occurrence of genetic determinants of resistance in medical, veterinary, and public health settings in order to understand risks of transmission and treatment failure. Advances in Next Generation Sequencing (NGS) technologies have had a significant impact on research in microbial genetics and microbiome analyses. Now, strategies for high throughput sequencing from panels of PCR amplicons representing known AMR genes offer opportunities for targeted characterization of complex microbial populations. Aim of the present study was to compare the Illumina MiSeq and Ion Torrent S5 Plus sequencing platforms for use with the Ion AmpliSeq™ AMR Research Panel in a veterinary/public health setting. All samples were processed in parallel for the two sequencing technologies, subsequently following a common bioinformatics workflow to define the occurrence and abundance of AMR gene sequences. Regardless of sequencing platform, the results were closely comparable with minor differences. The Comprehensive Antibiotic Resistance Database (CARD), QIAGEN Microbial Insight - Antimicrobial Resistance (QMI-AR), Antimicrobial resistance database (AR), and CARD-CLC databases were compared for analysis, with the most genes identified using CARD. Drawing on these results we describe an end-to-end workflow for AMR gene analysis using NGS.

To support national and global priority setting, public health initiatives, and treatment decisions, and transmission of AMR is required. In this study, efforts were made to compare different 1 4 9 sequencing technologies and data bases to provide a comprehensive analysis pipeline for data 1 5 0 defining AMR gene occurrence. All experimental variables were fixed with the exception of 1 5 1 sequencing platform. Library preparation kit, data analysis pipeline, and database stringency were all kept the same to maintain uniformity. In total ~15M reads were obtained using the Ion Torrent S5 Plus platform in a single FastQ file, merged reads using PandaSeq's default parameters; later, the merge length was optimized. Forward-reverse read overlap of 5, 10, and 15bp was analyzed in addition to the default 1 6 6 parameters. The 10 base pair overlap was found to be optimal due to its appropriate 1 6 7 representation of merged reads (Fig: S1). These results showed that overlapping parameters for 1 6 8 merging forward-reverse amplicon reads may incur important differences in apparent gene abundance as an appropriate overlapping parameter leads to false positive and negative results in Illumina sequencing platforms while analyzing AMR data. BLAST query hsp percentage 90 (Fig:S2). The default overlap with default BLAST could not be 1 7 8 used for analysis due to nonspecific reads merges. Specifically, the PandaSeq default merge length is 1bp, indicating that any two reads possessing a common base at the 5´ will be merged.
The 10bp overlap and BLAST qcov hsp percentage 90 was also not efficient as it hampered found to be most accurate as it avoided these issues and was applied for all subsequent analyses. low (i.e. less than 0.004%). Additionally, 6% of genes detected using Illumina MiSeq were 1 9 0 missing from the Ion Torrent results, but again the percentage abundance of each gene found only by Illumina was very low (i.e. less than 0.004%). There were many genes that were only 1 9 2 discovered in one or two samples, and those with a small number of hits. Overall, 62.1% genes 1 9 3 detected were common across both platforms. But, when genes with abundance ≥ 1% sequencing 1 9 4 reads were considered, the results from both sequencing platforms were similar (Table:1, Fig:1).

9 5
The APH (3')-IIIa gene was found to be most abundant in both the platforms followed by tetW 1 9 6 and tetQ. The occurrence of only nine genes was found to be significantly different between the 1 9 7 sequencing platforms (Fig: S3). Out of these nine genes, tet(40) was found to be most variable 1 9 8 (4%). Sample-specific comparison highlighted similar platform-associated variation for the 1 9 9 occurrence of tetO and Aminoglycoside phosphotransferase genes (Fig: S4, Fig: S5). Direct and Ion Torrent S5 Plus for genes with greater than 1% read abundance (Fig:2). The abundance of tet (40)  Streptococcus agalactiae and hence, the same trend in the percentage of S. agalactiae could be 2 1 1 observed (Fig:1, Fig:3). Prediction of bacterial identity associated with AMR gene carriage was found to be comparable 2 1 5 in both the platforms ( found to be significantly differently represented e between the platforms (q-value (corrected) = samples as also undertaken, illustrating the stability of taxonomic classification between 2 2 0 sequencing platforms (Fig 4, Fig: S7, Fig: S8). correlation or similarity (Fig:5). In the absence of clear complementarity, the CARD database genes and organisms among the four databases. As CARD is used primarily with genome 2 2 8 sequence data, a 'model' of detection for each sequence means that the criteria that determine the corresponding sequence for each sequence of the CARD reference are determined. The The random forest method generates decision trees from data samples, generation multiple predictions before identifying the best solution. Random forest is an ensemble method that is supporting the accuracy of each (Fig: S9). Similarly, PCoA analysis was used to confirm that all 2 4 2 Illumina MiSeq and Ion Torrent sample sequences were located in the same cluster (Fig. 6) LEfSe was performed for both gene and organism at Log LDA 3.0 and P-value ≤ 0.05. Only four 2 5 0 of 300 organisms were found to be significantly different between sequencing platforms. Ion Torrent dataset while, Uncultured bacteria were more common using the Illumina platform.

5 3
However, the abundance of all four organisms was low, less than 0.07% and 0.02% in the Ion 2 5 4 Torrent and Illumina datasets, respectively, both below the 1% threshold set earlier (Fig: S10).

5 5
LEfSe analysis of the AMR genes detected indicated that five genes were significantly different 2 5 6 between the platforms (Fig: S11). The genes tet32, ErmT, tetS and Erm35 was found to be more 2 5 7 abundant in Ion Torrent sequencing, while tet(40) was more common in the Illumina data.

5 8
Again, the percent abundance of these gene-specific reads was less than 0.04% in the Ion Torrent sequencing. Only detection of the gene tet(40) was found to be significantly different with more 2 6 0 than 1% read abundance, presenting with a two-fold higher abundance in the Illumina MiSeq abundance is very less.

0 2
Two different platforms were used to identify any database correlation if any. One of these local database was preferred due to its capacity to target higher number of gene as compare to another. Moreover, the main disadvantage of CLC workbench is that it is not freely available.

0 7
Both the CLC workbench license and the microbiological insight module have separate costs to 3 0 8 pay.
3 0 9 The present study has effectively demonstrated that, the analysis platform used to detect AMR in present study is; we do not perform the same exercise on the mock community as, such mock 3 1 3 community was not available. Irrespective of sequencing chemistry and platform used, comparative analysis among AMR biological samples, standard methods and pipeline for sample analysis must be established.

2 5
Database selection and parameter for analysis can change the outcome considerably.    Likely host organism (as predicted by CARD) and amplicon length is shown.  Organism PERMANOVA analysis with F-value 0.82178, R 2 value 0.036009 and *p-value <0.514.