Benchmarking reveals superiority of 1 deep learning variant callers on 2 bacterial nanopore sequence data 3

Abstract


Introduction
Variant calling is a cornerstone of bacterial genomics as well as one of the major applications of next generation sequencing.Its downstream applications include identification of disease transmission clusters, prediction of antimicrobial resistance, and phylogenetic tree construction and subsequent evolutionary analyses, to name a few [1][2][3][4].Variant calling is used extensively in public health laboratories to inform decisions on managing bacterial outbreaks [5] and in molecular diagnostic laboratories as the basis for clinical decisions on how to best treat patients with disease [6].Over the last 15 years, short-read sequencing technologies, such as Illumina, have been the mainstay of variant calling in bacterial genomes, largely due to their relatively high level of basecalling accuracy.However, nanopore sequencing on devices from Oxford Nanopore Technologies (ONT) have emerged as an alternative technology.One of the major advantages of ONT sequencing from an infectious diseases public health perspective is the ability to generate sequencing data in near real-time, as well as the portability of the devices, which has enabled researchers to sequence in remote regions, closer to the source of the disease outbreak [7,8].Limitations in ONT basecalling accuracy have historically limited its widespread adoption for bacterial genome variant calling [9].ONT have recently released a new R10.4.1 pore, along with a new basecaller [10] with three different accuracy modes (fast, high-accuracy [hac] and super-accuracy [sup]).The basecaller also has the ability to identify a subset of paired reads for which both strands have been sequenced (duplex), leading to impressive gains in basecalling accuracy [11][12][13].
A number of variant callers have been developed for ONT sequencing [14][15][16].However, to date, benchmarking studies have focused on human genome variant calling, and have mostly used the older pores, which do not have the ability to identify duplex reads [17][18][19].In addition, modern deep learning-based variant callers use models trained on human DNA sequence only, leaving an open question of their generalisability to bacteria [15,16,20].Human genomes have a very different distribution of k-mers (segments of DNA sequence of length k) and patterns of DNA modification, and as such, results from human studies may not directly carry over into bacterial genomics.
Moreover there is substantial k-mer and DNA modification variation within bacteria, mandating a broad multi-species approach for evaluation [21].Existing benchmarks for bacterial genomes, while immensely beneficial and thorough, only assess short-read Illumina data [22,23].
In this study, we conduct a benchmark of SNP and indel variant calling using ONT and Illumina sequencing across a comprehensive spectrum of 14 Gram-positive and Gram-negative bacterial species.We used the same DNA extractions for both Illumina and ONT sequencing to ensure our results are not biased by acquisition of new mutations during culture.We develop a novel strategy for generating benchmark variant truthsets in which we project variations from different strains onto our gold standard reference genomes in order to create biologically realistic distribution of SNPs and indels.We assess both deep learning-based and traditional variant calling methods and investigate the sources of errors and the impact of read depth on variant accuracy.

Genome and variant truthset
Ground truth reference assemblies were generated for each sample using ONT and Illumina reads (see Genome assembly).
Creating a variant truthset for benchmarking is challenging [24,25].Calling variants against a sample's own reference yields no variants, so we generated a mutated reference.Instead of random mutations, we used a pseudo-real approach, applying real variants from a donor genome to the sample's reference [25,26].This approach has the advantage of a simulation, in that we can be certain of the truthset of variants, but with the added benefit of the variants being real differences between two genomes.
For each sample, we selected a donor genome with average nucleotide identity (ANI; a measure of similarity between two genomes) closest to 99.5% (see Truthset and reference generation).We identified all variants between the sample and donor using minimap2 [27] and mummer [28], intersected the variant sets, and removed overlaps and indels longer than 50bp.This variant truthset was then applied to the sample's reference to create a mutated reference, ensuring no complications from large structural differences.While incorporating structural variation would be an interesting and useful addition to the current work, we chose to focus here on small (<50bp) variants.
Table 1 summarises the samples used, the number of variants, and the ANI between each sample and its donor.We analysed 14 samples from different species, spanning a wide range of GC content (30-66%).Despite the variation in SNP counts (2102-57887), the number of indels was 90 consistent across samples (see Suppl.Table S2
Alignments of ONT reads to each sample's mutated reference (see Genome and variant truthset) were generated with minimap2 and provided to each variant caller (except Medaka, which takes reads directly).Variant calls were assessed against the truthset using vcfdist (v2.3.3 [33]), classifying each variant as true positive (TP), false positive (FP), or false negative (FN).Precision, recall, and the F1 score were calculated for SNPs and indels at each VCF quality score increment.Table S3 for a summary and S4 for full details) as well as results broken down by species for Clair3 with the sup model in Suppl.Figures S5-7.Reads basecalled with the fast model are an order of magnitude worse than the hac and sup models.

Understanding missed and false calls
Conventional wisdom may leave readers surprised at finding that ONT data can provide better variant calls than Illumina.In order to convince ourselves (and others) of these results, we investigate the main causes for this difference.
1.00%  Given the ONT read-level accuracy now exceeding Q20 (Figure 1; simplex sup), read length remains the primary difference between the two technologies.Suppl.Figure S4 shows that Illumina's lower F1 score is mainly due to recall rather than precision (Suppl.Figure S3).We hypothesised that Illumina errors are related to alignment difficulties in repetitive or variant-dense regions due to its shorter reads.
Figure 4 shows that variant density and repetitive regions account for many false negatives, lowering recall.We define variant density as the number of variants (missed or called) in a 100bp window around each call.Figure 4a reveals a bimodal distribution of variant density for Illumina FNs, with a second peak at 20 variants per 100bp, unlike the distribution for TP and FP calls.In contrast, Clair3, a top-performing ONT variant caller, shows no bimodal distribution and few missed or false calls at this density (Figure 4b).Illumina reads struggle to align in variant-dense regions, whereas ONT reads can (Suppl.Figure S10), as 20 variants per 100bp represent a larger portion of an Illumina read than an ONT read.
We also assessed the change in F1 score when masking repetitive regions of the genome (see Identifying repetitive regions).Due to their shorter length, Illumina reads struggle more with alignment in these regions compared to ONT reads [34].Suppl.Figure S11 highlights missed variants and alignment gaps in Illumina data.This is further quantified by the increase in Illumina's F1 score when repetitive regions are masked (Figure 4c), rising from 99.3% to 99.7%.In contrast, Clair3 100x simplex sup data shows only a 0.003% increase.
In terms of ONT missed calls, a variant dense repetitive region in the E. coli sample ATCC_25922 was the cause of the simplex sup SNP outlier from Figure 2 (see Suppl.Section S2).In addition, the duplex sup SNP outlier was caused by very low read depth for sample KPC2_202310 (K.pneumoniae; Suppl.Section S2).
Indels have traditionally been a systematic weakness for ONT sequencing data; primarily driven by variability in the length of homopolymeric regions as determined by basecallers [9].Having seen the drastic improvements in read accuracy in Figure 1, we sought to determine whether false positive indel calls are still a byproduct of homopolymer-driven errors.
When analysing Clair3, the best-performing ONT caller, we found that reads basecalled with the fast model often miscalculate homopolymer lengths by 1 or 2bp (Figure 5), though there is an equal number of non-homopolymeric false indel calls.In contrast, the sup model significantly reduced false indel calls, matching Illumina's error profile.Of the eight false indel calls by Clair3 on sup data, five were homopolymers and three occurred within one or two bases of another insertion with a similar sequence.The hac model improved over the fast model but still produced notable false indel calls, mainly miscalculating homopolymers by 1bp.DeepVariant showed a similar error profile to Clair3 (Suppl.Figure S13), with 8/11 false indels being homopolymers.FreeBayes (Suppl.Figure S14), Medaka (Suppl.Figure S15) and NanoCaller (Suppl.Figure S16) performed similarly, while BCFtools (Suppl.Figure S12) exhibited a persistent bias for homopolymeric indel errors, even with sup model reads.This indicates that while the sup basecaller reduces bias, deep learning methods like Clair3 and DeepVariant further mitigate it by training models to account for these systematic issues.An honourable mention goes to FreeBayes, a traditional variant caller that handles errors without inherent bias.
Lastly, we did not see any systematic indel bias in the context of missed calls (Suppl.Figures S17-S22), especially when compared to Illumina indel error profiles.

How much read depth is enough?
Having established the accuracy of variant calls from 'full depth' ONT datasets (100x), we investigated the required ONT read depth to achieve desired precision or recall, which varies by use case and resource availability.This is particularly relevant for ONT, where sequencing can be stopped in real-time once 'sufficient' data is obtained.
We subsampled each ONT read set with rasusa (v0.8.0 [35]) to average depths of 5, 10, 25, 50, and 100x and called variants with these reduced sets.Due to limited duplex depth, 50x was the   maximum used for duplex reads, while 100x was used for simplex reads.Figure 6 and Figure 7 show F1 score, precision, and recall as functions of read depth for SNPs and indels.Precision and recall decrease as read depth is reduced, notably below 25x.Remarkably, Clair3 or DeepVariant on 10x ONT sup simplex data provides F1 scores consistent with, or better than, full-depth Illumina for both SNPs and indels (see Table S1 for Illumina read depths).The same is true for duplex hac or sup reads.With 5x of ONT read depth the F1 score is lower than Illumina for almost all variant caller and basecalling models.However, BCFtools surprisingly produces SNP F1 scores on par with Illumina on duplex sup reads.Despite the inferior F1 scores across the board at 5x, SNP precision remains above Illumina with duplex reads for all methods except NanoCaller, and calls from Clair3 and DeepVariant simplex sup data.

What computational resources do I need?
The final consideration for variant calling is the required computational resources.While this may be trivial for those with high-performance computing (HPC) access, many analyse bacterial genomes on personal computers due to their smaller size compared to eukaryotes.The main resource constraints are memory and runtime, especially for aligning reads to a reference and calling variants.Additionally, if working with raw (pod5) ONT data, basecalling is also a resource-intensive step.
Figure 8 shows the runtime (seconds per megabase of sequencing data) and maximum memory usage for read alignment and variant calling (see Suppl. Figure S23 and Table S7 for basecalling GPU runtimes).DeepVariant was the slowest (median 5.7s/Mbp) and most memory-intensive (median 8GB), with a runtime of 38 minutes for a 4Mbp genome at 100x depth.FreeBayes had the largest runtime variation, with a maximum of 597s/Mbp, equating to 2.75 days for the same genome.In contrast, basecalling with a single GPU using the super-accuracy model required a median runtime of 0.77s/Mbp, or just over 5 minutes for a 4Mbp genome at 100x depth.Clair3 had a median memory usage of 1.6GB and a runtime of 0.86s/Mbp (<6 minutes for a 4Mbp 100x genome).Full details are in Suppl.Table S6.

Discussion
In this study, we evaluated the accuracy of bacterial variant calls derived from Oxford Nanopore Technologies (ONT) using both conventional and deep learning-based tools.Our findings show that deep learning approaches, specifically Clair3 and DeepVariant, deliver high accuracy in SNP and indel calls from the latest high-accuracy basecalled ONT data, outperforming Illumina-based methods, with Clair3 achieving median F1 scores of 99.99% for SNPs and 99.53% for indels.
The high-quality sequencing data enabled the creation of near-perfect reference genomes, crucial for evaluating variant calling accuracy.While not claiming perfection for these genomes, we consider them to be as accurate as current technology allows (or as philosophically possible) [13,36].
To benchmark variant calling, we utilised a variant truthset generated by applying known differences between closely related genomes to a reference.This pseudo-real method offers a realistic evaluation framework and a reliable truthset for assessing variant calling accuracy [25,26].
Our comparison of variant calling methods showed that deep learning techniques achieved the highest F1 scores for SNP and indel detection, indicating their potential in genomic analyses and suggesting a shift towards more advanced computational approaches.While the superior  Each point represents a single run across read depths, basecalling models, read types, and samples for that variant caller (or alignment).s=seconds; m=minutes; MB=megabytes; GB=gigabytes; Mbp=megabasepairs.
performance of these methods has been established for human variant calls [17,18], our results confirm their effectiveness for bacterial genomes as well.
Our investigation into missed and false variant calls highlights inherent challenges posed by sequencing technology limitations, particularly read length, alignment in complex regions, and indel length in homopolymers.We found that variant density and repetitive regions hinder Illumina variant calling due to short read alignment issues.However, we found recent improvements in ONT read accuracy and deep learning-based variant callers have mitigated homopolymer-induced false positive indel calls, previously a major systematic issue with ONT data [9,13].
Having established the accuracy and error sources of modern methods, we examined the impact of read depth on variant calling accuracy.Our results show that high accuracy is achievable at reduced read depths of 10x, especially with super-accuracy basecalling models and deep learning algorithms.This is significant for resource-limited projects, as 10x super-accuracy simplex data can match or exceed Illumina accuracy.For optimal clinical and public health applications, we recommend a minimum of 25x depth.Notably, 5x depth with duplex super-accuracy ONT data achieved SNP accuracy comparable to Illumina.Having such confidence in low-depth calls will no doubt be a boon for many clinical and public health applications where sequencing direct-from-sample is desired [2,[37][38][39].
Lastly, considering computational resource requirements is crucial, especially for those without high-performance computing facilities [7,8,40].Our findings show a wide range of demands among variant calling methods, with the worst-case scenario (FreeBayes) taking over two days.
Most methods, however, run in less than 40 minutes, with Clair3 having a median runtime of about 6 minutes.All methods use less than 8GB of memory, making them compatible with most laptops.
Basecalling is generally faster than variant calling, assuming GPU access, which is likely considered when acquiring ONT-related equipment.There are three main limitations to this work.The first is that we only assess small variants and ignored structural variants.Zhou et al. benchmarked structural variant calling from ONT data [41], though this focused on human sequencing data.Generating a truthset of structural variants between two genomes is, in itself, a difficult task.However, we believe such an undertaking with a thorough investigation of structural variant calling methods for bacterial genomes would be highly beneficial.
The second limitation is not using a diverse range of ANI values for selecting the variant donor genomes when generating the truthset.Previous work from Bush et al. examined different diversity thresholds for selecting reference genomes when calling variants from Illumina data, and found it to be one of the main differentiating factors in accuracy [23].Our results mirror this to an extent, showing the reduction in Illumina accuracy as the variant density increases, though it would be interesting to determine whether the divergence in reference genomes has an affect on ONT variant calling accuracy.Nevertheless, to maintain our focus on the nuances of variant calling methods, including basecalling models, read types, error types, and the influence of read depth, we decided that introducing another layer of complexity into our benchmark could potentially obscure some of the insights.
The third limitation is that Illumina sequencing was performed on different models: three samples on the NextSeq 500 and the rest on the NextSeq 2000.While differences in error rates exist between Illumina instruments, no specific assessment has been made between these NextSeq models [42].However, the absolute differences in error rates are minor and unlikely to impact our study significantly.This is particularly relevant since Illumina's lower F1 score compared to ONT was due to missed calls rather than erroneous ones.
In conclusion, this study comprehensively evaluates bacterial variant calls using Oxford Nanopore Technologies (ONT), highlighting the superior performance of deep learning tools, particularly Clair3 and DeepVariant, in SNP and indel detection.Our extensive dataset and rigorous benchmarking demonstrate significant advancements in sequencing accuracy with the latest ONT technologies.
Improvements in ONT read accuracy and deep learning variant callers have mitigated previous challenges like homopolymer-associated errors.We also found that high accuracy can be achieved at lower read depths, making these methods practical for resource-limited settings.This capability marks a significant step in making advanced genomic analysis more accessible and impactful.
Illumina reads were preprocessed with fastp (v0.23.4 [44]) to remove adapter sequences, trim low-quality bases from the ends of the reads, and remove duplicate reads and reads shorter than 30bp.

Genome assembly
Ground truth assemblies were generated for each sample as per Wick et al. [36].Briefly, the unfiltered ONT simplex sup reads were filtered with Filtlong (v0.2.1 [45]) to keep the best 90% (-p 90) and fastp (default settings) was used to process the raw Illumina reads.We performed 24 separate assemblies using the Extra-thorough assembly instructions in Trycycler's (v0.5.4 [46]) documentation.Assemblies were combined into a single consensus assembly with Trycycler and Illumina reads were used to polish that assembly using Polypolish (v0.6.0;default settings [47]) and Pypolca (v0.3.1 [48,49]) with --careful.Manual curation and investigation of all polishing changes was made as per Wick et al. [36] (e.g., for very long homopolymers, the correct length was chosen as per Illumina reads support).

Truthset and reference generation
To generate the variant truthset for each sample, we identified all variants between the sample and a donor genome.To select the variant-donor genome for a given sample, we downloaded all RefSeq assemblies for that species (up to a maximum of 10000) using genome_updater (v0.6.3 [50]).ANI was calculated between each downloaded genome and the sample reference using skani (v0.2.1 [51]).We only kept genomes with an ANI, , such that 98.40% ≤  <= 99.80%.In addition, we excluded any genomes with CheckM [52] completeness less than 98% and contamination greater than 5%.We then selected the genome with the ANI closest to 99.50%.Our reasoning for this range exclusion is that genomes with  < 99.80% are almost always members of the same sequence type (ST) [53,54], and we found very little variation between them (data not shown).
We then identified variants between the reference and donor genomes using both minimap2 (v2.26 [27]) and mummer (v4.0.0rc1 [28]).We took the intersection of the variants identified by minimap2 and mummer into a single VCF and used BCFtools (v1.19 [29]) to decompose multi nucleotide polymorphisms (MNPs) into SNPs, left-align and normalise indels, remove duplicate and overlapping variants, and exclude any indel longer than 50bp.The resulting VCF file is our truthset.
Next, we generated a mutated reference genome, which we used as the reference against which variants were called by the different methods we assess.BCFtools' consensus subcommand was used to apply the truthset of variants to the sample reference, thus producing a mutated reference.

Alignment and variant calling
ONT reads were aligned to the mutated reference with minimap2 using options --cs, --MD, and -aLx map-ont and output to a BAM alignment file.
Where a variant caller provided an option to set the expected ploidy, haploid was given.In addition, where a minimum read depth or base quality option was available, a value of 2 and 10, respectively, was used in order to try and make downstream assessment and filtering consistent across callers.
However, as no fast model is available, we used the hac model with the fast-basecalled reads.
The pretrained model option --model_type ONT_R104 was used with DeepVariant, and the default model was used for NanoCaller.For Medaka, the provided v4.3.0 sup and hac models were used, with the hac model being used for fast data as no fast model is available.
For the Illumina variant calls that act as a benchmark to compare ONT against, we chose Snippy [32] due to being tailored for haploid genomes and being one of the best performing variant callers on Illumina data [23].Snippy performs alignmen of read with BWA-MEM [58] and calls variants with FreeBayes.
Variant call files (VCFs) are then filtered to remove overlapping variants, make heterzygous calls homozygous for the allele with the most depth, normalise and left-align indels, break MNPs into SNPs and remove indels longer than 50bp, all with BCFtools.

Variant call assessment
Filtered VCFs were assessed with vcfdist (v2.3.3 [33]) using the truth VCFs and mutated references from Truthset and reference generation.We disabled partial credit with --credit-threshold 1.0 and set the maximum variant quality threshold (-mx) to the maximum in the VCF being assessed.

Identifying repetitive regions
To identify repetitive regions in the mutated reference, we used the following mummer utilities.
nucmer --maxmatch --nosimplify to align the reference against itself and retain non-unique alignments.We then passed the output into show-coords -rTH -I 60 to obtain the coordinates for all alignments with an identity of 60% or greater.Alignments where the start and end coordinates of the alignment do not match are considered as repeats and these are output in the BED format, with intervals being merged with BEDtools [59].

93 1 .
and super-accuracy (sup) -along with different read types -simplex and duplex (see Basecalling 94and quality control).Duplex reads are those in which both DNA strands from a single molecule 95 are sequenced back-to-back and basecalled together, whereas simplex reads are basecalled only 96 using a single DNA strand.The median, unfiltered read identities, calculated by aligning reads to 97 their respective assembly, are shown in Figure Duplex reads basecalled with the sup model had 98

Figure 3
Figure 3 shows the precision-recall curves for the sup basecalling model (see Suppl.Figures S8 and S9 for the hac and fast model curves, respectively) for each variant and read type -aggregated across samples to produce a single curve for each variant caller.Due to the right-angle-like shape of the Clair3 and DeepVariant curves, filtering based on low-value variant quality improves precision considerably for variant calls, without losing much recall.A similar pattern holds true for BCFtools SNP calls.The best Clair3 and DeepVariant F1 scores are obtained with no quality filtering on sup data, except for indels from duplex data where a quality filter of 4 provides the best F1.See Suppl.

Figure 3 .
Figure 3. Precision and recall curves for each variant caller (colours and line styles) on sequencing data basecalled with the sup model, stratified by variant type (rows) and read type (columns) and aggregated across samples.The curves are generated by using increasing variant quality score thresholds to filter variants and calculating precision and recall at each threshold.The lowest threshold is the lower right part of the curve, moving to the highest at the top left.Note, Longshot does not provide indel calls.

Figure 4 .
Figure 4. Impact of variant density and repetitive regions on Illumina variant calling.Variant density is the number of (true or false) variants in a 100bp window centred on a call.a and b) the distribution of variant densities for true positive (TP), false positive (FP) and false negative (FN) calls.The y-axis, percent, indicates the percent of all calls of that decision that fall within the density bin on the x-axis.Illumina calls, aggregated across all samples are shown in a, while b shows Clair3 calls from simplex sup-basecalled reads at 100x depth.c) impact of repetitive regions on the F1 score (y-axis) for Clair3 (100x simplex sup) and Illumina.The x-axis indicates whether variants that fall within repetitive regions are excluded from the calculation of the F1 score.Points indicate the F1 score for a single sample.

Figure 5 .
Figure 5. Relationship between indel length (y-axis) and homopolymer length (x-axis) for false positive (FP) indel calls for Clair3 100x simplex fast (top left), hac (top right), and sup (lower left) calls.Illumina is shown in the lower right for reference.The vertical red line indicates the threshold above which we deem a run of the same nucleotide to be a 'true' homopolymer.Indel length is the number of bases inserted/deleted for an indel, whereas the homopolymer length indicates how long the tract of the same nucleotide is after the indel.The colour of a cell indicates how many FP indels of that indel-homopolymer length combination.

Figure 8 .
Figure 8. Computational resource usage of alignment and each variant caller (y-axis and colours).The top panel shows the maximum memory usage (x-axis) and the lower panel shows the runtime as a function of the CPU time (seconds) divided by the number of basepairs in the readset (seconds per megabasepairs; x-axis).Each point represents a single run across read depths, basecalling models, read types, and samples for that variant caller (or alignment).s=seconds; m=minutes; MB=megabytes; GB=gigabytes; Mbp=megabasepairs.

Table 1 .
for details).Summary of the ANI and number of variants found between each sample and its donor genome.

Table S5
for the full details.A striking feature of Figure 2 and Figure 3 is the comparison of deep learning-based variant callers (Clair3, DeepVariant, Medaka, and NanoCaller) to Illumina.For all variant and read types with hac or sup data, these deep learning methods match or surpass Illumina, with median best SNP and indel F1 scores of 99.45% and 95.76% for Illumina.Clair3 and DeepVariant, in particular, perform an order of magnitude better.Traditional variant callers (Longshot, BCFtools, and Free-Bayes) match or slightly exceed Illumina for SNP calls with hac and sup data.FreeBayes matches Illumina for indel calls, but BCFtools shows reduced indel accuracy across all models and read types.Fast model ONT data has a lower F1 score than Illumina, only achieving parity in the best case for SNPs.
Effect of read depth (x-axis) on the highest SNP F1 score, and precision and recall at that F1 score (y-axis), for each variant caller (colours).Each column is a basecall model and read type combination.The grey bars indicate the number of samples with at least that much read depth in the full read set.Samples with less than that depth were not used to calculate that depth's metrics.Bars on each point at each depth depict the 95% confidence interval.The horizontal red dashed line is the full-depth Illumina value for that metric, with the red bands indicating the 95% confidence interval.Effect of read depth (x-axis) on the highest indel F1 score, and precision and recall at that F1 score (y-axis), for each variant caller (colours).Each column is a basecall model and read type combination.The grey bars indicate the number of samples with at least that much read depth in the full read set.Samples with less than that depth were not used to calculate that depth's metrics.Bars on each point at each depth depict the 95% confidence interval.The horizontal red dashed line is the full-depth Illumina value for that metric, with the red bands indicating the 95% confidence interval.
Bacterial isolates were streaked onto agar plates and grown overnight at 37°C.Mycobacterium tuberculosis, Streptococcus pyogenes, and Streptococcus dysgalatiae subsp.equisimilis were grown in liquid media of 7H9 or TSB with shaking until reaching high cell density (OD ∼ 1; see Suppl.Section S1 for Streptococcus sample selection).The cultures were centrifuged at 13000rpm for 10 minutes and cell pellets were collected.Bacteria were lysed with appropriate enzymatic treatment except for Mycobacterium and Streptococcus, which were lysed by bead beating (PowerBead, 0.5mm glass beads [13116-50] or Lysing Matrix Y [116960050-CF] and Precellys or Tissue lyser [Qiagen]).