A comparison of bioinformatics pipelines for compositional analysis of the human gut microbiome

Investigating the impact of gut microbiome on human health is a rapidly growing area of research. A significant limiting factor in the progress in this field is the lack of consistency between study results, which hampers the correct biological interpretation of findings. One of the reasons is variation of the applied bioinformatics analysis pipelines. This study aimed to compare five frequently used bioinformatics pipelines (NG-Tax 1.0, NG-Tax 2.0, QIIME, QIIME2 and mothur) for the analysis of 16S rRNA marker gene sequencing data and determine whether and how the analytical methods affect the downstream statistical analysis results. Based on publicly available case-control analysis of ADHD and two mock communities, we show that the choice of bioinformatic pipeline does not only impact the analysis of 16S rRNA gene sequencing data but consequently also the downstream association results. The differences were observed in obtained number of ASVs/OTUs (range: 1,958 - 20,140), number of unclassified ASVs/OTUs (range: 210 - 8,092) or number of genera (range: 176 - 343). Also, the case versus control comparison resulted in different results across the pipelines. Based on our results we recommend: i) QIIME1 and mothur when interested in rare and/or low-abundant taxa, ii) NG-Tax1 or NG-Tax2 when favouring stringent artefact filtering, iii) QIIME2 for a balance between two abovementioned points, and iv) to use at least two pipelines to assess robustness of the results. This work illustrates the strengths and limitations of frequently used microbial bioinformatics pipelines in the context of biological conclusions of case-control comparisons. With this, we hope to contribute to “best practice” approaches for microbiome analysis, promoting methodological consistency and replication of microbial findings. Author Summary Studies increasingly demonstrate the relevance of gut microbiota in understanding human health and disease. However, the lack of consistency between study results is a significant limiting factor of progress in this field. The reasons for this include variation in study design, sample size, bacterial DNA extraction and sequencing method, bioinformatics analysis pipeline and statistical analysis methodology. This paper focuses on the variation generated by bioinformatics pipelines. A choice of a bioinformatic pipeline can influence the assessment of microbial diversity. However, it is unclear to what extent and how the results and conclusion of a case-control study can be influenced. Therefore, we compared the results of a case-control study across different pipelines (applying default settings) while using the same dataset. Our results indicate a lack of consistency across the pipelines. We show that the choice of bioinformatic pipeline not only affects the analysis results of 16S rRNA gene sequencing data from the gut microbiome, but also the associated conclusions for the case-control study. This means different conclusions would be drawn from the same data analysed with different bioinformatic pipeline.


23
Investigating the impact of gut microbiome on human health is a rapidly growing area of research. A 24 significant limiting factor in the progress in this field is the lack of consistency between study results, 25 which hampers the correct biological interpretation of findings. One of the reasons is variation of the 26 applied bioinformatics analysis pipelines. This study aimed to compare five frequently used  Before applying the pipelines, we applied an in-house script to make sure that the input was the same 156 for all the pipelines. First, we had to deal with the mixed orientation of the sequences. This means that 157 forward and reverse files contained both forward and reverse sequences.

261
Of the genera detected by NG-Tax1, NG-Tax2, QIIME1, QIIME2 and mothur, only 40% overlapped 262 between all pipelines ( Figure 2A). After applying the 10% prevalence cut-off to preserve the most 263 informative data for the downstream statistical analysis, 41.4% to 46.5% of the genera remained (Table   264 1). The prevalence cut-off did not improve the percentage of overlapping genera ( Figure 2B), indicating 265 that more prevalent genera are not necessarily shared across the results from the different pipelines.

275
Unconstrained PCoA plots based on the Bray-Curtis measure revealed that samples clustered based 276 on the sample ID rather than the bioinformatics pipelines ( Figure 3A). However, the constrained 277 ordination method, CAP analysis, indicated relevant differences between the pipelines in terms of 278 microbial composition (Bray-Curtis index) at the genus level ( Figure 3B). The CAP analyses captured 279 the variation in community structure in the first two components (CAP 1 and CAP 2) accounting for 11.1% of the total variance ( Figure 3B). The same results were observed in terms of microbiome 281 structure using Jaccard's similarity index ( Figure S1). PERMANOVA analysis supported the results by 282 revealing that microbial composition (Bray-Curtis: R 2 =13.9%, p<0.001) and structure (Jaccard: R 2 =9.5%, 283 p<0.001) differed significantly between the pipelines and, as expected, more variability was explained 284 by the same sample ID (Bray-Curtis: R 2 =89.5%; p<0.001 and Jaccard: R 2 =82.8%; p<0.001). Additionally, 285 we performed a pairwise comparison of group means dispersions (TukeyHSD). The analysis confirmed 286 that the intra-sample variation is quite similar across the pipelines, except for QIIME1 ( Figure 3C).

287
The CAP analysis also showed that NG-Tax1 and NG-Tax2 clustered together, and QIIME2 clustered 288 with mothur ( Figure 3C,D  We also compared the distribution of the ten most abundant genera found by each pipeline (Figure 4).

303
We excluded unclassified genera, since they represent a group of genera rather a single genus. 304

305
We carried out univariate testing of the relative abundance of individual genera between ADHD cases 306 (N=40) and controls (N=50) in order to investigate if the downstream statistical conclusions were 307 consistent across the pipelines. In total, 10 genera showed nominally significant differences (p< 0.05) 308 between cases and controls in at least one pipeline (Table 2), but these differences were not consistent 309 across all pipelines. Based on the P-value consistency score (PCS), only one of the 10 genera showed 310 total agreement in terms of PCS (PCS=5), none showed high agreement (PCS=4), three genera showed 311 moderate agreement (PCS=3), and two genera showed partial agreement (PCS=2). The rest of the 312 genera (N=4) showed no agreement (PCS=1) ( Table 2). The descriptive statistics of the 10 genera can 313 be found in the Supplementary Table S3.

314
In order to determine the effect of the differences in genus abundance on the case-control comparison 315 between the pipelines, we compared Fold Change (FC) based on genera relative abundance (

327
Testing the correlation between PCS and two measures of frequency, relative abundance and the 328 percentage of zeros, we found the correlation coefficient between PCS and relative abundance to be 329 r PCS-RA =0.58 and the one between PCS and percentage of zeros to be r PCS-%0 =-0.24 ( Figure S2A,B). Both 330 correlations were non-significant (p>0.05), however, suggesting that the consistency across the 331 pipelines was independent of bacterial relative abundance and the observed percentage of zeros. The 332 lack of significance should be treated with caution, as it could be a result of the low number of features 333 included in the analysis (n=10 genera).     (Figure S3 A,B).

362
Comparison of individual genera showed inconsistencies across pipelines for both MCs ( Figure S4, S5).

517
The data underlying this article will be shared on reasonable request to the corresponding author. 518

519
The authors declare that they have no competing interests. that the intra-sample variation is quite similar across pipelines, except for QIIME1. Figure S2. Scatter plot representing a correlation of the PCS with the relative abundance (A) and 537 prevalence (B) based on the 10 genera showing nominally significant differences (p< 0.05) between 538 cases and controls in at least one pipeline (Table 2).
554 Table S3. Descriptive statistics of 10 genera shown in Table 2. 555 Table S4. Number of genera based on observed and expected (EXP) MC.