Evaluation of the effects of library preparation procedure and sample characteristics on the accuracy of metagenomic profiles

Christopher A Gaulke; Emily R Schmeltzer; Mark Dasenko; Brett M. Tyler; Rebecca Vega Thurber; Thomas J Sharpton

doi:10.1101/2021.04.12.439578

ABSTRACT

Shotgun metagenomic sequencing has transformed our understanding of microbial community ecology. However, preparing metagenomic libraries for high-throughput DNA sequencing remains a costly, labor-intensive, and time-consuming procedure, which in turn limits the utility of metagenomes. Several library preparation procedures have recently been developed to offset these costs, but it is unclear how these newer procedures compare to current standards in the field. In particular, it is not clear if all such procedures perform equally well across different types of microbial communities, or if features of the biological samples being processed (e.g., DNA amount) impact the accuracy of the approach. To address these questions, we assessed how five different shotgun DNA sequence library preparation methods, including the commonly used Nextera^® Flex kit, perform when applied to metagenomic DNA. We measured each method’s ability to produce metagenomic data that accurately represents the underlying taxonomic and genetic diversity of the community. We performed these analyses across a range of microbial community types (e.g., soil, coral-associated, mouse-gut-associated) and input DNA amounts. We find that the type of community and amount of input DNA influence each method’s performance, indicating that careful consideration may be needed when selecting between methods, especially for low complexity communities. However, cost-effective preparation methods we assessed are generally comparable to the current gold standard Nextera^® DNA Flex kit for high-complexity communities. Overall, the results from this analysis will help expand and even facilitate access to metagenomic approaches in future studies.

IMPORTANCE Metagenomic library preparation methods and sequencing technologies continue to advance rapidly, allowing researchers to characterize microbial communities in previously underexplored environmental samples and systems. However, widely-accepted standardized library preparation methods can be cost-prohibitive. Newly available approaches may be less expensive, but their efficacy in comparison to standardized methods remains unknown. In this study, we compared five different metagenomic library preparation methods. We evaluated each method across a range of microbial communities varying in complexity and quantity of input DNA. Our findings demonstrate the importance of considering sample properties, including community type, composition, and DNA amount, when choosing the most appropriate metagenomic library preparation method.

INTRODUCTION

Recent advancements in high-throughput sequencing have revolutionized genomic discovery and unlocked new insights regarding the diversity and function of microbial communities (1–4). For example, shotgun metagenomic sequencing has clarified how the functional capacity of the gut microbiome links to human health (5–8), improved the efficacy of antibiotic resistance gene discovery (9–12), identified beneficial soil microbes for agricultural use (13–15), and uncovered novel, medically relevant biosynthetic gene clusters in marine microbes (16–18). However, while metagenomes offer rich opportunity to transform discovery, the financial cost of producing metagenomic data limits their application. Because much of this cost is associated with the preparation of metagenomic DNA for high-throughput sequencing, there is hope that emergent economical products and procedures can expand the scope of metagenomic investigations.

Illumina’s Nextera^® XT and DNA Flex kits (the latter now known as the “Illumina^® DNA Prep”) have been the most widely used methods for preparing metagenomic libraries and have effectively served as industry standard approaches. Indeed, Illumina DNA sequencing platforms remain the most widely utilized for generating genomic and metagenomic data, and their library preparation kits are accordingly used to prepare samples for sequencing. Due to their frequent use, these kits are subject to extensive evaluation and refinement. For example, Illumina recently released an updated version of their “gold standard” Nextera^® XT kit, which was rebranded as the Nextera^® DNA Flex (and now Illumina^® DNA Prep). This new kit allows greater flexibility across a wider range of genomes, from small genomes (microbial and amplicons) to more complex genomes found in eukaryotic and human systems. The Flex kit also resolved sequencing biases identified in the Nextera^® XT kit that occur in genomic regions with extreme GC-content (19). These features of the Nextera^® DNA Flex kit have contributed to its broad adoption in metagenomic investigations.

One downside to the Nextera^® DNA Flex kit is its relatively high price, which presently costs roughly $46 per sample. While this cost may be reasonable considering the demand for the product and its observed efficacy, it is high enough that it limits the scale of many metagenomic investigations. For example, studies performing high-throughput analyses on hundreds or thousands of samples may be forced to utilize non-metagenomic approaches (e.g., 16S rRNA gene sequencing) due to this library preparation expense. In the effort to circumvent this challenge, several alternative and competitive genomic library preparation methods have recently been developed and applied to metagenomic investigations. These approaches fall into two categories: methods that increase the economy of the Illumina Nextera^® by modifying various aspects of the manufacturing protocols (e.g. Baym et al. 2015 (20), and those that use entirely different technologies (e.g. seqWell plexWell™ 96). These approaches hold great promise to improve the throughput of metagenomic investigations by reducing library preparation costs. For example, the recent method known as ‘Hackflex’ achieves an eleven-fold decrease in per sample reagent costs compared to the Illumina kit protocols (21).

Although several alternative library preparation approaches have been assessed from the perspective of whole genome sequencing, very little is known about their accuracy and precision when applied to metagenomic investigations. It is crucial that the performance of novel library preparation procedures be specifically assessed in diverse metagenomic communities as different community types provide the unique sequencing challenges not common to traditional whole genome sequencing. For example, metagenomic communities vary in complexity, with some communities having few distinct taxa (e.g., insect gut) while others are very highly diverse (e.g., soil). Library preparation procedures may vary in their ability to unbiasedly sample DNA across the different genomes present in the community. Biological samples vary in their biomass, which affects the amount of whole community DNA that is subject to the library preparation approach. The sensitivity of these approaches to the amount of input DNA may hence impact study outcomes (25–28).

To advance the utility of low cost metagenomic library preparation methods, we quantified the performance of five recently developed approaches. Our investigation assessed how different features of metagenomic samples, including community complexity, and biomass impacts the performance of these procedures. In particular, we compared the Illumina Nextera^® DNA Flex, a modified DNA Flex protocol (20), QIAGEN^® QIAseq FX DNA, Perkin Elmer NEXTFLEX^® Rapid DNA-Seq 2.0, and seqWell plexWell™ 96 library preparation methods using community acquired DNA obtained from three different types of microbial communities: low complexity communities (as represented by Acropora hyacinthus microbiomes), moderately complex communities (as represented by Mus musculus mouse fecal microbiomes), and a highly complex community (as represented by a soil microbiome). We also evaluated how each approach performs on a commercially available mock community comprised of ten microbial species. Our analysis clarifies the performance of these approaches across these different sample conditions and the results will assist investigators in identifying appropriate approaches for their metagenomic investigations.

RESULTS

Library preparation procedure, community type, and input concentration influence metagenomic library characteristics

To determine how metagenomic library characteristics (e.g., insert size, millions of sequences generated) varied across different metagenomic library preparations, we regressed each library characteristic on community type, library preparation, and input DNA concentration (Table 1). We found that these predictor variables statistically affected the following characteristics: median fragment size (F_(31,48) = 29.94, R² = 0.90, P < 2.2×10⁻¹⁶), library concentration (F_(31,48) = 12.44, R² = 0.82, P = 3.51×10⁻¹⁴), library molarity (F_(31,48) =11.75, R² 0.81, P = 1.09×10⁻¹³), sequence read length (F_(39,60)= 56.81, R² = 0.96, p < 2.2×10⁻¹⁶), number of reads generated (F_(39,60) = 11.69, R² = 0.81, P < 2.2×10⁻¹⁶), read GC content (F_(39,60) = 2285, R² = 0.99, P < 2.2×10⁻¹⁶), duplication rate (F_(8,91) = 2.494, R² = 0.11, P = 0.02), and percentage of reads filtered (F_(39,60) = 1716, R² = 0.99, P < 2.2×10⁻¹⁶). Sequence duplication rate was only sensitive to community type (F_(3,91) = 5.93, P = 9.0×10⁻⁰⁴). Specifically, communities with low microbial diversity such as the coral (t = 2.885, P = 4.88×10-03) and mock communities (t = 2.16, P = 0.03) had elevated duplication rates. All other library characteristics were sensitive to interactions between community type, library preparation, and input DNA concentration, and many characteristics were also impacted by the independent effects of these variables. For example, library preparation method (F_(3,48) = 148.85, P = 2.61×10⁻²⁴) and community type (F_(3,48) = 11.83, P = 6.39×10⁻⁰⁶) affected median fragment size independent of the interaction between these parameters and DNA input (F_(9,48) = 5.93, P = 1.56×10⁻⁰⁵). Library concentration was impacted by community type (F_(3,48) = 5.72, P = 1.98⁻⁰³), library preparation (F_(3,48) = 20.75, P = 9.22×10⁻⁰⁹), input DNA concentration (F_(1,48) = 186.86, P < 2.2×10⁻¹⁶) and their interaction (F_(9,48) = 6.09, P = 1.16×10⁻⁰⁵). Library molarity was similarly impacted by community type (F_(3,48) = 5.63, P = 2.18×10⁻⁰³), library preparation (F_(3,48) = 35.69, P = 2.82×10⁻¹²), input DNA concentration (F_(1,48) = 126.21, P = 4.89×10⁻¹⁵) and their interaction (F_(9,48) = 4.40, P = 3.16×10⁻⁰⁴).

View this table:

Table 1. Library preparation summary statistics.

Each column show means and range of NGS library parameters in coral, feces, mock, and soil sample types. Parameters that varied significantly (p < 0.05) by community type(*), input concentration (†), and library preparation(‡)

The number of sequences generated was sensitive to preparation procedure (F_(4,60) = 41.68, P = 1.10×10⁻¹⁶), community type (F_(3,60) = 67.67, P = 3.09×10⁻¹⁹), and DNA input concentration (F(1,60) = 5.62, P = 2.09×10⁻⁰²) as well as the interaction between these variables (F_(12,60) = 2.25, P = 2.00×10⁻⁰²). The number of sequence reads that were quality filtered and derived from the host genome was also significantly affected by this interaction (F_(12,60) = 5.16, P = 7.44×10⁻⁰⁶). Metagenome libraries constructed for coral communities had increased levels of quality filtering (t = 62.11, P = 3.66×10⁻⁵⁶) while filtering in soil (t = −6.65, P = 9.84×10⁻⁰⁹) and mock (t = −6.56, P = 1.39⁻⁰⁸) communities was decreased when compared to fecal samples. Read filtering was also consistently increased in samples prepared with the Nextera^® Flex Reduced method (t = −12.38, P = 3.59⁻¹⁸) and reduced in the samples prepared using the QIASeqFX procedure (t = −4.89, P = 7.84⁻⁰⁶).

Metagenomic read characteristics were also significantly impacted by the examined variables. For example, the GC content of reads was significantly dependent on community type (F_(3,60) = 28,941.39, P = 9.38×10⁻⁹⁵). GC content was also affected by library preparation method (F_(4,60) = 442.80, P = 8.71×10⁻⁴⁴); a significant effect of the interaction between library preparation, community type, and input DNA concentration (F_(12,60) = 2.68, P = 5.98×10⁻⁰³) was also observed for GC content. Finally, average read length after quality filtering varied by both community type (F_(3,60) = 60.78, P = 3.54×10⁻¹⁸) and preparation procedure (F_(4,60) = 404.25, P = 1.21×10⁻⁴²). Collectively, these findings indicate that metagenomic library preparation procedures yield distinct library characteristics for different community types.

Different library preparation methods result in similar taxonomic profiles of a standardized mock community

While library preparation procedures varied in the resulting metagenomic library and sequence characteristics, it is unclear if this variation results in different downstream assessment of community composition. To address this question, we quantified how each library preparation method predicted the taxonomic composition of a defined mock community. We compared the taxonomic composition generated by each library preparation to the ZymoBIOMICS™ Microbial Community standard’s defined taxonomic composition of the mock community. Strong correlations (ρ = 0.93 - 0.97, P = 1.29×10⁻⁶ - P < 2.2×10⁻¹⁶, fdr < 1.0×10⁻⁵) were observed between the MetaPhlAn2 inferred taxonomic abundances and theoretical taxonomic abundances of taxa present in the mock community (Figure 1). To confirm that this was not due to bias in the MetaPhlAn2 database, we also compared the inferred taxonomic abundances using Kraken2 and observed similar taxonomic associations (ρ = 0.93 - 0.96, all P < 2.2×10⁻¹⁶, fdr < 2.2×10⁻¹⁶) and abundance profiles (Supp. Fig. 1).

Figure S1. Kraken2 imputed abundance profiles strongly correlate with theoretical mock community compositions across all metagenomic library preparation methods.

Scatter plots of the inferred Kraken2 taxonomic profiles and theoretical taxonomic compositions of a mock community. Library preparation methodology is indicated by point color and input concentration is denoted with point shape. Values in each panel represent the Pearson correlation coefficient.

Figure 1. Metagenomic library preparation methods accurately predict taxonomic composition of a simplified mock community.

Scatter plots of the observed and theoretical taxonomic compositions of a mock community. Libraries preparation methodology is indicated by point color and input concentration is denoted with point shape. Values in each panel represent the Pearson correlation coefficient.

MetaPhlAn2 results showed that the Nextera^® Flex Full and Nextera^® Flex Reduced methods, which are widely used in metagenomic studies, had the lowest correlations (ρ = 0.93 – 0.96, P = 1.29×10⁻⁶ - 2.66×10⁻⁸) with the theoretical composition of the mock community. The lower correlations produced by these two library preparation procedures are driven by an underestimation of Lactobacillus fermentum and an overestimation of the abundance of Staphylococcus aureus and Enterococcus faecalis. The strongest correlations were observed in the QIASeqFX (ρ = 0.97, P = 1.21×10⁻⁸ - 5.79×10⁻⁹), plexWell96 (ρ = 0.96 - 0.97, P = 2.58×10⁻⁸ - 1.24×10⁻⁸), and NEXTFLEX Rapid (ρ = 0.96 - 0.97, P = 1.02×10⁻⁷ - 2.38×10⁻⁸) methods (Figure 1). Together these data indicate that library preparation methods subtly influence some taxonomic estimates but that all methods examined overall performed well at recapitulating simple, defined microbial communities.

Community taxonomic profiles are significantly impacted by library preparation procedure and input concentration

Next, we quantified the impact of library preparation methods and input concentrations on the taxonomic profiles of soil, coral, mock, and fecal metagenomes. Library preparation procedure significantly associated with resulting taxonomic microbiome profiles as measured by PERMANOVA in coral (R² = 0.70, P = 2.00×10⁻⁴), feces (R² = 0.85, P = 2.00×10⁻⁴), mock (R² = 0.72, P = 4.00×10⁻⁴), and soil (R² = 0.72, P = 4.00×10⁻⁴) (Figure 2A) communities. In the fecal, soil, and mock communities, no association was found between input concentration and microbiome beta-diversity. However, in coral communities we identified a significant interaction effect between input concentration and metagenomic preparation method (R² = 0.13, P = 3.0×10⁻²). The taxonomic abundance profiles of each library were highly correlated (ρ = 0.63 – 1, fdr < 2.2×10⁻¹⁶) across library preparation methods (Figure 2B) suggesting that preparation methodology associates with distinct community profiles.

Figure 2. Microbial taxonomic diversity varies by library preparation method and input concentration.

A) PCA ordinations of microbial taxonomic diversity for each sample type. B) Correlation heat maps of taxonomic abundances generated from each library type. Row and column side plots indicated the library preparation methodology and the input concentration. Boxplots of dissimilarities (Bray Curtis) between C) technical replicates (lng inputs), and D) different input concentrations. Letters indicate significant differences, P < 0.05.

To quantify how variable individual replicates were across different library preparation methods, we examined the dissimilarity in taxonomic beta-diversity within each library preparation method. Methods that yield low intra-concentration (i.e., 1 ng/μL input) dissimilarities indicate high levels of reproducibility, while methods with low inter-concentration dissimilarities (i.e., all concentrations) would indicate that the taxonomic profiles generated using this method are robust to variation in library input concentration. We found that variation in intra-concentration taxonomic dissimilarity was low and significant differences were not observed across library preparation methodologies (Figure 2C). Inter-concentration dissimilarity was also low in fecal, mock, and soil samples but varied significantly across preparation methods in coral (H = 12.28, P = 0.02) potentially due to elevated dissimilarity in the QIASeqFX libraries. Together these data suggest that the taxonomic profiles generated using the methods under investigation are similarly reproducible, but the robustness varied across methods.

Community gene abundance profiles are sensitive to library preparation procedures and input DNA concentration

Metagenomic investigations frequently seek to define the genetic diversity of microbial communities. Using the number of distinct gene families observed in the data (i.e., gene family richness) as well as the functional composition of the community (i.e., gene family beta-diversity), we measured how different library preparation procedures affected determination of a community’s functional capacity. Gene family richness varied by library preparation method and input concentration in coral (F_(9,15) = 31.90, R² = 0.92, P = 3.47 ×10⁻⁰⁸), feces (F_(9,15) = 9.13, R² = 0.75, P = 1.20 ×10⁻⁰⁴), and mock (F_(9,15) = 8.78, R² = 0.74, P = 1.51 ×10⁻⁰⁴) communities, while soil richness (F_(9,15) = 1.41, R² = 0.13, P =0.27) was less sensitive to these effects (Figure 3A). We also observed a significant interaction between library preparation procedure and DNA input on the predicted functional profiles of coral (F_(4,15) = 8.00, P = 1.16 x 10⁻⁰³), feces (F_(4,15) = 5.64, P = 5.60 x 10⁻⁰³), and mock microbiomes (F_(4,15) = 7.78, P = 1.33 x 10⁻⁰³). However, these associations were not always consistent across different library preparation procedures. For example, gene richness was elevated in coral samples prepared with the plexWell™96 method (t = 5.53, P = 5.77×10⁻⁰⁵), while a contrasting pattern was observed in both fecal (t = −6.38, P = 1.23 ×10⁻⁰⁵) and soil (t = −1.958, P = 6.91 ×10⁻⁰²) samples, and no difference was identified in mock community samples (t = −0.58, P = 0.57). Similar effects of library preparation method and input concentration were observed on Shannon entropy (Figure 3B). Specifically, significant effects were observed for coral (F_(9,15) = 11.09, R² = 0.79, P = 3.75 ×10⁻⁰⁵), feces (F_(9,15) = 62.29, R² = 0.96, P = 1.89 ×10⁻¹⁰), mock (F_(9,15) = 11.04, R² = 0.79, P = 3.85 ×10⁻⁰⁵), and soil (F_(9,15) = 5.06, R² = 0.60, P = 2.95 ×10⁻⁰³).

Figure 3. Metagenomic diversity varies by library preparation method and input concentration.

A) Gene and B) Shannon entropy plots for each sample type.

As measured by PERMANOVA, we found that gene family beta-diversity (Bray Curtis) was significantly associated with library preparation method for coral (F_(4,15) = 3.62, R² = 0.44, P = 2.00 ×10⁻⁰⁴), fecal (F_(4,15) = 7.82, R² = 0.60, P = 2.00 ×10⁻⁰⁴), mock (F_(4,15) = 3.36, R² = 0.41, P = 2.00 ×10⁻⁰⁴), and soil (F_(4,15) = 3.39, R² = 0.40, P = 2.00 ×10⁻⁰⁴) communities (Figure 4A). However, the association between library preparation procedure and gene family beta-diversity is muted in comparison to taxonomic beta-diversity (Figure 2A), possibly as a result of the increased overall similarity in gene family abundances across samples of the same community type (ρ = 0.99 – 1.00, P < 2.2×10⁻¹⁶; Figure 4B). Despite this increased similarity between library preparation methods for gene family abundances, the beta-diversity of technical replicates did vary across library preparation methods for coral (H = 11.43, P = 2.21 x 10⁻⁰²), fecal (H = 9.03, P = 6.02 x 10⁻⁰²), mock (H = 12.1, P = 1.66 x 10⁻⁰²), and soil (H = 11.5, P = 2.15 x 10⁻⁰²) samples.

Figure 4. Microbial functional diversity varies by library preparation method and input concentration.

A) PCA ordinations of gene family beta-diversity for each sample type. B) Correlation heat maps of gene family abundances generated from each library type. Row and column side plots indicated the library preparation methodology and the input concentration. Boxplots of dissimilarities (Bray Curtis) between C) technical replicates (1ng inputs), and D) different input concentrations. Letters indicate significant differences, P < 0.05.

However, significant differences were not detected between individual library preparation methods (Figure 4C). The robustness of metagenome beta-diversity to input concentration differed across library preparation methods for coral (H = 24.33, P = 6.86 x 10⁻⁰⁵), fecal (H = 24.53, P = 6.26 x 10⁻⁰⁵), mock (H = 25.65, P = 3.72 x 10⁻⁰⁵), and soil (H = 13.83, P = 7.85 x 10⁻⁰³) samples (Figure 4D). Notably, the plexWell™96 method had elevated variability compared to the other library prep methods in coral, fecal, and mock community samples. In soil, these patterns were mitigated (Figure 4D). Overall, these data demonstrate that, similar to the taxonomic profiles, library preparation methods affect gene diversity profile predictions.

DISCUSSION

Taxonomic and functional profiles predictions are similar across methodology

Although the Nextera^® kits are widely used and considered the ‘gold standard’ for metagenomic sample preparation, their cost can limit researchers from conducting expansive project aims. As applications for metagenomic sequencing continue to increase, researchers are left with the difficult task of balancing the need for high-quality data with the cost of its generation. The development of new protocols that modify the standard Nextera^® kit protocol as well as several new economical library preparation kits have the potential to dramatically alter the field by expanding the accessibility of shotgun metagenomics. However, the quality of libraries prepared using more economical methods varies substantially (29). While prior studies have demonstrated that different library preparation procedures can affect metagenome characteristics (30–33), these studies did not evaluate contemporary procedures nor did they consider the sensitivity of the approaches to different metagenome sample types. Here we demonstrate that library quality as well as taxonomic and functional profiles vary as a function of environmental community type and biomass. Our findings suggest that while researchers need to be aware of differences between kits, overall taxonomic and functional profiles produced by these kits are similar.

Several investigations have identified key differences in library characteristics across metagenomic library preparation procedures, often by incorporating multiple study designs. These variations can result in substantial changes in the quality of the metagenomic library and are important considerations in preparation method selection. For example, Baym et al. demonstrated that a custom Nextera^® XT protocol yielded a substantially reduced insert fragment length (20). Smaller fragments have higher proportions of adapter contamination in reads, while fragments that are too large may be preferentially lost during the Illumina cluster generation process (34). We observed significant effects of community type, library preparation procedure, and input DNA concentration on fragment size. In our hands, the Nextera^® Flex protocols generated the largest library insert sizes, while the QIAseq FX and plexWell™96 procedures consistently produced the smallest. However, the Nextera^® procedures also produced libraries with the lowest average GC content when compared to the other procedures examined. This reduced representation of GC could impact the representation of genes with high GC content and skew both taxonomic and functional profiles (29, 35). The interacting effects of library preparation procedure, community type, and DNA input on GC content further indicates that specific library preparation procedures may have distinct insertion site biases.

Comparing library characteristics across environmental sample types, samples with low relative diversity (i.e., coral) had both a higher percentage of duplicate reads and a high number of reads filtered and removed from the resulting libraries regardless of input concentration. This high level of filtering is likely due to the extreme levels of host DNA contaminants relative to the other sample types, and additionally may point to the larger issue of host sequence contamination, regardless of library preparation method, in similar research studies. However, coral samples also had similar levels of precision across kit types, with the exception of lower precision with QIASeq FX, demonstrating that different kit types may still be viable options in other low complexity study systems.

Samples with moderate (fecal) and high (soil) microbial diversity had much lower respective average percentage of sequencing reads filtered than coral samples, but with a higher average GC content across libraries than coral samples. Fecal sample libraries had the highest variability in precision between kit types, with the exception of Nextera^® Flex Reduced, likely due to the more complex community composition. However, our study also had a relatively low average sequencing depth across all samples due to sample number and financial constraints; the high intra-community variability in precision that we observed may be resolved with a higher sequencing coverage (37). It is also possible that longer insert fragment sizes introduce greater variability due to lower base quality in produced community composition regardless of sample type (38), though this was not the case for coral, soil, or mock libraries with the longest fragment sizes.

In successfully recapitulating the taxonomic profiles of a mock microbial community, all library preparation methods performed similarly overall, however, variation in taxonomic profiles for the environmental sample types showed subtle differences between methods. While higher levels of intra-community variation per method could again be due to low sequencing depth, our results of higher variation for the coral sample types with lower relative diversity are consistent with previous findings that library coverage is increased for highly complex microbial communities. Furthermore, while it may appear that all preparation methods perform poorly in both taxonomic and functional resolution for low (coral) and high (soil) diversity sample types, it must be noted that these profiles may only be as complete as the reference databases used for assignment, and it is well known that these databases are preferentially curated with human microbiome sequences and studies in mind (39).

Financial and opportunity costs of metagenomic preparation methods differ

Decreasing the costs of kits and reagents associated with library preparation improves access to metagenomic approaches. The Nextera^® DNA Flex Full preparation actualized cost remains the most expensive of the five methods tested, with the NEXTFLEX Rapid XP and QIASeq FX in the median relative expense range, and Nextera^® Flex Reduced prep and plexWell™96 as the most economical choices for metagenomic library generation. However, due to the noted effect of specificity of environmental sample type on performance of preparation method, neither the most economical choice nor the most expensive may necessarily suit every study or generate the highest quality libraries. Due to the effects of preparation procedure, community type, and DNA input on fragment size and both taxonomic and functional profiles of metagenomic samples, comparing communities across multiple study designs may require additional covariates in statistical design. For future studies we recommend incorporating library preparation technique as a potential covariate in statistical design to account for these known differences and potential biases.

Conclusion

Collectively, these findings demonstrate that no single metagenomic library preparation approach performed the best across all community types and conditions evaluated. Rather, the performance of approaches varied as function of sample and the amount of input DNA. Consequently, researchers should consider these variables when selecting library preparation approaches, especially when attempting to optimize data quality, accuracy, and precision. To aid in this effort, we provide Figure 5 as a reference guide to aid in choosing preparation methods with cost and performance in mind. Our hope is that this information helps improve the accessibility and utility of metagenomic investigations. Further study is needed to determine what community properties (e.g., GC content, taxonomic diversity, etc.) dictate these differences in library procedure performance in order to generate more generalizable guidance of procedure selection. That said, our results show that the different approaches generally produced relatively consistent taxonomic and gene family diversity profiles, which indicates that selecting approaches based on cost and ease of implementation may be appropriate for some studies (namely those in which the loss of accuracy and precision is tolerable). However, we recommend careful consideration of community type and the amount of input DNA when selecting a metagenomic library preparation procedure to ensure optimal performance.

Figure 5. Library preparation summary and cost metrics reference guide.

Hands-on time refers to active time necessary for essential benchwork tasks. Fragment size categories are relative to other kit fragment sizes for each sample type. Recapitulation of mock community refers to correlation coefficient of given mock community and community produced by each kit. Precision refers to the level of variability in taxonomic community composition between 1.0ng DNA input technical replicates for each sample type.

MATERIALS AND METHODS

Genomic DNA extraction

Prior to metagenomic library construction, genomic DNA was extracted from environmental samples originating from: soil from the North American Project to Evaluate Soil Health Measurements (40), coral (Acropora hyacinthus), and mammalian feces (Mus musculus; C57BL/6) using methods outlined below. In addition to environmental samples, we used the ZymoBIOMICS™ Microbial Community DNA Standard to more efficiently assess bias associated with library preparation methods on a standard mock community.

For coral slurry, coral nubbins preserved in RNA/DNA Shield (ZymoBIOMICS™) were vortexed in 15ml tubes with a combination of ceramic and garnet bead lysing matrix at ~2500 RPM for 25 minutes. DNA was extracted from 300μl of the resulting coral slurry using ZymoBIOMICS™ DNA/RNA Miniprep (Zymo Research Corp., Irvine, CA, USA) following an additional 2-step enzyme incubation to increase recovery of bacterial DNA: 1) addition of 30μl chicken egg-white lysozyme (10mg/ml, Novagen^®), 1.8μL mutanolysin (50KU/ml from Streptomyces globisporus ATCC 21553, Sigma-Aldrich), 1.8μL lysostaphin (4KU/ml from Staphylococcus staphylolyticus, Sigma-Aldrich) and incubation at 37°C for 1 hr and 2) 1 hour incubation at 50°C following addition of 15μl Proteinase K (20 mg/ml, Thermo Scientific™) and 30μl Proteinase K digestion buffer (0.1M NaCl, 10mM Tris pH 9.0, 1mM EDTA, 0.5% SDS, nuclease-free water). Following digestion, one volume of kit-specific DNA/RNA Lysis Buffer was added in order to proceed with the manufacturer’s recommended extraction protocol.

For soil, the sample was taken on 2/27/2019 at the Virginia Tech Eastern Shore Agricultural Research and Extension Center. Samples were collected as 12 composite knife slices of soil to a depth of 15 cm, and each of the 12 slices were passed through an 8mm filter. Detailed sampling methods can be reviewed in Norris et al. 2020 (40). Following collection, 0.25g aliquots of soil were stored at −80°C after overnight shipment from the collection site. Soil aliquots were then extracted following the Earth Microbiome Protocol (41) using a KingFisher™ Flex (Thermo Fisher^®).

For mouse feces, DNA was isolated from a single fecal pellet using the DNeasy PowerSoil isolation kit (QIAGEN^®) following the manufacturer’s instructions. An additional 10-minute incubation at 65°C directly before bead beating was added to enhance microbial cell lysis. The samples were then homogenized using a Vortex-Genie 2 and vortex adapter (QIAGEN^®) at the highest setting for 10 minutes.

Metagenomic library preparation and sequencing

Environmental DNA samples were prepared for metagenomic sequencing following manufacturers’ protocol using four commercially available kits: 1) Illumina Nextera^® DNA Flex Library Kit, 2) QIAGEN^® QIAseq FX DNA Library Kit, 3) Perkin Elmer NEXTFLEX^® Rapid DNA-Seq Kit 2.0, and 4) seqWell plexWell™ 96. In addition, we included a fifth preparation method using the modified “reduced” protocol established by Baym et al. to increase the number of libraries the Nextera^® DNA Flex could generate (20). Genomic DNA was quantified using a Qubit 1X dsDNA HS Assay Kit for soil, fecal, and coral communities. Mock community DNA concentration was not quantified as ZymoBIOMICS™ manufacturer information provided a known concentration of 100ng/μl. Following quantification, all samples prepared using Nextera^®Flex Full, QIASeq FX, and NEXTFLEX^® Rapid XP were diluted with water to 0.2ng/μl. To determine how DNA input affected library generation, each standardized DNA concentration was then added to obtain respective 0.5ng, 1.0ng, and 5.0ng inputs. Samples prepared using plexWell™96 were diluted to 0.25ng/μl with water and appropriate additive volumes made to obtain 0.5ng and 1.0ng input concentrations. For samples with 5.0ng inputs, samples were diluted to 1.25ng/μl and then 4μl of sample was used to obtain 5.0ng input concentration. For the Nextera^®Flex Reduced reaction, all samples were diluted to 5.0ng/μl with nuclease-free water. A 1μl aliqout of this dilution was used for 5.0ng input libraries. For 0.5ng and 1.0ng input libraries, sufficient water was added to the 1μl dilution to bring respective concentrations to 0.5ng/μl and 1.0ng/μl.

Library insert size was assessed for the Nextera^® DNA Flex (full and reduced) and plexWell™96 methods using the Agilent TapeStation 4200 high sensitivity D5000 DNA ScreenTape. Insert size for the QIAseq FX and NEXTFLEX^® Rapid XP methods was quantified using the Agilent Bioanalyzer 2100 high-sensitivity DNA chip as these libraries are more prone to have adapter dimers which are poorly resolved using the TapeStation. Library concentration was assessed with the Qubit 1X high-sensitivity dsDNA quantification kit (ThermoFisher). Resulting libraries were diluted, pooled, and sequenced for paired-end reads of 150bp on Illumina HiSeq3000.

Microbial community gene family abundance and taxonomic diversity

Quality filtering, adaptor removal, and host read filtering were performed using shotcleaner v0.1 (42) with default parameters. For mouse fecal samples, host reads were removed by aligning to the mouse reference genome (GRCm38). A similar procedure was used for coral samples except that these reads were filtered against a concatenated version of the coral (Acropora millepora, Genbank: QTZP00000000.1 (43)) and symbiont (Symbiodiniaceae sp. clade A MAC-Cass KB8 (Tax ID: 671378, UniProt) genomes. Quality controlled sequence reads were input into HUMAnN 2.0 (44) for taxonomic and functional classification using the UniRef90 database and default parameters. HUMAnN 2.0 outputs for each community type were combined and renormalized to counts per million using HUMAnN 2.0 utility scripts before downstream analysis. High-quality reads were also taxonomically classified using Kraken2 v2.0.8-beta (45, 46) and a custom reference database that included sequences from all human, mouse (GRCm38), UniVec Core, bacteria, archaea, virus, fungi, and protozoa in the NCBI RefSeq database (accessed October 8, 2019) as well as the Symbiodinacaea sp. clade A MAC-Cass KB8 and A. millepora genomes.

Statistical Analyses

Independent linear models (R::stats::lm) were used to determine how community type, library preparation method, and input DNA concentration affect variance of the resulting library characteristics including: the number of reads generated, median fragment size, library concentration, library molarity, mean read length, read duplication rate, mean read GC content, and total reads filtered and removed. Since we reasoned it was likely that interactions between the predictors exist, we employed a model selection procedure to identify the most parsimonious model for each characteristic examined. For each characteristic we built a set of models of increasing complexity: 1) a reduced model with only additive effects (eq. 1), 2) a model with interaction terms for community type, library preparation procedure, and DNA input concentration (eq. 2). We then used Akaike information criterion (AIC) to select the most parsimonious model and analysis of variance (ANOVA) to determine the significance of each term in the selected model. Because sequencing libraries produced from distinct samples are pooled as part of the plexWell™ library preparation protocol, a single value is available for median fragment size, sequence library concentration, sequence library molarity for this method. The similarity between taxonomic profiles generated by library preparation methods and the known taxonomic composition of the ZymoBIOMICS™ Microbial Community DNA Standard was assessed by Pearson’s correlation test (R::stats::cor.test). To measure the variation in taxonomic profiles generated by different library preparation methods across soil, coral, and fecal communities, we calculated the Pearson’s correlation coefficient of generated taxonomic abundance profiles for each pair of samples. This analysis was conducted using Kraken2, a sensitive read binning tool, as MetaPhlAn2, a marker-gene-based abundance estimation tool to eliminate the possibility that mammalian biases in marker gene databases would skew results in environmental samples. We accounted for the effects of multiple correlation tests using false discovery rate (R::stats::p.adjust, method = fdr).

The additive and interactive statistical effects of library preparation and DNA input concentration on microbiome composition, as measured by the Bray-Curtis dissimilarity metric, were evaluated using PERMANOVA analysis (R::vegan::adonis, permutations = 5000, method= bray) and visualized using an ordination of principal components analysis for each community. Differences in the Bray-Curtis dissimilarity of taxonomic abundance profiles within and across library preparation methods were measured using Kruskal-Wallis tests (R::stats::kruskal.test) with a post-hoc pairwise-Wilcoxon-test (R::stats::pairwise.wilcox.test). A holm correction was used to control Wilcoxon-test family-wise error rates.

Shannon entropy and gene richness were calculated for HUMAnN 2.0 gene abundance profiles using R and vegan. Linear regression to quantified associations between gene level alpha diversity and library preparation and input DNA concentration for each community type. Associations with gene level Bray-Curtis dissimilarity, preparation method and DNA input were quantified using PERMANOVA (R::vegan::adonis, permutations = 5000, method = bray). Differences in gene abundances and metagenomic dissimilarity were quantified as with taxonomy.

ACKNOWLEDGEMENTS

Research reported in this publication was supported by the National Science Foundation under Grant No. OCE-1933165 and BIO-2025457, as well as the National Institute of Allergy and Infectious Diseases under award number R21AI135641 and the National Institute of Environmental Health Sciences under award number R01ES030226, and a contract to the Center for Genome Research and Biocomputing from the Soil Health Institute. Library preparation kits used in this study were generously donated by PerkinElmer, Inc. and soil samples were provided by Elizabeth Rieke (Soil Health Institute).

REFERENCES

1.↵
Paszkiewicz K, Studholme DJ. 2010. De novo assembly of short sequence reads. Brief Bioinform 11:457–472.
OpenUrl CrossRef PubMed
2.
Brumfield KD, Huq A, Colwell RR, Olds JL, Leddy MB. 2020. Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data. PLoS ONE 15.
3.
Yoon SS, Kim E-K, Lee W-J. 2015. Functional genomic and metagenomic approaches to understanding gut microbiota–animal mutualism. Curr Opin Microbiol 24:38–46.
OpenUrl CrossRef PubMed
4.↵
Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. 2017. Shotgun metagenomics, from sampling to analysis. 9. Nat Biotechnol 35:833–844.
OpenUrl CrossRef PubMed
5.↵
Laudadio I, Fulci V, Palone F, Stronati L, Cucchiara S, Carissimi C. 2018. Quantitative Assessment of Shotgun Metagenomics and 16S rDNA Amplicon Sequencing in the Study of Human Gut Microbiome. Omics J Integr Biol 22:248–254.
OpenUrl
6.
Jovel J, Patterson J, Wang W, Hotte N, O’Keefe S, Mitchel T, Perry T, Kao D, Mason AL, Madsen KL, Wong GK-S. 2016. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics. Front Microbiol 7.
7.
Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, Creasy HH, Earl AM, FitzGerald MG, Fulton RS, Giglio MG, Hallsworth-Pepin K, Lobos EA, Madupu R, Magrini V, Martin JC, Mitreva M, Muzny DM, Sodergren EJ, Versalovic J, Wollam AM, Worley KC, Wortman JR, Young SK, Zeng Q, Aagaard KM, Abolude OO, Allen-Vercoe E, Alm EJ, Alvarado L, Andersen GL, Anderson S, Appelbaum E, Arachchi HM, Armitage G, Arze CA, Ayvaz T, Baker CC, Begg L, Belachew T, Bhonagiri V, Bihan M, Blaser MJ, Bloom T, Bonazzi V, Paul Brooks J, Buck GA, Buhay CJ, Busam DA, Campbell JL, Canon SR, Cantarel BL, Chain PSG, Chen I-MA, Chen L, Chhibba S, Chu K, Ciulla DM, Clemente JC, Clifton SW, Conlan S, Crabtree J, Cutting MA, Davidovics NJ, Davis CC, DeSantis TZ, Deal C, Delehaunty KD, Dewhirst FE, Deych E, Ding Y, Dooling DJ, Dugan SP, Michael Dunne W, Scott Durkin A, Edgar RC, Erlich RL, Farmer CN, Farrell RM, Faust K, Feldgarden M, Felix VM, Fisher S, Fodor AA, Forney LJ, Foster L, Di Francesco V, Friedman J, Friedrich DC, Fronick CC, Fulton LL, Gao H, Garcia N, Giannoukos G, Giblin C, Giovanni MY, Goldberg JM, Goll J, Gonzalez A, Griggs A, Gujja S, Kinder Haake S, Haas BJ, Hamilton HA, Harris EL, Hepburn TA, Herter B, Hoffmann DE, Holder ME, Howarth C, Huang KH, Huse SM, Izard J, Jansson JK, Jiang H, Jordan C, Joshi V, Katancik JA, Keitel WA, Kelley ST, Kells C, King NB, Knights D, Kong HH, Koren O, Koren S, Kota KC, Kovar CL, Kyrpides NC, La Rosa PS, Lee SL, Lemon KP, Lennon N, Lewis CM, Lewis L, Ley RE, Li K, Liolios K, Liu B, Liu Y, Lo C-C, Lozupone CA, Dwayne Lunsford R, Madden T, Mahurkar AA, Mannon PJ, Mardis ER, Markowitz VM, Mavromatis K, McCorrison JM, McDonald D, McEwen J, McGuire AL, McInnes P, Mehta T, Mihindukulasuriya KA, Miller JR, Minx PJ, Newsham I, Nusbaum C, O’Laughlin M, Orvis J, Pagani I, Palaniappan K, Patel SM, Pearson M, Peterson J, Podar M, Pohl C, Pollard KS, Pop M, Priest ME, Proctor LM, Qin X, Raes J, Ravel J, Reid JG, Rho M, Rhodes R, Riehle KP, Rivera MC, Rodriguez-Mueller B, Rogers Y-H, Ross MC, Russ C, Sanka RK, Sankar P, Fah Sathirapongsasuti J, Schloss JA, Schloss PD, Schmidt TM, Scholz M, Schriml L, Schubert AM, Segata N, Segre JA, Shannon WD, Sharp RR, Sharpton TJ, Shenoy N, Sheth NU, Simone GA, Singh I, Smillie CS, Sobel JD, Sommer DD, Spicer P, Sutton GG, Sykes SM, Tabbaa DG, Thiagarajan M, Tomlinson CM, Torralba M, Treangen TJ, Truty RM, Vishnivetskaya TA, Walker J, Wang L, Wang Z, Ward DV, Warren W, Watson MA, Wellington C, Wetterstrand KA, White JR, Wilczek-Boney K, Wu Y, Wylie KM, Wylie T, Yandava C, Ye L, Ye Y, Yooseph S, Youmans BP, Zhang L, Zhou Y, Zhu Y, Zoloth L, Zucker JD, Birren BW, Gibbs RA, Highlander SK, Methé BA, Nelson KE, Petrosino JF, Weinstock GM, Wilson RK, White O, The Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. 7402. Nature 486:207–214.
OpenUrl CrossRef PubMed Web of Science
8.↵
Armour CR, Nayfach S, Pollard KS, Sharpton TJ. 2019. A Metagenomic Meta-analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome. mSystems 4.
9.↵
Guo J, Li J, Chen H, Bond PL, Yuan Z. 2017. Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements. Water Res 123:468–478.
OpenUrl CrossRef
10.
Jadeja NB, Purohit HJ, Kapley A. 2019. Decoding microbial community intelligence through metagenomics for efficient wastewater treatment. Funct Integr Genomics 19:839–851.
OpenUrl
11.
Li A-D, Li L-G, Zhang T. 2015. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants. Front Microbiol 6.
12.↵
Wang Z, Zhang X-X, Huang K, Miao Y, Shi P, Liu B, Long C, Li A. 2013. Metagenomic Profiling of Antibiotic Resistance Genes and Mobile Genetic Elements in a Tannery Wastewater Treatment Plant. PLOS ONE 8:e76079.
OpenUrl CrossRef PubMed
13.↵
Fierer N, Lauber CL, Ramirez KS, Zaneveld J, Bradford MA, Knight R. 2012. Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients. 5. ISME J 6:1007–1017.
OpenUrl CrossRef PubMed Web of Science
14.
Blaya J, Marhuenda FC, Pascual JA, Ros M. 2016. Microbiota Characterization of Compost Using Omics Approaches Opens New Perspectives for Phytophthora Root Rot Control. PLOS ONE 11:e0158048.
OpenUrl CrossRef
15.↵
Lutz S, Thuerig B, Oberhaensli T, Mayerhofer J, Fuchs JG, Widmer F, Freimoser FM, Ahrens CH. 2020. Harnessing the Microbiomes of Suppressive Composts for Plant Protection: From Metagenomes to Beneficial Microorganisms and Reliable Diagnostics. Front Microbiol 11.
16.↵
Blockley A, Elliott DR, Roberts AP, Sweet M. 2017. Symbiotic Microbes from Marine Invertebrates: Driving a New Era of Natural Product Drug Discovery. 4. Diversity 9:49.
OpenUrl
17.
Trindade M, van Zyl LJ, Navarro-Fernández J, Abd Elrazak A. 2015. Targeted metagenomics as a tool to tap into marine natural product diversity for the discovery and production of drug candidates. Front Microbiol 6.
18.↵
Kennedy J, Marchesi JR, Dobson ADW. 2007. Metagenomic approaches to exploit the biotechnological potential of the microbial consortia of marine sponges. Appl Microbiol Biotechnol 75:11–20.
OpenUrl CrossRef PubMed Web of Science
19.↵
Sato MP, Ogura Y, Nakamura K, Nishida R, Gotoh Y, Hayashi M, Hisatsune J, Sugai M, Takehiko I, Hayashi T. 2019. Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes. DNA Res 26:391–398.
OpenUrl CrossRef
20.↵
Baym M, Kryazhimskiy S, Lieberman TD, Chung H, Desai MM, Kishony R. 2015. Inexpensive Multiplexed Library Preparation for Megabase-Sized Genomes. PLOS ONE 10:e0128036.
OpenUrl CrossRef PubMed
21.↵
Gaio D, To J, Liu M, Monahan L, Anantanawat K, Darling AE. 2019. Hackflex: low cost Illumina sequencing library construction for high sample counts. bioRxiv 779215.
22.
Couto N, Schuele L, Raangs EC, Machado MP, Mendes CI, Jesus TF, Chlebowicz M, Rosema S, Ramirez M, Carriço JA, Autenrieth IB, Friedrich AW, Peter S, Rossen JW. 2018. Critical steps in clinical shotgun metagenomics for the concomitant detection and typing of microbial pathogens. 1. Sci Rep 8:13767.
OpenUrl CrossRef
23.
Lloyd-Price J, Mahurkar A, Rahnavard G, Crabtree J, Orvis J, Hall AB, Brady A, Creasy HH, McCracken C, Giglio MG, McDonald D, Franzosa EA, Knight R, White O, Huttenhower C. 2017. Strains, functions and dynamics in the expanded Human Microbiome Project. 7674. Nature 550:61–66.
OpenUrl CrossRef PubMed
24.
Pereira-Marques J, Hout A, Ferreira RM, Weber M, Pinto-Ribeiro I, van Doorn L-J, Knetsch CW, Figueiredo C. 2019. Impact of Host DNA and Sequencing Depth on the Taxonomic Resolution of Whole Metagenome Sequencing for Microbiome Analysis. Front Microbiol 10.
25.↵
Marine R, Polson SW, Ravel J, Hatfull G, Russell D, Sullivan M, Syed F, Dumas M, Wommack KE. 2011. Evaluation of a Transposase Protocol for Rapid Generation of Shotgun High-Throughput Sequencing Libraries from Nanogram Quantities of DNA. Appl Environ Microbiol 77:8071–8079.
OpenUrl Abstract/FREE Full Text
26.
Solonenko SA, Ignacio-Espinoza JC, Alberti A, Cruaud C, Hallam S, Konstantinidis K, Tyson G, Wincker P, Sullivan MB. 2013. Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics 14:320.
OpenUrl CrossRef PubMed
27.
Chafee M, Maignien L, Simmons SL. 2015. The effects of variable sample biomass on comparative metagenomics. Environ Microbiol 17:2239–2253.
OpenUrl CrossRef
28.↵
Duhaime MB, Deng L, Poulos BT, Sullivan MB. 2012. Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method. Environ Microbiol 14:2526–2537.
OpenUrl CrossRef PubMed Web of Science
29.↵
Jones MB, Highlander SK, Anderson EL, Li W, Dayrit M, Klitgord N, Fabani MM, Seguritan V, Green J, Pride DT, Yooseph S, Biggs W, Nelson KE, Venter JC. 2015. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc Natl Acad Sci 112:14024–14029.
OpenUrl Abstract/FREE Full Text
30.↵
Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. 2015. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 26:27663.
OpenUrl CrossRef PubMed
31.
Simon C, Daniel R. 2011. Metagenomic Analyses: Past and Future Trends. Appl Environ Microbiol 77:1153–1161.
OpenUrl Abstract/FREE Full Text
32.
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R. 2015. Modeling and Analysis of Compositional Data. John Wiley & Sons.
33.↵
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. 2017. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 8.
34.↵
Kircher M, Heyn P, Kelso J. 2011. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics 12:382.
OpenUrl CrossRef PubMed
35.↵
Browne PD, Nielsen TK, Kot W, Aggerholm A, Gilbert MTP, Puetz L, Rasmussen M, Zervas A, Hansen LH. 2020. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. GigaScience 9.
36.
Huptas C, Scherer S, Wenning M. 2016. Optimized Illumina PCR-free library preparation for bacterial whole genome sequencing and analysis of factors influencing de novo assembly. BMC Res Notes 9:269.
OpenUrl CrossRef PubMed
37.↵
Rodriguez-R LM, Konstantinidis KT. 2014. Estimating coverage in metagenomic data sets and why it matters. 11. ISME J 8:2349–2351.
OpenUrl CrossRef PubMed
38.↵
Tan G, Opitz L, Schlapbach R, Rehrauer H. 2019. Long fragments achieve lower base quality in Illumina paired-end sequencing. 1. Sci Rep 9:2856.
OpenUrl CrossRef
39.↵
Lindgreen S, Adair KL, Gardner PP. 2016. An evaluation of the accuracy and speed of metagenome analysis tools. 1. Sci Rep 6:19233.
OpenUrl CrossRef PubMed
40.↵
Norris CE, Bean GM, Cappellazzi SB, Cope M, Greub KLH, Liptzin D, Rieke EL, Tracy PW, Morgan CLS, Honeycutt CW. 2020. Introducing the North American project to evaluate soil health measurements. Agron J 112:3195–3215.
OpenUrl
41.↵
Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R, The Earth Microbiome Project Consortium, Rivera JLA, Al-Moosawi L, Alverdy J, Amato KR, Andras J, Angenent LT, Antonopoulos DA, Apprill A, Armitage D, Ballantine K, Bárta J, Baum JK, Berry A, Bhatnagar A, Bhatnagar M, Biddle JF, Bittner L, Boldgiv B, Bottos E, Boyer DM, Braun J, Brazelton W, Brearley FQ, Campbell AH, Caporaso JG, Cardona C, Carroll J, Cary SC, Casper BB, Charles TC, Chu H, Claar DC, Clark RG, Clayton JB, Clemente JC, Cochran A, Coleman ML, Collins G, Colwell RR, Contreras M, Crary BB, Creer S, Cristol DA, Crump BC, Cui D, Daly SE, Davalos L, Dawson RD, Defazio J, Delsuc F, Dionisi HM, Dominguez-Bello MG, Dowell R, Dubinsky EA, Dunn PO, Ercolini D, Espinoza RE, Ezenwa V, Fenner N, Findlay HS, Fleming ID, Fogliano V, Forsman A, Freeman C, Friedman ES, Galindo G, Garcia L, Garcia-Amado MA, Garshelis D, Gasser RB, Gerdts G, Gibson MK, Gifford I, Gill RT, Giray T, Gittel A, Golyshin P, Gong D, Grossart H-P, Guyton K, Haig S-J, Hale V, Hall RS, Hallam SJ, Handley KM, Hasan NA, Haydon SR, Hickman JE, Hidalgo G, Hofmockel KS, Hooker J, Hulth S, Hultman J, Hyde E, Ibáñez-Álamo JD, Jastrow JD, Jex AR, Johnson LS, Johnston ER, Joseph S, Jurburg SD, Jurelevicius D, Karlsson A, Karlsson R, Kauppinen S, Kellogg CTE, Kennedy SJ, Kerkhof LJ, King GM, Kling GW, Koehler AV, Krezalek M, Kueneman J, Lamendella R, Landon EM, Lane-deGraaf K, LaRoche J, Larsen P, Laverock B, Lax S, Lentino M, Levin II, Liancourt P, Liang W, Linz AM, Lipson DA, Liu Y, Lladser ME, Lozada M, Spirito CM, MacCormack WP, MacRae-Crerar A, Magris M, Martín-Platero AM, Martín-Vivaldi M, Martínez LM, Martínez-Bueno M, Marzinelli EM, Mason OU, Mayer GD, McDevitt-Irwin JM, McDonald JE, McGuire KL, McMahon KD, McMinds R, Medina M, Mendelson JR, Metcalf JL, Meyer F, Michelangeli F, Miller K, Mills DA, Minich J, Mocali S, Moitinho-Silva L, Moore A, Morgan-Kiss RM, Munroe P, Myrold D, Neufeld JD, Ni Y, Nicol GW, Nielsen S, Nissimov JI, Niu K, Nolan MJ, Noyce K, O’Brien SL, Okamoto N, Orlando L, Castellano YO, Osuolale O, Oswald W, Parnell J, Peralta-Sánchez JM, Petraitis P, Pfister C, Pilon-Smits E, Piombino P, Pointing SB, Pollock FJ, Potter C, Prithiviraj B, Quince C, Rani A, Ranjan R, Rao S, Rees AP, Richardson M, Riebesell U, Robinson C, Rockne KJ, Rodriguezl SM, Rohwer F, Roundstone W, Safran RJ, Sangwan N, Sanz V, Schrenk M, Schrenzel MD, Scott NM, Seger RL, Seguin-Orlando A, Seldin L, Seyler LM, Shakhsheer B, Sheets GM, Shen C, Shi Y, Shin H, Shogan BD, Shutler D, Siegel J, Simmons S, Sjöling S, Smith DP, Soler JJ, Sperling M, Steinberg PD, Stephens B, Stevens MA, Taghavi S, Tai V, Tait K, Tan CL, Tas, N, Taylor DL, Thomas T, Timling I, Turner BL, Urich T, Ursell LK, van der Lelie D, Van Treuren W, van Zwieten L, Vargas-Robles D, Thurber RV, Vitaglione P, Walker DA, Walters WA, Wang S, Wang T, Weaver T, Webster NS, Wehrle B, Weisenhorn P, Weiss S, Werner JJ, West K, Whitehead A, Whitehead SR, Whittingham LA, Willerslev E, Williams AE, Wood SA, Woodhams DC, Yang Y, Zaneveld J, Zarraonaindia I, Zhang Q, Zhao H. 2017. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551:457–463.
OpenUrl CrossRef PubMed
42.↵
Sharpton T. 2017. sharpton/shotcleaner. Perl6.
43.↵
Ying H, Hayward DC, Cooke I, Wang W, Moya A, Siemering KR, Sprungala S, Ball EE, Forêt S, Miller DJ. 2019. The Whole-Genome Sequence of the Coral Acropora millepora. Genome Biol Evol 11:1374–1379.
OpenUrl CrossRef PubMed
44.↵
Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N, Huttenhower C. 2018. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15:962–968.
OpenUrl CrossRef PubMed
45.↵
Wood DE, Salzberg SL. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46.
OpenUrl CrossRef PubMed
46.↵
Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20:257.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted April 13, 2021.

Download PDF

Citation Tools

Subject Area

Microbiology

Subject Areas

All Articles

Animal Behavior and Cognition (5214)
Biochemistry (11745)
Bioengineering (8751)
Bioinformatics (29195)
Biophysics (14971)
Cancer Biology (12095)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14179)
Epidemiology (2067)
Evolutionary Biology (18306)
Genetics (12245)
Genomics (16802)
Immunology (11867)
Microbiology (28083)
Molecular Biology (11592)
Neuroscience (60965)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2885)
Systems Biology (7339)
Zoology (1651)

[1] 1.↵
Paszkiewicz K, Studholme DJ. 2010. De novo assembly of short sequence reads. Brief Bioinform 11:457–472.
OpenUrl CrossRef PubMed

[2] 2.
Brumfield KD, Huq A, Colwell RR, Olds JL, Leddy MB. 2020. Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data. PLoS ONE 15.

[3] 3.
Yoon SS, Kim E-K, Lee W-J. 2015. Functional genomic and metagenomic approaches to understanding gut microbiota–animal mutualism. Curr Opin Microbiol 24:38–46.
OpenUrl CrossRef PubMed

[4] 4.↵
Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. 2017. Shotgun metagenomics, from sampling to analysis. 9. Nat Biotechnol 35:833–844.
OpenUrl CrossRef PubMed

[5] 5.↵
Laudadio I, Fulci V, Palone F, Stronati L, Cucchiara S, Carissimi C. 2018. Quantitative Assessment of Shotgun Metagenomics and 16S rDNA Amplicon Sequencing in the Study of Human Gut Microbiome. Omics J Integr Biol 22:248–254.
OpenUrl

[6] 6.
Jovel J, Patterson J, Wang W, Hotte N, O’Keefe S, Mitchel T, Perry T, Kao D, Mason AL, Madsen KL, Wong GK-S. 2016. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics. Front Microbiol 7.

[8] 8.↵
Armour CR, Nayfach S, Pollard KS, Sharpton TJ. 2019. A Metagenomic Meta-analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome. mSystems 4.

[9] 9.↵
Guo J, Li J, Chen H, Bond PL, Yuan Z. 2017. Metagenomic analysis reveals wastewater treatment plants as hotspots of antibiotic resistance genes and mobile genetic elements. Water Res 123:468–478.
OpenUrl CrossRef

[10] 10.
Jadeja NB, Purohit HJ, Kapley A. 2019. Decoding microbial community intelligence through metagenomics for efficient wastewater treatment. Funct Integr Genomics 19:839–851.
OpenUrl

[11] 11.
Li A-D, Li L-G, Zhang T. 2015. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants. Front Microbiol 6.

[12] 12.↵
Wang Z, Zhang X-X, Huang K, Miao Y, Shi P, Liu B, Long C, Li A. 2013. Metagenomic Profiling of Antibiotic Resistance Genes and Mobile Genetic Elements in a Tannery Wastewater Treatment Plant. PLOS ONE 8:e76079.
OpenUrl CrossRef PubMed

[13] 13.↵
Fierer N, Lauber CL, Ramirez KS, Zaneveld J, Bradford MA, Knight R. 2012. Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients. 5. ISME J 6:1007–1017.
OpenUrl CrossRef PubMed Web of Science

[14] 14.
Blaya J, Marhuenda FC, Pascual JA, Ros M. 2016. Microbiota Characterization of Compost Using Omics Approaches Opens New Perspectives for Phytophthora Root Rot Control. PLOS ONE 11:e0158048.
OpenUrl CrossRef

[15] 15.↵
Lutz S, Thuerig B, Oberhaensli T, Mayerhofer J, Fuchs JG, Widmer F, Freimoser FM, Ahrens CH. 2020. Harnessing the Microbiomes of Suppressive Composts for Plant Protection: From Metagenomes to Beneficial Microorganisms and Reliable Diagnostics. Front Microbiol 11.

[16] 16.↵
Blockley A, Elliott DR, Roberts AP, Sweet M. 2017. Symbiotic Microbes from Marine Invertebrates: Driving a New Era of Natural Product Drug Discovery. 4. Diversity 9:49.
OpenUrl

[17] 17.
Trindade M, van Zyl LJ, Navarro-Fernández J, Abd Elrazak A. 2015. Targeted metagenomics as a tool to tap into marine natural product diversity for the discovery and production of drug candidates. Front Microbiol 6.

[18] 18.↵
Kennedy J, Marchesi JR, Dobson ADW. 2007. Metagenomic approaches to exploit the biotechnological potential of the microbial consortia of marine sponges. Appl Microbiol Biotechnol 75:11–20.
OpenUrl CrossRef PubMed Web of Science

[19] 19.↵
Sato MP, Ogura Y, Nakamura K, Nishida R, Gotoh Y, Hayashi M, Hisatsune J, Sugai M, Takehiko I, Hayashi T. 2019. Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes. DNA Res 26:391–398.
OpenUrl CrossRef

[20] 20.↵
Baym M, Kryazhimskiy S, Lieberman TD, Chung H, Desai MM, Kishony R. 2015. Inexpensive Multiplexed Library Preparation for Megabase-Sized Genomes. PLOS ONE 10:e0128036.
OpenUrl CrossRef PubMed

[21] 21.↵
Gaio D, To J, Liu M, Monahan L, Anantanawat K, Darling AE. 2019. Hackflex: low cost Illumina sequencing library construction for high sample counts. bioRxiv 779215.

[22] 22.
Couto N, Schuele L, Raangs EC, Machado MP, Mendes CI, Jesus TF, Chlebowicz M, Rosema S, Ramirez M, Carriço JA, Autenrieth IB, Friedrich AW, Peter S, Rossen JW. 2018. Critical steps in clinical shotgun metagenomics for the concomitant detection and typing of microbial pathogens. 1. Sci Rep 8:13767.
OpenUrl CrossRef

[23] 23.
Lloyd-Price J, Mahurkar A, Rahnavard G, Crabtree J, Orvis J, Hall AB, Brady A, Creasy HH, McCracken C, Giglio MG, McDonald D, Franzosa EA, Knight R, White O, Huttenhower C. 2017. Strains, functions and dynamics in the expanded Human Microbiome Project. 7674. Nature 550:61–66.
OpenUrl CrossRef PubMed

[24] 24.
Pereira-Marques J, Hout A, Ferreira RM, Weber M, Pinto-Ribeiro I, van Doorn L-J, Knetsch CW, Figueiredo C. 2019. Impact of Host DNA and Sequencing Depth on the Taxonomic Resolution of Whole Metagenome Sequencing for Microbiome Analysis. Front Microbiol 10.

[25] 25.↵
Marine R, Polson SW, Ravel J, Hatfull G, Russell D, Sullivan M, Syed F, Dumas M, Wommack KE. 2011. Evaluation of a Transposase Protocol for Rapid Generation of Shotgun High-Throughput Sequencing Libraries from Nanogram Quantities of DNA. Appl Environ Microbiol 77:8071–8079.
OpenUrl Abstract/FREE Full Text

[26] 26.
Solonenko SA, Ignacio-Espinoza JC, Alberti A, Cruaud C, Hallam S, Konstantinidis K, Tyson G, Wincker P, Sullivan MB. 2013. Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics 14:320.
OpenUrl CrossRef PubMed

[27] 27.
Chafee M, Maignien L, Simmons SL. 2015. The effects of variable sample biomass on comparative metagenomics. Environ Microbiol 17:2239–2253.
OpenUrl CrossRef

[28] 28.↵
Duhaime MB, Deng L, Poulos BT, Sullivan MB. 2012. Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method. Environ Microbiol 14:2526–2537.
OpenUrl CrossRef PubMed Web of Science

[29] 29.↵
Jones MB, Highlander SK, Anderson EL, Li W, Dayrit M, Klitgord N, Fabani MM, Seguritan V, Green J, Pride DT, Yooseph S, Biggs W, Nelson KE, Venter JC. 2015. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc Natl Acad Sci 112:14024–14029.
OpenUrl Abstract/FREE Full Text

[30] 30.↵
Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. 2015. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 26:27663.
OpenUrl CrossRef PubMed

[31] 31.
Simon C, Daniel R. 2011. Metagenomic Analyses: Past and Future Trends. Appl Environ Microbiol 77:1153–1161.
OpenUrl Abstract/FREE Full Text

[32] 32.
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R. 2015. Modeling and Analysis of Compositional Data. John Wiley & Sons.

[33] 33.↵
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. 2017. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 8.

[34] 34.↵
Kircher M, Heyn P, Kelso J. 2011. Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics 12:382.
OpenUrl CrossRef PubMed

[35] 35.↵
Browne PD, Nielsen TK, Kot W, Aggerholm A, Gilbert MTP, Puetz L, Rasmussen M, Zervas A, Hansen LH. 2020. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. GigaScience 9.

[36] 36.
Huptas C, Scherer S, Wenning M. 2016. Optimized Illumina PCR-free library preparation for bacterial whole genome sequencing and analysis of factors influencing de novo assembly. BMC Res Notes 9:269.
OpenUrl CrossRef PubMed

[37] 37.↵
Rodriguez-R LM, Konstantinidis KT. 2014. Estimating coverage in metagenomic data sets and why it matters. 11. ISME J 8:2349–2351.
OpenUrl CrossRef PubMed

[38] 38.↵
Tan G, Opitz L, Schlapbach R, Rehrauer H. 2019. Long fragments achieve lower base quality in Illumina paired-end sequencing. 1. Sci Rep 9:2856.
OpenUrl CrossRef

[39] 39.↵
Lindgreen S, Adair KL, Gardner PP. 2016. An evaluation of the accuracy and speed of metagenome analysis tools. 1. Sci Rep 6:19233.
OpenUrl CrossRef PubMed

[40] 40.↵
Norris CE, Bean GM, Cappellazzi SB, Cope M, Greub KLH, Liptzin D, Rieke EL, Tracy PW, Morgan CLS, Honeycutt CW. 2020. Introducing the North American project to evaluate soil health measurements. Agron J 112:3195–3215.
OpenUrl

[42] 42.↵
Sharpton T. 2017. sharpton/shotcleaner. Perl6.

[43] 43.↵
Ying H, Hayward DC, Cooke I, Wang W, Moya A, Siemering KR, Sprungala S, Ball EE, Forêt S, Miller DJ. 2019. The Whole-Genome Sequence of the Coral Acropora millepora. Genome Biol Evol 11:1374–1379.
OpenUrl CrossRef PubMed

[44] 44.↵
Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N, Huttenhower C. 2018. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15:962–968.
OpenUrl CrossRef PubMed

[45] 45.↵
Wood DE, Salzberg SL. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46.
OpenUrl CrossRef PubMed

[46] 46.↵
Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20:257.
OpenUrl CrossRef PubMed