ABSTRACT
Molecular profiling of complex microbial communities has become the basis for examining the relationship between the microbiome composition, structure and metabolic functions of those communities. Microbial community structure can be partially assessed with universal PCR targeting taxonomic or functional gene markers. Increasingly, shotgun metagenomic DNA sequencing is providing more quantitative insight into microbiomes. Unfortunately both amplicon-based and shotgun sequencing approaches have significant shortcomings that limit the ability to study microbiome dynamics. We present a novel, amplicon-free, hybridization-based method (CaptureSeq) for profiling complex microbial communities using probes based on the chaperonin-60 gene. This new method generates a quantitative, pan-Domain community profile with significantly less expenditure and sequencing effort than a shotgun metagenomic sequencing approach. Molecular microbial profiles were compared for antibiotic-amended soil samples using CaptureSeq, shotgun metagenomics, and amplicon-based techniques. The CaptureSeq method generated a microbial profile that provided a much greater depth and sensitivity than shotgun metagenomic sequencing while simultaneously mitigating the bias effects associated with amplicon-based methods. The resulting community profile provided quantitatively reliable information about all three Domains of life (Bacteria, Archaea, and Eukarya). The applications of CaptureSeq are globally impactful and will facilitate highly accurate studies of host-microbiome interactions for environmental, crop, animal and human health.
DNA sequencing data associated with this work has been deposited at NCBI under BioProject PRJNA406970 and SRA deposits SRX3181274-SRX3181276 and SRX3187583-SRX3187601.
INTRODUCTION
Life on Earth is classified into hierarchical taxonomic lineages that describe all living systems as having descended from a common ancestor along three evolutionary lines. Using ribosomal RNA-encoding gene sequences, Woese and Fox 1 delineated these Domains, which are now known as Bacteria, Archaea, and Eukarya 2. Most complex microbial communities exist as assemblages replete with representatives from each of these Domains, the total genomic complement of which is called a microbiome. Understanding microbial community dynamics requires tools to examine the composition of these complex ecosystems. Advancements in DNA sequencing technology have created new opportunities to simplify the profiling of microbial communities from a diverse range of environments. As new insights are gained into the diversity of microbiomes in soil, water, plant and animal-associated ecosystems, we are collectively realizing the powerful effects that microbiome composition and structure can have on how these communities function 3. To characterize the multifaceted relationships between microorganisms and their environment, it is critical to obtain a comprehensive microbial community profile that most accurately reflects its original composition and quantitative structure.
Microbiologists have increasingly embraced culture-independent methods of identification in recent decades 4. By far the most commonly employed culture independent method is PCR-based amplification of informative gene sequences. In adapting the use of PCR for amplifying a conserved region of 16S rRNA, Weller and Ward provided the first example of microbial profiling 5. More recently, Paul Hebert’s proposed DNA barcoding criteria for Eukarya have established standards for what comprises a robust target for phylogenetic profiling 6. Alternative universal gene markers for 16S 7, cpn60 8, rpoB 9, mcrA 10 and ITS 11 have been used for profiling microorganisms from bacterial, archaeal and eukaryotic Domains, however no single amplification is able to profile microbes from all three Domains simultaneously. In order to obtain phylogenetic information for microorganisms across all three Domains of life, separate target amplification and processing protocols are required12, increasing the cost and analytical complexity of accurately assessing dynamic changes in the community across Domains. Moreover, stochastic effects of primer interaction with a complex template, along with the difficulty in designing primers and amplification conditions that will equally target all members of a community13, result in an unavoidable bias in community representation both in terms of presence/absence and relative abundances13–16.
In recent years metagenomic approaches in which whole nucleic acid recovered from a sample is fragmented and sequenced using shotgun methods have become increasingly popular. This approach has a significant advantage over barcode-specific methods in that shotgun-sequencing data can overcome issues of bias and representation that are inherent in amplicon sequencing approaches, and provides the additional advantage of describing the metabolic potential of the microbial community 17–19. Sequencing of all DNA present in an environmental sample can therefore be considered somewhat of a “gold standard” for taxonomic profiling. However, this approach is not without its own limitations. For example, it can be a wasteful enterprise in terms of the phylogenetic information recovered per sequencing cost. Shotgun sequencing is also not easily able to connect the functional potential observed in the sequencing data with the exact microbe within which that functionality resides. Additionally, DNA acquired from a community of microorganisms is inherently unbalanced; there are not equal numbers of each taxon, nor do all taxa have genomes that are of equal sizes. Thus shotgun sequencing can provide a view of microbial community composition that is biased by genome size and microbial abundances. Overcoming this bias requires significant amounts of sequencing; therefore, chasing the rarity of the least abundant microbes by shotgun metagenomics sequencing carries a high financial cost 14,15,20,21. The abundances of microbes within characterized complex microbial communities range over many orders of magnitude. While shotgun sequencing efforts provide a reasonable estimate of abundance there is a significant loss in dynamic range when compared to PCR-based profiling.
The chaperonin 60 gene 8 (type I chaperonin) and its Archaeal homologue thermosome complex 22 (type II chaperonin) have been previously recognized as highly discriminating targets across all three Domains of life 23, meet standard International Barcode of Life criteria 24 and enable de novo assembly of operational taxonomic units (OTU) 25. While “universal” PCR primers are available 8,26, they are not expected to capture the pan-Domain diversity of a complex microbial community through amplification. Moreover, cpn60 amplification provides OUT abundances that do not always correlate to the true abundance of the microorganism in the sample 27. If these limitations can be overcome, there is significant opportunity to dramatically improve research assessing host-microbiome interactions in plant, human and animal settings.
Recent advances in hybridization-based DNA capture combined with high throughput sequencing (CaptureSeq), which have proven to be remarkably powerful means of enriching samples for DNA sequences of interest 28–30, led us to consider the possibility of exploiting the unique features of cpn60 to provide a pan-Domain microbial community profile without the use of universal PCR amplifications. A custom array of biotinylated RNA capture baits was designed based on the entire taxonomic composition of the chaperonin database cpnDB (www.cpndb.ca) 8 and evaluated as a tool for enriching total genomic DNA simultaneously for type I and type II chaperonin target sequences. Samples were selected that encompassed taxonomic diversity across all three Domains of life. Soil samples comprised primarily of Bacteria, manure samples with increased Archaeal diversity and a terrestrial pond sample with a larger number of Eukarya were used to compare the CaptureSeq method to standard shotgun metagenomic and amplicon-based approaches. The results indicate that CaptureSeq provides the taxonomic reach associated with shotgun metagenomic sequencing combined with the sampling depth of amplicon-based sequencing, giving an essentially complete, balanced, quantitatively accurate view of complex microbial ecosystems with reduced sequencing effort.
RESULTS
CaptureSeq generates Pan-Domain microbial community profiles
Microbial profiles were generated by CaptureSeq using samples from very different environmental ecosystems including soil, manure and a non-aerated terrestrial pond using CaptureSeq. These profiles provided a taxonomic overview of Bacteria, Archaea and Eukarya simultaneously, and identified sequencing reads from 9,361 (soil), 9,306 (manure), and 6,568 (pond) distinct taxonomic clusters (Supplemental Dataset S1). Additionally, the CaptureSeq profile facilitated inter-Domain comparisons of read abundances between taxonomic groups, since the abundances could be expressed in relation to the total pan-Domain community as opposed to reflecting only the proportions within a single Domain (Figure 1).
The soil sample microbiomes were composed primarily of Bacteria, with Proteobacteria and Actinobacteria comprising 60% and 25% of the pan-Domain community respectively. Members of the phyla Acidobacteria and Gemmatinomonadetes represented an additional 5% each of the microbiome. Total archaeal reads only accounted for 0.03-0.08% of the soil pan-Domain community, however there were still 165 archaeal taxonomic clusters identified in the soil. Eukarya represented 0.18-0.21% of the soil microbiome, with Fungi and Metazoa the most abundant taxonomic groups. While the manure samples also contained a diverse array of Bacteria, they only represented 77-80% of the microbiome, compared to >99% for all of the soil samples. CaptureSeq libraries from the manure samples contained 19-22% archaeal reads, of which the vast majority were methanogens from the Phylum Euryarchaeota. The terrestrial pond contained a much greater proportion and diversity of Eukaryotes, representing 6.7% of the sequencing reads and 361 taxonomic clusters (Supplemental Dataset S1). De novo assembly of eukaryotic sequencing reads from the terrestrial pond sample generated 11 OTU most closely related to members of the Phylum Chlorophyta (green algae). Additionally, the assembly of OTU most similar to Aenopholes sp. (mosquitoes), and three members of the Phylum Alveolata (protists), suggests that CaptureSeq was able to retrieve cpn60 DNA from higher level Eukarya. Compared to reference sequences in cpnDB, these de novo assembled OTU had nucleotide identities ranging from 59-84%, suggesting that the current probe array design and hybridization conditions were sufficiently permissive to allow capture of novel cpn60 sequences (true unknowns).
CaptureSeq provides a similar microbial community profile to shotgun metagenomic sequencing
The complex taxonomic diversity found in soil provided an opportunity to determine if CaptureSeq yields a microbial community profile that accurately reflects the composition of the community and facilitates insights into the response of the communities to perturbation. Therefore, replicate plots amended with antibiotics were compared to control (unamended) soil samples using CaptureSeq, shotgun metagenomics, or cpn60-based amplicon sequencing techniques. In this setting, the ability of CaptureSeq to achieve in-depth sampling that is a more accurate reflection of the community composition is critical to elucidate the effects of antimicrobial exposure on microbial ecosystem dynamics.
Both CaptureSeq and metagenomic techniques generated type I chaperonin sequences from all three Domains unencumbered by amplification and primer design biases. However, the number of chaperonin containing sequences represented only 0.08% of the total reads from the shotgun metagenomic library compared to an average of 16.7% (± 0.8%) for CaptureSeq and 94.8% (± 0.6%) for amplicon libraries (Supplemental Table S1). For a complex community such as soil, a greater sampling depth is required in order to make meaningful conclusions regarding microbial community composition and structure. Using a metagenomic approach requires orders of magnitude more sequencing effort to achieve a high level of community coverage and is not financially feasible for a large number of samples (Figure 2).
Examination of OTU abundance patterns revealed that the CaptureSeq and shotgun metagenomic profiled samples displayed patterns of microbial abundances that were more similar to one another and distinct from the pattern shown by the amplicon datasets (Figure 3). Moreover, of the three methods analyzed, only CaptureSeq showed a hierarchical clustering pattern that showed a difference between the antibiotic-treated and untreated soil samples (Figure 3). Similarly, when intra-technique beta diversity was assessed, only the CaptureSeq data provided measures that showed a separation of the soil samples by antibiotic treatment (Supplemental Figure S1). These results highlight the importance of profiling method on the ability to gain meaningful insights into microbiome structure and function.
Comparing alpha diversity metrics of the soil communities between the three profiling techniques suggested that both richness (Chao1) and diversity (Shannon H’) were higher when profiled using shotgun metagenomic compared to amplicon sequencing (Supplemental Figure S2). The CaptureSeq method provided alpha diversity metrics that were between those of the shotgun metagenomic shotgun method and amplicon sequencing (Supplemental Figure S2). Additionally, the alpha diversity metrics of the CaptureSeq method showed the least variability among the biological replicates of each treatment, even when libraries were down-sampled to very low levels (Supplemental Figure S2). Samples examined by cpn60 amplification and sequencing displayed the highest inter-sample variability compared to CaptureSeq and metagenomic sequencing.
CaptureSeq permits de novo assembly of OTU from taxonomic clusters
To determine if de novo assembly of OTU representing individual organisms was reliable using CaptureSeq, we selected one target microorganism from each Domain for quantification using OTU-specific qPCR. For Bacteria, we quantified Microbacterium sp. C448, which was cultured from these soil samples and has previously been shown to degrade and metabolize the sulfonamide antibiotic added to the field plots31. While the presence of this target in the soil samples was confirmed using culture methods, it was under-represented in the amplicon and shotgun metagenomic libraries when compared to the CaptureSeq profiles. Only the CaptureSeq library provided a sufficient number of target sequencing reads for de novo assembly, generating a 1,066 bp OTU that was >99% identical to the cpn60 sequence obtained from the genome of this organism 32. We also assembled OTU targets from the Domains Eukarya (type I-Phythophthora infestans) and Archaea (type II-Methanoculleus sp.). Reads that mapped to the reference chaperonin sequences for these organisms were assembled de novo into OTU and were then quantified in each soil sample using ddPCR. Quantification of Microbacterium sp. C448 showed that the bacterium was present at a low level in all soil samples of between 103 and 104 gene copies per gram of soil, and that the levels were significantly higher in the antibiotic-treated soil samples (Table 1). The archaeal OTU was quantified at levels between 495 and 527 gene copies per gram of soil. The OTU corresponding to P. infestans was present at levels below the limit of detection of ddPCR for these samples, yet was detectable by CaptureSeq (Table 1). These results confirm the potential of the CaptureSeq method to almost completely sample complex microbial communities with a limit of detection beyond the dynamic range of even very sensitive quantification methods like ddPCR.
CaptureSeq provides a quantitatively accurate view of bacterial abundance
Using a synthetic community of 20 microorganisms spiked into carrier DNA from a seed wash facilitated a quantitative examination microbial community profiles using CaptureSeq. Quantification of cpn60 DNA from the synthetic community before and after hybridization using qPCR revealed an enrichment of 3-4 orders of magnitude for cpn60-containing DNA fragments compared to 16S rRNA-encoding genes (Supplemental Figure S3). For the 5 microorganisms that were quantified, the ∼10-fold reduction in gene copy number observed between the high, medium, and low spike levels was consistent with the starting composition of the synthetic community samples (Supplemental Figure S3). Furthermore, the number of cpn60 gene copies for the microorganisms added to the seed wash DNA extract was highly reproducible within each spike level across the 1000-fold difference analyzed (Supplemental Figure S3). Across the different spiking levels, there was a linear correlation between qPCR-determined input gene copies and the number of sequencing reads observed for each of the five targets using the CaptureSeq method, providing Pearson correlation coefficients (r2) ranging from 0.995-1.000. This compared to a range of 0.532-0.878 for libraries profiled by amplicon sequencing, with more apparent distortion at the higher spike levels when targets were the most abundant (Figure 4).
While all 20 bacteria from the synthetic community were identified using both amplicon and CaptureSeq profiling techniques, only the CaptureSeq method generated profiles that accurately reflected the relative amounts of DNA spiked into the seed wash background (Figure 5 and Supplemental Table S2). In the CaptureSeq libraries, the number of mapped sequencing reads for each member of synthetic community was within one order of magnitude from the mean for each spike level. In the amplicon libraries however, the cpn60 sequences of Bifidobacterium infantis and Bifidobacterium bifidum, which feature a high G/C content, were over 10- and 100-fold lower than the mean for both the High and Medium spiked samples (Supplemental Figure S4). This improved representation of high G/C Actinobacteria by CaptureSeq was also apparent in the microbial community profiles generated for the soil samples. Compared to the CaptureSeq libraries, the cpn60 sequences of the 25 most under-represented taxonomic clusters in the amplicon libraries had very high G/C content (64-71%) and included several members of the genera Nocardioides, Marmoricola and Pseudonocardia (Supplemental Table S3).
De novo assembly of the mapped sequencing reads for each microorganism from the synthetic panel for both amplicon and CaptureSeq libraries generated OTU that were >99% identical to the known cpn60 sequences.
DISCUSSION
Targeted capture of cpn60 gene fragments resulted in an approximately 200-fold enrichment of the soil samples for the taxonomic marker of interest, from under 0.1% of reads in the shotgun metagenomic sequencing to over 15% of reads in the CaptureSeq datasets. This level of enrichment enabled very deep sampling of the soil microbial communities (similar to that attained using PCR-based enrichment) with far less sequencing data (i.e. a significant cost savings). This is of particular importance when the organisms of interest are very low in abundance, such as Microbacterium sp. C448 in this study. OTU were observed in the CaptureSeq datasets that were present at extremely low levels in the soil genomic DNA, near or below the detection limit for ddPCR. Based on the assay setup and dilution factors we used, the theoretical ddPCR detection limit was 3570 copies/g soil, assuming detection of 10 copies per assay33. Although increased sequencing effort can result in more complete coverage of complex microbial communities using shotgun metagenomic sequencing15,21, application of this method to investigate the taxonomic composition of a sample is not an efficient use of budgetary resources. In addition, CaptureSeq provided a balanced view of the relative abundances of microorganisms within the community. PCR-associated representational bias, which presents a skewed representation of microbial taxon abundance 34, is a well-known phenomenon35–37, and is likely the result of using end-point PCR product to generate the sequencing library as the exponential accumulation of amplicon serves to compress the dynamic range of relative DNA abundance in the end product of the reaction. CaptureSeq also resulted in an improvement of the representation of high G/C content microorganisms compared to amplification. Difficulty in amplification of high G/C content targets is a phenomenon that has been previously observed using both 16S and cpn60 taxonomic markers from mixed communities 26,38. De novo assembly of taxonomic clusters from the CaptureSeq datasets into OTU for which probes were not explicitly designed, such as Microbacterium sp. C448, also suggests that off-target cpn60 sequence capture can expand the breadth of OTU observed in the dataset beyond the sequences represented in the probe array and can include sequences that have not been previously observed. While CaptureSeq may be biased by the probe sequences employed, it is clearly capable of detecting novel microbes, expanding the breadth of microorganisms that are included in the microbial community profile beyond microbes that have been previously identified.
The overall patterns of OTU abundances in each of the three methods showed that the amplicon-based method provided a pattern that was distinct from the patterns observed for both CaptureSeq and shotgun metagenomic sequencing, which were more similar to one another. While the three methods all provided discernably different overall community profiles, the difference observed in the relative abundances of microorganisms was likely the result of different biases inherent in each of the methods. The over-representation in the amplicon datasets of several of the microorganisms that were very rare in the metagenomic and CaptureSeq libraries was likely the result of amplification effects on the relative abundances of microorganisms 16,39. PCR amplification also introduced a higher experimental error in various alpha diversity parameters (Chao1, Shannon, Simpson) among the biological replicates analyzed compared to CaptureSeq and shotgun metagenomic sequencing. This observation is consistent with previous studies using 16S rRNA amplicon profiles of soil communities 16,40. Among the three methods, CaptureSeq displayed the lowest inter-sample variation for these diversity parameters. CaptureSeq therefore has the potential to improve insight into microbial community dynamics by reducing experimental variability, and thereby improving reproducibility, compared to both amplicon-based and shotgun metagenomic sequencing. The consistency in alpha diversity calculations is likely a reflection of the reduced biases inherent in the CaptureSeq protocol and facilitates making meaningful conclusions about community richness and diversity.
The cpn60 taxonomic marker enables de novo assembly of OTU 23,25 providing greater discrimination between closely related microorganisms and facilitating OTU-specific assay design. The cpn60-based CaptureSeq approach generates assembled chaperonin sequences that may also include regions flanking the sequence amplified by the universal primers, as observed with the OTU over 1 kb in length generated for Microbacterium sp. C448 and Methanoculleus marisnigri in this study. This additional sequencing information can provide further taxonomic discrimination of many prokaryotes, especially if the assembled region includes the cpn10 co-chaperonin that is adjacent to cpn60 in many bacterial genomes 41. The OTU that were de novo assembled provided suitable targets for ddPCR, facilitating the enumeration of targeted microorganisms from each Domain, which had initially been identified by sequencing and assembly. Such an approach can be used to identify biological interactions between/among microorganisms that can explain their relative abundance patterns 23.
Both CaptureSeq and shotgun metagenomic sequencing provided the means to identify OTU from all Domains simultaneously, facilitating the characterization of inter-Domain relationships among microorganisms. The ability to calculate the abundances of organisms as a proportion of the entire pan-Domain community facilitates the identification of inter-Domain relationships and syntrophies. This is of particular importance in many settings (e.g. manure or gut health) in identifying the syntrophic relationships between volatile fatty acid producing Bacteria and methanogenic Archaea 42. In soil, the complex relationship between saprophytic Fungi and Bacteria is critical to examining the role of the microbiome in nutrient cycling 43. Similarly in the terrestrial pond, the bacterial and eukaryotic components of the microbial ecosystems can be directly compared numerically, which may allow insights into inter-Domain relationships that impact elemental cycles or other ecosystem services. This advantage is not offered using amplification of universal targets, although it does provide the benefit of very deep coverage of complex microbial communities. Shotgun metagenomic genome sequencing does not provide the community coverage of either the amplicon-based or CaptureSeq methods at a similar sequencing effort, suggesting that complex microbiomes will likely require additional phylogenetic data to make any informed examination of microbial diversity metrics. CaptureSeq enabled deep coverage of complex microbial communities, although the community representation is naturally biased by the hybridization probes used. However, we observed off-target hybridization, as evidenced by the appearance of cpn60 OTU in the CaptureSeq datasets. Optimizing the hybridization parameters may result in further improvements to the enrichment of taxonomic markers in complex templates, increasing the efficiency of this approach to microbial community profiling. Shotgun metagenomics can reasonably be considered the least biased means of determining the taxonomic composition of an environmental sample, and may be a suitable choice when sufficient sequencing resources are available. However the abiding popularity of amplicon-based profiling is at least partially a result of the high degree of enrichment of taxonomically informative sequence reads that it generates. CaptureSeq provides an alternative that avoids the amplification biases associated with PCR while retaining the sequencing efficiency of amplicon-based profiling.
Molecular microbial community profiling is one of the foundational steps in exploring microbiome structure-function relationships in an experimental system 44–46. To generate and evaluate scientific hypotheses it is critical to generate a microbiome profile that reflects the natural state a closely as possible with sufficient sensitivity to evaluate both abundant and rare microorganisms. The cpn60-based method described herein permits taxonomically broad and deep microbial community profiling of complex microbiomes. Thus CaptureSeq has the potential to impact life sciences research wherever microbes are thought to be important, including human health and nutrition 47, agriculture 48, biotechnology 49, and environmental sciences 50. Several methodologies are available for microbial community profiling, including 16S and ITS amplification and sequencing, as well as profiling using 16S rRNA-based capture probes 30. While all microbial community profiling techniques have inherent limitations and biases, compared to shotgun metagenomic and universal target amplification, CaptureSeq is a suitable alternative that provides quantitative, pan-Domain analysis of complex communities.
MATERIALS AND METHODS
Soil sample preparation
Soil samples were obtained from a long-term study initiated in 1999 evaluating the effect of annual antibiotic exposure on soil microbial communities, described in Cleary et al. 51. Soil samples evaluated in the present study were obtained in 2013 following 15 sequential annual applications of a mixture of sulfamethazine, chlortetracycline and tylosin, each added at 10 mg kg-1 soil. Soil was sampled 30 days after the spring application of antibiotics. The plots were planted with soybeans (Glycine max, v. Harosoy) immediately after incorporation of the antibiotics. One triplicate group of plots had experienced no antibiotic treatment, and the other triplicate set had received yearly antibiotic treatments since 1999 as described 51. Genomic DNA was extracted from 3.5 g of each soil sample using the PowerMax Soil DNA isolation kit (Mo-Bio Laboratories, Carlsbad, CA) with a 5 mL elution volume. DNA extracts were quantified using a Qubit fluorimeter (Thermo Fisher Scientific, Waltham, MA, USA) and stored at −80°C until processing and analysis.
Terrestrial pond sample preparation
A water sample was obtained from a pond located on a Saskatchewan farm (51.99°N, - 106.46°W) on May 13, 2016. Biological material was recovered from 2L of water by centrifugation at 20,000 g for 20 minutes. Total DNA was extracted using a PowerWater DNA extraction kit (Mo-Bio Laboratories, Carlsbad, CA) and quantified as described above.
Seed wash carrier DNA preparation
Genomic DNA to act as carrier DNA for spiking 10-fold decreasing amounts of a synthetic community was generated by washing wheat seeds as previously described 23, and known to lack all of the microorganisms comprising the synthetic community panel 23.
Synthetic community sample preparation
Amplicons corresponding to the cpn60 UT of 20 bacteria associated with the human vaginal tract 25 were cloned into the pGEM-T Easy plasmid (Promega, WI, USA) and purified using the Qiagen Miniprep kit (Qiagen, CA, USA). The synthetic community was formed by combining equimolar concentrations of plasmids containing the cpn60 UT for all 20 microorganisms 25. Dilutions of this mixture (corresponding to 0.4, 0.04, and 0.004 ng plasmid DNA, or approximately 108, 107, and 106 copies of each plasmid) were spiked into a background of 10 ng/μl of wheat seed carrier DNA. Spiked genomic DNA samples prepared in this way were sequenced using cpn60 universal target amplification and CaptureSeq as described below.
The efficacy of the CaptureSeq hybridization was assessed prior to sequencing using quantitative PCR (qPCR) targeting plasmids added to the seed wash background. qPCR primers and amplification conditions were as described previously 52. Total bacteria were enumerated using qPCR targeting the 16S ribsosomal RNA-encoding gene as described previously 53.
Amplicon-based sequencing
The cpn60 UT was amplified from synthetic community-spiked DNA or soil genomic DNA samples using 40 cycles of PCR with the type I chaperonin universal primer cocktail containing a 1:3 ratio of H279/H280:H1612/H1613 26 and cycling conditions of 1x 95°C, 5 min; 40x 95°C 30sec, 42-60°C 30sec, 72°C 30sec; 1x 72°C 2min. Replicate reactions from each amplification temperature for each sample were pooled and gel purified using the Blue Pippin Prep system (Sage Science, MA, USA) with a 2% agarose cassette, and concentrated using Amicon 30K 0.5 ml spin columns (EMD Millipore, MA, USA). Amplicon from all samples was prepared for sequencing using the NEBNext Illumina library preparation kit (New England Biolabs, location), and sequenced with 400 forward cycles of v2 Miseq chemistry.
CaptureSeq array design
Capture probes were designed based on all type I and type II chaperone sequences in the public domain (i.e. CpnDB; www.cpndb.ca)8. 15,733 probes were designed to be complementary to the type I and type II chaperone sequences. Design of probes was based on identifying 120bp sequences from the reference database using a 60bp incrementing step. Thus the resulting probes should share a 50% overlap with the next probe in a tiling-like fashion. The custom oligos were bound to magnetic beads in equimolar concentration as a custom Mybaits array by Mycoarray (Ann Arbor, MI, USA).
Shotgun metagenomic sequencing and CaptureSeq preparation
Genomic DNA from each of the soil samples was diluted to 2.5 ng/μl and split into two aliquots of 100 μl each for shearing using a water bath sonicator as described 54. Shotgun metagenomic genomic sequencing libraries were prepared directly from one aliquot of each sheared genomic DNA sample using the NEBNext Illumina library preparation kit according to the manufacturer’s directions (New England Biolabs, MA, USA). Samples were then sequenced with 2×250 bp cycles of v2 Miseq chemistry (Illumina, CA, USA).
To generate the CaptureSeq libraries, the second aliquots of sheared genomic DNA samples were subjected to end repair and index addition using NEBNext as above, then hybridized to the capture probe array as described 54. The chaperonin-enriched products were then sequenced with 2×250 bp cycles of v2 Miseq chemistry (Illumina, CA, USA).
Sequencing analysis
To compare the number of output sequencing reads for the different spiking levels, sequencing reads from the synthetic community-spiked samples were down-sampled to the smallest library size for each profiling technique (30,091 for amplicon and 506,247 for CaptureSeq) and mapped to a reference set of cpn60 UT sequences for the 20 microorganisms in the panel by local paired alignment using bowtie2 (v. 2.2.3) 55.
A reference database of all publically available chaperonin sequences was generated by selecting a list of seven chaperonin protein sequences representing each taxonomic group: fungi, bacteria, archaea, plant mitochondria, plant chloroplast, and animal mitochondria. These probes were used as queries for a BLAST search of GenBank using the default parameters to blastp. Matching protein sequences were manually vetted to generate a list of 30,141 protein identifiers. These protein identifiers were then used to retrieve the corresponding 30,120 nucleotide sequences available in GenBank according to the procedure described in Supplemental Information. The accession numbers of those nucleotide sequences are provided in Supplemental Dataset S2. The breadth of taxa that were retrieved by this method was similar to the taxonomic breadth represented in the 16S and ITS reference datasets (Supplemental Dataset S3). Sequencing reads from all soil samples were grouped into taxonomic clusters by paired local alignment to this reference set of chaperonin genes using bowtie2. The sequencing libraries were down-sampled to the size of the smallest shotgun metagenomic library (2,777 mapped paired reads), and the relative abundances of each of the resulting taxonomic clusters was used as the basis for assessing the alpha and beta diversity metrics of the three profiling methods for equivalent sampling effort.
De novo OTU assembly and quantification
Read pairs from target taxonomic clusters were assembled de novo into cpn60 OTU using Trinity (v. 2.4.0) with a kmer of 31. OTU-specific primer and hydrolysis probe sets were designed using Primer3 56 or Beacon Designer (v.7) (Premier Biosoft, Palo Alto, CA, USA) as described previously 57. Annealing temperatures were optimized for each reaction using gradient PCR with ddPCR Supermix for Probes (Bio-Rad, Mississauga, ON, Canada) using 900 nM each primer and 250 nM of hydrolysis probe in a 20 μl reaction volume. Primer/probe sequences and optimized amplification conditions are shown in Supplemental Table S1. Template DNA was digested prior to amplification using EcoRI at 37°C for 60 minutes. A final volume of 2-5 μl was used as template for droplet digital PCR (ddPCR). Emulsions were formed using a QX100 droplet generator (Bio-Rad, Hercules, CA, USA), and amplifications were carried out using a C1000 Touch thermocyler (Bio-Rad). Reactions were analyzed using a QX100 droplet reader (Bio-Rad) and quantified using QuantaSoft (v.1.6.6) (Bio-Rad). Results were converted to copy number/g soil extracted by accounting for sample preparation and dilution. For the prepared CaptureSeq libraries, results were converted to copy number/μl by considering dilution factors.
Alpha diversity analysis
To compare the richness and diversity metrics between the three profiling techniques, mapped sequencing reads were down-sampled from 250-2,750 reads to simulate a uniform sampling effort across profiling techniques. Metrics were averaged across 100 bootstrapped datasets using the multiple_rarefactions.py and alpha_diversity.py scripts from QIIME (v. 1.8.0) 58.
In the cases where the total effect of sequencing effort was required for comparisons across estimates of community coverage read thresholds were transformed to reflect total sequencing effort for each sample.
Beta diversity analysis
To compare the community similarity between different sequencing methods, mapped sequencing reads were down-sampled to the size of the smallest metagenomic library sample (2,777 mapped reads). For intra-technique comparisons, mapped sequencing reads were down-sampled to the smallest library size within each profiling method; 2,777 for metagenomic, 127,642 for CaptureSeq, and 27,388 reads for amplicon libraries. Principal Coordinate Analysis of inter- and intra-technique Bray-Curtis distance was calculated using the vegan package (v. 2.4.2) in R (v. 3.2.4).
Authors’ contributions
ET, SH, TD and AC performed collection, processing and sequencing of all samples. ML, LM and JT performed bioinformatics analysis of sequencing data. All authors contributed to writing the manuscript.
Competing interests
The author(s) declare no competing financial or non-financial interests.
Acknowledgements
This work was funded through Agriculture and Agri-Food Canada A-base project 1562: Optimizing soil health and protecting environmental quality through judicious manure management, and innovative cover cropping.