Abstract
Due to their fundamentally different biology, archaea are consistently overlooked in conventional 16S rRNA gene amplicon surveys. Herein, we evaluate different methodological set-ups to detect and quantify archaea signatures in human samples (nose, oral, appendix, stool, and skin) using amplicon sequencing and quantitative PCR.
With our optimized protocol, we were able to increase the detection of archaeal RSVs from one (using a so-called “universal” approach) to 81 RSVs in a representative sample set. Moreover, we confirmed the presence of about 5% archaeal signatures in the human gut, but found, unexpectedly, an almost 1:1 ratio of archaeal to bacterial 16S rRNA genes in appendix and nose samples. This finding indicates a high prevalence of archaeal signatures in body regions thus far not analyzed for the presence of archaea using appropriate methods.
In order to assess the archaeome diversity and archaeal abundance, a specific archaea-targeting methodology is required, for which we propose two standard procedures. These methodologies might not only prove useful for analyzing the human archaeome in more detail, but could also be used for other holobionts’ samples.
Introduction
The importance of microbial communities to human and environmental health motivates microbiome research to uncover their diversity and function. While the era of metagenomics and metatranscriptomics has begun, 16S rRNA gene amplicon sequencing still remains one of the most used methods to explore microbial communities, mainly due to the relatively low cost, the number of available pipelines for data analysis, and the comparably low computational power required.
It has been recognized that methodological issues in sample processing can significantly influence the outcome of microbiome studies, affecting comparability between different studies (Clooney et al., 2016; de la Cuesta-Zuluaga and Escobar, 2016) or leading to an over-and under-estimation of certain microbial clades (Eisenstein, 2018; Eloe-Fadrosh et al., 2016). For better comparability among different studies, standard operational procedures for sampling, storing samples, DNA extraction, amplification and analysis were set-up (e.g. the Earth Microbiome Project (Gilbert et al., 2014) and the Human Microbiome Project (Methé et al., 2012)). This includes the usage of so-called “universal primers“(Caporaso et al., 2012; Klindworth et al., 2013; Walters et al., 2016), to maximally cover the broadest prokaryotic diversity.
The human microbiome consists of bacteria, archaea, eukaryotes and viruses. The overwhelming majority of microbiome studies is bacteria-centric, but in recent years, awareness on eukaryotes (in particular fungi) and viruses has increased (Halwachs et al., 2017; Seed, 2014; Zou et al., 2016). However, most microbiome studies still remain blind for the human archaeome (Eisenstein, 2018; Moissl-Eichinger et al., 2018). A few of the underlying reasons for the under-representation of archaea in microbiome studies are (i) primer mismatches of the “universal primers“(Raymann et al., 2017),(ii) the sometimes too low abundance of the archaeal DNA in the studied samples,(iii) improper DNA extraction methods, and (iv) the incompleteness of the 16S rRNA gene reference databases. This is not only leading to an insufficient taxonomy assignment (unclassification) of archaea, but also, in the worst case, to the removal of archaeal signatures from the retrieved datasets (Ding and Schloss, 2014; Fischer et al., 2016). Moreover, the clinical interest on archaea is minor, due to the fact that there are no known or proved archaeal pathogens yet ( Gill et al., 2006).
Nevertheless,(methanogenic) archaea are part of the commensal microorganisms inhabiting the human body, being regularly detected in the oral cavity and the gastrointestinal tract (Chaudhary et al., 2015; Gaci et al., 2014; Horz and Conrads, 2011; Nkamga et al., 2017); in the latter they sometimes even outnumber the most abundant bacterial species (14%,(Tyakht et al., 2013);). Most human archaea studies use either cultivation or qPCR methods (Grine et al., 2017; Koskinen et al., 2017; van de Pol et al., 2017; Wampach et al., 2017) and only a few, 16S rRNA gene-based archaea-centric studies are available (Koskinen et al., 2017; Moissl-Eichinger et al., 2017). These new studies have shown that archaea are also present in the human respiratory tract (Koskinen et al., 2017) and on human skin in considerable amounts (Moissl-Eichinger et al., 2017; Probst et al., 2013). Furthermore, Koskinen et al.(2017) have shown for the first time that archaea reveal a body site specific pattern, similar to bacteria: the gastrointestinal tract being dominated by methanogens, the skin by Thaumarchaeota, the lungs by Woesearchaeota, and the nose archaeal communities being composed of mainly methanogens and Thaumarchaeota. Altogether, this indicates a substantial presence of archaea in some, or even all, human tissues.
As a logic consequence of our previous studies, we have started to optimize the detection and quantification methods of archaea as human commensals. We tested, in silico and experimentally, 27 different 16S rRNA gene targeting primer pair combinations suitable for NGS amplicon sequencing, to detect the archaeal diversity in samples from different body sites, including respiratory tract (nose samples), digestive tract (oral samples, stool and appendix specimens), and skin. Furthermore, we optimized qPCR protocols for quantifying the archaeal 16S rRNA gene to assess the bacteria/archaea ratios. Our results culminate in a proposed standard operating procedure for archaea diversity analysis and quantification in human samples.
Material and methods
Ethics statement
Research involving human material was performed in accordance with the Declaration of Helsinki and was approved by the local ethics committees (the Ethics Committee at the Medical University of Graz, Graz, Austria).(Bacterial) microbiome studies of some of the samples used in this study have already been published elsewhere (oral, nose, skin samples:(Klymiuk et al., 2016; Koskinen et al., 2018; Santigli et al., 2016). Details of the ethics approvals obtained are shown there. Appendix samples and stool samples have been obtained covered by the ethics votes: 25-469 ex12/13, and 27-151 ex 14/15.
Selection of samples and DNA extraction
Five representative samples types from various body sites including the respiratory tract (nose swabs), the digestive tract (oral biofilm, appendix biopsy and stool samples) and skin swabs were selected for the comparison of amplification-based protocols. All samples underwent pre-screening for archaea-positive qPCR and NGS signals with previously described protocols (Koskinen et al., 2017).
The nose swabs were obtained from healthy volunteers and were taken from the olfactory mucosa located at the ceiling of the nasal cavity using ultra minitip nylon flocked swabs (Copan, Brescia, Italy; n=7)(Koskinen et al., 2018). The oral samples have been obtained by standardized protocol for paper point sampling (Santigli et al., 2017) from healthy volunteers who participated in a microbiome study investigating the subgingival biofilm formation in children (n=7)(Santigli et al., 2016). Appendix samples have been obtained from collaboration partners at the Department of Pediatric and Adolescent Surgery and the Institute of Pathology, both Medical University of Graz (appendix specimens were obtained during pediatric appendectomies from either acute or ulcerous appendicitis; n=6). Stool samples have been obtained from healthy adult volunteers (n=5), and from one patient with above average methane production after metronidazole treatment (n=1; this sample was used for comparing different amplification protocols). Skin samples were obtained from healthy adult volunteers from either the back (n=1; this sample was used for comparing different amplification protocols) or the left forearm, using BD Culture Swabs™ (Franklin Lakes, New Jersey, USA; n=7).
In all cases, the genomic DNA was extracted by a combination of mechanical and enzymatic lysis. However, depending on the sample type, different protocols were chosen: for the stool samples around 200mg of sample has been used for DNA extraction using the E.Z.N.A. stool DNA kit according to the manufacturer’s instruction. The DNA from the appendix samples was obtained using the AllPrep DNA/RNA/Protein Mini Kit (QIAGEN), small pieces of cryotissue were homogenized before 3 times for 30s at 6500rpm using the MagNALyzer ® instrument (Roche Molecular Systems) with buffer RTL and β-mercaptoethanol (according to the manufacturer’s instructions). For the nose and skin samples from the forearm, the DNA was extracted using the FastDNA Spin Kit (MP Biomedicals, Germany) according to the provided instructions. The DNA from the oral samples and from the skin samples from the back were isolated using the MagnaPure LC DNA Isolation Kit III (Bacteria, Fungi; Roche, Mannheim, Germany) as described by Santigli et al.(Santigli et al., 2016) and Klymiuk et al.(Klymiuk et al., 2016).
NOTE: Sample set 1 (one representative sample from each body site) was used to initially evaluate the primers and methods, whereas sample set 2 (7 nose samples, 7 oral samples, 6 appendices, 7 skin samples) was then used for assessing the archaeal diversity and quantity, based on the chosen, optimized protocol.
16S rRNA gene primer selection and pre-analysis in silico evaluation
Different primer pairs targeting the archaeal 16S rRNA gene region have been selected from recent publications (Klindworth et al., 2013; Koskinen et al., 2017). The main criteria for selection were: a. specificity for archaea in-silico, b. low or no amplification of eukaryotic DNA, and c. amplicon length between 150 to 300bp, suitable for NGS such as Illumina MiSeq. In addition, three “universal” primer pairs (Caporaso et al., 2012; Klindworth et al., 2013; Walters et al., 2016) were tested in parallel to determine their efficiency in detecting archaea in human samples. Full information on the selected primer pairs is given in Table 1. In silico evaluation of the selected primer pairs has been performed using the online tool TestPrime1.0 (Klindworth et al., 2013) and the non-redundant SILVA database SSU132 (Quast et al., 2013). Two of the primers were also tested using TestProbe 3.0 (Klindworth et al., 2013) and the SILVA database SSU132 to assess their individual coverage for the archaeal domain.
PCR and library preparation
For archaea-targeting PCR, a nested approach was chosen to increase the specificity for archaea and to avoid the formation of primer dimers caused by the tag, necessary for Illumina sequencing, attached to the primers (Koskinen et al., 2017; Peng et al., 2015).
In addition to the nested approach, a standard PCR was performed with three different universal primer pairs, and one archaeal primer pair for comparative reasons, and to test if a universal approach is capable to cover archaea in human samples in sufficient depth. All primer combinations (in total 27) used for the PCR reactions are provided in Table 2.
For the first PCR, each reaction was performed in a final volume of 20 µl containing: TAKARA Ex Taq® buffer with MgCl2(10 X; Takara Bio Inc., Tokyo, Japan), primers 500 nM, BSA (Roche Lifescience, Basel, Switzerland) 1 mg/ml, dNTP mix 200 µM, TAKARA Ex Taq® Polymerase 0.5 U, water (Lichrosolv®; Merck, Darmstadt, Germany), and DNA template (1-50 ng/µl).
After the first PCR, the resulting amplicons were purified to remove primer remnants. This purification was performed with three different kits to compare the different yields and efficiencies, namely MinElute PCR Purification kit (Qiagen; Hilden, Germany), Monarch® PCR & DNA Cleanup Kit (5 μg)(New England Biolabs GmbH; Ipswich, USA), or innuPREP DOUBLEpure Kit (Analytik Jena, Germany) as indicated in Table 2. The purified PCR product was eluted in 10 µl water (Lichrosolv®; Merck, Darmstadt, Germany).
Two µl of the resulting, purified PCR products were transferred into a subsequent 2nd PCR containing the following mixture: TAKARA Ex Taq® buffer with MgCl2(10 X; Takara Bio Inc., Tokyo, Japan), primers 500 nM, BSA (Roche Lifescience, Basel, Switzerland) 1 mg/ml, dNTP mix 200 µM, TAKARA Ex Taq® Polymerase 0.5 U, and water (Lichrosolv®; Merck, Darmstadt, Germany) up to a volume of 25 µL.
The PCR cycling conditions are listed in Table 3, according to the primer pairs used. For all primer pairs, annealing temperatures were either determined experimentally by gradient PCR or adopted from literature information.
Sample set 2 was amplified using the primer combination 344F-1041R/519F-806R (Table 2). For the first PCR, each reaction was performed in a final volume of 20 µl as described above. After the first PCR, the PCR products were purified using Monarch® PCR & DNA Cleanup Kit (5 μg; New England Biolabs GmbH). For the second PCR, the final volume was 25 µl, as described above, only the volume of the DNA template varied: 2 µl purified PCR product for stool and nose samples, 4 µl for all other samples.
Next generation sequencing, bioinformatics and statistical analyses
Amplicons were sequenced at the ZMF Core Facility Molecular Biology in Graz, Austria, using the Illumina MiSeq platform (Klymiuk et al., 2016). The MiSeq amplicon sequence data was deposited in the European Nucleotide Archive under the study accession number PRJEB27023.
The data processing of the obtained MiSeq sequence data was performed using the open source package DADA2 (Divisive Amplicon Denoising Algorithm;(Callahan et al., 2016)) as described previously (Mora et al., 2016). Shortly, the DADA2 turns paired-end fastq files into merged, denoised, chimera-free, and inferred sample sequences called ribosomal sequence variants (RSVs). The taxonomic affiliations were determined using SILVA v128 database as the reference database (Quast et al., 2013). In the resulting RSV table, each row corresponds to non-chimeric inferred sample sequence with a separate taxonomic classification. RSV tables are given in Supplementary Tables 1(a-c) and 2 (available on request).
Negative controls (extraction controls and no-template controls) were included during PCR amplification. The RSVs overlapping the negative controls and samples were either subtracted or completely removed from the data sets. RSVs detected in the negative controls are provided in Supplementary Table 3(available on request).
Processing of sequencing data was performed using the in-house Galaxy set-up (Klymiuk et al., 2016) and subsequent statistical analyses were performed in R version 3.4.3 (R Core Team, 2013). Alpha diversity was calculated using the Shannon diversity. In order to identify differences between the archaeal diversity, Wilcoxon Rank Test was performed. The diversity of the archaeal communities within sample set 2 was determined using two diversity matrices (Shannon and richness). Analysis of variance (ANOVA) was performed to test for differences in the archaeal diversity based on the body location. Principal Coordinates Analysis (PCoA) based on Bray-Curtis distances was used to visualize differences between the samples from different body site. Redundancy discrimination analysis (RDA) was used to analyze the association between archaeal community composition and the body site location. RDA, alpha diversity and PCoA analysis were performed using Calypso Version 8.62 (Zakrzewski et al., 2016). The RSV tables obtained were used to summarize taxon abundance at different taxonomic levels. The taxonomic profiles obtained at the genus level were used to generate bar graphs for all samples.
A phylogenetic tree was constructed with the obtained archaeal RSVs from the universal approach, the archaeal primer pair 519F-806R, and from the archaeal specific primer pair combination 344F-1041R/519F-806R. The alignment was performed using the SILVA SINA (Pruesse et al., 2012) and the 5 most closely related available sequences (neighbors) were downloaded together with the aligned sequences. All sequences were cropped to the same length and used to construct a tree based on maximum-likelihood algorithm using MEGA7 (Kumar et al., 2016), using a bootstrap value of 500. The Newick output was further processed with iTOL interactive online platform (Letunic and Bork, 2007).
Quantitative PCR
To determine the optimal qPCR approach for assessing the absolute number of archaeal 16S rRNA gene copies in relevant samples, three different procedures were tested, two based on SYBR, one on TaqMan chemistry. In addition, two bacteria targeting approaches, one based on SYBR, one on TaqMan chemistry, were evaluated in parallel to determine the ratio of bacteria: archaea in these samples.
The used primers are given in Table 4 and the in silico evaluation results thereof are given in Table 5.
For SYBR-based qPCR, the reaction mix contained: 1x SsoAdvanced(tm) Universal SYBR® Green Supermix (Bio-Rad, Hercules, USA), 300 nM of forward and reverse primer, gDNA template, water (Lichrosolv®; Merck, Darmstadt, Germany). For TaqMan-based qPCR, the reaction mix contained: 1x SsoAdvanced(tm) Universal Probes Supermix (Bio-Rad, Hercules, USA), 800 nM of forward and reverse primer, 200 nM of FAM-marked probe for archaea and HEX-marked probe for bacteria, gDNA template, water (Lichrosolv®; Merck, Darmstadt, Germany).
The qPCR was performed using the CFX96 Touch(tm) Real-Time PCR Detection System (Bio-Rad, Hercules, USA). The qPCR conditions used are given in Table 6.
Crossing point (Cq) values were determined using the regression method within the Bio-Rad CFX Manager software version 3.1. Absolute copy numbers of bacterial and archaeal 16S rRNA genes were calculated using the Cq values and the reaction efficiencies based on standard curves obtained from defined DNA samples from Nitrososphaera viennensis and Escherichia coli (Probst et al., 2013). The qPCR efficiency and R2 values of the standard curves can be found in Table 6(last two columns) and were in range of 92.5% to 106.40% and the R2 values were between 0.837 and 0.983, the archaeal TaqMan approach having a low R2 value of only 0.837.
Detection limits were defined based on the average Cq values of non-template controls (triplicates) and the corresponding standard curves of the positive controls. The detection limits were variable, depending on the primer pair used; herein, the detection limit is defined as the last positive signal before the signal of the negative control, in order to exclude false-positive results. For archaea detection the limit was 480 copies/µl when the primer pair 806aF-958aR was used and 7.46*104 copies/µl with the primer pair 344aF-517uR, respectively, and 186 copies/µl for 349aF-806aR. For bacterial 16S rRNA genes, the detection limit was 6.45*103 copies/µl for the primer pair 338bF-517uR that was used for SYBR and 508 copies/µl for the primer pair 331bF-797R that was used for TaqMan. The archaeal (806aF-958aR) and bacterial (338bF-517uR) qPCR was then used to determine the absolute copy numbers of 16S rRNA genes for sample set 2. The crossing points (Cq) were determined using the single threshold in Bio-Rad CFX Manager software version 3.1. The absolute copy numbers were calculated as mentioned above. The qPCR efficiency was 108.6% for archaea, and 90.9% for Bacteria, respectively and R2 values were 0.976 for archaea and 0.929 for Bacteria. The detection limits were 715 copies/µl for archaea, and 7.06E+04 copies/µl for Bacteria.
All qPCR reactions have been performed in triplicates. Only samples with positive results in 2 out of 3 or 3 out of 3 replicates were considered for further analysis.
Results
Primer pairs were evaluated with respect to the following characteristics: i) high in silico specificity for archaeal 16S rRNA genes and an amplicon length of 150 to 300 bp, suitability for NGS and quantitative PCR, ii) in vitro capability to amplify diverse archaeal 16S rRNA genes from a variety of human specimens, and iii) in vitro suitability for qPCR with stringent requirements in efficiency and specificity.
Besides archaea-specific primer pairs, two widely used “universal” primers (515F-806uR original; 515FB-806RB modified;(Caporaso et al., 2012; Walters et al., 2016)) were evaluated all along to assess the potential of “universal” primers to display archaeal diversity associated with the human body.
Most archaea-targeting primers reveal good coverage in silico
A total of 12 different primer pairs were evaluated in silico (Table 1) using sample set 1. Most primer pairs showed high coverage for the archaeal domain ranging from 46% to 89% and revealed a high domain-specificity (8 of 12 primer pairs without matches outside of the archaeal domain). When one mismatch was allowed, the coverage increased to values from 68% to 95%.
One designated archaeal primer pair was found to target additionally sequences of the bacterial and eukaryotic domain when one mismatch was allowed, namely primer pair 519F-806R, with a coverage of the bacterial domain >90%.
We further evaluated the detailed coverage of the primer pairs for specific archaeal phyla and genera of particular interest in human archaeome studies: Euryarchaeota, Thaumarchaeota, and Wosearcheota, as well as Nitrososphaera, Methanobrevibacter, Methanosphaera and Methanomassiliicoccus. For all subsequent in silico analyses we allowed one mismatch.
All primer pairs revealed a high coverage for the Euryarchaeota phylum (in total >90%), for genera Methanobrevibacter (between 94.6% and 98.9%) and Methanomassiliicoccus (between 92.9% and 100%), while the coverage for Methanosphaera was below 90% for most primer pairs except for 519F-806R and 349F-519R (Table 7).
The coverage of the Thaumarchaeota phylum depended on the primer pair used. Most analyses that included the primer 344F showed a low in silico coverage for Thaumarchaeota (below 30%) while all other primer pair combinations revealed a high coverage of this phylum (>90%; Table 7). The coverage for Nitrososphaera in particular varied between 86.9% and 94.4%. The class Wosearchaeia showed variable coverage between 65.2% and 89.5%.
As the archaeal primer 344F has often been used for detecting archaea in a variety of environmental samples (Fontana et al., 2016; Zhang et al., 2014), we took a closer look on its coverage capacity using the TestProbe 3.0 (Klindworth et al., 2013) and the SILVA database SSU132 (Quast et al., 2013). Overall, the primer revealed 73.2% coverage of the archaeal domain. The in silico results showed a high coverage of the Euryarchaeota phylum (93.8%) and the genera within, especially Methanobrevibacter with 96.1%, Methanosphaera with 89.9% and Methanomassiliicoccus with 100%. It also revealed a good coverage for Wosearchaeia with 74.6%, but showed, despite a high coverage for the genus Nitrososphaera (93.6%), a generally low coverage of the Thaumarchaeota phylum with only 24%, indicating a potentially low capacity for studies with thaumarchaeotal diversity in focus.
Another primer that we analyzed in more detail was primer 519F, also known as S-D-Arch-0519-a-S-15. As the sequence of this primer (5’ - CAGCMGCCGCGGTAA - 3’) overlaps with the sequence of the “universal” primer S-*-Univ-0519-a-S-18 (5’ - CAGCMGCCGCGGTAATWC - 3’), we were interested to compare their coverages.
As expected, the results from the in silico analysis indicated that the primer S-D-Arch-0519-a-S-15 targets Bacteria (coverage 98%), archaea (coverage 98.2%) and Eukarya (coverage 96.4%). The universal primer S-*-Univ-0519-a-S-18 has a similar coverage and specificity for the three domains of life: Bacteria (coverage 97.5%), archaea (coverage 96.4%), and eukarya (coverage 95.6%). Considering our in silico results, the primer S-D-Arch-0519-a-S-15 cannot be used to target archaea specifically and should be re-named to S-D-Univ-0519-a-S-15.
As most selected archaea-targeting primers revealed a good coverage of the archaeal domain in general, all primer pairs were used for subsequent wet-lab experiments.
Archaeal community composition varies according to the used primer pairs and universal primers fail to detect the archaeal diversity
Herein we sought to identify the optimal primer pair for amplicon sequencing of the archaeomes in human samples. For this, we selected five representative sample types from different body sites: nose (upper nasal cavity), oral (subgingival sites), stool and appendix specimens, and skin (back)(sample set 1). The stool sample represented the positive control and served as a natural mock community.
Next generation sequencing was performed, after a two-step nested PCR (for archaea) or a single-step PCR (“universal” target). The nested PCR approach was selected based on the reasons given in the Materials and Methods section. In brief, the first PCR was intended to select the archaeal community of interest, the second to further amplify the archaeal signal.
The results obtained after amplification, NGS and data analysis based on DADA2 algorithm (Callahan et al., 2016; Koskinen et al., 2017) are summarized in Supplementary Table 4(a-c)(available on request), which includes the number of reads and observed ribosomal sequence variants (RSVs) obtained for all samples covering the three domains, Archaea, Bacteria, and Eukarya (plus unclassified taxa).
The use of universal primers (primer pair 515F-806uR, 515FB-806RB and 519F-785R) in the PCR reaction resulted in reads that were classified mainly within the bacterial domain with almost no reads classified within the archaea, confirming our previous observations (Koskinen et al., 2017). In fact, when the two universal primer pairs (515F-806uR original and 515FB-806RB) were compared regarding the archaeal domain, only primer pair 515F-806uR allowed the detection of only one RSV being classified within the archaea and from only one sample, the stool sample (Supplementary Table 1a)(available on request).
Universal primer pair 519F-785R yielded slightly better results, allowing the detection of three different archaeal RSVs from two different samples: Methanobrevibacter and Methanosphaera in the stool sample, and one RSV from the nose sample, classified within the Thaumarchaeota phylum. Very similar results (detection of the same methanoarchaeal signatures in the stool sample, and one thaumarchaeal signature in the oral sample instead of the nose sample) were obtained from primer pair 519F-806R, which was originally described to be archaea-specific, but revealed wide coverage of the bacterial and archaeal domain (>90%, when one mismatch allowed) in silico (see previous chapter).
The obtained archaeal RSVs from the universal approaches were used to construct a phylogenetic tree to identify whether the universal primer pairs allow the detection of the same RSVs or closely related RSVs in the analyzed samples (Fig. 1). In addition, the RSVs obtained from the archaeal specific primer pair combination 344F-1041R/519F-806R were included for comparison. This approach allowed the detection of 20 RSVs in the nose, 19 RSVs in the oral, one RSV in the appendix, 3 RSVs in the stool, and 39 RSVs in the skin sample. For the stool sample, the RSVs obtained from the universal and archaeal specific approach grouped together in clades, either within Methanobrevibacter or Methanosphaera clade (Fig. 1). The RSV from the nose sample detected through the universal approach grouped separately to the RSVs identified using the archaeal specific primers. The oral RSV from the universal approach grouped also separately from the RSVs obtained with the archaeal specific primers, the archaeal specific primers did not detect any Thaumarchaeota in the oral sample, only RSVs that were grouped within Woesearchaeota.
Overall, 10 out of 24 primer pair combinations allowed the detection of archaeal signatures in all analyzed samples (Supplementary Table 1a; provided on request). But all 24 primer pair combinations were able to identify archaeal reads in at least one of the sample types analyzed, for example all primer pair combinations detected archaeal RSVs in the stool sample; the number of RSVs, however, varied according to the used primer pair combination from 1 RSV to 7 RSVs.
Depending on the used primer pair, the archaeal community composition was found to be highly variable (Fig. 2). We observed that the detected variation in the archaeal composition was due to the used primer pair in the first PCR, the primer pair used to select the communities, while the second PCR and primer pair enhanced the signal of the first PCR (Fig. 2). It shall be mentioned that for the second PCR only three different primer pairs have been used, 349F-519R, 519F-785R and the 519F-806R, of which the first two primer pairs had been used before to explore archaeal communities in human samples (Koskinen et al., 2017) and in confined habitats (Mora et al., 2016).
To further explore the influence of the primer pair selection on the archaeal community composition, the alpha diversity was calculated using the Shannon index (Fig. 3). For this analysis, we excluded the results obtained from the second primer pair 349F-519R as most samples herein (except stool samples) yielded less than 500 reads. To compare the alpha diversity within the results obtained we performed a Wilcoxon Rank Test to identify potential differences in the primer combination performances.
The highest archaeal diversity could be detected with the primer combination 344F-1041R/519F-806R (PCR34); this result was found to be significantly higher (p<0.05) compared to PCR 33 (344F-1041R/519F-785R), PCR Q7 (344F-806R/519F-806R) and PCR M7 (344F-806R/519F-806R; see Table 2 and Fig. 3), whereas no other significant differences could be detected.
According to the comparison of the alpha diversity of the archaeal communities between the different primer pair combinations, we recommend the use of the nested approach with the primer pair 344F-1041R in the first PCR, followed by a second PCR with the primers 519F-806R for studying and exploring the archaeal communities in human samples.
The primer combination with superior performance revealed a broad archaeal diversity in stool, appendix, nose, oral and skin samples
To further test and validate the use of the primer pair combination 344F-1041R/519F-806R for studying the archaeal communities within human samples, we selected additionally samples from the same body sites: nose (n=5), oral (n=6), appendix (n=5), stool (n=5), and skin (n=7)(sample set 2).
Our selected PCR approach allowed the detection of archaea in all samples investigated with an average of 102,366 reads and 8 observed RSVs for the nose, 56,480 reads and 35 observed RSVs for oral, 46,022 reads and 8 observed RSVs for the appendix, 93,948 reads and 4 observed RSVs for the stool sample, and 76,001 reads and 30 observed RSVs for the skin samples.
A summary of the number of archaeal, bacterial and eukaryotic reads/RSVs can be found in Supplementary Table 5(provided on request). The results were plotted to indicate the archaeal communities present at genus level in the analyzed samples (Fig. 4).
We further characterized the archaeal community information with respect to alpha and beta diversity. Depending on the body site a significant difference (p-value < 0.05) could be shown for alpha (Shannon index and richness) and beta diversity (PCoA and RDA)(Fig 5). Our results confirm the findings that archaeal communities are body site specific (Koskinen et al., 2017).
Notably, the stool samples revealed the overall lowest diversity of Archaea, with only 3-5 identified archaeal RSVs, while skin and oral samples contained a higher diversity, with 5 to 49 RSVs found in the skin samples and 14 to 49 RSVs in the oral samples.
Optimization of quantitative PCR for determination of the archaea:bacteria ratio
The ratio of archaea vs. bacteria in human tissues and samples is widely unknown and can only be inadequately assessed from amplicon-based studies or metagenomics. It was our goal to set-up a suitable qPCR-based methodology to determine the ratios of bacterial and archaeal 16S rRNA gene contents in diverse samples.
Based on literature review, we selected three different set-ups for the quantification of archaeal 16S rRNA genes, and two different set-ups for the quantification of bacterial 16S rRNA genes. We included both chemistries, namely TaqMan and SYBR (Table 4). The same samples (sample set 1) that have been used for amplicon sequencing (nose, oral, appendix, stool and skin) have been tested for the different qPCR set-ups.
In silico analyses of the qPCR primers indicate a low coverage for archaea when no mismatch is considered, below 50% for two of the primers, but when one mismatch is considered the coverage increases from 71.2% to 91.1%. The coverage of the bacterial primers ranged from 75.3% to 90.5% when zero mismatches were considered and from 82.8% to 95.8% when one mismatch was allowed (Table 5).
For quantification of archaeal 16S rRNA genes, three different approaches were investigated, two SYBR approaches using two different primers pair: 806aF-958aR and 344aF-517uR, and one TaqMan approach using the primers pair 349aF-806aR and the probe (515F-FAM)(Hunter et al., 2002; Probst et al., 2013; Takai and Horikoshi, 2000).
All primer combinations showed qPCR results within the set limits for efficiency (between 90% - 110%). However, the SYBR protocols proved superior compared to the TaqMan approach, as this methodology allowed the quantification of archaeal signatures in only one out of five samples, namely the stool sample (Fig. 6a).
The SYBR 806aF-958aR allowed the quantification of archaeal 16S rRNA genes in 3 (stool, nose and appendix) out of 5 samples (Fig. 6a). However, no archaeal signatures could be detected in the oral and skin samples as signals were below the detection limit of 480 copies/µl.
The SYBR approach based on primer pair 344aF-517uR allowed the quantification of archaea in 4 out of 5 samples (Fig. 6a), with visible bands on the electrophoresis gel for stool, nose and appendix samples. Due to the high detection limit, we consider that this approach does not allow the detection of archaeal signals in samples with low biomass and the results might be overestimated.
Therefore, we suggest the use of the SYBR approach based on the primers pair 806aF-958aR for optimally detecting archaeal signatures within human samples. With this method we detected around 108 copies of archaeal genes in the stool sample, 105 copies in the nose sample and 106 copies in the appendix sample. Moreover, it was the most conservative approach, with the lowest chance to overestimate the archaeal abundance.
For the quantification of bacterial 16S rRNA genes, we used two different approaches, a SYBR approach using the primers pair: 338bF-517uR and a TaqMan approach using the primers pair: 331bF-797R and the probe 528bR labeled with HEX. Both methods allowed an adequate quantification of the bacterial load in all samples (Fig. 6b). The highest bacterial load was found in the stool samples, with around 1011 copies/total DNA (200mg sample), followed by the nose samples containing around 107 copies/total DNA. The appendix bacterial load was around 106 copies/ total DNA, and the oral sample had the lowest bacterial load with only 105 copies/total DNA. For the skin samples the absolute number is reported on the area sampled since the DNA could not be quantified using the Qubit HS detection method (Fig. 6).
Based on our experiments and their results, we propose to use primer pair 806aF-958aR for archaea-targeted SYBR qPCR.
Optimized quantitative PCR protocols reveal the presence of up to 50% archaeal signatures in certain human samples
The selected qPCR approach for archaea has been further tested using a set of different human samples (sample set 2). We selected nose (n=5), oral (n=6), appendix (n=5) and stool (n=5) samples to determine the absolute number of archaeal 16S rRNA genes and to assess the ratio between bacterial and archaeal 16S rRNA genes. Skin samples could not be included in the experiment due to their low DNA content.
For the archaeal qPCR we used the SYBR approach based on primer pair 806aF-958aR, and for the bacteria we used the SYBR qPCR approach with the primer pair 338bF-517uR.
The results are plotted in Fig. 7. The ratio between bacterial and archaeal 16S rRNA genes was determined using the average for each sample type. For stool samples the ratio was 20:1 (0.1 to 21.3% archaeal signatures). For nose and appendix, the ratio was around 1:1 (21.8 to 70.7 % archaeal signatures for appendix, and 22.8 to 82.8% for nose), and for oral samples the ratio was 77:1 (0.3 to 5.3 % archaeal signatures).
Discussion
Up to now, little it is known about the composition of the healthy human archaeome, or the real bacteria/archaea ratios in certain body parts. It is unknown, whether archaeal communities are affected by dysbiosis or human disease, or how we acquire these microorganisms after birth, although several studies have shown that archaea are present in the first year of life (Palmer et al., 2007; Wampach et al., 2017). Additionally, it is largely unexplored, how archaeal communities interact/communicate with other commensal microorganisms inhabiting the human body. Furthermore, there still remains the most burning question, if there are really no archaeal pathogens. Facing these numerous unsolved mysteries, we argue that more studies are needed with respect to the human archaeome. For these, however, standardized protocols are required, which are effective enough to reliably assess archaeal diversity and abundance based on 16S rRNA gene signatures.
To address the need for archaea-targeted amplicon method for NGS in human samples, we herein tested 12 different primers previously described in literature (Klindworth et al., 2013), in 27 primer pair combinations and evaluated their performance using in silico and experimental approaches on five different human sample types.
Despite their overall good in silico results, the three universal primer pairs tested failed to assess the archaeal diversity in the experiments. Two of these primer pairs represent the most used universal primers for amplicon sequencing methods (Caporaso et al., 2012; Walters et al., 2016), resulting in the detection of one (515F-806uR) or zero archaeal RSVs (515FB-806RB) in five sample types that evidentially possessed a variety of archaeal signatures. This was particularly intriguing, as the presence of archaeal signatures in the appendix and nose sample was confirmed by qPCR, with a ratio of 1:1 and 8:1 bacteria: archaea, respectively (sample set 1).
The reasons for the failure of the universal primers to detect Archaea are unclear; however, it seems bacterial signatures outcompete archaeal signatures, just due to slightly better primer matches, depending on the diversity within the sample.
Furthermore, an archaeal primer pair (519F-806R) that has been used before for amplicon sequencing (Siles et al., 2018) detected only a small proportion of the archaeal diversity in the analyzed samples, but the same primer pair performed better when used in a nested PCR together with the primer pair 344F-1041R for the first PCR.
Nested PCR has been shown to improve sensitivity and specificity and are useful for suboptimal DNA samples (Bomberg et al., 2003; Vissers et al., 2009). Based on our experience in the past (Koskinen et al., 2017), other reports (De Vrieze et al., 2018), and due to the fact that all attempts to use Illumina-tagged archaeal primers to directly identify archaeal 16S rRNA genes in human samples failed, we kept to this approach for the archaeal diversity assessment.
We used a combination of an archaea-specific first PCR (9 different primer combinations) and two archaeal specific and one universal primer pair, resulting in 24 different approaches (Table 2).
Notably, although the primer pair combinations 344F-915R/349F-519R and 344F-915R/519F-785R had been used earlier to detect archaeal signatures in human samples and confined environments (Koskinen et al., 2017)(Mora et al., 2016), our study revealed that when the second PCR contained the Illumina-tagged primers 349F-519R, almost no reads apart from the stool samples were retrieved (Suppl. Table 4a; available on request).
Ten out of the 24 different primer combinations allowed the detection of archaeal signatures in all analyzed samples (sample set 1). The results of two of the primer pair combinations were outstanding regarding the number of reads and observed RSVs identified in each sample (Supplementary Table 4a; available on request), namely primer pair 344F-1041R/519F-806R and 344F-1041R/519F-785R. The comparison of the alpha diversity (based on Shannon index) indicated that the archaeal diversity uncovered with the primer pair 344F-1041R/519F-806R was significantly higher than the one obtained with the primer pair combination 344F-1041R/519F-785R (Fig. 3), which was thus considered superior.
To further test and validate the use of the primer pair 344F-1041R/519F-806R, we selected 29 samples from different body sites (nose, oral, appendix, stool, skin; sample set 2), resulting in overall 85 archaeal RSVs from 6 different phyla. We were able to confirm body-site specificity through PCoA and RDA analysis (Koskinen et al., 2017), with the gastrointestinal tract (stool and appendix samples) being dominated by euryarchaeal communities, the oral samples dominated by archaeal communities from the Euryarchaeota phylum but different from the ones found in the gastrointestinal tract and the nose dominated by Euryarchaeota and Thaumarchaeota signatures. The skin revealed a mix of Euryarchaeota, Thaumarchaeota, Aenigmarchaeota, and, in very low amounts also Crenarcheota, confirming previous results (Koskinen et al., 2017; Moissl-Eichinger et al., 2017; Tsai et al., 2016).
According to the obtained results we recommend the use of the primer pair combination 344F-1041R/519F-806R to identify and characterize archaeal communities within human samples, even though the second primer pair 519F-806R is a universal primer pair according to the in silico results. Although this led to retrieval of not only archaeal reads, but also reads classified within Bacteria and Eukarya which had to be filtered bioinformatically, this procedure proved superior to all the other primer pairs tested in identifying archaeal signatures in the analyzed samples.
Another issue we addressed in this study is the need for a quantitative assay to determine the ratio of archaeal and bacterial signatures in human samples. Published qPCR methodology focuses on specific taxa, such as Methanobrevibacter, Methanosphaera and Methanomassiliicoccus (for more details see Table S1 in (Koskinen et al., 2017)), or has not yet been fully evaluated for various human samples (Hunter et al., 2002; Probst et al., 2013; Takai and Horikoshi, 2000). Our results revealed, that two qPCR approaches (SYBR 344aF-517uR and TaqMan 331bF-797R with probe 528bR) were unsuitable, due to the high detection limit, low efficiency, extraordinary long run times and potential over-estimation of the archaeal signatures. Therefore, we recommend the use of a SYBR approach based on primer pair 806aF-958aR, as this method allowed the quantification of archaeal 16S rRNA genes in stool, appendix and nose samples, and had a low detection limit of 480 copies 16S rRNA genes/ µl.
In conclusion, we have shown that the choice of the archaeal primer pair influences substantially the perspective of the obtained archaeal community in the analyzed samples. Therefore, for future comparisons between studies focused on exploring and characterizing the archaeal community in human samples using amplicon sequencing approach, it should be considered to make use of the same, standardized methodology. For this we recommend the use of a nested approach with the primer pair 344f-1041R for the first PCR, followed by a second PCR with the primer pair 519F-806R. Furthermore, for quantifying the number of archaeal 16S rRNA gene copies we recommend the use of the SYBR approach based on the primer pair 806aF-958aR.
Conclusions
The optimized and evaluated protocol for archaeal signature detection and quantification can now be used for all human samples and might also be useful for samples from other environments and holobionts, such plants or animals.
Acknowledgements
The authors acknowledge the support of the ZMF Galaxy Team: Core Facility Computational Bioanalytics, Medical University of Graz, funded by the Austrian Federal Ministry of Science, Research and Economy (BMWFW), Hochschulraum-Strukturmittel 2016 grant as part of BioTechMed Graz. M.-R. Pausan and M. Blohs were trained within the frame of the Ph.D. program in Molecular Medicine of the Medical University of Graz.