Abstract
The most common strategy to assess microbiota is sequencing specific hypervariable regions of 16S rRNA gene using 2nd generation platforms (such as MiSeq or Ion Torrent PGM). Despite obtaining high-quality reads, many sequences fail to be classified at the genus or species levels due to their short length. This pitfall can be overcome sequencing the full-length 16S rRNA gene (1,500bp) by 3rd generation sequencers.
We aimed to assess the performance of nanopore sequencing using MinION™ on characterizing microbiota complex samples. First set-up step was performed using a staggered mock community (HM-783D). Then, we sequenced a pool of several dog skin microbiota samples previously sequenced by Ion Torrent PGM. Sequences obtained for full-length 16S rRNA with degenerated primers retrieved increased richness estimates at high taxonomic level (Bacteria and Archaea) that were missed with short-reads. Besides, we were able to obtain taxonomic assignments down to species level, although it was not always feasible due to: i) incomplete database; ii) primer set chosen; iii) low taxonomic resolution of 16S rRNA gene within some genera; and/or iv) sequencing errors. Nanopore sequencing of the full-length 16S rRNA gene using MinION™ with 1D sequencing kit allowed us inferring microbiota composition of a complex microbial community to lower taxonomic levels than short-reads from 2nd generation sequencers.
Introduction
Bacteria, fungi, viruses and archaea are the main microorganisms constituting the microbiota, which is defined as the microbial communities inhabiting a specific environment (1). In humans, many efforts have been made to characterize the different body site ecosystems and their associated microbial communities, mainly at bacterial level (2,3), which are the most abundant microorganisms on the human-associated microbiota (4,5).
Studying host-associated microbiota has provided many insights on health and diseases for many different body sites (6,7). In human skin, alterations on skin microbiota have been associated to numerous cutaneous diseases, such as acne vulgaris (8,9), psoriasis (10–12), or atopic dermatitis (13–17). Not only humans, but also dogs presented, for example, altered microbiota states during atopic dermatitis disease (18–20).
The most common strategy to assess bacterial microbiota is amplifying and sequencing specific regions of 16S rRNA gene using 2nd generation massive sequencing technologies (for a review see (21)). This bacterial marker gene is ubiquitously found in bacteria, and has nine hypervariable regions (V1-V9) that can be used to infer taxonomy (22).
The ability to classify sequences to the genus or species level is a function of read length, sample type, the reference database (23), and the quality of the sequence. High-quality short-reads obtained from 2nd generation sequencers (250-350 bp) bias and limit the taxonomic resolution of this gene. The most common region amplified with Illumina MiSeq or Ion Torrent PGM™ for bacterial taxonomic classification is V4, but this region fails to amplify some significant species for skin microbiota studies, such as Propionibacterium acnes. So, when performing a skin microbiota study the preferred choice is V1-V2 regions, although they lack sensitivity for the genus Bifidobacterium and poorly amplify the phylum Verrucomicrobia (21). On the other hand, near full-length 16S rRNA gene sequences are required for accurate richness estimations especially at higher taxa (24), which are necessary on microbiota studies. Besides, full-length reference sequences are needed for performing phylogenetic analyses or designing lineage specific primers (23), especially in species different to human or mouse, in which previous metagenomics approaches deciphered the richness of bacterial species in the great and different variety of microbiome samples analyzed.
With the launching of 3rd generation single-molecule technology sequencers, these short-length associated problems can be overcome by sequencing the full or almost full-length of 16S rRNA gene with different sets of universal primers (25). Results for full-length 16S rRNA gene have been reported for Pacific Biosciences (PacBio) platform (23,26–30). Schloss and collaborators reported the possibility of generating near full-length 16S rRNA gene sequences with error rates slightly higher, but comparable to the 2nd generation platforms (0.03%) (23). The primary limitation on the PacBio platform is the accessibility to the sequencers and the cost of generating the data.
MinION™ sequencer of Oxford Nanopore Technologies (ONT) (https://nanoporetech.com) is a 3rd generation sequencer that is portable, affordable with a small budget and offers long-read output (only limited by DNA extraction protocol). Besides, it can provide a rapid real-time and on-demand analysis very useful on clinical applications. Several studies targeting the full 16S rRNA gene have already been performed using MinION™ to: i) identify pure bacterial culture (31); ii) characterize artificial and already-characterized bacterial communities (mock community) (32–34); and to iii) characterize complex microbiota samples, from mouse gut (35), wastewater (31) and pleural effusion from a patient with empyema (34).
Here we aim to assess the potential of Nanopore sequencing in complex microbiota samples using the full-length 16S rRNA (1,500bp). First set-up step is performed using a staggered mock community (HM-783D). Then, we sequenced a pool of several skin microbiota samples previously sequenced by Ion Torrent PGM™.
Material and methods
Samples and DNA extraction
As simple microbial community, we used a Microbial Mock Community HM-783D kindly donated by BEI resources (http://www.beiresources.org) that contained genomic DNA from 20 bacterial strains with staggered ribosomal RNA operon counts (1,000 to 1,000,000 copies per organism per μL). This mock community allowed us to perform the MinION™ sequencing and analysis protocol set-up.
As complex microbial community, we used a sample pool from inner pinna skin microbiota of healthy dogs, which had been previously characterized using Ion Torrent PGM™. Skin microbiota samples were collected using Sterile Catch-All™ Sample Collection Swabs (Epicentre Biotechnologies) soaked in sterile SCF-1 solution (50 mM Tris buffer (pH = 8), 1 mM EDTA, and 0.5% Tween-20). Bacterial DNA was extracted from the swabs using the PowerSoil™ DNA isolation kit (MO BIO) (for further details on sample collection and DNA extraction see (36)).
MinION™: PCR amplification and barcoding
To prepare the DNA and the library we followed the Oxford Nanopore protocol 1D PCR barcoding amplicons (SQK-LSK108), however we used the Phusion Taq polymerase rather than the LongAmp Taq recommended in this protocol. Specifically, we amplified ~1,500bp fragments of the full 16S rRNA gene.
Bacterial DNA was amplified using a nested PCR with a first round to add the 16S rRNA gene primer sets and a second round to add the barcodes. In this study we used two sets of 16S universal primers. On one hand, primer set 27F-1391R (also named S-D-Bact-0008-c-S-20 and S-D-Bact-1391-a-A-17 (37)) amplified V1-V8 hypervariable regions of 16S rRNA gene. On the other hand, primer set 27F-1492R (also named S-D-Bact-0008-c-S-20 and S-D-Bact-1492-a-A-22 (37)) amplified V1-V9 hypervariable regions of 16S rRNA gene. These two sets of universal primers are the most commonly used when assessing full-length 16S rRNA gene, because they have shown a really low non-coverage rate, even at phylum level (38). The primers used in this study are listed in Table 1 and contain some ambiguous bases previously described to make the primers more universal (25).
Primer sequences and hypervariable regions (HVR) targeted for full-length 16S rRNA gene amplification and sequencing.
We will distinguish among primer sets used referring to the hypervariable regions they are amplifying, so: 27F-1391R will be V1-V8; and 27F-1492R will be V1-V9.
We ordered the 16S rRNA gene primers with the Oxford Nanopore Universal Tag added to their 5’ end. The universal tag was 5’-TTTCTGTTGGTGCTGATATTGC-3’ for forward primers and 5’-ACTTGCCTGTCGCTCTATCTTC-3’ for reverse primers. These universal tags will allow the second barcoding PCR using the PCR Barcoding kit (EXP-PBC001).
In the first round, PCR mixture (25 μl) contained initial DNA sample (1 μl of DNA of the mock community and 5 μl of DNA of the skin microbiota), 5 μl of 5X Phusion Buffer HF, 0.2 mM of dNTPs, 0.02 U/μl of Phusion High Fidelity Taq Polymerase (Thermo Scientific). Primer concentrations were adapted to each primer set: for 27F-1391R, 0.4 μM of each primer and for 27F-1492R, 0.4μM of 27F and 0.8μM of 1492R. The PCR thermal profile consisted of an initial denaturation step for 30s at 98°C, followed by 25 cycles for 15s at 98°C, 15s at primer-adjusted annealing temperature, 45s at 72°C for extension, and a final step for 7 min at 72°C. The annealing temperature was adjusted to the primer set: 55ºC for 27F-1391R and 51ºC for 27F-1492R. To assess possible reagent contamination, each PCR reaction included a no template control (NTC) sample, which did not amplify.
In the second round, PCR mixture (100 μl) contained 0.5 nM of the first-round PCR product, 20 μl of 5X Phusion Buffer HF, 0.2 mM of dNTPs, 0.02 U/μl of Phusion High Fidelity Taq Polymerase (Thermo Scientific), and 2 μl of each specific barcode (EXP-PBC001) as recommended in the Oxford Nanopore protocol 1D PCR barcoding amplicons (SQK-LSK108). The PCR thermal profile consisted of an initial denaturation step for 30s at 98°C, followed by 15 cycles for 15s at 98°C, 15s at 62ºC for annealing, 45s at 72°C for extension, and a final extension step for 7 min at 72°C.
Following each PCR round, a clean-up step using AMPure XP beads at 0.5X concentration was used to discard short fragments as recommended by the manufacturer. DNA quantity was assessed using Qubit fluorimeter.
A final equimolar pool containing 1ug of the barcoded DNA samples in 45 uL of DNAse and RNAse free water were used to prepare the sequencing library.
MinION™: Library preparation and sequencing
The Ligation Sequencing Kit 1D (SQK-LSK108) was used to prepare the amplicon library to load into the MinION™ following the instructions of the 1D PCR barcoding amplicon protocol of ONT. Input DNA samples were 1 μg of the barcoded DNA pool in a volume of 45 μL and 5 μL of DNA CS (DNA from lambda phage, used as a sequencing positive control). The DNA was processed for end repair and dA-tailing using the NEBNext End Repair / dA-tailing Module (New England Biolabs). A purification step using Agencourt AMPure XP beads (Beckman Coulter) was performed and approximately the expected 700 ng of total DNA were recovered as assessed by Qubit quantification.
For the adapter ligation step, a total of 0.2 pmol of the end-prepped DNA (approximately 200 ng of our 1,500 bp fragment) were added in a mix containing 50 μL of Blunt/TA ligase master mix (New England Biolabs) and 20 μL of adapter mix, and were incubated at room temperature for 10 min. We performed a purification step using Agencourt AMPure XP beads (Beckman Coulter) and Adapter Bead Binding buffer provided on SQK-LSK108 kit to finally obtain the DNA library.
We prepared the pre-sequencing mix (12 μL of DNA library) to be loaded by mixing it with Library Loading beads (25.5 μL) and Running Buffer with fuel mix (37.5 μL).
We used SpotON Flow Cell Mk I (R9.4) (FLO-MIN106). After the quality control, we primed the flowcell with a mixture of Running Buffer with fuel mix (RBF from SQK-LSK108) and Nuclease-free water (500 μL + 500 μL). Immediately after priming, the nanopore sequencing library was loaded in a dropwise fashion using the spot-on port.
Once the library was loaded we initiated a standard 48h sequencing protocol using the MinKNOW™ software.
MinION™: Data pre-processing and analysis
The first flow cell contained two technical replicates of the mock community (M1 and M2) amplified with V1-V9 primer set together with other skin microbiota samples not included in this study. Basecalling was performed using the Metrichor™ agent 1D barcoding for pre-existing basecalls and demultiplexing using EPI2ME debarcoding workflow. Finally, fast5 files were converted to fastq files using poRe (39) and adapters were trimmed using Porechop (40) (Figure 1).
HM783D was a mock community representing a simple microbial community, which was sequenced per duplicate using V1-V9 primer set (technical replicates). 5 samples from the canine inner pinna were representing a complex microbial community, and were sequenced with V1-V8 and V1-V9 (biological replicates). The same sequencing kit was used, and different pieces of software were applied in the different steps.
The second flow cell included a pool of 5 canine skin microbiota samples from the inner pinna amplified by V1-V8 and V1-V9 primer sets (biological replicates). Basecalling was performed using Albacore v0.8.4 software. Again, fast5 files were converted to fastq files using poRe (39). Afterwards sequences were demultiplexed and adapters trimmed using Porechop (40) (Figure 1).
As a final step, we trimmed the universal tags of the sequences using a custom script and filtered out those sequences shorter than 1,100 bp for V1-V8 and 1,200 bp for V1-V9 amplifications respectively, using split_libraries.py from QIIME software (41).
We performed the analysis using NanoOK (42) with LAST aligner (43) against two databases: a subset of Greengenes database (44,45) and the rrn database (33).
– Greengenes database offers annotated, chimera-checked, and full-length 16S rRNA gene sequences and it is one of the most commonly used databases when performing microbiota analyses (44,45). We used the Greengenes database clustered at 99% of similarity to reduce redundancy and filtered out those sequences that did not reach species level or that did not have a minimum length of 1,400bp. This adapted database contained 20,745 sequences belonging to 3,147 different species.
– rrn database is a custom database created by Benitez-Paez and Sanz to analyze the rrn operons (33). It contains information of the ribosomal RNA operon (16S-ITS-23S genes) retrieved from bacterial genomes of GenBank at NCBI. This database contained 22,351 sequences belonging to 2,384 different species.
IonTorrent PGM®: PCR amplification, massive sequencing and downstream analyses
V1–V2 regions of 16S rRNA gene were amplified using the widely used primer pair F27 (5′-AGAGTTTGATCCTGGCTCAG-3′) and R338 (5′-TGCTGCCTCCCGTAGGAGT-3′). PCR mixture (25 μl) contained 2 μl of DNA template, 5 μl of 5x Phusion High Fidelity Buffer, 2.5 μL of dNTPs (2 mM), 0.2 μM of each primer and 0.5 U of Phusion Hot Start II Taq Polymerase (Thermo Fisher).
The PCR thermal profile consisted of an initial denaturation of 30 sec at 98 °C, followed by 30 cycles of 15 sec at 98 °C, 15 sec at 55 °C, 20 sec at 72 °C and a final step of 7 min at 72 °C. To assess possible reagent contamination, each PCR reaction included a NTC sample.
For each amplicon, quality and quantity were assessed using Agilent Bioanalyzer 2100 and Qubit™ fluorometer, respectively. Both primers included sequencing adaptors at the 5′ end and forward primers were tagged with different barcodes to pool samples in the same sequencing reaction.
The 5 samples included in this study were sequenced in a pool with other skin microbiota samples. A sequencing pool included forty barcoded samples that were sequenced on an Ion Torrent™ Personal Genome Machine (PGM) with the Ion 318 Chip Kit v2 and the Ion PGM Sequencing 400 Kit (Life Technologies) under manufacturer’s conditions.
We will distinguish among primer sets referring the hypervariable regions they are amplifying, so: 27F-338R will be named V1-V2.
Raw sequencing reads were demultiplexed and quality-filtered using QIIME 1.9.1 (41). Reads included presented: a length greater than 300 bp; a mean quality score above 25 in sliding window of 50 nucleotides; no mismatches on the primer; and default values for other quality parameters. After that, quality-filtered reads were processed using vsearch v1.1 pipeline (46): a first de-replication step was applied, followed by clustering into operational taxonomic units (OTUs) at 97% similarity with a de novo approach and finally chimera checking was performed using uchime de novo. The raw OTU table was transferred into QIIME 1.9.1 and taxonomic assignment of representative OTUs was performed using the Ribosomal Database Project (RDP) Classifier (47) against Greengenes v13.8 database (44). Alignment of sequences was performed using PyNast (48). We sequentially applied some extra filtering steps in aligned and taxonomy-assigned OTU table to filter out: 1) sequences that belonged to Chloroplasts class; and 2) sequences representing less than 0.005% of total OTUs (as previously done in (49)). The final OTU table for skin microbiota samples can be found in Additional File 1.
Results and discussion
Mock community analyses
We amplified full-length 16S rRNA sequences from the staggered community with primers V1-V9 by duplicate (M1 and M2). We processed a total of 11,284 sequences for M1 and 22,995 for M2. The taxonomic results obtained are shown in Table 2 and 3, both for Greengenes and rrn databases.
Results obtained at the genus level after MinION™ sequencing of two replicates of the staggered mock community (M1 and M2) aligned against Greengenes and rrn databases.
Results obtained after MinION™ sequencing of two replicates of the staggered mock community (M1 and M2) aligned against Greengenes and rrn databases. Relative abundances that correlated to operon counts are in bold. (*) Is the species annotated in the database? (n.d.): not detected
Greengenes is one of the most commonly used databases in microbiota studies because it is curated and checked for chimeras (45). We used a subset of this database, which contained 20,745 sequences belonging to 3,147 different species. However, only 11 out of the 20 bacterial species of the mock community were annotated in the database down to the species level, so we expected seeing only genus level for those specific taxa (marked as “No” in Table 2).
rrn database (33) contains information of the ribosomal RNA operon (16S-ITS-23S genes) for 22,351 sequences belonging to 2,384 different species. This database lacks information for only one member of the mock community (Deinococcus radiodurans).
At the genus level, we were able to identify all the members of the mock community except Actinomyces even when it was represented on both databases (Table 2). The overall trend is that relative abundances correlated to operon counts: taxa with >10% of relative abundance represent operons with 1,000,000 copies; taxa with >1% of relative abundance represent operons with 100,000 copies; and successively. The exceptions were Actinomyces and Rhodobacter with lower abundances than expected that suggesting primer set V1-V9 was not optimal for these bacterial genera.
Approximately 10% of the total reads for Greengenes and 2% for rrn database belonged to other genera theoretically not present in the mock community. Among those “other genera”, the most abundant belonged to Shigella, Enterobacter and Salmonella that present a 16S rRNA gene with high similarity to Escherichia coli (present in the mock community) (50). If we consider these taxa were probably wrongly assigned because they are closely related to Escherichia coli, only ~2.5% and ~0.5 % of the reads aligned to Greengenes and rrn database respectively belong to unexpected other genera, which could be either due to sequencing errors and wrong taxonomical assignation or to cross-contamination from dog skin microbiota samples.
Delving to species level, Greengenes contains taxonomic annotation for 11 out of 20 bacterial species included in the mock bacterial community. From these, we were able to detect all of them (with the exception of Actinomyces odontolyticus). Rhodobacter sphaeroides, Pseudomonas aeruginosa were detected at lower abundances than expected. Moreover, it’s worthy to note that despite Streptococcus mutans was not in Greengenes database at species level, we detected the closely related species Streptococcus sobrinus in high abundance (M1=14.4% and M2=11.3%); in fact both belong to the mutans group (51). Moreover, we saw Streptococcus infantis (M1=9.6 and M2= 7.9%) as another abundant species. On the other hand, rrn database contained species level information from 19 out of the 20 bacterial species of the mock community and we were able to detect all of them (with the exception of Actinomyces odontolyticus) (Table 3).
When looking at the results of rrn database, we could see that not only Rhodobacter sphaeroides and Pseudomonas aeruginosa were underrepresented but also Bacillus cereus and Clostridium beijerinckii. Finally, Streptococcus pneumoniae was overrepresented, probably suggesting that the sequencing errors together with the large amount of Streptococcus entries in the rrn database (44 Streptococcus species in rrn vs 9 Streptococcus species in Greengenes) produced an incorrect identification of this species.
We can conclude from mock community analyses that full-length 16S rRNA sequencing with MinION™ is able to detect taxonomy assignments and retrieve diversity information, provided that the target species are in the database. At the genus level, we were able to accurately retrieve the mock community composition. It’s also worthy to note the good technical replicates obtained for M1 and M2 samples. Some of the biases observed in the expected taxonomic profile could be due to: i) sequencing errors; ii) primer set used; iii) low taxonomic resolution of 16S rRNA gene within some genera; and/or iv) incomplete database.
Evaluation of primer sets V1-V8 and V1-V9 in microbial richness
Dog skin microbiota samples were sequenced as a pool with MinION™ after amplification of full-length 16S rRNA gene with primers targeting regions V1-V8 or V1-V9 (see Table 1). These complex microbiota samples were basecalled with Albacore v0.8.4 and fast5 files were converted to fastq files. After demultiplexing, adapters and universal tags were trimmed and sequences analyzed using NanoOK (42) with LAST aligner (43). We finally obtained a total of 79,083 sequences for V1-V9 and 74,243 for V1-V8.
The same samples had been previously sequenced individually with Ion Torrent PGM™ with primers targeting V1-V2 hypervariable regions. In that case sequences were analyzed with QIIME 1.9.1 (41) with operational taxonomic units (OTUs) picking a representative sequence of a group of sequences with a 97% similarity and taxonomy was assigned with RDP classifier (47) against the whole Greengenes database (it contains many entries that do not reach low taxonomic levels) (44,45). Using RDP classifier, if the taxonomy assignment does not reach a specific threshold, the sequences included in the OTU are set as “Other”. We finally obtained a total of 249,572 sequences for V1-V2 region (Additional File 1).
We performed the evaluation of the primer sets using exclusively the Greengenes database, because V1-V2 short-reads were analyzed using this database. We used a subset of the Greengenes database that contained only those taxa that reached species level for the long-reads obtained for V1-V8 and V1-V9 regions with MinION™. We compared diversity estimates of higher taxa (from kingdom to order).
Both V1-V8 and V1-V9 primer sets for long-reads were able to retrieve more bacterial taxa than V1-V2 short reads. The bacterial richness was higher at different taxonomic levels when assessed with long-reads rather than with short-reads, as seen in Table 4, and this trend increased as we were lowering taxonomic level.
At the highest taxonomic level, we were able to detect not only Bacteria but also Archaea kingdom, despite using universal primers specific for Bacteria (25). However, they were present at really low proportions (< 0.01% of total reads), which agrees with previous results on human skin microbiota samples (52).
Bacterial richness retrieved with different primer pairs targeting short (V1-V2) or full-length (V1-V8 and V1-V9) 16S rRNA gene.
Delving into phylum level, we detected that the most common and better characterized phyla were retrieved by all the primers. These taxa represented >98% of the total skin microbiota composition. Long-read primers were able to detect 8 phyla previously unseen using V1-V2 short-reads (Table 5). It has already been reported the low coverage of this primer set for some specific phyla (21,53). Some phyla were only detected with a specific primer set, such as Lentisphaerae with V1-V8 or Fibrobacteres with V1-V9. On the other hand, GN02, TM7 and Thermi phyla belong to candidate divisions and none of their members have been cultivated (54), so databases do not have taxonomy information down to species level. Thus, since we used the Greengenes subset database with species-level sequences, no representative of those phyla were included for taxonomy assignment of V1-V8 and V1-V9 long-reads and that is probably the reason why they are only detected with V1-V2 primers.
Table containing all the observed phyla per primer subset. Long-reads taxonomy was obtained from a species-level subset of Greengenes database (see materials and methods) that did not contain information of GN02, SR1 and TM7. *phyla that belong to Archaea kingdom. ** phyla with no representative at the species level in the database used for assigning taxonomy to long-reads (V1-V8 and V1-V9).
So, we were able to detect previously unseen bacteria phyla on dog skin using MinION™ long-amplicons for full-length 16S rRNA sequences, which provided better richness estimates. We cannot discard that this increased richness could also be due to the primers used for long amplicons, which presented some degenerated positions. Although these previously unseen bacteria phyla presented low relative abundances on canine skin microbiota, the use of long-amplicons in more uncharacterized environments will provide better diversity estimates.
Skin microbiota analyses
We assessed canine skin microbiota composition using MinION™ and full-length 16S rRNA gene in a pool of 5 inner pinna samples that were previously individually sequenced with Ion Torrent PGM® using V1-V2 short-reads. In Figure 2a we can see the microbiota profile of the most abundant taxa (> 0.5% of total relative abundance) per individual sample included in the pool when sequencing V1-V2 region with Ion Torrent PGM®. We can see that all of them have been identified down to family level and some of them also to genus level.
Only abundant taxa included (>0.5% of total relative abundance). Bar plots of (a) the most abundant taxa on individual skin samples sequenced with V1-V2 primers in short-reads. “g__” means there is no information at genus level; and b) the main families on dog skin microbiota when sequencing the full-length 16S rRNA gene (V1-V9 and V1-V8) and when averaging the V1-V2 individual results.
We have averaged the results from V1-V2 regions (sequenced by IonTorrent PGM®) to compare the taxonomic profile with that obtained from V1-V8 and V1-V9 regions (sequenced by MinION™) (Figure 2b). The global taxonomic profile at family level was equivalent when comparing V1-V8 and V1-V9 that are biological replicates. V1-V2 taxonomic profile was similar for the abundant species, and differed when looking at low-abundant taxa. The pools used for V1-V8 and V1-V9 nanopore sequencing were not equally representing all the individual samples despite working with an equimolar pool. Moreover, Lactobacillaceae seems to be an abundant family on dog inner’s pinna and in fact is almost exclusive to a unique sample (18A).
The pool of canine inner pinna samples was sequenced twice using a different set of primers (V1-V8 and V1-V9) and gave highly similar taxonomic results within each database, making the taxonomic assignment robust (Figure 3 and Additional File 2). However, mock community results showed that species-level resolution was challenging even with full-length 16S rRNA because sometimes the target species was not present in the database and the assigned taxonomy corresponded to a closed-related species rather than the actual one. Because of each database contained different taxonomic annotations, we considered a species-level assignment reliable when the two unrelated databases gave identical annotation.
Comparison of the abundant species (>1% of the total microbiota composition) detected against the (a) Greengenes and (b) rrn databases in the dog skin microbiota samples amplified with V1-V8 and V1-V9 16S rRNA primers and sequenced with MinION™. Additional File 2 contains all the taxa identified (*) coincident taxonomic assignments using independent databases.
When comparing the most abundant species (>1% of total relative abundances), we could see 7 bacterial species identified by two independent databases: Bergeyella zoohelcum, Capnocytophaga canimorsus, Pasteurella multocida, Neisseria shayeganii, Lactobacillus reuteri, Bibersteinia trehalosi, and Neisseria weaver (Figure 3). Other 13 bacterial species presented the same identical taxonomy assignment in the two databases with lower abundances (>0.01%) (Table 6).
% of total skin microbiota composition.
When comparing the taxonomic information obtained by V1-V2 (Figure 2a) to that obtained with long-reads (Figure 3) we could reach lower taxonomic levels for: 1) [Weeksellaceae] family, with Bergeyella zoohelcum; 2) Neisseriaceae, with Neisseria shayeganii and Neisseria weaveri; 3) Pasteurellaceae, with Pasteurella multocida and Bibersteinia trehalosi; and 4) Flavobacteriaceae, with Capnocitophaga canimorsus.
Some of these species detected on dog inner pinna skin microbiota have been previously isolated or detected in dogs. Bergeyella zoohelcum and Neisseria shayeganii have been detected in canine oral microbiota of healthy dogs, as well as Pasteurellaceae sp. and Capnocytophaga (55). In fact, Bergeyella zoohelcum, Capnocytophaga canimorsus and Pasteurella multocida are classically considered as zoonotic pathogens because they can be responsible for bacteremia produced after dog bites (56–58). Also Neisseria weaveri was associated to dog bites (59). On the other hand, Lactobacillus reuteri was one of the most prevalent and abundant lactic acid bacteria isolated from fecal and intestinal microbiota of healthy dogs (60,61). Thus, these taxa are likely to be normal inhabitants of the skin microbiota of healthy dogs.
We could also see that Fusobacteriaceae (specifically Fusobacterium genus) was one of the most abundant families on the inner pinna skin microbiota with V1-V2 short-reads. However, the subset of Greengenes database used lacked entries of Fusobacterium at the species level so this genus is not detected. Porphyromonadaceae was another abundant taxon of dog skin microbiota, but the species level taxonomy differed when using each database. The only representative at species level in Greengenes is P. endodontalis, whereas rrn database contains 9 different species. When analyzing the results against Greengenes, all the Porphyromonas sequences are classified as P. endodontalis. On the other hand when aligning the same sequences against rrn database, these are classified in 7 different Porphyromonas species (see rrn summary in Additional File 2). This result highlights again the need databases that contain the most relevant taxa at species level.
Conclusions
Nanopore sequencing of the full-length 16S rRNA gene with MinION™ allowed us inferring microbiota composition from both simple and complex microbial communities. Moreover, long-reads and degenerated primers were able to retrieve increased richness estimates, which show us previously unseen phyla on dog skin microbiota, despite being at really low abundances.
Taxonomy assignment down to species level was obtained, although it was not always feasible due to: i) sequencing errors; ii) primer set chosen; iii) low taxonomic resolution of 16S rRNA gene within some genera; and/or iv) incomplete database.
With the nanopore reads, we assigned taxonomy through alignment strategies executing all-vs-all comparisons that need many computational resources, so we needed a small database to obtain results. When working with a 16S database subset, we should be sure to include the most relevant taxa even if they do not have representative members at species level. Oxford Nanopore Technologies offers other bioinformatics tools, such as the cloud-based EPI2ME platform. However, we run out of memory on our hard disk when trying to perform these analyses.
Future studies should be relying on the new 1D2 kit that presents higher accuracy and also other experimental strategies could be assessed to perform microbiota studies taking profit of 3rd generation sequencing (Box 1). Other amplicon-based strategies have shown potential, such as: i) sequencing the whole rrn operon constituted by 16S rRNA-ITS-23S rRNA (33); ii) sequencing the cDNA from size selected SSU rRNA (62); or iii) sequencing other bacterial barcodes, such as cpn60 (63). Another potential approach would be sequencing the 16S rRNA directly that now is feasible with nanopore sequencing, Smith and colleagues were able to detect the modified bases of Escherichia coli that can identify a pathogenic strain (64). Finally, metagenomics approach would allow seeing not only taxonomic information of the whole community but also potential functions of the community (genes).
Acknowledgments
This work was supported by a grant awarded by Generalitat de Catalunya to AC (Industrial Doctorate program, 2013 DI 011) and SD (Industrial Doctorate program, 2015 DI 044).