ABSTRACT
We developed ResCap, a targeted sequence capture platform based on SeqCapEZ technology, to analyse resistomes and other genes related to antimicrobial resistance (heavy metals, biocides and plasmids). ResCap includes probes for 8,667 canonical resistance genes (7,963 antibiotic resistance genes and 704 genes conferring resistance to metals or biocides), plus 2,517 relaxase genes (plasmid markers). Besides, it includes 78.600 genes homologous to the previous ones (47,806 for antibiotics and 30,794 for biocide or metals). ResCap enriched 279-fold the targeted sequences detected by metagenomic shotgun sequencing and improves their identification. Novel bioinformatic approaches allow quantifying “gene abundance” and “gene diversity”. ResCap, the first targeted sequence capture specifically developed to analyse resistomes, enhances the sensitivity and specificity of available metagenomic methods to analyse antibiotic resistance in complex populations, enables the analysis of other genes related to antimicrobial resistance and opens the possibility to accurately study other complex microbial systems.
INTRODUCTION
Antimicrobial resistance is considered a major Global Health challenge recently included in the political agendas of international bodies such as G8 and IMF1. The adoption of measures to face the “antibiotic resistance crisis” 2 is impaired by the controversy about “what” is resistance and “how” and “where” should be detected and analysed 3–4,5
Metagenomic methods are increasingly used to analyse the ensemble of genes encoding antibiotic resistance in different microbial ecosystems which has recently been defined as the “resistome”6–16. An important hurdle of the available resistome analyses is the low discrimination in the detection of minority populations harbouring resistance genes (often present at concentrations below the detection level of the methods used)17 and/or the identification of allelic variants that might confer different resistance phenotypes.
A sensitive and specific identification of antibiotic resistance genes in a metagenome background is required for assessing the associated risks in terms of Public Health 18a, 18b. Such methodological challenge parallels the difficulties of analysing sets of orthologous genes of many individuals for the diagnosis of human multifactorial inherited diseases 19. In this case, the use of “capture-based” or “targeted” sequencing strategies, was a cost-effective and high-throughput alternative that overcame the limitations of metagenomic shotgun sequencing (MSS)20,21. In-solution t argeted capture platforms (TCP) take advantage of Next Generation Sequencing to provide technical improvements over array-based platforms or other genome-partitioning approaches in terms of scalability, cost-effectiveness, and enhanced data quality (lower variance in target coverage, more accurate SNP calling, higher reproducibility and longer assembled contigs)22. Currently, TCPs offer a tremendous potential for boosting advances in environmental and ecological studies, particularly involving micro-biodiversity research, which requires the isolation of sequences of interest from a mixture of DNAs of a complex multiplicity of organisms23.
Our work reports the development and validation of the first TCP for the analysis of bacterial resistomes, which we designate as ResCap (for Resistome Capture). We show that ResCap results in a significant improvement in sensitivity and specificity over previous metagenomic analysis of antimicrobial resistance. ResCap also allows the analysis of the presence and diversity of genes conferring resistance to other antimicrobials (heavy metals and biocides), which are frequently co-selected with antibiotic resistance genes and also genes from replicons of the mobilome (as plasmids). An ad-hoc advanced bioinformatics pipeline, developed in parallel, exploits the capabilities of Rescap comparative metagenomic analysis. The metagenomic approach described here opens the way for a series of applications in the identification, epidemiological surveillance, ecology, and study of evolutionary trajectories of resistance genes.
RESULTS
Targeted metagenomics, a tool for high-resolution analysis of resistome
ResCap was designed to establish a standarized framework that would allow performing both quantitative and qualitative analysis of resistomes. Also, to facilitate the analysis of novel genes potentially involved in the resistance to antibiotics, metals, biocide or any combination of them. As a proof of concept, we compared the performance of ResCap with metagenomic shotgun sequencing (MSS) by analysing the resistome in 17 fecal samples, 9 from humans and 8 from swine.
ResCap exhibits a target capacity (total amount of targeted sequences) of 88.13 Mb, and includes probes for 78,600 non-redundant genes (81,117 redundant genes), including 7,963 functionally validated antibiotic resistance genes, 704 functionally validated metal & biocide resistance genes, and 2,517 relaxase genes (genes used for plasmid identification and classification)24. Besides the 8,667 genes that confer functionally proved resistance to antimicrobials (canonical genes), the platform also includes targets for 78,600 homologous of resistance genes (47,806 for antibiotics and 30,794 for biocide and metals resistance). The criteria used to select the targeted genes are explained in the section Material and Methods.
ResCap performance was compared with MSS in two ways. First, by applying a reference-based approach that maps metagenome reads against specific databases (AbR, Metal & Biocides and Relaxases). Second, by applying a reference-free approach that assembles metagenomic reads and performs a functional annotation. The results of both evaluations are detailed below.
Reference-based evaluation
This section addresses how the abundance and diversity of resistance genes (ResCap or those already validated) were calculated.
ResCap achieves better recovery of target genes than MSS
An average of 1.9×107 paired-reads was obtained from the MSS and ResCap datasets (0.92-3.2×107). The on-target average (the number of reads mapping on the target genes relative to the total read number) against the selected databases (see Material and Methods) was 0.11% (0.07-0.18) for MSS data and 30.26% (20.27–41.83%) for ResCap data, which represents an enrichment of 279-fold (Table 1).
The analysis of the gene abundance, expressed in RPKMs (reads per kb per million reads, see Materials and Methods), demonstrates better recovery of genes coding for resistance to antibiotics, heavy metals, biocides and relaxases (plasmid genes) when using ResCap than when using MSS. Figure 1 represents the RPKMs inferred before (MSS) and after capture(ResCap) for all the samples analysed, while Figure S1 shows the gain plots for each sample. Most canonical genes (99.3%, 1339/1.348) detected by MSS were also detected with ResCap. Furthermore, almost half of the genes detected by ResCap (42%,975/2323), were not detected by MSS. The linearity of the system was evaluated by using a linear regression model for the genes detected in each paired-sample (MSS vs ResCap). An R2 mean of 0.813 (0.85-0.99) shows a good match between both protocols.
The enrichment of canonical resistance genes when using ResCap was similar in samples from humans and swine. Nonetheless, the differences in the relative abundance of genes encoding resistance to antimicrobials (antibiotics, heavy metals, biocides) and relaxases in different samples (Figure 2) is not surprising due to the variability of microbiotas of different hosts 25,26
ResCap addresses gene diversity
Allele redundancy of some resistance genes hinders the correct estimation of “gene diversity” and precludes a correct estimation of “gene abundance” in metagenomes when using most available metagenomic tools.
To overcome this issue, we define the term Mapping Gene Cluster (MGC) as the group of alleles/genes detected by the same set of reads (see Material and Methods). MGCs, firstly defined in this work, allow an estimation of gene diversity across samples, and are measured as the number of MGCs per million reads (MPM). The number of MPMs increased 1.3 fold in humans (0.7-1.74) and 2.1-fold (2.3-1.9) in swine when using ResCap instead of MSS (Figure 2).
An increase in reads per MGC does not imply a homogeneous distribution of the reads. Therefore, we also determined the “gene horizontal alignment coverage”, which was defined as the fraction of a gene that is covered by reads. The probability of identifying an allele-specific mutation will also increase with the number of reads per nucleotide or “gene depth coverage”. Figure 3 shows the improvement of “gene alignment horizontal coverage” using ResCap and MSS (average= 97.5%, range = 66%-99% vs. average= 73.4%, range = 35.9%-94.8%, respectively). Most genes were almost fully covered by reads and there was also a general increase in “gene depth coverage” (Figure S2). As a consequence, the number of genes unequivocally detected by ResCap almost doubled that of MSS (n=26, range 17.1-30.0 genes per sample per million of reads vs. n=14.9, range 12-17.6 genes per sample per million of reads). The number of reads unequivocally mapped increased up to 300 fold (2×105 for ResCap vs 8×102 for MSS) (Figure 4).
Figures S3 shows the abundance (RPKMs) and diversity (MPMs) obtained by ResCap and MSS for individual categories of resistance genes (antibiotics, biocides and metals), which also illustrates the improved sensitivity of ResCap vs. MSS. Figure S4 reflects that although both ResCap and MSS can track the most abundant gene families as those conferring resistance to beta-lactams, macrolides, aminoglycosides and tetracyclines, followed by those conferring resistance to phenicols and sulphonamides, many canonical resistance genes were only detected by the ResCap platform (e.g. mecA, blaZ in beta-lactams; ermA, ermC, ermD, erm33 or lnu among macrolides; fexA, catA and catB alleles among phenicols). Genes encoding resistance to fluoroquinolones, glycopeptides, or trimethoprim, first line antibiotics families used to treat community and hospital-based infections, were barely detected using MSS but unequivocally detected with ResCap (e.g. dfrA16, dfrA15, dfrG, dfrK among those conferring resistance to trimethoprim, oqxAB, qnrB, qnrS among those producing resistance to quinolones, or vanB,vanA for glycopeptides-resistance). ResCap also detected more genes conferring resistance to heavy metals (e.g. cadmium, copper, silver or mercury), and relaxases, which are markers of plasmid families that carry antibiotic resistance genes (MOBC, MOBF, MOBP1, MOBP2) (Figures S5-S7).
Comparative analysis of resistomes from different samples
A statistical analysis of “gene abundance”, analogous to that used for comparing the abundance of mRNA among samples in differential expression analysis27, was performed to quantify the improvement of ResCap over MSS in samples from different hosts. The need for such comparisons is based on the known differences in microbiotas of different hosts.
“Gene abundance” data without normalization were processed as “count data” and used as “input data” for differential analysis of the genes (detection of the genes only present in either human or swine samples) ( Figure 5 ). Using MSS, the resistome of the total sample analyzed comprises 88 MGCs differentially detected (60 MGCs from humans and 28 MGCs from swine) with a p-value lower than 0.001. Conversely, ResCap detected 262 MGCs (186 from humans and 76 from swine) (Figure 5, panel a). Out of these 262 MGCs, 185 were differentially detected by ResCap and not by MSS, 77 were differentially detected by both approaches and 11 MGCs were only differentially detected by MSS (Figure 5, panel C). This result means that ResCap detected roughly three times more the MGCs on each resistome than MSS. The 11 MGCs that were only detected by MSS belong to common (“present in both human and swine samples”) MGCs by ResCap, suggesting that these differentially detected MGCs might in fact represent false positives. Meanwhile, the number of common MGCs detected in human and swine sets was 437 with MSS and 569 with ResCap, of which 269 MGCs were disclosed by both approaches, 300 MGCs being specific for ResCap and 168 for MSS. The 168 MGCs detected as common between human and swine metagenomes with MSS but not with ResCap were identified to be differentially present by ResCap as false negatives. This can be explained because the count ofreads by MSS is lower than that of ResCap which makes the statistical analysis confidence values by ResCap better for a given MGC.
Reference-free evaluation
ResCap includes approximately 78,600 genes that are homologous to “known” resistance genes, with different degrees of sequence identity with defined resistance genes, which might be involved in antibiotic resistance.
Assembly statistics and coverage show that the information obtained with ResCap only covers a small portion of the metagenome, as intended by design (Figure 7). As expected, ResCap increases, with respect to MSS, the number of sequenced genes that are homologous (evolutionary close) to the canonical genes included in Arg-ANNOT, BACMet and ConjDB databases. To perform a comparative analysis, the genes were catalogued as “ResCap”, “UniProt” or “Novel”. The “ResCap” gene set includes genes within the ResCap database of canonical genes. The “UniProt” gene set comprehends those that are already described in UniProtKB database and result in a positive blast against ResCap database. The “Novel” gene set corresponds to those genes not included in UniProtKB but resulting in a positive blast against ResCap canonical database. Only Blast hits with e-values lower than 10-100 were considered as positive and included in the analysis.
The annotation of the genes shows that ResCap also improves the recovery of genes homology with genes coding for resistance, (UniProtKB 752±237 genes with ResCap vs 237±107 for humans and 441±71 genes vs 82 ±46 for swine with MSS; Novel genes, 79±38 genes with ResCap vs 20±7 107 for humans and 105±26 genes vs 9±4 for swine with MSS) as presented in Figure 8. Although the actual role of these genes in antibiotic resistance will require functional validation that is beyond the scope of the current study, its identification as bona fide resistance genes as well as the analysis of their abundance upon antibiotic challenge might have a deep impact in further studies on the evolution of antibiotic resistance. Figure S8 shows the better resolution of ResCap expressed by number of blast hits per gene per megabase.
DISCUSSION
This work reports the development of a novel resistance gene capture platform ResCap and on its comparative evaluation with MSS in resistance gene identification in a collection of human and swine faecal samples. Our study shows that ResCap is ideally suited for high-resolution analysis of resistomes and also offers the possibility to detect genes homologous to “known” resistance genes to further analyse the evolution of antibiotic resistance.
ResCap also provides several technical advantages to study resistomes in comparison with current metagenomic methods. First, the enrichment of ResCap resides in its targeted metagenomics approach, which significantly increases the recovery of sequences of resistance genes. Thus, ResCap reduces the sequencing depth needed to comprehensively detect the targeted genes and, consequently, contributes to lower sequencing costs. More importantly, it can significantly lower the limit of detection of resistance genes in complex microbiomes. Our results indicate that the resistome represents barely 0.2% of the gut metagenome. As a consequence, MSS needs at least 3.75×109 reads per sample to reach a similar coverage to that obtained by using ResCap (average of 1.9 × 107 paired reads that represents a relative enrichment of 279x). Second, the tiling of capture probes greatly facilitates the higher level of “gene horizontal alignment” coverage of ResCap relative to MSS (Figure 2, Figure S2), thus increasing specificity (Figure S9). Third, ResCap ability to detect previously unrecognized DNA fragments with homology to canonical resistance genes will facilitate the discovery of novel genes potentially involved in antibiotic resistance. In case they are antibiotic-selectable, the novel genes will be enriched in the presence of antibiotics. In addition, ResCap will be of interest in Public Health, because it allows a more accurate “ranking risk analysis”18 of the genes within the resistomes of different microbiotas. Finally, the substantial capacity of the platform (200Mb) makes ResCap extensible up to two fold of its current capacity, thus making possible its updating with new sequences published or added to resistance gene databases. ResCap updates will be publicly available through the GitHub repository and the Nimblegene webpage. Nonetheless, the threshold of detection of ResCap remains unknown due to the lack of a negative control that demonstrate the ability of ResCap to pick antibiotic resistance genes from quantified minority populations (e.g. mock genomic populations). Although appropriate, the complexity and variability of the metagenomic samples makes difficult to use a good negative control to this kind of studies.
The definition of parameters that accurately express antibiotic resistance “gene abundance” and antibiotic resistance “gene diversity” constitute a requirement to comparatively analyse the resistomes of different samples. Relative abundance parameters are widely used in computational analysis of MSS datasets, but require specialized statistics, as thesecompositional parameters are influenced by the variability in metagenomes of different samples. The application of the novel concept of MGCs (Mapping Gene Clusters, groups of alleles detected by the same set of reads) provides a set of normalized variables that can be measured in abundance and diversity among samples. Furthermore, the MGC-based system permits to evaluate the diversity within and between different functional groups (in our case, families of antibiotics, groups of genes conferring resistance to heavy metals or biocides and plasmid relaxases). To date, only a very few quantitative metagenomic approaches to analyze resistomes are available but they do not achieve this level of accuracy14,16.
Because of its sensitivity, specificity, and the possibility to accurately compare results between samples, ResCap complies with the needs of public health epidemiology of antibiotic resistance that include: i) the detection of emerging antibiotic resistance risks in different microbial environments28; (http://www.efsa.europa.eu/en/press/news/140325) ii) the need for implementation of accurate risk assessment studies based on resistome analysis in healthy humans, hospitalized patients, animal husbandry, food industry, and the environment; iii) the quality control of sewage and water bodies decontamination of antibiotic resistant genes iv) the update and refining of the list of resistance genes to be considered in monitoring the adverse effects of drugs in microbiomes, including pharmacomicrobiomic applications in clinical trials; v) the close monitoring of the efficacy of microbiome reconstitution/re-biosis, whether through targeted probiotic-live culture administration or fecal microbiota transplantation, to alleviate the adverse impact of antibiotic administration, and vi) to analyse the effect of eco-evo drugs and strategies to combat antibiotic resistance29.
In summary, ResCap provides an opportunity to meet the challenge of analyzing samples with complex and heterogeneous mix of genes in low and high concentration DNA samples. Thus, ResCap-like approaches might also be used to other complex microbial systems and their minority bacterial populations (e.g. virulence determinants, key-ecological traits involved in biosynthesis or biodegradation, or relevant genes of biotechnological interest).
METHODS
ResCap design
The ResCap capture library was based on a homemade core reference database (it will be available as per request) that comprises both well–known and hypothetical genes encoding resistance to antimicrobials (antibiotics, heavy metals, biocides) and genes coding for plasmid family markers (relaxases). The core reference database was built by downloading sequences associated with non-redundant antimicrobial genes available in curated databases Arg-ANNOT 30, CARD31, RED-DB, http://www.fibim.unisi.it/REDDB/Default.asp ResFinder32 and Bacmet 33.
The putative antibiotic resistance genes dataset was constructed as follows. All antibiotic resistance databases were combined in a non-redundant set. Proteins were clustered in protein families by homology, using CD-HIT with parameters of 80% identity and 80% coverage. First, each protein family was aligned by MUSCLE v. 3.7 34 with default parameters. Then, a Hidden Markov Model (HMM) was built for each family with hmmbuild function of the HMMER335 using default parameters. Hmmer search function (hmmsearch) was used against UniProtDB for each HMM profile to search homologous proteins for each family of proteins that confer antibiotic resistance. Manual curation of datasets was performed to remove false positives. Final protein data set was translated to DNA sequence using ENA accession numbers associated with each UniProtDB entry.
As a result, the final ResCap targeted sequence panel consists of 78,600 non-redundant genes (81,117 redundant genes) that would search a target space of 88.13Mb, not reaching yet the 200Mb target capacity offered by the custom SeqCap EZ library format (NimbleGen). Probes targeting the antibiotic resistome include 47,806 putative antibiotic resistance genes and 7,963 functionally characterized, canonical, antibiotic resistance genes. Probes targeting the metal and biocide resistome include 30,794 putative resistant genes and 704 canonical resistance genes. The platform also includes probes for 2,517 relaxases of the Conj database.
The consolidated list of target sequences was submitted to Roche NimbleGen for capture library design and synthesis and further implemented under the custom NimbleGen SeqCap EZ Developer Library format. Redistribution of probes for better capture uniformity, redundancy, and comprehensive target base coverage relied on NimbleGen, and was based on patented algorithms. ResCap design covers the 98.3% of the 88.13Mb and 99.6% of the genes have more than 50% of their sequence covered. (Figure S9).
The ResCap workflow
The Rescap workflow consists of: i) whole-metagenome shotgun library construction, ii) hybridization, and iii) capture. All steps were performed according to NimbleGen standard protocols for Illumina platforms. To evaluate ResCvariable that would make it possibleap efficiency, samples were sequenced before and after capture.
Whole-metagenome shotgun library construction. Total nucleic acid was extracted following the standardized Metahit protocol36 (http://www.metahit.eu/ and using the FastPrep instrument (MP Biomedicals, USA). Libraries were prepared following the instructions of “Kapa Library Preparation Kit for Illumina platforms” (Kapabiosystems, KR0935-v1.13). Briefly, 1.0 µg input DNA (measure by Picogreen) was fragmented to 500-600 bp insert size by sonication with Bioruptor (FastPrep®-24). After End repair, A-tailing and Adapter ligation, we follow Dual-SPRI size selection adding 0.5 vol in first cut and 0.2 vol in the second cut to get 650-750pb libraries. Library amplification was carried out using LM-PCR of 7 cycles, as indicated in the SeqCap EZ Library SR User´s Guide v4.2. At this level, samples were labelled with specific barcodes for further sample identification. A first aliquot of the resulting amplified libraries were quality checked in a Bioanalyzer 2100 (Agilent) and pooled in equimolecular amounts for sequencing on Illumina HiSeq 2000 instrument, generating 100-150-bp paired-end reads (“pre-capture” samples).
Hybridization and capture. The second part of each DNA library was subjected to targeted sequence capture with the custom ResCap probes prior to sequencing (“post-capture” samples). Both experiments were made in separate sequencing runs. Targeted sequence capture was carried out according to the manufacturer’s specifications. The captured DNA was checked for quality and integrity in a Bioanalyzer and titrated by quantitative PCR using the “Kapa-SYBR FAST qPCR kit forLightCycler480” and a reference standard for quantification. The captured libraries were denatured prior to be loaded on a flow-cell at a density of 2,2pM, where clusters were formed and sequenced using a HiSeq 2000 in a 2x100 pair-end mode for swine samples and NextSeq 500 in a 2x150 pair-end mode for human samples. Raw sequences were processed using FastX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/).
Bioinformatic analysis
Reference-based workflow
Analysis of sequence data from metagenomes constitutes a challenge because of the inherent variability of the samples analysed, and the limitations of current bioinformatics’ methods to unequivocally identify specific alleles from short length reads (100 −150 bp). To overcome such limitations, we developed a novel approach to define variables suitable for inferring “gene abundance” and “gene diversity” and, in our case, to perform quantitative analysis of antimicrobial resistance genes. Moreover, we suggest a workflow of variable normalization in relation to the information content of the targeted variable that would make it possible to compare different samples of different hosts. These tools were developed for ResCap but could be implemented for any other metagenomic sequence dataset. Shotgun metagenomic sequencing allowed assembling the sequences into contigs to infer the functionality of the sequenced metagenome. Figure 9 shows the workflow that illustrates and defines the variables used.
Raw Data Processing
Reads were mapped against our database comprising ARG-ANNOT, BacMET and ConjDB databases independently, using Bowtie2 software37. Bowtie was set up to retrieve all end to end possible alignments and suppress both discordant alignments and mixed alignments. The output SAM file was parsed to get the fields of Query template NAME, Reference sequence NAME,1-based leftmost mapping Position, MAPping Quality, Position of the mate/next read. Reads with unavailable information (field Query Template NAME equal to ‘*’) were suppressed. Subsequently, a homemade perl script (available per request) was used to count matched reads per gene. Using the SAM parsed file and the length of the reference genes, the perl script generated a table with the following fields i) the number of reads per gene mapped (RPG, “gene depth coverage”), ii) the number of reads per kb of gene (RPK), iii) the number of the reads that were mapped unequivocally to a given gene and iv) the percentage of coverage of the gene sequence (“gene horizontal alignment coverage”) of each mapped gene. Table fields Unique, RPG and RPK were normalized by the total amount of reads in each sample, transforming such fields in “read per gene per million reads” and “reads per kb per million reads” (RPKM), respectively, the last one being a common unit of “gene abundance”38. Several ways to normalize abundance data have been applied to different studies (e.g., expression data in RNA-Seq experiment). The aim of our approach was to estimate the proportion of antimicrobial resistance genes among samples that putatively contained the same amount of DNA, so the normalization using the total amount of DNA (i.e reads) among samples fits better with the initial approach.
The redundancy of mapped reads may be represented as a network where the nodes are the genes (usually alleles of the same gene) and the edges are the reads that map in the different nodes. Because one read can map in different alleles/genes, all the genes mapped by these reads are linked among them. The resulting network that comprises all the nodes and edges in a set of samples is named “allele Network” (Figure S10). In our context, the allele network must be unique for all samples of a given assay, so an allele network was built joining all the SAM parsed files of the study.
Each cluster of the Allele Network represents the set of genes that are detected by a set of reads. They are defined as a Mapping Gene Clusters (MGCs) and each one may include hundreds of genes or just one gene. A given MGC will be detected when at least one read maps against any of the genes within that MGC (diversity). To quantify the MGCs in each sample, the highest value shown by an allele (node) within a given MGC is taken as the occurrence of such MGC (abundance). The MGCs system builds a set of normalized variables that can be measured in abundance and diversity among samples and thus, allows comparing datasets of different sources, while maximizing the accuracy of the observable information.
A homemade perl script was used to build the allele network from the SAM parsed files, taking the mapped genes as nodes and searching the ambiguously mapped reads to create the edges. Perl script calculates the edges-weight as the number of reads that map the linked nodes at the same time. Allele Network was loaded in R environment39 using the igraph package40. MGCs were defined using mcl, from MCL R package41, with default parameters except allow loops and cluster with only one member on the allele network.
Data Analysis
The resistome of a given experiment was analysed in terms of gene abundance and diversity according to the methodology described above. The abundance and the diversity of genes in a particular resistome are the (dependent) variables that define this resistome and are measured as the number of RPKM per MGC and the number of MGCs, respectively.
The number of MGCs was normalized by the total number of sequencing reads per each sample expressed in millions of reads (MPM), this value being considered as a unit of diversity. MGCs of the antibiotic resistance gene database were divided according to antibiotic families 30. MGCs of the relaxase database were organized in known different relaxase families 42. The MGCs of biocide and heavy metal resistance gene database were classified according the susceptibility to specific compounds 33. Genes that belong to more than one functional category (e.g. some conferring resistance to different metals) contribute equally for any of them. Figure S9 shows the 839 MGCs determined in our sample (237 for AbR, 283 for Biocide and Metals and 319 for relaxases). Descriptive statistic was performed using dplyr43, tidyr44 and ggplot245 packages of R39.
Differential analysis was performed using DESeq2 package46. Although DESeq2 was originally designed for differential expression analysis, it also works well with abundance data. Tables containing the original abundance data obtained by ResCap and MSS datasets were used separately as input for DESeq2 package to determine the MGCs differentially detected between swine and human hosts. Normalization and statistical analysis were performed with the default parameters of DESeq2. MGCs with p-value lower than 0.001 were classified as differentially detected, rest of the MGCs (p-value above 0.001) were classified as commonly detected.
Reference-free workflow
Assemblies were performed by MegaHit software with default parameters47. Prodigal48 was used for gene recognition and translation with the specific parameters for metagenomic sequences. Quality assemblies’ quantification was performed by Quast software49. Predicted genes were first annotated against the ResCap database by Best Blast Hit approach using blastn software50. In order to identify only genes belonging to ResCap database or their homologs, and minimize the false positive ratio, Blast hits were filtered by e-value of 10 and 80% of coverage. Genes with identities higher than 95% and coverage higher than 80% were considered as belonging to ResCap. The remaining genes were translated to proteins. These proteins were classified as non-ResCap and were compared against UniProt by blastp. Again, hits with higher identity than 95%, coverage higher than 80% and e-value lower than 10−100 were considered as UniProt known proteins. The set of proteins that did not accomplish this threshold were considered as novel proteins.
Samples analyzed
ResCap was validated by analysing fecal samples from 9 human and 8 swine individuals, all collected as part of FP7 European Research Consortium EvoTAR (www.evotar.eu). Swine samples were collected in Spanish farms linked to large companies which supply broilers and swine processed meat in the EU. Antibiotics as growth promoters or with preventive purposes are not used in these farms. Human samples were collected in the Hôpital Bichat, Paris, France, under the protocol approved by its local ethics committee. DNA preparation was accomplished for animal and human samples using standardized protocols (MetaHIT Protocol). Robustness of the platform was tested by comparative analysis of two technical replicates of two swine samples.
COMPETING FINANTIAL INTEREST
The authors declare no competing financial interests
ACKNOWLEDGEMENTS
This work was supported by the European Commission, Seven Framework Program (EVOTARFP7-HEALTH-282004 for VFL, FB, JLM, AA, DE, ER, RJLW, WvS, FdlC and TMC), the Join Programming Initiative in Water (JPI Water StARE JPIW2013-089-C02-01 to JLM), the Ministry of Economy and Competitiveness of Spain (BIO2014-54507-R to JLM, and PLASWIRES-612146/FP7-ICT-2013-10 and BFU2014-55534-C2-1-P for FdlC). Authors also acknowledge the European Development Regional Fund “A way to achieve Europe” (ERDF) for cofounding the Plan Nacional de I+D+ I 2012-2015 (BIO2014-54507-R to JLM, PI12-01581 to TMC and BFU2014-55534-C2-1-P for FdlC) and CIBER (CIBER in Epidemiology and Public Health, CIBERESP; CB06/02/0053 to FB) and the Spanish Network for Research on Infectious Diseases (REIPI RD12/0015 to JLM), and the Regional Goverment of Madrid (PROMPT-S2010/BMD2414). Val F. Lanza was further funded by a Research Award Grant 2016 of the European Society for Clinical Microbiology and Infectious Diseases (ESCMID). Conflict of Interest: none declared