Abstract
Monitoring wastewater samples at building-level resolution screens large populations for SARS-CoV-2, prioritizing testing and isolation efforts. Here we perform untargeted metatranscriptomics on virally-enriched wastewater samples from 10 locations on the UC San Diego campus, demonstrating that resulting bacterial taxonomic and functional profiles discriminate SARS-CoV-2 status even without direct detection of viral transcripts. Our proof-of-principle reveals emergent threats through changes in the human microbiome, suggesting new approaches for untargeted wastewater-based epidemiology.
Body
Our past work deploying a highly spatially resolved, high-throughput wastewater monitoring system on a college campus (1) enabled collection and qPCR characterization of thousands of wastewater samples, identifying 85% of SARS-CoV-2 clinical cases (2), and also enabling genomic surveillance for emerging variants of concern by complete genome sequencing from extracted RNA (3). Wastewater-based epidemiology (WBE) provides additional advantages in that it is (i) non-invasive, (ii) cost-effective relative to individual clinical testing, (iii) does not require individuals to consent to clinical testing that is often reported to public health agencies, and (iv) can therefore benefit under-served populations (4–6). However, this WBE scheme is currently limited to pathogen detection and characterization through targeted qPCR and sequencing, and cannot detect agents of disease for which a screening test has not been developed.
Here we describe an untargeted community/population level disease monitoring strategy using metatranscriptomics, which leverages correlations in observable changes in wastewater microbiomes with human microbiome disruptions associated with disease state. SARS-CoV-2, like many pathogens, has been reported to cause systematic disruptions in the human gut microbiome (7–9), which is the principal human microbial input to wastewater (10). We employed this strategy to test whether information in the wastewater metatranscriptome could discriminate SARS-CoV-2 positive from negative wastewater samples (assessed by qPCR) as a proof-of-principle.
We present a high-throughput wastewater metatranscriptomics pipeline that lowers the accessibility to an otherwise cost-prohibitive sequencing method at scale through miniaturization, parallelization, and automation (11–12). (Sup. Fig. S1) Using this pipeline, we generated metatranscriptomics sequencing data for 313 virally-enriched (VE) wastewater samples collected from manholes servicing different residential buildings across a college campus, including isolation housing buildings (Manhole IDs: C6M095-C6M098), from Nov 23 2020 to January 7 2021. Sequencing reads were demultiplexed, trimmed, and quality filtered before being deposited in Qiita (13), where ribosomal reads were removed using SortMeRNA (14) using default processing recommendations; non-ribosomal reads were aligned to genomes or genes using Woltka (15) resulting in two different feature tables: taxonomic and functional (details in Materials and Methods).
Samples obtained from each manhole have a distinct microbiome signature, likely a composite of the individual microbiomes of the people contributing to each wastewater stream. Beta-diversity analyses of both metatranscriptomic feature tables (taxonomic and functional) measured by Aitchison distance and robust Aitchison principal component analysis (RPCA) (16) reveal that wastewater samples cluster primarily by manhole source (manhole_id) (Fig. 1A), with a stronger signal than SARS-CoV-2 detection status (Fig. 1B)(Sup. Table ST1). Wastewater samples separate according to SARS-CoV-2 status based on these bacterial profiles alone, but this signal is obscured in the RPCA ordination by the stronger manhole_id clustering effect. Taxonomic features provide better separation by both SARS-CoV-2 status and manhole_id than functional features (Supp. Table ST1), suggesting that microbial community membership rather than current functional gene expression is more strongly affected by infection.
To test whether the SARS-CoV-2 detection status-dependent microbiome signal can be identified even against the stronger manhole_id clustering effect, we selected a subset of samples for paired comparisons between SARS-CoV-2 positive and negative samples within specific manholes across one week (selection process detailed in Materials and Methods). This subset (squares, n=28 Fig. 1A-B) was analyzed by dimensionality reduction with compositional tensor factorization (CTF) (17), which accounts for the intra-manhole sample correlation. The resulting ordination shows that samples of the microbiome in any specific manhole undergo a pronounced shift along one of the main principal components (PC1 for taxonomic, PC2 for functional), when the subject population it services becomes infected with SARS-CoV-2 (Fig. 1C-D). Consequently, taxonomic features (genomes) that drive segregation along PC2 (Fig. 1E), or functional features (genes) along PC1 (Fig. 1F), can be positively or negatively correlated with SARS-CoV-2 detection. Log-ratio analysis of the top and bottom ranked taxonomic features as numerator and denominator respectively show a significant difference in the means of the SARS-CoV-2 detection sample groupings (Fig. 1G). Similarly, a log-ratio of six functional features positively and negatively ranked along PC2 also shows a significant difference in the means of the SARS-CoV-2 detection sample groupings (Fig. 1H) (see Materials and Methods).
The predictive power for wastewater SARS-CoV-2 status discrimination of the features selected through CTF analysis was validated via log-ratios and random forest machine learning (RFML) classification, using the remaining samples in this study (circles, Fig. 1A-B) plus an additional validation set (total n=285, positive=179, negative=106, Sup. Table ST2). Log-ratios of selected taxonomic and functional features showed a significant difference by SARS-CoV-2 detection status across the validation sampleset, with function (t-test, T=-3.9 p=0.0001) (Fig. 2A) showing a smaller effect than taxonomy (t-test, T=-8.8, p=1.3e-16) (Fig. 2B). Type II ANOVA of both log-ratios shows that differences in sample means are larger across SARS-CoV-2 status groups than manhole_id or sample_plate confounders (Sup. Fig. S2). The performances of the RFML classification models were evaluated through average area under the curve of precision-recall (AUC-PR) tests of stratified 5-fold cross validation classification tasks distinguishing samples’ SARS-CoV-2 status, manhole_id, and sample_plate. Lower dimensional feature tables from feature selection show comparable SARS-CoV-2 status classification performance as full feature tables for both data modalities (taxonomic and functional) (Fig. 2C), but reduced classification performance when distinguishing confounding manhole_id (Fig. 2D) or sample_plate (Sup. Fig. S3).
Our results demonstrate that wastewater metatranscriptomes can reveal traces of rare pathogens through alterations of the microbiome of the afflicted individuals, which are eventually reflected in the wastewater microbiome. When effects are confounded by site/population, leveraging generalizable log-ratios separating positive/negative groupings across sites reduces overfitting. This proof-of-principle justifies further research on high-throughput wastewater metatranscriptome biomarker discovery for WBE; the untargeted nature of this data modality makes it flexible enough to monitor multiple diseases at the population scale (through traditional direct detection of known sequences from pathogens, but also by leveraging microbiome perturbations as a proxy), and is superior to metagenomic monitoring because it encompasses all living organisms and viruses(18). One of the limitations of the proposed strategy is the narrow stability of the samples’ RNA molecules. However, our methods don’t claim to comprehensively characterize the wastewater metatranscriptome and instead focus on the fact that changes in the observable bacterial metatranscriptome are sufficient to discriminate the wastewater’s viral status, with SARS-CoV-2 detection status serving as a relevant case study. Although key features of the bacterial metatranscriptome discriminate SARS-CoV-2 detection, further work is needed to determine how broadly this phenomenon generalizes to other pathogens. Lastly, our methodology allows automated high-throughput metatranscriptomics processing, applicable to many biospecimen types, and could have considerable impact beyond WBE.
Acknowledgments
This work is supported in part by the IBM Research AI through the AI Horizons Network, IBM Artificial Intelligence for Healthy Living (A1770534), UC San Diego’s Return to Learn Program, NIH Director’s Pioneer Award (DP1AT010885), NSF RAPID Award (# 2038509), and Emerald Foundation Distinguished Investigator Award.
Conflict of Interest
A.D.S. is currently Chief Technology Officer of InterOme, Inc. a digital health company which offers wastewater testing and monitoring of pathogens including SARS-CoV-2 among its services