Metatranscriptomics reveals a shift in microbial community composition and function during summer months in a coastal marine environment

Temperate coastal marine waters are often thermally stratified from spring through fall but can be dynamic and disrupted by tidal currents and wind-driven upwelling. These mixing events introduce deeper, cooler water with a higher partial pressure of CO2 (pCO2), and its associated microbial communities to the surface. Anecdotally, these events impact shellfish hatcheries and farms, warranting improved understanding of changes in composition and activity of marine microbial communities in relation to environmental processes. To characterize both compositional and functional changes associated with abiotic factors, here we generate a reference metatranscriptome from the Strait of Georgia over representative seasons and analyze metatranscriptomic profiles of the microorganisms present within intake water containing different pCO2 levels at a shellfish hatchery in British Columbia from June through October. Abiotic factors studied include pH, temperature, alkalinity, aragonite, calcite and pCO2. Community composition changes were observed to occur at broad taxonomic levels, and most notably to vary with temperature and pCO2. Functional gene expression profiles indicated a strong difference between early (June-July) and late summer (August-October) associated with viral activity. The taxonomic data suggests this could be due to the termination of cyanobacteria and phytoplankton blooms by viral lysis in the late season. Functional analysis indicated fewer differentially expressed transcripts associated with abiotic variables (e.g., pCO2) than with the temporal effect. Microbial composition and activity in these waters varies with both short-term effects observed alongside abiotic variation as well as long-term effects observed across seasons. The analysis of both taxonomy and functional gene expression simultaneously in the same samples by environmental RNA (eRNA metatranscriptomics) provided a more comprehensive view for monitoring water bodies than either would in isolation.


147
(i.e., the 'merged assembly approach'). These assemblies were compared based on the total number of 148 contigs, total length, multi-mapping proportions, and mapping percentages to select the best assembly.

149
Reads for each sample were aligned against the reference metatranscriptome using Bowtie2 150 (Langmead and Salzberg 2012) in end-to-end mode allowing for multi-mappings. A maximum of 40 151 alignments were retained for each read. Alignments were then filtered to remove low quality mappings (i.e., 152 retain mapq ≥ 2). Retained alignments were quantified using eXpress (Roberts et al. 2013). Effective counts 153 from eXpress were output into a table in R, and imported into edgeR (Robinson and Oshlack 2010).

154
Filtering was conducted to only retain contigs against which at least five reads mapped in the sample with were grouped into binary groups for pCO2 (low/normal versus high) and season for differential expression 180 analysis. High pCO2 was considered when the value was greater than 700 ppm, and low/normal was 181 considered less than 700 ppm. Early summer was considered as June through July, and late summer was 182 considered for August through October (see Table 1). Expression levels for each transcript were analyzed 183 in a generalized linear model (i.e., glmFit and glmLRT) in edgeR to analyze the effect of pCO2 and the 184 effect of early vs. late summer, and their interaction. Genes with pairwise p ≤ 0.05 after Benjamini-

185
Hochberg multiple test correction were considered differentially expressed.

186
To annotate transcripts with functions, expressed transcripts were assigned UniProt descriptions 187 and identifiers by using BLASTx (Altschul 1997) to align contigs against the Swiss-Prot database (UniProt 188 2017) using the pipeline go_enrichment (Eric Normandeau, see Data Accessibility), with flags --189 max_target_seqs 1 in outfmt 6 format, and only retaining hits with E < 10 -5 . The UniProt identifier was 190 used as an input for Gene Ontology (GO) enrichment analysis in DAVID bioinformatics (Huang, Sherman, and Lempicki 2009), using differentially expressed lists compared against all expressed genes in the collected being seasonally balanced between the early summer (i.e., June-July, n = 11) and late summer 198 (i.e., August-October, n = 9). Environmental variables were measured from the intake water, including pH, 199 temperature, salinity, alkalinity, aragonite, calcite and pCO2, as shown in Table 1 (for complete data see   200 Supplemental File S1).

221
The reference metatranscriptome was assembled from a total of 191,303,324 mRNA reads  Figure S2). Therefore, this collectively-assembled assembly was chosen to be used for all downstream 234 functional analyses (i.e., the 'final reference metatranscriptome assembly').

818
Dimension 1 explains the most variation, separating the late and early season samples. Samples are labeled 819 by sample number and colors indicate early vs. late season. Samples S13, S15, S16, and S19 are from 2014, 820 and the rest are from 2015. Full details on samples can be viewed in Supplemental File S1.

829
The individual assembly of eight libraries that were subsequently merged together (shown to the right of

853
Viral transcripts identified in BP in the season differential analysis are also included.