Genomic evidence for a chemical link between redox conditions and microbial community composition

Environmental influences on community structure are often assessed through multivariate analyses in order to relate microbial abundances to separately measured physicochemical variables. However, genes and proteins are themselves chemical entities; in combination with genome databases, differences in microbial abundances directly encode for chemical variability. We predicted that the carbon oxidation state of inferred community proteomes, obtained by combining taxonomic abundances from published 16S rRNA gene sequencing datasets with predicted microbial proteomes from the NCBI Reference Sequence (RefSeq) database, would reflect environmental oxidation-reduction conditions in various natural and engineered settings including shale gas wells. Our analysis confirms the geobiochemical predictions for environmental redox gradients within and between hydrothermal systems and stratified lakes and marine environments. Where they are present, a common set of taxonomic groups (Gamma- and Deltaproteobacteria and Clostridia) act as drivers of the community-level differences in oxidation state, whereas Flavobacteria most often oppose the overall changes. The geobiochemical signal is largest for the steep redox gradients associated with hydrothermal systems and between surface water and produced fluids from shale gas wells, demonstrating the ability to determine the magnitude of redox effects on microbial communities from 16S sequencing alone.


Figure 3. Lower carbon oxidation state is tied to oxygen depletion in water columns.
All sites except for ETNP and station C4 outside the Blue Hole are permanently stratified. Z C values were calculated for inferred community proteomes using published microbial 16S rRNA gene sequences (see Table 1). Oxygen concentrations were taken from the same publications, except for locations inside and outside the Blue Hole (Xie et al., 2019). For the Blue Hole, ratios of nitrate to nitrite (NO3 -/ NO2 -) are also plotted based on NO3and NO2concentrations reported by Xie et al. (2019). No Z C value is shown for 1 m depth in Ursu Lake in April 2016 because fewer than 200 sequences remained for this sample after all sequence processing steps.   (Figure 3). At the 185 ETNP, the Z C decreases strongly with depth in the free-living communities (0.2-1.6 mm size 186 fraction), but to a lesser extent in particle-associated communities (1.6-30 mm size fraction).

187
This might reflect environmental microniches and cell-cell interactions that to some extent 188 reduce the sensitivity of these communities to external redox conditions.

189
Although Z C trends are dominated by oxygen gradients, a closer look suggests that other of Z C has a "C" shape, but maintains relatively high values at greater depths. In this case, the 197 overall higher Z C with depth appears to be more closely associated with the ratio of nitrate to lines.

216
At the phylum level, the high-temperature samples in the Manus Basin are associated with 217 greater numbers of Aquificae and Campylobacterota (formerly Epsilonproteobacteria) and 218 fewer Proteobacteria. These groups have relatively low and high Z C , respectively, which to 219 a large extent explains the chemical difference at the whole-community level (leftmost plot).

220
However, the campylobacterotal sequences themselves are affiliated with organisms whose 221 proteomes have lower Z C in the higher-temperature samples. Therefore, the whole-community 222 chemical differences are due to both differential taxonomic abundances at the phylum level, 223 which have the largest effect, as well as differential abundances within phyla. A similar finding 224 applies to the major classes; at this level it is apparent that the proteobacterial contribution is 225 mainly due to lower numbers of Gammaproteobacteria. The differential abundances of the 226 identified genera yield a small Z C difference between hotter and cooler fluids in the same 227 direction as the whole-community trend.

228
Analogous reasoning can be used to interpret the trends in the Baltic Sea. The relatively high 229 n H2O in low-salinity samples is mostly controlled by an increase in Actinobacteria. In contrast,

230
Proteobacteria become less abundant at lower salinity, which to some extent counteracts the 231 n H2O rise, but the within-group variation of Alphaproteobacteria, Betaproteobacteria, and 232 Gammaproteobacteria is toward higher n H2O at lower salinity. The genus-level assignments 233 suggest an opposite trend (higher n H2O at higher salinity), but this is less likely to represent 234 the actual differences because the low classification rate to the genus level (37%; see Table S1) 235 together with the 1% abundance cutoff for genera in Figure 4 results in a low fraction of 236 assignments represented at this level (23%).  (Cluff et al., 2014), Denver-Julesburg Basin (Hull et al., 2018), and Duvernay Formation (Zhong et al., 2019)

281
The main finding of this study is that the oxidation state of inferred community proteomes 282 decreases in more reducing conditions at global and local scales. Closer examination of selected 283 datasets (Figure 4) indicates that the whole-community chemical differences are mainly 284 associated with changes in abundances of particular phyla and that within-group variation of 285 particular classes is often in the same direction, indicating that physicochemical conditions 286 shape microbial communities at multiple taxonomic levels.

287
The results here are consistent with our earlier analysis of shotgun metagenomic data that  (Figure 2) and is lower than all but the most reduced 293 community proteomes inferred for anoxic waters (Figure 3). Therefore, Z C values calculated 294 using inferred community proteomes (this study) or from shotgun metagenomic sequences 295 (Dick et al., 2019; Lecoeuvre et al., 2021) appear to be commensurate.

296
In our previous analysis of shotgun metagenomic data, we found conflicting trends for Z C 297 in oxygen minimum zones (Dick et al., 2019). We speculated that preferential degradation 298 of low-GC content extracellular DNA by heterotrophs could leave behind resistant genes that 299 are more likely to undergo horizontal gene transfer; the putative enrichment of AT-rich genes 300 would be manifested by higher Z C of the inferred proteins (Dick et al., 2019). It has also 301 been proposed that horizontal gene transfer could hinder the use of shotgun sequences for 302 accurate taxonomic identification (Tessler et al., 2017), although we note that our previous 303 analysis of Z C of shotgun sequences was not based on taxonomic assignments. In contrast to 304 the results for shotgun metagenomes, the present study uncovered a strong signal of decreasing 305 Z C with depth in multiple 16S rRNA datasets for stratified water bodies.

306
The highly saline produced waters from many shales converge toward a common profile 307 dominated by the halophilic and anaerobic Halanaerobium (Mouser et al., 2016), but in 308 the Denver-Julesburg Basin Thermoanaerobacter, which has similar metabolic capabilities, is 309 present instead (Hull et al., 2018). The predicted RefSeq proteomes of these groups have

316
More work is needed to derive a chemical metric that captures the relationship between 317 community composition and salinity. Because of osmotic forces, higher salinity should have a 318 dehydrating effect, but this prediction is not supported by the increase of n H2O of inferred 319 community proteomes that is observed for produced fluids. Higher n H2O may be intrinsically 320 linked to lower Z C as a result of the background correlation between these metrics when n H2O 321 is calculated using the QEC basis species (gray lines in Figure 5A and Figure 5C; see also  Figure 5B. 325 At this point it is not possible to predict with this method which taxa are actually present in 326 a community. Nevertheless, analysis of the contribution of individual classes to the differences in 327 Z C within datasets reveals some provocative patterns for these taxa that are present (Figure 6). 328 In all datasets where they make a major contribution to the oxidation-reduction differences, 329 the Gammaproteobacteria, Deltaproteobacteria, and Clostridia contribute to lower the overall 330 Z C of the communities in more reducing environments, due to either changes in abundance, Flavobacteriia opposes the community-level DZ C for many datasets; a similar trend is also 335 apparent for Cytophagia, which is another group within the Bacteroidetes, in some other 336 datasets. These classes have relatively reduced proteomes (i.e. more negative Z C ), but they 337 tend to be more abundant in more oxidizing environments, which explains their opposing 338 contribution (Figure 6-Figure Supplement 1). Therefore, these classes can be described  (Fernández-Gómez et al., 2013), may underlie the unique behavior of these 342 groups remains unknown; whether these trends could be related to the very low n H2O of their 343 proteomes (Figure 1) is also an open question.

344
Our findings for shale gas systems suggest that in addition to salinity, the steep redox  (Ulrich et al., 2018; Mumford et al., 2020). Because of the different sample depths and O2 profiles of the Swiss lakes (Figure 3), the deepest and shallowest samples from both Lake Zug and Lake Lugano were selected.  (Markelova et al., 2017). 373 By leveraging the chemical information contained in genomic sequences, it is possible to 374 achieve a broader view of the coupling between inorganic and organic oxidation-reduction 375 reactions that is essential for all ecosystems (Burgin et al., 2011; Orcutt et al., 2011). 376 Analysis of inferred community proteomes indicates that redox gradients have the strongest sequences. Sequence processing statistics and additional details are given in Table S1. 397 After filtering, the remaining sequences for all samples in each dataset were pooled and   (Hördt et al., 2020). The RDP taxon Spartobacteria genera incertae sedis, which is 439 relatively abundant in the Baltic Sea (Herlemann et al., 2016) Values of n C , Z C and n H2O were computed for each taxonomic group in RefSeq whose amino counts. The percent contribution of the ith taxon (∆Z C %) to the difference in Z C between 478 sample subsets "o" and "r" (for oxidized and reduced) was calculated from where n i,o and n i,r are the abundances of the ith taxon in the oxidized and reduced subsets,

485
The authors declare that the research was conducted in the absence of any commercial or 486 financial relationships that could be construed as a potential conflict of interest.