Top-heavy trophic structure within benthic viral dark matter

Viruses exert considerable influence on microbial population dynamics and community structure, with cascading effects on ecosystem-scale biogeochemical cycling and functional trajectories. Creating broadly generalizable theory on viral trophic ecology requires further inquiry into historically unexplored microbial systems that currently lack empirically demonstrated patterns in viral infectivity, such as structurally complex benthic communities. This becomes increasingly relevant considering recently proposed revisions to the fundamental mechanisms that modulate the strength and direction viral trophic linkages. Here, we employed deep longitudinal multiomic sequencing to characterize the viral assemblage (including ssDNA, dsDNA, and dsRNA viruses) and profile lineage-specific host-virus interactions within benthic cyanobacterial mats sampled from Bonaire, Caribbean Netherlands, over a complete diel time-series, and reconstruct patterns in intra-mat trophic structure. We recovered 11,020 unique viral populations spanning at least 10 viral families across the orders Caudovirales, Petitvirales, and Mindivirales, with evidence for extensive genomic novelty from reference and environmental viral sequences. Analysis of coverage ratios of viral sequences and computationally predicted hosts spanning 15 phyla and 21 classes revealed virus:host abundance and activity ratios consistently exceeding 1:1, with overall power-law scaling indicating an increasingly top-heavy intra-mat trophic structure with significant top-down pressure. Diel activity of cyanophages showed clear temporal patterns that seem to follow host physiological condition. These data generate important hypotheses concerning taxon-dependent variation in the relative contribution of top-down vs. bottom-up forcing in driving mat community dynamics, and establish a useful database of viral sequences from this previously unexplored system toward the generation of generalizable trans-system theory on viral trophic ecology. SIGNIFICANCE STATEMENT Recent advances in viral ecological theory suggest a better understanding of system-specific viral ecology is needed from diverse environments to create generalizable theory that accurately predicts patterns of trophic interaction strengths across systems, especially in the Anthropocene. This study characterized viral-host trophic structure within coral reef benthic cyanobacterial mats - a globally proliferating cause and consequence of coral reef degradation - using paired multiomic sequencing. Recovered viral sequences displayed remarkable genomic novelty from other well-characterized viruses and spanned diverse viral taxa. Unexpectedly, lineage-resolved trophic linkages displayed a strongly active top-heavy trophic structure, suggesting extensive top-down forcing. These results highlight the context-dependence of viral trophic interaction strengths and suggest that viruses strongly influence reef cyanobacterial mat and reef ecosystem functional trajectories.

currently lack empirically demonstrated patterns in viral infectivity are especially critical toward 69 was performed to reconstruct mat metagenomes and metatranscriptomes (Table S4 & S5), which  116 were subsequently screened to isolate the cyanobacterial mat virome. 117

118
Overview of viral community structure 119 We used combined profile Hidden Markov Models (HMM) and BLASTp with non-120 reference based HMM searches to extract viral signatures from coassembled metagenomes and 121 metatranscriptomes, identifying a total of 13,026 manually curated viral sequences dereplicated 122 at 95% identity within mat samples (approximating viral populations; vOTUs; 11,020 non-123 redundant vOTUs dereplicated across all samples; vMAT database). Recovered viral sequences 124 included dsDNA, ssDNA, and dsRNA viruses, but primarily dsDNA viral sequences were 125 recovered (Fig 1). A genomic protein-sharing network was constructed among viral sequences 126 reconstructed in this study and viral sequences present in RefSeq 88 , and integrated with 127 taxonomic predictions from trained convolutional-neural networks against the Caudovirales 128 database to assign putative viral taxonomy, and explore viral diversity from reference viral 129 sequences. Identified vOTUs could be assigned to 10 unique viral families, primarily belonging 130 to the order Caudovirales ( Fig. 1). Unsurprisingly, few viral sequences recovered in the vMAT 131 database shared significant numbers of gene regions to viral sequences found in the RefSeq 132 database, highlighting the highly novel diversity of viral sequences recovered from this system 133 ( Fig. 1b). Viral sequences assigned to the Podoviridae from convolutional-neural network 134 analysis and those that shared edges (significant gene overlap) in the gene-sharing network with 135 RefSeq Podovirdae sequence nodes displayed remarkably low edge betweenness centrality, 136 suggesting that these sequences possess highly unique or divergent gene regions from other viral 137 families (Fig. 1a). 138 Viral abundance and absolute sequence counts in metagenomes and metatranscriptomes 139 was dominated by viral sequences that could not be placed within a resolved taxonomic lineage, 140 followed by members of the Myoviridae, Siphoviridae, and Podoviridae (Fig. 1c, d). Viral 141 sequences assigned to the Cystoviridae were also abundant in mat metatranscriptomes, 142 suggesting RNA viruses are important members of cyanobacterial mat communities. Overview 143 of viral sequences, including taxonomic assignments, are provided in Dataset S1. 144 145

Cyanobacterial mat virome harbors substantial diversity unique from other environments 146
An additional genomic-based protein-sharing network was constructed among the viruses 147 reconstructed from this study (vMAT), the viruses reconstructed from the Global Ocean Virome 148 2.0 (GOV2), and the viruses reconstructed from permafrost soil (vFROST) to contextualize the 149 diversity uncovered in this dataset with other relevant publicly available environmental viral 150 databases (Fig. 2a). Viruses recovered from the cyanobacterial mats sampled here displayed 151 remarkable diversity from the sequences in these previously characterized environmental viral 152 databases. Out of a total of 14,158 unique clusters (approximating genus-level relatedness), only 153 11 were shared among all datasets (Fig. 2b). As expected, viruses recovered from marine 154 cyanobacterial mats (vMAT) shared fewer proteins and formed fewer unique clusters with 155 permafrost viruses than with pelagic marine viruses (Fig. 2), likely owing to strong differences in 156 abiotic filters between permafrost and benthic marine environments. The broad coverage of edge 157 space across the diversity of the GOV2.0 dataset nodes suggests a broad commonality of gene 158 regions across viruses in this dataset, with restricted distribution among viral sequences isolated 159 from cyanobacterial mats, further highlighting the immense genetic diversity found in this 160 system. More viral sequences were classified as singletons (n = 6,401) than those that clustered (n = 4,177), and the majority of those that clustered formed clusters exclusively with other 162 cyanobacterial mat viral sequences (Fig. 2b). Taken together, these data reveal a high degree of 163 endemism in mat community viruses, distinct even from viruses in surrounding seawater. Our finding of such strong evidence for relatively systemic top-down control across host 268 taxa in cyanobacterial mats is surprising (Fig. 4, 5). Previous assessments of fast-growing, high-269 density microbial systems have suggested an overwhelming predominance of lysogenic 270 interactions over lytic interactions (9, 26, 60), with few exceptions (61-63), which support our 271 initial hypothesis of a weak top-down signal in similarly fast-growing high-density 272 cyanobacterial mat communities. Indeed, aquatic benthic systems in general have long been 273 thought to have generally low rates of bacterial mortality from infecting viruses, despite high 274 mortality in surrounding water column communities (64, 65). While we did not explicitly interrogate ratios of lysogeny vs. lysis in this system, the VMRs and VMARs documented herein 276 would be highly unlikely in a system dominated by lysogeny, as we would expect VMR and (to a 277 lesser extent) VMAR to be closer to 1:1 (60). Additionally, the shape of the relationships 278 between log-log viral host abundance/expression were both upward curved on traditional axes, 279 pointing to increasing viral control as host abundance and activity increase (Fig. 4, 5). Many 280 phyla-specific and class-specific prey abundance gradients span ~1 or less order of magnitude, 281 which unfortunately precluded us from meaningfully resolving phyla-specific host-viral power 282 law behavior, or from effectively interrogating the relationship between host abundance and 283 VM{A}R (i.e. [x/y]/y), which would be useful for further analyzing the relationship between 284 viral ecology and host abundance in this system. Taken together, these results highlight 1) the 285 dire need to incorporate patterns from diverse contemporaneously unexplored microbial systems 286 into theory and quantitative frameworks linking describing viral influence on microbial 287 communities, and 2) the need to understand how biotic and abiotic context can drive fine-scale 288 nuance in virus-microbe trophic interactions and contribute to patterns that depart from, and 289 overwhelmingly subvert, general predictions from theory (53). 290 Viral sequences recovered in this study were highly divergent from reference viruses, and 291 previously curated viruses from seawater and permafrost (Fig. 1, 2). Strong endemism of benthic 292 viruses has previously been demonstrated in deep sea sediment viral communities (63), and 293 suggest that marine benthic viromes may generally harbor strongly unique viral communities 294 from previously explored environments. Many of the viral sequences reconstructed herein likely 295 belong to novel taxonomic groups with no current representatives in viral databases, as a 296 significant proportion of vOTUs could not be assigned to a taxonomic family (Fig. 1). Large temporal patterns across host phyla suggests that the strongest temporal signal in VMAR is 318 within the phyla Cyanobacteria, with relative temporal consistency across other phyla (Fig S5).

Overview of sampling, sequencing, quality control, and metagenome assembly 345
Sampling strategy, library preparation, sequencing, data quality control, metagenome 346 assembly, metagenome binning, and MAG taxonomic placement are described in detail

Metatranscriptome quality control and assembly 373
Raw cDNA reads were quality trimmed to a Phred quality score threshold of 20, 374 minimum length threshold of 50bp, and had adapters removed using TrimGalore as previously 375 'good quality' is inherently difficult to define in complex mixed-assemblage reference-free de-381 novo metatranscriptomic assemblies, overall quality of metatranscriptome coassemblies was 382 assessed using multiple quantitative assembly summary statistics generated using METAQUAST 383 (Table S5;  ssDNA, and RNA viral sequences were clustered at 95% minimum sequence identity within 406 coassembly using CD-HIT-EST ver. 4.7 (86), creating non-redundant 95% databases with vOTUs 407 that represent a combination of free viruses, integrated prophages, and actively infecting viruses 408 (vMAT database). Clustered vOTUs were further validated using non-reference-based HMM 409 similarity searches implemented in VIBRANT ver. 1.2.0 (87) using the -virome flag, with a mean 410 74% ± 1.6% SD and 21% ± 9.0% SD of all DNA and RNA curated viral contigs validated, 411 respectively (Dataset S1). It should be noted that VIBRANT performs relatively poorly predictor and response variables. The assumption of homoskedasticity was still violated 506 following weighted regression, however WLS coefficient estimates remain robust against 507 heteroskedasticity (unbiased estimators). Error structures presented herein are from the fit WLS 508 models and should be interpreted carefully. To better understand how VMR varies by host 509 taxonomic affiliation, we fit random-intercept linear mixed effects models (lme4::lmer) using 510 Restricted Maximum Likelihood (REML) to log 10 transformed VMR (log 10 [P/H]) against a fixed 511 effect of either phyla or class (below class taxonomy was generally not informative in these 512 MAGs with few exceptions) where the intercept varies by spatially distinct mat. Log 10 513 transformed relative abundances (CPM) were preferred in these analyses to centered log ratio or 514 isometric log ratio transformed counts to facilitate easier data integration and comparison with 515 preexisting literature values (9, 60, 99). 516 To complement patterns characterized using DNA-based evidence, expressed viral 517 activity was assessed from RNA recruitment against gene-level features. To explore the 518 relationship between VMAR and VMR, a WLS model was fit to log 10 transformed VMAR (from 519 TPM normalized viral / host expression relationships) against log 10 transformed VMR 520 (calculated as above) with weighting structure defined as above to extract goodness of fit (r 2 ) and 521 significance of relationship (from p). Spatial variability was packaged into the random effects 522 structure of an additional random-slopes linear mixed effects model fit using REML with an 523 identical fixed effects structure to the OLS model while allowing for the intercept and slope to 524 vary across spatially distinct mats with respect to logVMR. There was strong concordance in 525 fixed effect coefficient estimations between the OLS and mixed effects model. An OLS 526 regression was fit to log 10 transformed viral and host RNA abundances normalized to TPM to explore power-law scaling among viral and host activity. To better understand temporal and host 528 phyla-specific patterns in VMAR, a linear mixed effects model was fit to log 10 transformed 529 VMAR with both sampling time point and host phyla fit as fixed predictors where both the slope 530 and intercept were allowed to vary by sampled mat with respect to host phylum. Further, activity 531 data were subset to only include putative Cyanophage-Cyanobacteria interactions, and temporal 532 patterns from this subset were interrogated in a linear mixed effects model fit to log 10 533 transformed VMAR with sampling time point as the fixed predictor and random intercepts fit to 534 sampled mat. Model assumptions were assessed graphically as above. demonstrating an increasingly top heavy trophic structure as host abundance increases. 820