ABSTRACT
Although microbial interactions underpin ocean ecosystem functions, they remain barely known. Different studies have analyzed microbial interactions using static association networks based on omics-data. However, microbial associations are dynamic and can change across physicochemical gradients and spatial scales, which needs to be considered to understand the ocean ecosystem better. We explored associations between archaea, bacteria, and picoeukaryotes along the water column from the surface to the deep ocean across the northern subtropical to the southern temperate ocean and the Mediterranean Sea by defining sample-specific subnetworks. Quantifying spatial association recurrence, we found the lowest fraction of global associations in the bathypelagic zone, while associations endemic of certain regions increased with depth. Overall, our results highlight the need to study the dynamic nature of plankton networks and our approach represents a step forward towards a better comprehension of the biogeography of microbial interactions across ocean regions and depth layers.
INTRODUCTION
Microorganisms play fundamental roles in ecosystem functioning (DeLong, 2009; Krabberød et al., 2017) and ocean biogeochemical cycling (Falkowski et al., 2008). The main processes shaping microbial community composition are selection, dispersal, and drift (Vellend, 2020). Selection exerted via environmental conditions and biotic interactions are essential in structuring the ocean microbiome (Logares et al., 2020), leading to heterogeneities reflecting those in the ocean environment, mainly in terms of temperature, light, pressure, nutrients and salinity. In particular, global-scale studies of the surface ocean reported strong associations between microbial community composition and diversity with temperature (Sunagawa et al., 2015; Ibarbalz et al., 2019; Salazar et al., 2019; Logares et al., 2020). Marked changes in microbial communities with ocean depth have also been reported (Cram et al., 2015; Parada & Fuhrman, 2017; Mestre et al., 2018; Peoples et al., 2018; Xu et al., 2018; Giner et al., 2020), reflecting the steep vertical gradients in light, temperature, nutrients and pressure.
Prokaryotes (bacteria and archaea) and unicellular eukaryotes are fundamentally different in terms of ecological roles, functional versatility, and evolutionary history (Massana & Logares, 2013) and are connected through biogeochemical and food web interaction networks (Layeghifard et al., 2017; Seymour et al., 2017). Still, knowledge about these interactions remains limited despite their importance to understand better microbial life in the oceans (Krabberød et al., 2017; Bjorbækmo et al., 2019). Such interactions are very difficult to resolve experimentally, mainly because most microorganisms are hard to cultivate (Baldauf, 2008; Lewis et al., 2020) and synthetic laboratory communities are unlikely to mirror the complexity of wild communities. However, metabarcoding approaches to identify and quantify marine microbial taxa allow to infer association networks, where nodes represent microorganisms and edges potential interactions.
Association networks provide a general overview of the microbial ecosystem aggregated over a given period of time (Steele et al., 2011; Chow et al., 2013, 2014; Cram et al., 2015; Needham et al., 2017; Parada & Fuhrman, 2017) or through space (Lima-Mendez et al., 2015; Milici et al., 2016;
Chaffron et al., 2020). Previous work characterized potential marine microbial interactions, including associations within and across depths. For example, monthly sampling allowed investigating prokaryotic associations in the San Pedro Channel, off the coast of Los Angeles, California, covering the water column from the surface (5 m) to the seafloor (890 m) (Cram et al., 2015; Parada & Fuhrman, 2017). Furthermore, a global spatial survey occurring within the TARA Oceans expedition, allowed to investigate planktonic associations between a range of organismal size fractions in the epipelagic zone, from pole to pole (Lima-Mendez et al., 2015; Chaffron et al., 2020). However, these studies did not include the bathypelagic realm, below 1000 m depth, which represents the largest microbial habitat in the biosphere (Arístegui et al., 2009).
A single static network determined from spatially distributed samples over the global ocean captures global, regional and local associations. Also, given that global-ocean expeditions collect samples over several months, networks could include temporal associations, yet disentangling them from spatial associations is normally complicated and not considered. Global associations may constitute the core interactome, that is, the set of microbial interactions essential for the functioning of the ocean ecosystem (Shade & Handelsman, 2012). Core associations may be detected by constructing a single network from numerous locations and identifying the most significant associations and strongest associations (Coutinho et al., 2015). On the other hand, regional and local associations may point to interactions occurring in specific spatial areas of different sizes due to particular taxa distributions resulting from environmental selection, dispersal limitation, specific ecological niches or biotic/abiotic filtering. The fraction of regional associations may be determined by excluding all samples belonging to one region and recomputing network inference with the reduced dataset (Lima-Mendez et al., 2015). Alternatively, regional networks can be built allowing to determine both, global and regional associations (Mandakovic et al., 2018) by investigating which edges networks have in common and which are unique. Such regional networks could contribute to understanding how the architecture of potential microbial interactions changes with environmental heterogeneity, also helping to comprehend associations that are stable (i.e., two partners always together) or variable (one partner able to interact with multiple partners across locations).
Regional networks, however, require a high number of samples per delineated zone, but these may not be available due to logistic or budgetary limitations. Recent approaches circumvent this limitation by deriving sample-specific subnetworks from a single static, i.e., all-sample network, which allows quantifying association recurrence over spatiotemporal scales (Chaffron et al., 2020; Deutschmann et al., 2021). Here, we adjusted this approach and used it to determine global and regional associations along vertical and horizontal ocean scales, which allowed us determining the biogeography of marine microbial associations. We analyzed associations between archaea, bacteria, and picoeukaryotes covering the water column, from surface to deep waters, in the Mediterranean Sea (hereafter MS) and five ocean basins: North and South Atlantic Ocean, North and South Pacific Ocean, and Indian Ocean (hereafter NAO, SAO, NPO, SPO, and IO). We estimated microbial taxa abundances using 397 globally distributed samples from the epipelagic to the bathypelagic zone in six ocean regions (Figure 1). We separated most epipelagic samples into surface and deep-chlorophyll maximum (DCM) samples. Next, we constructed a first global network comprising 5457 nodes and 31966 edges, 30657 (95.9%) positive and 1309 (4.1%) negative. Then, we applied a filter strategy including the removal of environmentally-driven edges due to nutrients (4.9% NO3−, 4.2% PO43−, 2.0% SiO2), temperature (1.9%), salinity (0.2%), and Fluorescence (0.01%) (Supplementary Table 1). Altogether, our sample-specific network-based exploration allowed us to determine core associations in the global ocean and specific regions, analyze changes in associations and network topology with depth and regions, and to investigate the vertical connectivity of planktonic associations.
RESULTS
From a global static network to sample-specific subnetworks
The resulting global static network contained 5448 nodes and 29118 edges, 28178 (96.8%) positive and 940 (3.2%) negative. It served as the underlying structure from which we generated 397 sample-specific subnetworks following three criteria. First, we required that an edge must be present in the global static network. Second, an edge can only be present within a subnetwork if both microorganisms associated with the edge have a sequence abundance above zero in the corresponding sample. Third, microorganisms associated need to appear together (intersection) in more than 20% of the samples, in which one or both appear (union) for that specific region and depth. This third condition was robust since random subsets retained most associations compared with the associations obtained when using all samples (Supplementary Figure 1). In addition to these three conditions, a node is present in a subnetwork if it has at least one association partner. Consequently, each subnetwork is included in the global static network.
Spatial recurrence
We determined the spatial recurrence of each association using its prevalence computed as the fraction of subnetworks in which a given association was present across the 397 samples (Figure 2A) and within each region-depth-layer combination (Figure 2B). The global ocean surface layer (contributing with 40% of samples) had more associations compared to the other depths (Figure 2B). Remarkably, 14971 of 18234 (82.1%) global ocean surface associations were absent from the MS. In turn, the number of surface associations was similar across ocean basins (Figure 2B).
Considering the most prevalent associations (those found in over 70% of subnetworks), we found that major vertical taxonomic patterns were conserved across regions: the epipelagic layers (surface and DCM) and the two lower layers (meso- and bathypelagic zones) were more similar to each other, respectively (Figure 3). The fraction of associations including Alphaproteobacteria was moderate to high in all zones in contrast to Cyanobacteria appearing mainly, as expected, in the epipelagic zone (Figure 3). The fraction of Dinoflagellata associations was moderate to high in the epipelagic zone and lower in the meso- and bathypelagic zones. While Dinoflagellata associations dominated most epipelagic layers, fewer were found in the MS and SAO surface and NAO DCM (Figure 3). Thaumarchaeota associations were moderate to high especially in the mesopelagic (dominant in the MS), moderate in the bathypelagic, and lower in the epipelagic zone (Figure 3). Another interesting pattern is the increase in associations including Gammaproteobacteria with depth being higher in the meso- and bathypelagic than in the epipelagic, especially in the SAO, SPO, NPO and IO.
Highly prevalent associations present across all regions are candidates to represent putative core interactions in the global ocean, which are likely to perform processes crucial for ecosystem functioning. We defined global associations as those appearing in more than 70% of subnetworks in each region. While we found several (21-26) global associations in the epi- and mesopelagic zones, no global associations were identified in the bathypelagic zone (Table 1, Supplementary Figure 2). In addition, we resolved prevalent (>50%) and low-frequency (>20%) associations. These three types of associations are distinct by definition, i.e., a global association cannot be assigned to another type. The fraction of global, prevalent, and low-frequency associations was highest in the DCM layer and lowest in the bathypelagic zone (third and fifth column in Table 1, Supplementary Figure 2B, 2D). Given that the MS bathypelagic is warmer (median temperature of 13.78°C) than the global ocean bathypelagic (median temperature between 1.4°C in SPO and 4.41°C in NAO), we calculated these associations for the global ocean only. We found slightly to moderately more global, prevalent, and low-frequency associations in the global ocean when not considering the MS (fifth to seventh row in Table 1, Supplementary Figure 2E-H).
Next, we determined regional associations within each depth layer. A regional association was defined as detected in at least one sample-specific subnetwork of one region and absent from all subnetworks of the other five regions. Results indicated an increasing proportion of regional associations with depth (Table 1, Figure 4A-B, Supplementary Figure 3). We found substantially more associations in the DCM and mesopelagic layers of the MS than corresponding layers of the global ocean. This may reflect the different characteristics of these layers in the MS vs. the global ocean or the massive differences in spatial dimensions between the global ocean and the MS. More surface and bathypelagic regional associations corresponded to the MS and NAO than in other regions (Table 1). Most regional associations had low prevalence, i.e., they were present in a few sample-specific subnetworks within the region (Figure 4C). We found 235 prokaryotic highly prevalent (>70%) regional associations in contrast to 89 eukaryotic and 24 associations between domains (Supplementary Material 1).
Previous studies have found a substantial vertical connectivity in the ocean microbiota, with surface microorganisms having an impact in deep sea counterparts (Mestre et al., 2018; Ruiz-González et al., 2020). Thus, here, we analyzed the vertical connectivity of microbial associations. Few associations appeared throughout the water column within a region: 327 prokaryotic, 119 eukaryotic, and 13 associations between domains (Supplementary Material 2). In general, most associations appearing in the meso- and bathypelagic did not appear in upper layers except for the MS and NAO where most and about half, respectively, of the bathypelagic associations already appeared in the mesopelagic (Figure 5). Specifically, 81.77 – 90.90% mesopelagic and 43.54-72.71% bathypelagic associations appeared for the first time in the five ocean basins (Supplementary Table 2). In the MS, 71.24% mesopelagic and 22.44% bathypelagic associations appeared for the first time and 69.71% of bathypelagic associations already appeared in the mesopelagic (Supplementary Table 2). This points to specific microbial interactions occurring in the deep ocean that do not occur in upper layers. In addition, while most surface associations also appeared in the DCM in the MS, most surface associations disappeared with depth in the five ocean basins (Figure 5) suggesting that most surface ocean associations are not transferred to the deep sea, despite microbial sinking (Mestre et al., 2018). In fact, we observed that most deep ocean ASVs already appeared in the upper layers (Supplementary Figure 4), in agreement with previous work that has shown that a large proportion of deep sea microbial taxa are also found in surface waters, and that their presence in the deep sea is related to sinking processes (Mestre et al., 2018).
Comparing subnetworks
Vertical and horizontal spatial variability is expected to affect network topology via biotic and abiotic variables as well as through dispersal processes (e.g., dispersal limitation). Yet, we have a limited understanding on how much marine microbial networks change due to these processes, thus analyzing the topology of subnetworks from specific ocean regions and depths is a first step to address this question. We compared the subnetworks of the six regions and depth layers using eight global network metrics (see Methods). We found that global network metrics change along the water column (Supplementary Figure 5). As a general trend, subnetworks from deeper zones were more clustered (transitivity) with higher average path length, stronger associations (average positive association scores) and lower assortativity (based on degree) compared to those in surface waters. Most DCM and bathypelagic subnetworks had the highest connectivity (edge density). Contrarily, in the MS, the surface subnetworks had the highest connectivity (Supplementary Figure 5).
To avoid predefined groupings into regions and depth layers, we grouped similar subnetworks via a local network metric (see Methods) and identified 36 clusters of 5 to 28 subnetworks (Supplementary Table 3). We found 13 (36.1%) clusters that were dominated by surface subnetworks: six clusters (100% surface subnetworks) from three to five oceans but not MS and seven clusters with 55-86% surface networks from two to five of the six ocean regions. In turn, 11 clusters were dominated by a deeper layer: two DCM (64-90%), five mesopelagic (62-83%) and four bathypelagic dominated clusters (60-69%). Nine of these 11 clusters combined different regions except for one mesopelagic and one bathypelagic dominated cluster representing exclusively the MS (Supplementary Table 3). Furthermore, we found 11 clusters containing exclusively or mainly MS subnetworks in contrast to only one cluster dominated by an ocean basin (NAO).
Next, we built a more comprehensive representation of network similarities between subnetworks via a minimal spanning tree (MST, see Methods) to underline the pervasive connectivity of associations across depth and environmental gradients. The depth layers, ocean regions, location of clusters, and environmental factors were projected onto the MST (Figure 6). Most surface subnetworks were centrally located, while subnetworks from other depths appeared in different MST areas. Most MS subnetworks were located in a specific branch of the MST, while the five oceans were mixed, indicating homogeneity within oceans but network-based differences between the oceans and the MS. However, subnetworks in the MST tended to connect to subnetworks from the same depth layer, cluster or similar environmental conditions. All in all, the above results suggest a strong influence of environmental gradients in shaping network topology and plankton associations, as previously observed in epipelagic communities at global scale (Chaffron et al., 2020).
DISCUSSION
In this work, we disentangled and analyzed global and regional microbial associations across the oceans’ vertical and horizontal dimensions. We found a low number of global associations indicating a potentially small global core interactome within each depth layer across six oceanic regions. Core microorganisms are often defined as those appearing in most or all samples from similar habitats (Shade & Handelsman, 2012). We previously identified a core microbiota in a coastal MS observatory based on both association patterns (Krabberød et al., 2021) and temporal recurrence of associations (Deutschmann et al., 2021). Both studies indicate more robust microbial connectivity, suggesting a broader core, in colder than in warmer seasons. In contrast, within each region, we found less highly prevalent associations in the bathypelagic zone of the global ocean (pointing to a smaller regional core) than in the upper layers, except from the NPO, having less highly prevalent associations in the meso-than in the bathypelagic. In agreement, we found more regional bathypelagic associations than in upper layers. Thus, associations may reflect the heterogeneity and isolation of the deep ocean regions due to deep currents, water masses, or the topography of the seafloor that may prevent microbial dispersal. Moreover, the higher complexity of the deep ocean ecosystem may provide a higher number of ecological niches potentially resulting in more regional associations and agreeing with our observations. A high diversification of niches may be associated to the different quality and types (labile, recalcitrant, etc.) of organic matter reaching the deep ocean from the epipelagic zone (Arístegui et al., 2009), which is significantly different across oceanic regions (Hansell & Carlson, 1998). In an exploration of generalists versus specialist prokaryotic metagenome-assembled genomes (MAGs) in the arctic Ocean, most of the specialists were linked to mesopelagic samples indicating that their distribution was uneven across depth layers (Royo-Llonch et al., 2020). This is in agreement with putatively more niches in the deep ocean than in upper ocean layers leading to more specialist taxa and subsequently more regional associations.
Vertical connectivity in the ocean microbiome is partially modulated by surface productivity through sinking particles (Mestre et al., 2018; Boeuf et al., 2019; Ruiz-González et al., 2020). An analysis of eight stations, distributed across the Atlantic, Pacific and Indian oceans (including 4 depths: Surface, DCM, meso- and bathypelagic), indicated that bathypelagic communities comprise both endemic taxa as well as surface-related taxa arriving via sinking particles (Mestre et al., 2018). Ruiz-González et al. (Ruiz-González et al., 2020) identified for both components (i.e. surface-related and deep-endemic) the dominating phylogenetic groups: while Thaumarchaeota, Deltaproteobacteria, OM190 (Planctomycetes) and Planctomycetacia (Planctomycetes) dominated the endemic bathypelagic communities, Actinobacteria, Alphaproteobacteria, Gammaproteobacteria and Flavobacteriia (Bacteroidetes) dominated the surface-related taxa in the bathypelagic zone. We found association partners for each dominating phylogenetic group within each investigated type of association, i.e., highly prevalent, regional, global, prevalent, and low-frequency associations. While ASVs belonging to these taxonomic groups were present throughout the water column, specific associations were observed especially in the mesopelagic and bathypelagic zones, which suggests specific associations between deep-sea endemic taxa. This is in agreement with a recent study that found a remarkable taxonomic novelty in the deep ocean after analyzing 58 microbial metagenomes from global samples, unveiling ∼68% archaeal and ∼58% bacterial novel species (Acinas et al., 2021). Less is known about associations found along the entire or a substantial fraction of the water column, suggesting consortia of associated microorganisms that sink together or that populate large vertical ranges of the water column. Associations present across all layers were few but may represent interacting taxa that populate the entire water column or that sink together. However, given that we targeted mainly picoplankton, we would not expect a considerable influence of sinking particles in the vertical distribution of associations in this study. Some associations observed in the deep ocean may correspond to consortia of taxa degrading sinking particles, or taxa that might have detached from sinking particles, i.e., dual life-style taxa as observed in (Sebastián, Sánchez, et al., 2021). Alternatively, microorganisms may have reached bathypelagic waters via fast-sinking processes, embedded in (larger) particles (Agusti et al., 2015). By following this observation, a previous study found that the abundances of microorganisms in deeper layers mirrored the changes in abundance of microorganisms in shallower layers, at a single sampling station, indicating that communities populating different ocean depths are not isolated from each other but linked, possibly through sinking particles or migrating organisms transporting nutrients through the water column (Cram et al., 2015). However, microbial co-occurrence alone does not suffice to infer microbial interactions, because different mechanisms, such as selection or dispersal, influence species as well as their interactions (Poisot et al., 2012). Our results suggest that microorganisms can potentially change their interaction partners along vertical (and horizontal) scales and, to a lesser extent, maintain interactions along the water column.
A study of global-ocean picoplanktonic eukaryotes through the water column (from the Epi- to the Bathypelagic zone) found the highest and lowest relative metabolic activity for most eukaryotes in the meso- and bathypelagic zones, respectively (Giner et al., 2020). Thus, we could hypothesize more competition in the mesopelagic zone and more beneficial interactions in the bathypelagic zone. In our study, mesopelagic subnetworks displayed the lowest connectivity in most regions on average, and we found the strongest associations among both meso- and bathypelagic subnetworks. Moreover, we found the highest clustering (transitivity) in the meso- and bathypelagic zones (relatively colder waters) compared to the epipelagic zone (warmer waters). Similarly, a previous global-scale study (Chaffron et al., 2020) concentrating on the epipelagic zone and including polar waters, found higher edge density, association strength and clustering in polar (colder waters) compared to warmer waters. These results suggest that either microorganisms interact more in colder and darker environments or that their recurrence is higher due to a higher environmental selection exerted by low temperatures and no light. Alternatively, limited resources (primarily nutrients) in the surface versus deep ocean may prevent the establishment of specific microbial interactions. Furthermore, another explanation could be the higher diversity of ecological niches and, thus, a higher diversity of associations in the meso- and bathypelagic.
Through quantifying regional associations, our results indicated distinct associations in the MS, where most regional associations were observed compared to the global ocean, as previously shown in an epipelagic network (Lima-Mendez et al., 2015). Furthermore, we found a substantial number of regional associations in the NAO compared to other ocean basins, contrasting with the NAO having the lowest number of regional associations in a previous epipelagic network (Lima-Mendez et al., 2015).
To conclude, our network-based exploration disentangles the spatial distribution of associations of the global ocean microbiome, from top to bottom layers, suggesting both global and regional interactions. Our analysis demonstrated the change of network topology across vertical (water column) and horizontal (different regions) dimensions of the ocean. Furthermore, our results indicate that associations have specific spatial distributions that are not just mirroring ASV distributions.
METHODS
Dataset
Samples originated from two expeditions, Malaspina-2010 (Duarte, 2015) and Hotmix (Martínez-Pérez et al., 2017). The former was onboard the R/V Hespérides and most ocean basins were sampled between December 2010 and July 2011. Malaspina samples included i) MalaSurf, surface samples (Ruiz-González et al., 2019; Logares et al., 2020), ii) MalaVP, vertical profiles (Giner et al., 2020), and iii) MalaDeep, deep-sea samples, (Pernice et al., 2016; Salazar et al., 2016; Sanz-Sáez, 2021). For the Hotmix expedition, sampling took place onboard the R/V Sarmiento de Gamboa between 27th April and 29th May 2014 and represented a quasi-synoptic transect across the MS and the adjacent North-East of the NAO. See details in Table 2.
DNA extractions are indicated in the papers associated with each dataset (Table 2). From the DNA extractions, the 16S and 18S rRNA genes were amplified and sequenced. PCR amplification and sequencing of MalaSurf, MalaVP (18S), and Hotmix (16S) are indicated in the papers associated with each dataset in Table 2. MalaVP (16S) and Hotmix (18S) were PCR-amplified and sequenced following the same approach as in (Logares et al., 2020). MalaDeep samples were obtained from (Pernice et al., 2016; Salazar et al., 2016) but re-sequenced in Genoscope (France) with different primers, as described below. MalaSurf, MalaVP and Hotmix datasets were sequenced at RTL Genomics (Texas, USA).
We used the same amplification primers for all samples. For the 16S, we amplified the V4-V5 hypervariable region using the primers 515F-Y and 926R (Parada et al., 2016). For the 18S, we amplified the V4 hypervariable region with the primers TAReukFWD1 and TAReukREV3 (Stoeck et al., 2010). See more details in (Logares et al., 2020). Amplicons were sequenced in Illumina MiSeq or HiSeq2500 platforms (2×250 or 2×300 bp reads). Operational Taxonomic Units were delineated as Amplicon Sequence Variants (ASVs) using DADA2 (Callahan et al., 2016), running each dataset separately before merging the results. ASVs were assigned taxonomy using SILVA (Quast et al., 2012), v132, for prokaryotes, and PR2 (Guillou et al., 2012), v4.11.1, for eukaryotes. ASVs corresponding to Plastids, Mitochondria, Metazoa, and Plantae, were removed. Only samples with at least 2000 reads were kept. The dataset contained several MalaDeep replicates, which we merged, and two filter sizes: given the cell sizes of prokaryotes versus microeukaryotes, we selected the smallest available filter size (0.2-0.8 µm) for prokaryotes and the larger one (0.8-20 µm) for microeukaryotes. The other three datasets used filter sizes of 0.2-3 µm. Additionally, we required that samples had eukaryotic and prokaryotic data, resulting in 397 samples for downstream analysis: 122 MalaSurf, 83 MalaVP, 13 MalaDeep, and 179 Hotmix. We separated the samples into epipelagic, mesopelagic and bathypelagic zone (Figure 1). Furthermore, we separated most epipelagic samples into surface and deep-chlorophyll maximum (DCM) samples, but 18 MS and 4 NAO samples belonged to neither. We also considered nevironmental variables: Temperature (2 missing values = mv), salinity (2 mv), fluorescence (3 mv), and inorganic nutrients NO3− (36 mv), PO43− (38 mv), and SiO2 (37 mv), which were measured as indicated elsewhere (Giner et al., 2020; Logares et al., 2020; Sebastián, Ortega-Retuerta, et al., 2021). In specific samples, missing data on nutrient concentrations were estimated from the World Ocean Database (Boyer et al., 2013).
Single static network
We constructed the single static network in four steps. First, we prepared the data for network construction. We excluded rare microorganisms by keeping ASVs with a sequence abundance sum above 100 reads and appearing in at least 20 samples (>5%). The latter condition removes bigger eukaryotes only appearing in the 13 MalaDeep eukaryotic samples of a bigger size fraction. To control for data compositionality (Gloor et al., 2017), we applied a centered-log-ratio transformation separately to the prokaryotic and eukaryotic tables before merging them.
Second, we inferred a (preliminary) network using FlashWeave (Tackmann et al., 2019), selecting the options “heterogeneous” and “sensitive”. FlashWeave was chosen as it can handle sparse datasets like ours, taking zeros into account and avoiding spurious correlations between ASVs that share many zeros.
Third, we aimed to remove environmentally-driven edges. FlashWeave could detect indirect edges and allows to supply additional metadata such as environmental variables, but currently does not support missing data. Thus, we applied EnDED (Deutschmann et al. 2020), combining the methods Interaction Information (with 0.05 significance threshold and 10000 iterations) and Data Processing Inequality as done previously via artificially-inserted edges to connect all microbial nodes to the six environmental parameters (Deutschmann et al., 2021). Although EnDED can handle missing environmental data when calculating intermediate values relating ASV and environmental factors, it would compute intermediate values for microbial edges using all samples. Thus, to avoid a possible bias and speed up the calculation process, we applied EnDED individually for each environmental factor, using only the samples containing values for the specific environmental factor.
Fourth, we removed isolated nodes, i.e., nodes without any edge. The resulting network represented the single static network in our study.
Sample-specific subnetwork
We constructed 397 sample-specific subnetworks. Each subnetwork represented one sample and was derived from the single static network, i.e., a subnetwork contained nodes and edges present in the single static network but not vice versa. Consider sample sRL with R being the marine region, and L the sample’s depth layer. Let e be an association between microorganisms A and B. Then, association e is present in the sample-specific subnetwork Ns, if
e is an association in the single static network,
the microorganisms A and B are present within sample s, i.e., the abundances are above zero within that particular sample, and
the association has a region and depth specific Jaccard index, JRL, above 20% (see below).
In addition to these three conditions, a node is present in a sample-specific subnetwork when connected to at least one edge, i.e., we removed isolated nodes.
Regarding the third condition, we determined JRL for each association pair by computing within each region and depth layer, the fraction of samples two microorganisms appeared together (intersection) from the total samples at least one microorganism appears (union). Supplementary Table 4 shows the number of edges using different thresholds. Given the heterogeneity of the dataset within regions and depth layers, we decided to use a low threshold, keeping edges with a Jaccard index above 20% and removed edges below or equal to 20%. We tested robustness by randomly drawing a subset of samples from each region and depth combination. The subset contained between 10% and 90% of the original samples. We rounded up decimal numbers to avoid zero sample subsets, e.g., 10% of 7 samples results in a subset of 1 sample. We excluded the DCM of the SPO because it contained only one sample. Next, we recomputed the Jaccard index for the random subset. Lastly, requiring J>20%, we evaluated robustness determining i) how many edges were kept in the random subsamples compared to all samples, and ii) how many edges were kept in the random subset that were also kept when all samples were used. We repeated the procedure for each region-depth combination 1000 times.
Spatial recurrence
To determine an association’s spatial recurrence, we calculated its prevalence as the fraction of subnetworks in which the association was present. We determined association prevalence across the 397 samples and each region-layer combination. We mapped the scores onto the single static network, visualized in Gephi (Bastian et al., 2009), v.0.9.2, using the Fruchterman Reingold Layout (Fruchterman & Reingold, 1991) with a low gravity score of 0.5. We used the region-layer prevalence to determine global and regional associations. We considered an association to be global within a specific depth layer if its prevalence was above 70% in all regions. In turn, a regional association had an association prevalence above 0% within a particular region-layer (present, appearing in at least one subnetwork) and 0% within other regions of the same layer (absent, appearing in no subnetwork). In addition, associations that are not global but appear in all regions over 50% are considered prevalent. Similarly, associations that are not global nor prevalent but appear in all regions over 20% are considered low-frequency. Thus, an association can be classified as i) global, ii) regional, iii) prevalent, low-frequency, and v) “other”, i.e., associations that have not been classified into the previous categories.
Global network metrics
We considered the number of nodes and edges and six other global network metrics of which most were computed with functions of the igraph R-package (Csardi & Nepusz, 2006). Edge density indicating connectivity is computed through the number of actual edges divided by the number of possible edges. The average path length is the average length of all shortest paths between nodes in a network. Transitivity indicating how well a network is clustered is the probability that the nodes’ neighbors are connected. Assortativity measures if similar nodes tend to be connected, i.e., assortativity (degree) is positive if high degree nodes tend to connect to other high degree nodes and negative otherwise. Similarly, assortativity (Euk-Prok) is positive if eukaryotes tend to connect to other eukaryotes and prokaryotes tend to connect to other prokaryotes. Lastly, we computed the average positive association strength as the mean of all positive association scores provided by FlashWeave.
Local network metric
The previous global metrics disregard local structures’ complexity, and topological analyses should include local metrics (Espejo et al., 2020), e.g., graphlets (Pržulj et al., 2004). Here, we determined network-dissimilarity between each pair of sample-specific subnetworks as proposed in (Yaveroǧlu et al., 2014), comparing network topology without considering specific ASVs. The network-dissimilarity is a distance measurement that is always positive: 0 if networks are identical and greater numbers indicate greater dissimilarity.
Next, we constructed a Network Similarity Network (NSN), where each node is a subnetwork and each node connects with all other nodes, i.e., the NSN was a complete graph. We assigned the network-dissimilarity score as edge weight within the NSN. To simplify the NSN while preserving its main patterns, we determined the minimal spanning tree (MST) of the NSN. The MST had 397 nodes and 396 edges. The MST is a backbone, with no circular path, in which the edges are chosen so that the edge weights sum is minimal and all nodes are connected, i.e., a path exists between any two nodes. We determined the MST using the function mst in the igraph package in R (Prim, 1957; Csardi & Nepusz, 2006).
Using the network-dissimilarity (distance) matrix, we determined clusters of similar subnetworks in python. First, we reduced the matrix to ten dimension using umap (McInnes et al., 2018) with the following parameter settings: n_neighbors=3, min_dist=0, n_components=10, random_state=123, and metric=‘precomputed’. Second, we clustered the subnetworks (represented via ten dimensions) with hdbscan (McInnes et al., 2017) setting the parameters to min_samples=3 and min_clusters=5.
Reproducibility
R-Markdowns for data analysis including commands to run FlashWeave and EnDED (environmentally-driven-edge-detection and computing Jaccard index) are publicly available: https://github.com/InaMariaDeutschmann/GlobalNetworkMalaspinaHotmix. While the networks are already available, the microbial sequence abundances (ASV table), taxonomic classifications, environmental data including nutrients will be publicly available after acceptance. The data are of course available upon request to reviewers.
Author’s contributions
The overall project was conceived and designed by RL. JMG, CMD, SGA, RM, JA were responsible for the sampling and acquisition of contextual data. CRG, JP and MS processed specific samples in the laboratory. RL processed the amplicon data generating the two ASV tables. They were the starting point of the present study, which is part of the overall project. IMD developed the conceptual approach and DE, SC, and RL contributed to its finalization. IMD performed the data analysis. ED, MS, CMD, SGA, RM, JMG, DE, SC, and RL contributed with interpretation of the results. IMD wrote the original draft. All authors contributed to manuscript revisions and approved the final version of the manuscript.
Competing interests
The authors declare that they have no competing interests.
SUPPLEMENTARY MATERIAL
SUPPLEMENTARY FIGURES
Supplementary Figure 1: Robustness of the third condition for generating sample-specific subnetworks for each region and depth with sufficient samples (DCM layer from the SPO was removed because it contained only one sample). Within each region and depth, the set of samples was randomly subsampled containing between 10% to 90% of the original set using all samples. The y-axis shows the fraction of edges that were kept in the subsampled set compared to the original set. We considered only the number of kept edges and B) which edges were kept.
Supplementary Figure 2: Associations occurring in each region and depth layer. If an association appears in more than 20% of subnetworks in each region, it is classified as low-frequency, >50% prevalent, and >70% global. The number of samples appears in the upper left corner, the number of edges in the upper right corner, and the depth range in the lower right corner (in m below surface). We classified the associations considering all six regions (A-D) and considering the five ocean basins neglecting the MS (E-H).
Supplementary Figure 3: Regional associations occurring in each region and depth layer. Within a particular depth layer, if an association appears in at least one subnetwork (present) in one region and in no subnetwork (absent) in other regions, it is classified as regional. The four ocean layers (rows) are surface (SRF), DCM, mesopelagic (MES), and bathypelagic (BAT). The number of samples appears in the upper left corner, the number of edges in the upper right corner, and the depth range in the lower right corner (in m below surface).
Supplementary Figure 4: ASVs across depth layers. For each region, we color ASVs based on the layer they first appeared: surface (S, yellow), DCM (D, orange), mesopelagic (M, red), and bathypelagic (B, black). Absent ASVs are grouped in box “a”. An ASV only appearing in the bathypelagic, is assigned to box “a” in above layers. That is, an ASV detected in the surface and present in the DCM but absent in lower layers, appears in the box (S) in the surface and DCM layer, but in box “a” in the meso- and bathypelagic layer. An ASV cannot be assigned to two layers. Note that most ASVs in the bathypelagic zone have been already detected in upper layers because most ASVs are assigned to the boxes “S”, “D”, and “M” instead of “B”.
Supplementary Figure 5: Global network metrics grouped by region and depth layer.
SUPPLEMENTARY TABLES
SUPPLEMENTARY MATERIAL
Supplementary Material 1: Highly prevalent (>70%) regional associations. For each association between two ASVs (first and second column) we list: region (third column), depth layer (fourth column), prevalence in that region and depth layer (fifth column), type: eukaryotic (Euk_Euk), prokaryotic (Prok_Prok), and association between domains (Euk_Prok) (sixth column), and the phyla (seventh and eight column).
Supplementary Material 2: Associations appearing in all layers in at least one region. For each association between two ASVs (first and second column) we list: the classification in each layer (3-6 column), overall prevalence (8. column), prevalence in each region and depth layer (9-34. column), the number of regions in which the association appeared in all layers (AllLayers, 35. column), the number of layers an association appears in a region (36-41. column), type: eukaryotic (Euk_Euk), prokaryotic (Prok_Prok), and association between domains (Euk_Prok) (42. column), and the phyla (43-44. column).
Acknowledgements
We thank all members of the Malaspina and Hotmix expeditions with the multiple projects funding these collaborative efforts. Sampling was carried out thanks to the Consolider-Ingenio programme (project Malaspina 2010 Expedition, ref. CSD2008–00077) and HOTMIX project (CTM2011-30010/MAR), funded by the Spanish Ministry of Economy and Competitiveness Science and Innovation. Part of the analyses have been performed at the Marbits bioinformatics core at ICM-CSIC (https://marbits.icm.csic.es). This project and IMD received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 675752 (ESR2, http://www.singek.eu) to RL. RL was supported by a Ramón y Cajal fellowship (RYC-2013-12554, MINECO, Spain). This work was also supported by the projects INTERACTOMICS (CTM2015-69936-P, MINECO, Spain), MicroEcoSystems (240904, RCN, Norway) and MINIME (PID2019-105775RB-I00, AEI, Spain) to RL. SC was supported by the CNRS MITI through the interdisciplinary program Modélisation du Vivant (GOBITMAP grant). SC, DE and SGA were funded by the H2020 project AtlantECO (award number 862923).