Regional oceanographic features and hydrothermal activity influence protist diversity and biogeography in the Okinawa Trough

10 Microbial eukaryotes (protists) contribute substantially to ecological functioning in marine ecosystems, but factors shaping protist diversity, such as dispersal barriers and environmental selection, remain difficult to parse. Deep-sea water masses, which form geographic barriers, and hydrothermal vents, which represent isolated productivity hotspots, are ideal opportunities for studying the effects of dispersal barriers and 15 environmental selection on protist communities. The Okinawa Trough, a deep, back-arc spreading basin, contains distinct water masses in the bottom waters of northern and southern regions and at least twenty-five active hydrothermal vents. In this study, we used metabarcoding to characterize protist communities from fourteen stations spanning the length of the Okinawa Trough, including three hydrothermal vent sites. 20 Significant differences in community structure reflecting regional oceanography and water mass composition were present, indicating the importance of geographic factors in shaping protist communities. Protist communities in bottom waters affected by hydrothermal activity were significantly different from communities in other bottom waters, suggesting that environmental factors can be especially important in shaping 25 community composition under specific conditions. Amplicon sequence variants that were enriched in hydrothermally influenced bottom waters largely derived from cosmopolitan protists that were present, but rare, in other near-bottom samples, thus highlighting the importance of the rare biosphere.

Introduction hydrothermal vent plumes. Vertical profiles from stations 2, 10, and 11 showed nearbottom turbidity and CDOM maxima (Supplemental Figure 1A), which is typical for OT vent plumes [16,20]. In addition, nutrient analysis showed elevated NH4 concentrations in bottom water samples from stations 2 and 10, providing further evidence of hydrothermal influence [16,20] (Supplemental Figure 1B). Total carbon was only 110 measured for samples from stations 2, 10, 14, and 15; consistent with hydrothermal influence [16,20], stations 2 and 10 exhibit near bottom peaks in carbon concentrations (Supplemental Figure 1C). Stations 2, 10 and 11 are, therefore, included in analyses aimed at determining how hydrothermal activity influences protist community structure in the OT, while the other sampling stations near vent sites are not. Knoll, station 10 is at the Hatoma Knoll, station 11 is near the Dai-Yon Yonaguni Knoll, station 5 is near the Higa vent site, and station 12 is near the Iheya North vent field.

Sample collection
A Niskin rosette with 30 10-liter bottles and fitted with a conductivity-temperature-depth 130 (CTD) probe (SBE 911plus, Sea-Bird Scientific, Bellevue, WA) was deployed at each cruise station to collect water from the subsurface chlorophyll maximum (SCM, 50-100 m), the middle water column (mid, 700-1500 m), and approximately 10 m above the seafloor (bottom, 772-2957 m) (Supplemental Table 1

Amplicon sequence analysis
Sequence data from each of four flow-cells were denoised separately using the Divisive Amplicon Denoising Algorithm [25] through the DADA2 plug-in for QIIME 2 [26]. We 160 analyzed amplicon sequence variants (ASVs) to maximize the amount of diversity included in the study [27] and detect changes in community composition that may not be apparent if analyzed at higher taxonomic levels. Denoised ASV tables were merged before taxonomy was assigned to ASVs with a naive Bayes classifier trained on the Protist Ribosomal Reference (PR2) database [28] using the QIIME 2 feature-classifier 165 plug-in [29]. We imported results into the R statistical environment [30] for further processing with the R package phyloseq [31]. Sequences were initially filtered to remove those that were not assigned taxonomy at the Kingdom level-likely data Protist diversity and biogeography in the Okinawa Trough 7 artifacts-and all metazoan sequences. We did not apply additional prevalence or minimum abundance filtering since samples were from varied locations and depths, but 170 the DADA2 algorithm discards singletons.

Statistical analyses
To test whether community composition varied significantly by depth, filter-size, region or presence or absence of hydrothermal influence in bottom water samples, we 175 computed the Aitchison distance between samples, which minimizes compositional bias inherent in metabarcoding data [32], and performed Permutational Multivariate Analyses of Variance (PERMANOVA, 999 permutations) with the adonis2 function from the R package vegan [33]. We further used pairwise PERMANOVA (999 permutations) to test which regions differed in community composition from each other. Because stations 3 180 and 18 were much deeper than other stations (Supplemental Table 1), we used midwater samples from these stations in the bottom-water analyses and excluded these stations from the mid-water analyses. This is further justified because the mid-water samples at these stations were from the salinity layer contributing to bottom waters at the other stations (Supplemental Figure 2). To determine if environmental variables 185 contributed to changes in community composition, we performed Redundancy Analysis (RDA) and applied ANOVA to RDA results to test whether RDA models were statistically significant and which environmental variables significantly contributed to variation in community composition [34]. Before performing RDA, we tested for collinearity of environmental variables by computing Pearson correlation coefficients 190 between environmental variables in each depth layer (Supplemental Figure 3). If the absolute value of correlation coefficients was greater than 0.8, we only included one of the correlated variables in the RDA for that depth [35]. We further applied variance partitioning on variables that significantly contributed to variation in community composition to determine the percent of variation attributable to each variable [36].

Protist diversity in the Okinawa Trough
Overall, 31 Figure 4A). Shannon indices for SCM and surface samples were not significantly different but were both significantly higher than for mid and bottom waters   . The plot is faceted by sampling depth (columns) and filter pore-size in μm (rows). When replicates were available for a particular station/depth/filter combination, replicates were collapsed and represented with a single stacked bar. Regional community differences are 255 not visible at the high taxonomic level represented in the plot, but there are visible differences by depth and filter pore-size. Hydrothermally influenced sites include station 10 at the Hatoma Knoll vent field, station 11 near the Dai-Yon Yonaguni Knoll, and station 2 at the ANA vent at the Daisan-Kume Knoll (highlighted in red). 260

Protist biogeography in the Okinawa Trough
To investigate biogeography of protist communities in the OT, we merged smaller and Redundancy analysis was applied to evaluate the contribution of environmental factors to regional differences in protist community composition (Figure 4)

Protist communities at hydrothermal vent sites
To evaluate the effect of hydrothermal activity on protist diversity, we first checked whether alpha diversity was significantly different at hydrothermally influenced sites; hydrothermal influence did not significantly increase or decrease observed richness or 315 Shannon indices in bottom water samples (Supplemental Figure 4C-D Spirotrichea ciliates, and Picozoa ASVs ( Figure 5).

Discussion
Due to their small size, protists, like bacteria, face few dispersal barriers in the ocean [38]. In addition, rapid asexual reproduction in many protists, as well as the ability to 345 form resting cysts, increases their dispersal potential. As a result, many protists are cosmopolitan, suggesting that global protist diversity is low while local diversity in any available in metabarcoding datasets, we investigated whether biogeographic patterns exist among protists in the Okinawa Trough in regard to regional ocean circulation patterns, water mass composition, and hydrothermal influence. Our results demonstrate relatively weak biogeography in surface waters through to the mid-water column, but show strong regional differentiation of protist communities in deeper water. In addition, 360 protist communities in bottom waters affected by hydrothermal vent activity were significantly different from communities in other bottom water samples.

Protist biogeography in the Okinawa Trough
Consistent with previous studies on protist biogeography, the strongest overall influence 365 on community composition was the depth sampled [43]. Depth-dependent diversity patterns were similar to metabarcoding results from other regions, with higher diversity in the surface and SCM than the meso-and bathypelagic [8,43]. In addition, overall community structure by depth was comparable to global patterns: Dinoflagellata, whereas surface water regularly enters and occasionally exits [11,44]. Regional 380 differences in protist community composition are mostly consistent with these prevailing current patterns. In surface and mid-water samples, NOT and SOT communities were both significantly different from KG samples, but NOT and SOT communities were not different from each other in the surface waters, where the Kuroshio current is strongest.
Protist diversity and biogeography in the Okinawa Trough 16 At the SCM, NOT and SOT samples differed significantly but KG samples did not differ 385 from NOT or SOT samples. These results may be due, in part, to Changjiang (a.k.a. Yangtze) Diluted Water (CDW) and Kuroshio water were mixing at the continental shelf near station 13. The Changjiang/Yangtze is the longest river in China and transports terrestrial nutrients and anthropogenic pollutants into the East China Sea, causing eutrophication that regularly triggers causes diatom blooms [45]. CTD and nutrient 390 measurements at station 13 showed decreased subsurface salinity and increased nutrient concentrations and chlorophyll fluorescence compared to other sampling sites (Supplemental Figure 6). In addition, there as was an increased contribution of Bacillariophyceae diatoms (Ochrophyta, Figure 4) [5]. Water 405 masses in the ocean may either act as geographic boundaries to dispersal or select for microbes adapted to specific environmental conditions [5,6]. If differences in community composition were due to distinct chemical properties of water masses, it would support the latter [6]. In the OT, salinity is the major marker for water mass composition [10], and the salinities measured during our sampling were consistent with those reported for 410 the different water mass compositions (Supplemental Figure 2A-B, [10]). However, salinity did not significantly contribute to variation in community composition in the bottom waters of the OT ( Figure 4D). Therefore, our results support the idea that particular water mass histories influence community composition and suggest that water masses may act as barriers to microbial dispersal [6].

Protist communities at hydrothermal vent sites
The protist communities in samples from hydrothermally influenced sites were significantly different from the communities in bottom waters at sites without hydrothermal influence. Furthermore, environmental factors that significantly contributed 420 to variation in community composition in bottom waters-turbidity, NH4-are related to hydrothermal activity in the OT, where vents resuspend overlying terrigenous material creating turbid vent plumes [16]. Syndiniales, Ciliophora, and MAST (Marine Stramenopiles) ASVs were significantly enriched in hydrothermally influenced OT bottom waters, which is consistent with previous work comparing protists in bottom 425 waters with varying proximity to hydrothermal vents [3]; Ochrophyta, environmental clades of Radiolaria, and Picozoa ASVs were additionally enriched near hydrothermally influenced sites in this study (Figures 2, 5). Quantitative models of deep-water circulation suggest that vents in back-arc basins, such as the Okinawa Trough, are wellconnected, but basin to basin dispersal may be restricted [12]. A remaining question, interface near the Juan de Fuca Ridge vent system [49,50]; the Syndinales ASVs enriched in hydrothermally influenced samples clearly derive from cosmopolitan organisms. Although they are not restricted to vent systems, parasitic Syndiniales do seem to thrive in vent systems and may take advantage of increased host availability in such regions [51]. 450 Ochrophyta accounted for the second-most enriched ASVs in hydrothermally influenced samples ( Figure 5). The enriched Ochrophyta ASVs belong to several clades within the order Chrysophyceae. Two belong to clades with unknown morphology made up of environmental sequences (clades EC2H and EC2I) recovered from a diverse array of marine and freshwater environments [52]. The remaining ASVs belong to the 455 genus Paraphysomonas, which are colorless phagotrophs that are globally distributed in freshwater, marine, and soil ecosystems and have previously been recovered from vent sites [53]. Although several of the enriched ASVs share > 99%-100% identity with previously published vent-associated sequences [53], they are equally similar to sequences recovered from many other marine environments, indicating that these 460 enriched ASVs also derive from cosmopolitan organisms. Atlantic [59], and the Southern Ocean [60].
The RAD-A and RAD-B radiolarian groups represent environmental clades that have no morphological description-other than being picoplankton-or known 475 ecological roles [61]. The RAD-A ASV enriched in hydrothermally influenced samples ( Figure 5) shares > 99% identity with many GenBank sequences recovered from oxic and anoxic waters globally, including the Cariaco basin in the Caribbean Sea [50], the Gulf Stream [49], and the South East Pacific [62]. The enriched RAD-B ASV shares   100% identity with multiple sequences recovered from the East Pacific Rise (1500 m   480 and 2500 m), the Arctic Ocean (500 m), oxygenated water in the Cariaco Basin [50], the Juan de Fuca Ridge [63], and the Southern Ocean [64]. While the radiolarian ASVs enriched in hydrothermally influenced samples appear to be cosmopolitan, it is notable that identical and similar sequences have repeatedly been isolated from vent fields and anoxic regions. 485 Previous studies indicated Ciliophora are abundant in sediments and bacterial mats near hydrothermal vents [65][66][67]. It is unsurprising, then, that two Ciliophora ASVs were significantly more abundant in hydrothermally influenced samples ( Figure 5). The enriched Spirotrichea ASV was classified to genus level by our classifier and belongs to Leegaardiella, a recently described genus collected in the North Atlantic [68]. The 490 Leegardiella ASV shared 100% coverage and identity with one sequence in GenBankan uncultured eukaryote clone recovered from 2500 m at the East Pacific Rise-and had greater than 99% shared identity with another sequence recovered from the East Pacific Rise, also from 2500 m [49]. However, the Leegardiella ASV also had 100% coverage and > 99% identity matches with several sequences recovered from Arctic 495 surface water [69]. This Leegardiella ASV may, therefore, derive from a widely distributed genus that opportunistically becomes more abundant at vent sites. The Oligohymenophorea ASV belongs to the environmental OLIGO 5 clade, but did not have > 96% identity matches to any sequences in GenBank; the closest match was to an uncultured eukaryote clone recovered from 500 m depth off the coast of California 500 [49]. The EukRef-Ciliophora curated Oligohymenophorea phylogeny groups the OLIGO 5 clade with the subclass Scuticiliatia [70]. Interestingly, Zhao and Zu (2016) also found Spirotrichea and Scuticiliatia enriched among ciliates found near a vent site in the Northern Okinawa Trough [71]. While spirotrich ciliates are generally bacterivores [65], Scuticiliatia ciliates are known to be symbionts of aquatic organisms [72], including giant 505 hydrothermal bivalves [3]. If the Scuticiliata ASV is a symbiont, it could provide some explanation as to why more similar sequences have not been recovered elsewhere; coevolution with a dispersal-limited host could also limit symbiont dispersal [73].
However, it is equally likely that sequences with high similarity to the Scuticiliata ASV have not yet been recovered elsewhere simply due to insufficient sampling. 510 A single ASV enriched near vent sites belonged to Picozoa ( Figure 5). The first Picozoa species was described in 2013 [74] and Picozoa ecology remains poorly resolved. Information regarding their distribution derives from environmental molecular surveys, which have established five clades within Picozoa. While all Picozoa are marine, biogeographical patterns vary between clades: clade BP2 has only been found 515 in surface waters, group 2 is mostly found in deep waters, and the other three groups are globally distributed throughout the water column [75]. The enriched Picozoa ASV belongs to one of the cosmopolitan clades (group 1). The sequence shares 100% identity with three others in Genbank, from a variety of environments-surface water in the Arctic [76] and Southern [77] oceans, as well as mesopelagic water near southern 520 California [49].

Conclusions and future directions
While the Okinawa Trough provides an ideal setting to investigate protist diversity and biogeography, this study represents the first comprehensive survey of protists in the 525 region. Our results show that protist communities largely reflect regional oceanography and are influenced by water mass composition in near-bottom waters. Disentangling environmental and geographic effects on communities is challenging since many variables, such as temperature, covary with latitude. However, region consistently contributed more to variance in community composition than any environmental factors 530 in this study. We additionally found distinct protist communities associated with hydrothermally influenced sites, with bacterivore, parasitic, and putatively symbiotic protists enriched in hydrothermally influenced bottom waters. The enriched protists were mostly globally distributed, and many were previously found at vent sites in other parts of the world. These results emphasize the importance of the rare biosphere; most 535 protists seem to be widely distributed, even if rare, and can become opportunistically more abundant under certain conditions. A major limitation of metabarcoding studies with DNA is that it is impossible to know whether sequences derive from actively metabolizing cells, dead or inactive cells, or even environmental DNA; future studies will benefit from incorporating methods that better delineate metabolic state. Additionally, 540 metabarcoding studies suffer from focusing on a short sequence to assess diversity; metagenomic and metatranscriptomic approaches are becoming increasingly feasible for large-scale protist surveys, which will allow for finer resolution of diversity. In the future, comparisons between the Okinawa Trough and other hydrothermal regions will continue to advance our understanding of important protists in hydrothermal 545 ecosystems and how hydrothermal activity influences protist diversity and biogeography in the global ocean.