Consistent marine biogeographic boundaries across the tree of life despite centuries of human impacts

Over millennia ecological and evolutionary mechanisms have shaped macroecological distributions across the tree of life. Research describing patterns of regional and global biogeography has traditionally focussed on the study of conspicuous species. Consequently, there is limited understanding of cross-phyla biogeographical structuring, and an escalating need to understand the macroecology of macrobial and microbial life. Here we used environmental DNA metabarcoding to explore the biodiversity of marine metazoans, micro-eukaryotes and prokaryotes along an extensive and highly heterogeneous coastline. We expected environmental forces to be responsible for creating similar biogeographical assemblages across kingdoms of life, while life-history strategies and pervasive anthropogenic impacts were predicted to disturb such patterns. We found that despite centuries of significant human activities, biogeographic patterns were consistent across kingdoms, underpinned primarily by abiotic factors. Our work highlights the importance of studying multiple domains of life to understand the maintenance of biodiversity and potential drivers of future change.


Introduction 41 42
Researchers have long recognised the importance of grouping global biota into distinct, 43 geographically separated regions. Delineating these biogeographic areas is important to 44 understand the factors shaping the range limits of species 1 , to designate key areas for biodiversity 45 conservation 2 and in formulating predictive responses to environmental change 3,4 . One of the 46 first efforts to define geographic regions of terrestrial biota were Alfred Russel Wallace's so-47 called 'Zoological Regions' 5 , which included six major regions (hereafter realms) that are still 48 recognised in subsequent classifications today 6 . The drivers responsible for these geographical 49 classifications are frequently environmental conditions or physical barriers. For example, broad 50 tectonic dynamics can produce waterbodies or mountains, and dramatically different climatic 51 conditions are known to define biogeographic boundaries and shape species distributions. 52 Studies have shown that deep divergence in the geographic arrangement of terrestrial biota arose 53 as a result of plate tectonics, while shallow divergence has been most frequently attributed to 54 climate 7 . In aquatic ecosystems, the relative importance of macroecological drivers is less 55 understood, although both climatic (e.g. temperature) 8 and tectonic forces 9 have been identified 56 as key determinants of biogeographic patterns. Recent studies have successfully partitioned the 57 oceans into distinct ecoregions (i.e. a geographically defined area, smaller than realm, that 58 contains characteristic assemblages of species) 1,10 , but the description of marine ecoregions has 59 mostly considered conspicuous or well-described species. Similarly, most macroecological 60 marine research has focussed on easily identifiable eukaryotic species, principally metazoans 11 , 61 although some progress has been made in understanding global patterns of marine microbes 12 . In 62 line with recent studies demonstrating strong cross-phylum interdependence 13 , there is an 63 increasing need to include prokaryotic species in our assessment of biogeographic patterns. The 64 language of macroecology and microbial ecology is similar, both examining the incidence of 65 species across different spatial scales, but these fields have long progressed independently. As a 66 result, only a few studies have simultaneously explored biogeographic patterns of both 67 microscopic and macroscopic life to date 13,14 , and to our knowledge no study has compared 68 broad-scale biogeographic patterns across different kingdoms of life. 69 70 Human activities such as habitat destruction, pollution and the introduction of non-native species 71 are the main drivers of more recent global biodiversity change 3 and therefore have the potential 72 to alter geographic patterns of biota at multiple spatial scales. Cumulatively anthropogenic 73 stressors not only threaten vulnerable native species but whole-community structure and 74 function 3,15,16 . The magnitude and direction of human impacts is complex, with evidence for both 75 gains and losses in local species richness across biomes 17-19 . However, a consistent global pattern 76 is emerging, with a recent and rapid increase in species turnover 18 and an associated increase in 77 community similarity (b diversity) between two or more geographically separated sites 19 . These 78 incidences of increased community similarity are known as biotic homogenisation 20 and are 79 driven by human activities that promote extinctions and the introductions of non-native species. 80 In light of growing evidence that taxonomic, phylogenetic and functional diversity are strongly 81 correlated 21 , the homogenisation of biological communities has the potential to negatively affect 82 ecosystem function. Furthermore, even uncommon species within an ecological community can 83 contribute significantly to ecosystem function 22 , demonstrating the importance of studying 84 inconspicuous species to preserve ecosystems. Cumulatively, biotic homogenisation directly 85 disrupts biogeographic patterns, threatens endemism at both local and regional scales, and has 86 the potential to affect ecosystem function. Studies  PCR was performed in a PCR-free laboratory. The three eDNA samples per site were pooled and 178 three independent technical replicates were sequenced per pool. The process per sequenced pool 179 was as follows. The first PCR reaction was conducted in triplicate in a total reaction volume of 180 20 μl. Each reaction contained 10 μl Amplitaq GOLD 360 2X Mastermix (Applied Biosystems, 181 California, USA), 0.8 μl (5 nmol ml −1 ) of each forward and reverse primers and 2 μl of undiluted 182 environmental DNA template. The reaction conditions for PCR were an initial denaturation step 183 at 95°C for 10 minutes followed by 20 cycles of 95°C for 30 seconds, variable annealing temp 184 (46°C for COI, 50°C for 18S and 55°C for 16S) for 30 seconds, and extension at 72°C for 1 185 minute. A final extension at 72°C was performed for 10 minutes. The triplicate first PCR 186 replicates were pooled and cleaned using AMPure XP beads (Beckman Coulter, California, 187 USA) at 0.8 beads:sample volume ratio following manufacturer's instructions. The second PCR 188 reaction was conducted in a total volume of 20 μl containing 10 μl Amplitaq GOLD 360 2X 189 Mastermix, 0.5 μl (10 nmol ml −1 ) of both forward and reverse primers and 5 μl of undiluted 190 cleaned PCR product from the first reaction. PCR conditions were an initial denaturation step at 191 95°C for 10 minutes followed by 15 cycles of 95°C for 30 seconds, annealing at 55°C for 30 192 seconds, and extension at 72°C for 1 minute. A final extension at 72°C was performed for 10 193 minutes. PCR 2 products were cleaned using AMpure XP beads as above. Negative control 194 samples for the filters, extraction kit, PCR1 and 2 were included in library building and 195 sequenced alongside experimental samples. Products were quantified following the 196 manufacturer's instructions using the NEBNext Library Quant qPCR kit (New England Biolabs,197 Massachusetts, USA) and then normalised and pooled at an equimolar concentration for each 198 marker. Each gene region was sequenced independently using a V3 paired-end 300bp reagent kit 199 on the Illumina MiSeq Instrument with 5% PhiX genomic library added to increase sequence was performed after manual examination of the read quality profile, the forward reads were 211 trimmed to 250bp (COI), 240bp (18S) and 240 bp (16S) and the reverse reads were trimmed to 212 230bp (COI), 220bp (18S) and 220 bp (16S). As each marker was sequenced separately, the 213 differences in read trimming length reflect typical variation in sequencing runs rather than any 214 biological difference. The error rates per run were estimated and used to perform the denoising 215 using the DADA2 algorithm. The denoised sequence pairs were then merged and resulting 216 sequences were truncated if they were outside of the expected gene fragment range (303-323bp 217 for COI, 400-450bp for 18S and 390-450bp for 16S). Chimeras were identified and removed 218 before assembling a sample by ASV (amplicon sequence variant) table for analysis. The 219 denoised ASVs were then curated using the default settings of the LULU algorithm 43 which 220 merges sequences based on sequence similarity and co-occurrence. Assigning taxonomy to a set 221 of unknown sequences is a difficult task, particularly considering many marine species lack 222 DNA barcodes, are undescribed, or have erroneous barcodes in online public databases. We 223 therefore focused our analysis at a higher taxonomic level than species, assigning taxonomy to 224 sequences from the COI and 18S runs as follows: an unconstrained (no limits on sequence 225 similarity or match length) BLAST search (v 2.6.0+) was performed for each sequence against 226 the entire nt database (downloaded on 16 th May 2019), 200 hits per sequence were retained (-227 num_alignments). These sequences were then parsed using an R script as follows. All hits below 228 65% coverage and 85% percentage identity were removed, remaining sequences with percent 229 identity above 97% for COI and 99% for 18S were marked as 'high confidence'. Remaining 230 sequences were marked 'low confidence'. For all remaining sequences with conflicting (non-231 identical) taxonomic assignments an R script selected the lowest common ancestor (broadest 232 taxonomic lineage) that covered all assignments and assigned this taxa to the sequence. The 233 NCBI taxonomy (downloaded 16 th May 2019) was used for the lowest common ancestor and the 234 NCBItax2lin utility from Mahmoudabadi and Phillips 44 was used to convert NCBI taxonomic 235 IDs into lineages. Recent analyses have suggested that only exact (100% identity) matching of 236 sequences to reference data is appropriate for species assignment for the prokaryotic 16S 237 region 45 . The 16S sequences were matched to the SILVA database (release 132) 46 using the 238 default settings of the assignTaxonomy function from the DADA2 package to assign taxonomy 239 at genus level or above. The incidence of NUMTs (nuclear mitochondrial DNA) and chimeras in 240 the final ASV list was evaluated following Supplementary Information 1. 241 242

Abiotic, human impact and geographic data 243
In situ temperature data reflects a snapshot of the total conditions experienced across the lifetime 244 of the species that make up marine communities. Therefore, abiotic variables for the sites 245 covering an ecologically relevant timescale were sourced as follows. High resolution (1 km 2 ) 246 remote sensing average daily sea surface temperature (SST) data derived from multiple satellite 247 missions combined with in situ data 47 was parsed in R to find the nearest datapoint to each site. 248 any smaller values found in non-control samples were set to zero). Samples were then rarefied to 270 the smallest number of reads found within each gene fragment (see Supplementary Table 2). 271 Technical replicates were then collapsed to produce a dataset containing the mean value of 272 rarefied reads per ASV. Finally, ASVs assigned with high confidence (and no cases of multiple 273 matches of equal quality) to the same species in both the COI and the 18S datasets were 274 combined. The taxonomic assignment method used for the 16S data assigns to genus, so no 275 ASVs from the 16S data were collapsed. In order to explore broad scale patterns and avoid 276 difficulties with taxonomic assignment, the number of ASVs per phyla and number of rarefied 277 reads per phyla were collapsed to produce per site assessments of taxonomic composition. For 278 plots phyla represented by less than 2% of ASV counts were concatenated in an 'other' category. show the unique information provided by each variable. The dbRDA ordination allows us to 306 examine linear changes in the beta diversity in response to a number of predictor variables in 307 tandem, and also to explore their relative impact. In addition, we used a generalised additive 308 model to assess the significance of each independent variable in explaining the variation in the 309 two non-metric multidimensional axes from above using a restricted maximum likelihood 2D 310 smoother, implemented in the function ordisurf in the R package vegan. 311 312 It has previously been common to use a partial Mantel test to evaluate the effect of a distance 313 matrix (frequently environmental variables) on a second distance matrix (species composition) 314 while 'cancelling out' the effect of a third matrix (geographic distance). However, this approach 315 has been shown to be sensitive to spatial autocorrelation common in ecological datasets 57  Across all markers the greatest mean ASV richness was found along the southern coast (Fig. 1). 335 However, a one-way ANOVA showed a significant difference (F2,15=7.18, p=0.007) between 336 coastlines only in the 16S data with no difference found in the COI (F2,15=3.64, p=0.051) or 18S 337 data (F2,15=1.82, p=0.195). A post hoc Tukey test of the 16S data (Supplementary Table 3) 338 showed that the east and west coasts had significantly less ASVs compared to the south coast, 339 but that they were not significantly different to one another in terms of overall richness. The 340 taxonomic composition, measured by ASV richness per phyla, was consistent across sites and 341 markers (Fig. 1c). In contrast, the proportion of reads assigned to each phylum exhibited more 342 variation across sites ( Supplementary Fig. 1) in the COI and 18S data, although the 16S data 343 were more even. 344

353
Beta diversity 354 Across all three markers, the non-metric multidimensional ordinations showed clustering of sites 355 consistent with ecoregions previously described in conspicuous metazoan species (Fig. 2). 356 Furthermore, PERMANOVA modelling showed a significant (p<0.001) effect of coastline in all 357 cases (see Supplementary Table 4

Composition-distance-decay relationship 409
Distance-decay slopes for all observations showed an exponential decrease in compositional 410 similarity as the distance between sites increased (Fig. 3). Regression models of log10 411 transformed compositional similarity indicated that this slope was statistically significant in all 412 cases (P<0.005 for all markers, full model output in Supplementary Table 10

Phylum level analyses 429
Removing phyla with fewer than 20 ASVs resulted in 9, 11 and 6 Phyla remaining for the COI, 430 18S and 16S data respectively with a mean of 152 ASVs per phyla. All but two phyla showed a 431 significant difference between coastlines in PERMANOVA models, with Chordata and 432 Cryptophyta in the 18S data having no global statistical difference between coasts. In contrast 433 only three phyla showed a difference in the intercept or slope between artificial and natural sites. 434 Regression models for the Bacillariophyta from the COI data and Epsilonbacteraeota from the 435 16S data showed a significant (p < 0.05) difference in intercept and the Proteobacteria from the 436 16S data showed a difference in both the intercept and slope. Full model outputs for both 437 PERMANOVA and regression models are shown in Supplementary Table 11. of taxon-specific responses to produce patterns that are not universal across systems at different 464 spatial scales 13 . Indeed evidence from metazoan species demonstrates that biogeography can 465 differ among different classes 6 , which is a relatively shallow taxonomic level. Therefore, lack of 466 any universal biogeographic structure across taxa should be considered the null hypothesis for 467 the study of any system, particularly in the ocean where there are comparatively fewer barriers to 468 dispersal and migration. In our study, we observed remarkably similar biogeographic patterns 469 across kingdoms (Fig. 2), rejecting the above null hypothesis, and providing an example of 470 cross-phyla biogeographic congruence. 471

472
Our analyses suggested that temperature was the primary environmental variable structuring 473 marine communities across the study region (Fig. 2). In line with this, global studies of 474 biogeographic patterns have shown a central role of temperature in the structuring of both 475 microbial 12 and larger planktonic life 28,61 across the ocean. There is growing evidence that the 476 range boundaries of marine organisms closely track their thermal limits 4 . Therefore, a general 477 expectation was that regardless of taxonomic kingdom, species would remain within their 478 thermal niche resulting in temperature-structured communities as observed here. In contrast to 479 temperature, salinity had a minor role in structuring the studied communities (Fig. 2) structure, but to a much lesser extent compared to temperature (Fig. 2). The human impact index 492 used here 50 covered a large number of different types of impact (e.g. pollution, shipping 493 intensity) but even this granular approach explained the variation in ASVs observed among 494 sampling sites. Previous work on marine metazoans has shown a strong effect of proximate 495 urbanisation 35 and the ecological drivers produced through anthropogenic activities are well 496 documented 64 . Interestingly, the pervasive and conspicuous urbanisation of the marine 497 environment in the study area showed a much weaker effect on biogeographic patterns than 498 abiotic factors (Fig. 2). Anthropogenic pressures have become a major ecological driver only 499 relatively recently in evolutionary time, with the most dramatic changes in biodiversity occurring 500 within the 21st century 3 . It is clear human activities are altering evolutionary trajectories 64 , either 501 through extinction, speciation or range expansion. However, centuries of human impacts in our 502 study system have not yet demonstrably altered the main biogeographic patterns across 503 kingdoms of life. 504 505 Previous work on biotic homogenisation has shown a dramatic effect on whole communities at 506 both regional 17,65 and global scales 23,24 . Here, we found limited evidence for biotic 507 homogenisation along the South African coastline. More specifically, we found no major 508 differences in the distance-decay relationship between artificial and natural sites. This pattern 509 was consistent for the vast majority of gene regions considered (Fig. 3) and subsets of the genes 510 when individual phyla were analysed (see Supplementary Table 11). However, we did find a 511 significant difference in the distance-decay relationship between site types in the COI dataset as 512 a whole (Fig. 3), with more homogenous community composition between artificial sites. As the 513 majority of COI phyla did not individually provide evidence of biotic homogenisation, the 514 difference in the distance decay relationship observed across the whole COI dataset must be 515 driven by the portion of the community not assigned to taxonomy, making interpretation 516 difficult. The remaining markers and phyla showed no evidence of biotic homogenisation when 517 sites were compared, likely a result of consistent species introductions and extirpations in both 518 natural and artificial sites, despite the higher rate of species introductions in anthropogenic 519 habitats. Further work should incorporate time series data to explore biotic homogenisation, 520 given the significant but minor role of human impact in structuring ecological communities 521 across the region. Additionally, seasonal effects may reveal some variation that might explain 522 biotic homogenisation patterns along highly dynamic coastlines. 523 524 Environmental parameters have a clear-cut effect on the community structure across kingdoms of 525 life, but the proportion of the total observed variance explained in the best RDA model was less 526 than half. Therefore, the comparative role of deterministic (environmental filtering, niche 527 processes, etc.) and stochastic processes (ecological drift, random extinction, etc.) in explaining 528 the observed patterns remains uncertain. The classical deterministic theory (Baas Becking 529 hypothesis 66 ) of microbial biogeography (often summarised as 'everything is everywhere') 530 postulates that due to vast population sizes and dispersal microbes are found in all environments 531 and the variation in abiotic conditions selects for those that make up the vast majority of species 532 in each region. This theory ignores neutral processes which have been shown to have a critical 533 role in structuring microbial biogeography across biomes 67 . In line with previous efforts studying 534 deterministic and stochastic processes across taxonomic kingdoms 68,69 , we found that the 535 majority of the observed variation could not be explained for both prokaryotic and eukaryotic 536 species (Fig 2.). Indeed recent biogeographic research in the oceans has provided both 537 theoretical 70 and empirical 28 evidence of strong biogeographic patterns driven by both stochastic 538 and deterministic forces, but much of the observed variation between communities remains 539 unexplained. Understanding the comparative roles of different community structuring processes 540 requires a more comprehensive view of the observed variance between communities, the 541 interactions between members of these communities, and the broader role of the environmental 542 conditions where they live. 543 544 Several recent innovations will provide valuable data to help uncover the unexplained variation 545 in community structure. For example, the extraction and analysis of sedimentary ancient DNA 546 allows the reconstruction of high-resolution biodiversity change over time 71 . By rewinding the 547 ecological clock, the approach provides evidence to evaluate the role of deterministic processes 548 in reconstructions of community composition through time relative to changes in environmental 549 proxies. In conjunction, the analysis of networks from molecular data facilitate the testing of 550 species interaction hypotheses that may further elucidate the role of ecological interactions (e.g. 551 Djurhuus, et al. 14 ) in structuring biogeographic patterns. Finally, very high-resolution multi-552 spectral remote sensing data (e.g. WorldView-3, <100m) will provide unparalleled insights into 553 the role of environmental forces on the distribution of ecological communities 72 . Taken