Metagenome-assembled genomes of phytoplankton communities across the Arctic Circle

Phytoplankton communities significantly contribute to global biogeochemical cycles of elements and underpin marine food webs. Although their uncultured genetic diversity has been estimated by planetary-scale metagenome sequencing and subsequent reconstruction of metagenome-assembled genomes (MAGs), this approach has yet to be applied for eukaryote-enriched polar and non-polar phytoplankton communities. Here, we have assembled draft prokaryotic and eukaryotic MAGs from environmental DNA extracted from chlorophyll a maximum layers in the surface ocean across the Arctic Circle in the Atlantic. From 679 Gbp and estimated 50 million genes in total, we recovered 140 MAGs of medium to high quality. Although there was a strict demarcation between polar and non-polar MAGs, adjacent sampling stations in each environment on either side of the Arctic Circle had MAGs in common. Furthermore, phylogenetic placement revealed eukaryotic MAGs to be more diverse in the Arctic whereas prokaryotic MAGs were more diverse in the Atlantic south of the Arctic Circle. Approximately 60% of protein families were shared between polar and non-polar MAGs for both prokaryotes and eukaryotes. However, eukaryotic MAGs had more protein families unique to the Arctic whereas prokaryotic MAGs had more families unique to south of the Arctic circle. Thus, our study enabled us to place differences in functional plankton diversity in a genomic context to reveal that the evolution of these MAGs likely was driven by significant differences in the seascape on either side of an ecosystem boundary that separates polar from non-polar surface ocean waters in the North Atlantic.


38
The global ocean arguably harbours the largest microbial diversity on planet Earth. To reveal 39 insights into global marine microbial diversity, which is also considered to be the biogeochemical 40 engine of our planet, multiple large-scale international projects, of which TARA Oceans [1] might 41 be the most significant, have been conducted over the past 10 years. The outcome of these projects 42 has provided a step change in our understanding of marine microbial diversity especially in the 43 surface ocean. One of the most important revelations from these initiatives was the realisation that 44 we have significantly underestimated plankton diversity in the past because we were too reliant on 45 culture-dependent methods [2]. As a consequence, some of the groups we thought of as being 46 insignificant in the oceans turned out to be highly diverse with a major contribution to the global 47 carbon cycle and marine food webs [3]. Furthermore, the significance of organism interactions and 48 specifically symbiosis for cycling of energy and matter was revealed, with viral-host dynamics as 49 the most impactful form of these biotic interactions [4,5]. 50 51 Linking functional microbial diversity, estimated by metagenomics and metatranscriptomics, with 52 microbial activity as part of physico-chemical ecosystem properties shed light on how different 53 microbial groups contribute to biogeochemical cycling of elements [1,6,7]. These results built the 54 foundation for estimating how changing oceans due to global warming might impact the diversity 55 and activity of ocean microbes [7,8]. However, to fully explore the role of microbes and their 56 interactions in changing environmental conditions, we must understand their metabolic capabilities 57 in an evolutionary context [9, 10]. As the majority of marine microbes are unculturable and because 58 genomic information is required to reconstruct their metabolic evolution, metagenome-assembled 59 July 2012; Atlantic samples were collected on ANT-XXIX/1 (PS81) between 1 st and 24 th November 105 2012. After water samples were pre-filtered with a 100 μ m mesh to remove bigger zooplankton, 106 they were filtered onto 1.2 μ m Nucleopore membrane filters and stored at -80°C until further 107 analysis. DNA was extracted using the EasyDNA Kit (Invitrogen, Carlsbad, CA, USA) with some 108 adjustments. Cells were washed off the filter with pre-heated (65 ºC) solution A from the kit and the 109 supernatant was transferred into a new tube with one small spoon of glass beads (425-600 um, acid 110 washed) (Sigma-Aldrich, USA). The samples were then vortexed three times in intervals of 3 111 seconds to break the cells. RNAse A was added to the samples and incubated for 30 min at 65 ºC. 112 The supernatant was transferred into a new tube and solution B from the kit was added followed by applied. Reads mapping to the human HG19 genome with over 93% identity were discarded. 124 Remaining reads were assembled with MEGAHIT [27]. The quality-controlled reads were mapped 125 back to the assembly to generate coverage information using seal [28]. Some of these samples were 126 later reassembled using SPAdes [29]. For eukaryotic binning, we used only the MEGAHIT 127 reason, we repeated taxonomic classification for all samples to ensure differences are not due to 139 differences in reference databases or pipelines. 140

141
The IMG/M pipeline identified a number of prokaryotic bins. Samples were binned by JGI as 142 described in [24]. Briefly, each assembly was binned separately, using MetaBat [32] and a 143 minimum contig size of 3000bp. Resulting bins were assessed for completion and contamination 144 with CheckM [33] which also provides initial estimate of taxonomy. While eukaryotic sequences 145 were not excluded from binning, all bins were labelled as archaea, bacteria or unknown by CheckM,146 prompting the distinct binning attempt for eukaryotes. 147 148 For eukaryotic binning, each assembly was binned separately, the process for binning one assembly 149 is given below. Eukaryotic contigs were predicted with EukRep [12], which uses a linear support 150 vector machine to classify sequences as eukaryotic or prokaryotic using k-mer frequencies. 151 Coverage of the eukaryotic contigs was estimated by pseudoaligning the reads from each sample to 152 the contigs using Kallisto [34]. Binning was performed using MetaBat [32] with the coverage 153 information, and a minimum contig size of 1500bp. Completeness and contamination of resulting 154 bins were assessed with BUSCO v3 [35], using the eukaryota_odb9 set of genes. Bins which were 155 less than 50% complete were discarded from further analysis. Completion is defined as the 156 percentage of expected single-copy genes from a selected gene set observed in a MAG, and 157 contamination is defined as the percentage of single copy genes observed in two or more copies. 158 all protists and green algae labelled representative from NCBI were used, as well as two diatom 172

202
Metagenome sequencing and annotation of contigs 203 Sampling stations have been named according to their geographical location in relation to the Arctic 204 Circle. P-stations (polar) were located north and NP-stations (non-polar) south of the Arctic Circle 205 in the North and South Atlantic (Figure 1a). In total, eleven stations were sampled (P1-6; NP1-5) 206 and one metagenome was generated per station except for P3, which was used to sequence two 207 metagenomes from two independent samples obtained from the chlorophyll a maximum layer. contributing between 22% and 27% of the total abundance of reads, whereas they only contribute 218 between 12% and 19% non-polar station. In non-polar stations with lower abundance of eukaryotes, 219 there is a corresponding increase in the abundance of archaea. This is most pronounced in stations 220 NP1 and NP2 (Figure 1b)  The evenness of abundance within phyla could lead to differing levels of coverage for genomes 282 within phyla. A diverse phylum whose species are more evenly distributed would have low 283 coverage compared to a less even phylum, affecting the ability to recover MAGs. To investigate the 284 effect of intraphylum evenness on recovering MAGs, we calculated Simpson's evenness measure 285 using the number of reads assigned to species for all phyla of bacteria with a mean relative 286 abundance equal to or higher than Verrucomicrobia, which was the least abundant phyla for which 287 MAGs were recovered. Only bacteria were used, as for eukaryotic phyla other than Ascomycota 288 there were few reference genomes available for the taxonomic classification database. Phyla from 289 which MAGs were recovered had a lower mean evenness (0.31) than those from which no MAGs 290 were recovered (0.50). A t-test showed this difference is significant for p = 0.01. Ascomycota 291 similarly have a high mean evenness (0.74). MAGs have a greater than 95% ANI to three surface genomes, suggesting a species level 315 relationship. The ANI between these MAGs and deep ocean A. macleodii is below 95%. This is 316 supported by the assignment of contigs within the MAGs based on BLAST searches against the NT 317 database, for both MAGs at least 89% of contigs are assigned to the A. macleodii node or a strain 318 below it. 319 320 Other groups of MAGs display similarly close relationships to each other, but are more distant from 321 reference genomes. Four polar MAGs which placed among Bacteroidetes, P6_35P, P3b_8P, 322 P1_34P, and P3a_27P, share over 95% identity to each other, but less than that to their closest 323 reference genome, an unclassified species of genus Aureitalea. The results of assigning contigs via 324 BLAST searches is similarly mixed, most contigs being assigned to a mix of Flavobacterieaceae or 325 uncultured bacterium. These four MAGs could represent members of the same novel species of 326 Bacteroidetes. 327 328 There are few close relationships between polar and non-polar MAGs evident in the tree. The 329 median distance from a polar MAG to the nearest polar MAG is lower than to the nearest non-polar 330 MAG, and the same for non-polar to non-polar (Supplementary Data 5). In both cases the difference 331 in medians is significantly different at p < 0.01 using Mood's median test. One clade of 332 Bacteroidetes is an exception, where polar MAG P1_21P appears closely related to NP2_14P, 333 NP3_30P and NP4_11P. The closest reference is Croecibacter atlanticus which is in different clade. 334 Pairwise ANI between these mags and the C. atlanticus reference genome is greater than 95%, 335 suggesting these MAGs could represent genomes of the species C. atlanticus. 336 Some MAGs had been classified at a species level by CheckM where the phylogenomic tree does 338 not suggest a similarly specific classification. MAGs P3a_28P, P6_14P, P5_21P, P2_21P, and 339 P6_33P were classified as Coraliomargarita akajimnesis by CheckM. The first three placed closest 340 to C. akajimnesis but with longer branches than observed between taxa from the same species 341 elsewhere in the tree. The latter two lacked the amount of marker genes required to be included in 342 the tree. Looking at the ANI also suggests these MAGs and C. akajimensis are not the same species, 343 no pair shares above 95% ANI. 344 345 Three MAGs, NP1_5P, NP2_10P, and NP3_4P, were classified as planctomycetes by CheckM, but 346 were more ambiguously placed in the phylogenomic tree. This may be a result of only one reference 347 planctomycete genome, for Phycisphaera mikurensi, being used for tree construction.   419 Next, we used read-coverage to analyse MAG distribution across polar and non-polar samples. 420

3) Coverage of MAGs and associations
Where less than 90% of bases had at least one read aligned to them, we considered a MAG to not be 421 This domain has been observed in complete genomes for Bacillariophyta, but in lower numbers. 493 Another transposase related domain, rve_2, was observed in a high numbers of genes in P6_1E and 494 groups of bacteria in the surface ocean are Proteobacteria, Actinobacteria and Bacteroidetes [1,14]. 505 We did not find any significant differences in their read abundance between polar and non-polar 506 metagenomes, which confirms their ubiquity. Surprisingly, Ascomycota was the most abundant 507 group of microbial eukaryotes in our metagenomes without significant geographic differences, 508 which suggests that these fungi are very ubiquitous [56,57]. However, this could be partially due to 509 the greater number of Ascomycota reference sequences available; in the database used for 510 taxonomic classification over half the eukaryotic minimisers mapped to Ascomycota species. They 511 are known to be commensals or parasites of many different pelagic species including algae and 512 animals but their roles and ecological functions in the surface ocean are far from understood [58]. 513 Previous surveys based on phylogenetic marker genes revealed their diversity in the surface ocean 514 including the Arctic [59], but our dataset suggests that at least in samples enriched for microbial 515 eukaryotes, they appear to be more prominent than any autotrophic prokaryotes and eukaryotes 516 regardless of geographic location. For reads from photosynthetic microbes, there appear to be 517 geographical preferences relative to either side of the Arctic circle. Reads from cyanobacteria were 518 more abundant in non-polar waters whereas reads from chlorophytes and bacillariophytes were 519 more abundant north of the Arctic circle. All other groups identified in our metagenomes including 520 the groups of Apicomplexa and archaea had a more patchy geographical distribution. were further north of the 15°C annual-mean isotherm, it is likely causative for the strong 559 demarcation between polar and non-polar MAGs. This suggests that this ecological boundary does 560 not only affect the distribution of individual sequences in complex meta-omics datasets but also the 561 diversity and evolution of genomes. However, some prokaryotic MAGs (e.g. P1_16P, P1_21P, 562 NP_23P, NP_10P) have crossed this boundary, which might indicate the presence of locally adapted 563 ecotypes. None of the eukaryotic MAGs has been found on both sides of the boundary, which 564 suggests that the environmental differences might have had a stronger impact on diversification and 565 therefore adaptation and evolution. These MAG-specific geographical distribution patterns are 566 reflected in cross-kingdom co-occurrences between eukaryotes and prokaryotes in these 567 phytoplankton communities (Supplementary Data 6). The co-occurrence patterns we identified were 568 limited to either the Arctic or Atlantic side of the ecological boundary. Thus, none of them was 569 crossing it, which indicates that co-evolution under significantly different environmental conditions example of how ecological boundaries in the seascape not only influence the spatial heterogeneity 572 of sequences but genomes from co-occurring species in complex phytoplankton communities. 573

574
The separation of Pfams into taxonomic groups confirms the overall taxonomic placements of the 575 MAGs based on concatenated phylogenetic approaches, even though the Pfam separation is less 576 clear for prokaryotes (Figure 4). The latter might be caused by a higher proportion of genetic 577 exchange between bacterial strains compared to their eukaryotic counterparts. Although whole 578 population metagenomics already provided some evidence that there are region-specific Pfams 579 (Figure 4), only the specific analysis of MAGs has revealed significant differences between 580 prokaryotes and eukaryotes in terms of their genetic repertoire in relation to either side of the Arctic 581 circle. The reason for eukaryotic MAGs to have more unique Pfams in polar waters and vice versa 582 for prokaryotes remains enigmatic but suggests that identical environmental conditions and 583 therefore similar selection pressures would impose differences in how prokaryotic and eukaryotic 584 genomes evolve in the surface ocean. It appears that for eukaryotes, a dynamic surface ocean with 585 seasonal mixing and sea-ice formation requires genomes to diversify because of the high abundance 586 of transposable elements [19]. In contrast, prokaryotic MAGs in the same environment were 587 characterised by a high abundance of domains of unknown function. Non-polar environments 588 characterised by higher temperatures, stratified waters and weaker seasonality appear to enrich for 589 PSD domains that are shared by chytochrome c -like proteins for electron transport as part of the 590 respiratory chain in prokaryotes (Figure 4). This potentially suggests that respiratory activity is 591 enhanced in non-polar prokaryotes compared to their polar counterparts, which would be expected 592 according to the positive relationship between temperature and metabolic activity [65]. 593 Interestingly, Pfams related to phosphate acquisition and metabolism in addition to Pfams involved MAG-based analyses of phytoplankton communities not only offer the identification of novel 605 genomic resources, they might reveal unifying concepts responsible for how differences in 606 ecosystem properties shape the genomes of their inhabitants and even species associations, which 607 underpin the evolution of complex microbial communities. 608