Whole-genome scanning reveals selection mechanisms in epipelagic Chaetoceros diatom populations

Charlotte Nef; Mohammed-Amin Madoui; Éric Pelletier; Chris Bowler

doi:10.1101/2022.05.19.492674

Abstract

Diatoms form a diverse and abundant group of photosynthetic protists that are essential players in marine ecosystems. However, the microevolutionary structure of their populations remains poorly understood, particularly in polar regions. Exploring how closely related diatoms adapt to different oceanic ecoregions is essential given their short generation times, which may allow rapid adaptations to different environments; and their prevalence in marine regions dramatically impacted by climate change, such as the Arctic and Southern Oceans. Here, we address genetic diversity patterns in Chaetoceros, the most abundant diatom genus and one of the most diverse, using 11 metagenome-assembled genomes (MAGs) reconstructed from Tara Oceans metagenomes. Genome-resolved metagenomics on these MAGs confirmed a prevalent distribution of Chaetoceros in the Arctic Ocean with lower dispersal in the Pacific and Southern Oceans as well as in the Mediterranean Sea. Single nucleotide variants identified within the different MAG populations allowed us to draw a first landscape of Chaetoceros genetic diversity and to reveal an elevated genetic structure in some Arctic Ocean populations with F_ST levels ranging up to ≥ 0.2. Genetic differentiation patterns of closely related Chaetoceros populations appear to be correlated with abiotic factors rather than with geographic distance. We found clear positive selection of genes involved in nutrient availability responses, in particular for iron (e.g., ISIP2a, flavodoxin), silicate and phosphate (e.g., polyamine synthase), that were further confirmed in Chaetoceros transcriptomes. Altogether, these results provide new insights and perspectives into diatom metapopulation genomics through the integration of metagenomic and environmental data.

Introduction

About half of primary productivity on Earth is supported by aquatic phytoplankton, a phylogenetically diverse group of photosynthetic organisms composed of eukaryotic algae and cyanobacteria that provide essential ecosystem services, from nutrient cycling and CO2 regulation to sustaining superior trophic levels as the base of marine food webs (Nelson et al. 1995; Field et al. 1998; Falkowski et al. 2004). Among phytoplankton, diatoms are pivotal in marine ecosystems since they account for an estimated 40% marine primary productivity and 20% global carbon fixation (Field et al. 1998), as well as being important contributors to global carbon export (Guidi et al. 2016). Moreover, they link silicon and carbon biogeochemical cycles through the synthesis of their elaborate silicified cell wall, surrounded and embedded by glycoproteins that prevent its dissolution (Lewin 1961; Kröger and Sumper 1998). Diatoms are therefore key players also in the global silicon cycle, particularly in the Southern Ocean (Tréguer and de La Rocha 2013; Llopis Monferrer et al. 2021).

Like other pelagic plankton, diatoms are thought to display high dispersion potential due to their rapid generation times and large population sizes, combined with the few apparent oceanic barriers to dispersal (Norris 2000; Cermeño and Falkowski 2009). As a consequence, they are expected to show reduced diversity patterns and biogeographic structure due to homogenised genetic pools (Finlay 2002). Instead, molecular surveys have revealed that diatom populations exhibit tremendous diversity, with more than 4,000 different operational taxonomic units (OTUs) (de Vargas et al. 2015), while being widely distributed across all major oceanic provinces (Malviya et al. 2016) encompassing high latitudes, upwelling regions as well as stratified waters (Kemp and Villareal 2018; Leblanc et al. 2018). The ecological success of diatoms is undoubtedly linked to their complex evolutionary history, which was found to be sustained by horizontal gene transfers from bacteria (Armbrust et al. 2004; Bowler et al. 2008), and mosaic plastid evolution derived from both red and green algae (Moustafa et al. 2009; Dorrell and Smith 2011; Dorrell et al. 2017). This chimeric origin led to specific physiological innovations, such as silicon utilisation for cell protection, efficient nutrient uptake systems allowing rapid responses to environmental fluctuations, a functional urea cycle and potential carbon concentration mechanisms (Armbrust et al. 2004; Bowler et al. 2008; Pierella Karlusich, Bowler, et al. 2021). In contrast to these genetically-encoded functions, diatom genomes themselves appear to display a wide variety of dynamics, through specific transposable elements in the model diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum (Armbrust et al. 2004; Maumus et al. 2009), alternative splicing (Rastogi et al. 2018), as well as gene copy number variation and mitotic recombination between homologous chromosomes (Bulankova et al. 2021). Altogether, these characteristics likely fuel diatom diversity, leading to rapid diversification rates (Bowler et al. 2008) while increasing their ability to respond to changing environmental conditions.

Climate change is expected to induce a range of environmental stressors on phytoplankton (Hays et al. 2005). Among these are increased water temperature and stratification, nutrient paucity and acidification (Bopp et al. 2013). Moreover, a recent study indicated that numerous important diatom genera, such as Chaetoceros, Porosira and Proboscia, are predicted to be vulnerable to climate change, particularly in polar plankton communities (Chaffron et al. 2021). Diatoms appear therefore to be valuable candidates to investigate the fundamental links between their genomes, physiology and population dynamics, in light of predicted environmental changes. Understanding such principles would require access to the genome of natural diatom populations as well as precise contextual information. With the emergence of new sequencing technologies and processes to recover genomes from environmental data, either from metagenomes or single-cell genomes, it is now possible to access the genomic information of organisms by going beyond culture-dependent approaches, allowing us to gain insights into the biology and ecology of natural populations (Iverson et al. 2012; Mangot et al. 2017). This is of particular interest for organisms for which culture conditions cannot be mimicked easily, as for instance organisms thriving in polar environments. These new techniques have enabled the scientific community to access sequences from taxa lacking significant information, such as Euryarchaeota (Iverson et al. 2012), Picozoa (Not et al. 2007; Seenivasan et al. 2013), MOCHs (for Marine OCHrophytes) (Massana et al. 2014), MAST-4 (for MArine STramenopiles) (Mangot et al. 2017), and rappemonads (Kim et al. 2011).

Among diatoms, the genus Chaetoceros holds a particular position as it is the most widespread diatom genus, presenting a worldwide distribution from pole to pole with a prevalence at high latitudes (Malviya et al. 2016; De Luca, Kooistra, et al. 2019; Sommeria-Klein et al. 2021). As such, it is considered an important driver of carbon export and silica sinking in modern oceans (Smetacek 1999; Tréguer et al. 2018). The genus displays a high level of diversity, with 239 accepted species names in Algaebase, in addition to 153 names under debate or yet to be verified (https://www.algaebase.org, as of May 2022). It is generally accepted that the Chaetoceros genus is subdivided into the Hyalochaete and Phaeoceros subgenera, the latter including the type species Chaetoceros dichaeta, though their exact subdivision remains under debate (De Luca, Sarno, et al. 2019). Chaetoceros presents peculiar physiological properties that may be responsible for its prevalent distribution. For instance, some Chaetoceros species have been shown to display unusually high C:N ratios unaffected by light regime and nitrogen source, suggesting a capacity to accumulate superior carbon per nitrogen units than other Arctic diatoms, while showing physiological responses similar to those of more temperate diatoms (Schiffrine et al. 2020). Besides its particular physiological characteristics, Chaetoceros is known to participate in a significant range of associations with a wide variety of microorganisms. The Chaetoceros phycosphere has been shown to gather a diverse set of epibiotic bacteria, the composition of which simplifies along subculturing (Crenn et al. 2018), and is significantly influenced by nutrient availability and host growth stage (Baker et al. 2016). Some associated bacteria have even been observed to favour resistance of Chaetoceros cells against viral infection and lysis compared to axenic controls (Kimura and Tomaru 2014). Chaetoceros can be involved in photosymbioses with epibiotic peritrich and tintinnid ciliates (Gómez 2020), interact with nitrogen-fixing cyanobacteria in diatom-diazotroph associations (Foster et al. 2011; Pierella Karlusich, Pelletier, et al. 2021) and is globally highly connected with other plankton members in the Tara Oceans network of planktonic associations (Vincent and Bowler 2020). Therefore, given the ecological significance of Chaetoceros and its prevalence in regions particularly predicted to be vulnerable to climate change, the present study focuses on describing patterns of genetic diversity and population structure of this diatom genus. To this end, we leveraged 11 metagenome-assembled genomes (MAGs) originating from the Tara Oceans expeditions (Delmont et al. 2022), and that are associated with highly contextualised metadata. We aimed to answer the following questions: How are natural Chaetoceros populations structured? Is geographic distance a barrier to gene flow and, if not, what main ecological factors are correlated with Chaetoceros micro-diversification? What are the genetic functions undergoing selection among different Chaetoceros populations?

Results

Description and comparative analysis of Chaetoceros MAGs

The MAGs (see Supplementary Table S1 for details on their names) displayed genome sizes ranging from 10.6 (ARC_232) to 44.4 (PSW_256) Mbp that are the same order of magnitude as the genomes of the model diatoms T. pseudonana and P. tricornutum (Fig. 1A). The MAG gene numbers, ranging from 5,000 to 17,000 genes, adequately mirrored the genome sizes (Fig. 1C). Overall, the genomes displayed good completion percentages with 7 out of the 11 MAGs having a BUSCO score at least equal to 50% (Fig. 1B). The average percentage of GC ranged from 39% (ARC_267) to 44% (SOC_37), which is lower than those of T. pseudonana and P. tricornutum (Fig. 1E), with a global decreasing percentage of GC from first to third position in the codons (Fig. 1F). Mean gene lengths varied between 400 and 500 bp (Fig. 1D), again lower than those of T. pseudonana and P. tricornutum, but expected because metagenome-assembled genomes are generally more fragmented than genomes sequenced from cultured organisms (Bowers et al., 2017). A principal component analysis (PCA) on eight genome and gene metrics was conducted to test whether the MAGs belonging to the same geographical region displayed common genome characteristics (Fig. 1G-H). The PCA showed three defined groups: one consisting of ARC_116, ARC_217, PSE_171, PSE_253 and SOC_37 that are characterised by a larger number of genes, large genome and gene sizes as well as higher GC content, indicating a more compact genome. A second group consisted of ARC_267 and PSW_256 that display the largest intron sizes, and a last one grouped ARC_232 and SOC_60, the smallest Chaetoceros MAGs. This analysis did not reveal any clustering of the MAGs based on their geographical origins. It must be noted that some of the differences observed regarding for instance genome size may be linked to the reconstruction methods applied rather than to bona fide biological differences, as exemplified by different genome completion levels (see Fig. 1B).

Figure 1.

Characteristics of the different Chaetoceros MAGs. (A) Genome size, (B) level of BUSCO completion, (C) number of genes, and (D) boxplots of mean gene length (mean gene length is represented by the blue dot) of the MAGs and reference diatoms P. tricornutum (P.t.) and T. pseudonana (T.p.). (E) Mean GC content of MAGs and reference diatom genomes. (F) Distribution of GC content along codon positions of the MAGs. (G-H) Principal Component Analysis of different gene and genome metrics of the MAGs, shaded by geographical origin (blue: Arctic Ocean; purple: Mediterranean; orange: Pacific South Eastern Ocean; green: Pacific South Western Ocean; black: Southern Ocean).

Relatively weak average nucleotide identity (ANI) (< 80%) and average amino acid identity (AAI) (< 60%) were observed between the MAGs that were derived from the same geographical areas, suggesting that the populations are not necessarily closely related (Fig. 2A, Supplementary Figure S1A). Pairwise ANI and AAI values were highly positively correlated particularly for pairwise ANI > 85% and AAI > 75% (Supplementary Figure S1A-B). The most elevated ANI (95.1%) and AAI (94%) were observed between the MAGs ARC_217 and PSE_171. Elevated pairwise ANI (> 80%) and AAI (> 60%) values were also observed for PSE_253 and ARC_189 as well as for PSW_256 and ARC_267.

Figure 2.

Comparative analysis of Chaetoceros MAGs and their coding potential. (A) Average Nucleotide Identity. (B) Concatenated multigene ML tree generated with RAxML (100 bootstrap), based on 83 BUSCO gene clusters with the Chaetoceros MAGs highlighted in gold. Bootstrap values ≥ 50% are indicated. (C) Upset plot representing the top 30 shared orthogroups among the MAGs, with the orthogroups shared by all genomes highlighted in blue. (D) Frequency of the most variable amino acids compared to their global means across all MAGs. The MAG respective number of genes is indicated for comparison. (E) Heatmap of 46 PFAM domains displaying the most variable copy number (SD ≥ 10) among the MAGs (see Supplementary Table S2 for details).

We then evaluated the relatedness of the Chaetoceros MAGs between one another and with respect to other taxa based on a concatenated tree of 34 taxa for 83 single-copy nuclear genes (total 42,525 amino acids) across the eukaryotic tree of life (Fig. 2B). We obtained a relatively good phylogeny of the taxa, with a monophyly of the diatoms. As observed previously, MAGs from a close geography did not appear to resolve together. In accordance with the ANI/AAI patterns, three MAG pairs resolved together with high support values, namely ARC_189 and PSE 253, ARC_217 and PSE_171, and ARC_267 and PSW_256 (Fig. 2B). Both ARC_189 and PSE_253 MAGs resolved at the level of C. dichaeta, which suggested that they belonged to the Phaeoceros subgenera. Conversely, the 9 other MAGs resolved in the same clade as C. affinis, C. curvisetus and C. debilis, suggesting closeness to the Hyalochaete subgenera. The MAGs ARC_267, PSW_256, ARC_232 and MED_399 resolved in clades close to C. affinis CCMP159. The MAG SOC_60 appears to be most closely related to C. debilis with high support (bootstrap value > 90%), while ARC_116 was closely related to the C. neogracile RCC1993 strain but not to C. neogracile CCMP1317.

Interestingly, identifying the orthologous genes shared between the MAGs showed that the most elevated number of orthogroups were not shared by the 11 genomes, as one would have expected, but rather by the two MAGs ARC_217 and PSE_171 (969 orthogroups) (Fig. 2C). The second highest set of orthogroups was shared between ARC_189 and PSE_253 (612 orthogroups). The third group consisted of all the genomes except ARC_232 and SOC_60 (500 orthogroups), suggesting that the two excluded MAGs are the most divergent. This pattern is consistent with the result of the multigene phylogeny, and may be partly explained by the fact that both these genomes are the smallest in size, number of genes and BUSCO completion (Fig. 1A-C). Another explanation would be an artifactual result due to low genome completion. A total of 472 orthogroups appeared common to the 11 genomes. MAG-specific orthogroup sets were retrieved, with PSW_256 displaying the highest number of MAG-specific orthogroups (303), which may be because this genome is the largest both in size and gene number (Fig. 1).

Clear discrepancies were observed regarding the proportions of amino acids encoded in the MAGs. A first group, composed of ARC_116-217 and all genomes associated to PSE and SOC, exhibited an enrichment in arginine (R) as charged residues, in alanine (A) for the hydrophobic ones as well as in cysteine (C); the other group, consisting of ARC_232 and 267, MED_399 and PSW_256, formed a monophyletic group, and showed a larger proportion of aspartate (D), glutamate (E) and lysine (K) as charged residues, and in asparagine (N) (Fig. 2D), suggesting a replacement of some residues displaying the same chemical properties. Such differences in amino acid composition of predicted proteomes have been previously identified in a study investigating more than 100 algal genomes, in which saltwater algae encoded higher proportions of D, E and K residues and lower proportions of A and C compared to freshwater species (Nelson et al. 2021).

We further conducted a comparative analysis at the level of PFAM domains to test the relative genome enrichment in putative biological functions. Most MAGs showed a median of two PFAM domains per gene, with ARC_116, ARC_189 and SOC_37 among the most complete MAGs, displaying 3 domains per gene (Supplementary Figure S2). A selection of the most variable PFAM domains among the MAGs was conducted on those displaying a standard deviation at least equal or superior to 10, leading to the identification of 46 PFAMs (see Supplementary Table S2 for detailed information) whose relative enrichments are represented on a heatmap (Fig. 2E). Again, no clear distinction between the MAGs based on their geographical localisation was observed. The group formed by ARC_232 and SOC_60 displayed globally the same patterns of PFAM enrichment and depletion compared to the other MAGs, with comparatively less domains, a pattern consistent with their smaller size. These MAGs were particularly depleted in chaperone associated domains (PF00004 and PF00226) and in an IQ calmodulin-binding motif (PF00612) involved in protein binding. ARC_232 showed a dramatic depletion of a pentatricopeptide repeats domain potentially involved in RNA metabolism (PF13812). A group gathered PSE_171, ARC_217, ARC_116, SOC_37 and ARC_189, most of them showing the same gene and genome characteristics. Compared to all the other MAGs, ARC_189 was dramatically enriched in domains associated with regulators of chromatin structure, domains containing repeat motifs and chromosome condensation repeats (PF00415, PF13540, PF08238, PF12796, PF13517 and PF00651). ARC_217 and PSE_171 where both enriched in the ATP-binding domain of ABC transporters and cyclins (PF00005 and PF00134) while the latter showed a strong depletion in PT/TIG domains (PF01833), a putative family of transcription factors. On the other hand, ARC_116 displayed a strong enrichment in heat shock transcription factor and chlorophyll a-b binding protein domains (PF00447 and PF00504) while SOC_37 displayed a significant enrichment in RNA polymerase Rpb1 C-terminal domain (PF05001). Another group consisted of the genomes PSE_253, ARC_267, MED_399 and PSW_256, all exhibiting medium BUSCO completion. Both PSE_253 and ARC_267 showed an enrichment in zinc-finger domain PF00098 while MED_399 was enriched in domains involved in ubiquitination processes (PF00066 and PF13475). Finally, of all the MAGs, PSW_256 was found strongly enriched in reverse transcriptase (PF00078) and transposase IS4 (PF13843, PiggyBac transposon) domains, a pattern that appeared consistent since this genome was the largest.

Genome-resolved biogeography of Chaetoceros

The biogeographical distribution of the MAGs was investigated by estimating the proportion of Tara Oceans metagenomic reads that mapped on the eleven genomes. After filtration of the reads based on their identity and coverage (see Materials and Methods, Supplementary Figures S3 and S4 for details), a final number of 20 different sampling stations and/or depths were conserved. The Chaetoceros MAGs together recruited 0.71% of reads from these stations (all size fractions combined) and presented an amphitropical distribution, with a large prevalence in the Arctic Ocean and minor dispersal in the Pacific and Southern Oceans, as well as in the Mediterranean Sea. The relative contribution of the MAGs to the total metagenomic reads ranged from 0.03% for SOC_60 in the Southern Ocean to a local maximum of up to 4% for ARC_116 in the Arctic Ocean (Supplementary Table S3A-B). For a given depth, some stations (i.e., TARA_92, TARA_173, TARA_188, TARA_189, TARA_194, TARA_201 and TARA_205) appeared to harbour two and up to three different Chaetoceros MAGs (Fig. 3A), which suggests that our approach was precise enough to discriminate a mixture of populations from strains that are expected to be closely related. The four MAGs ARC_116, ARC_217, ARC_232 and SOC_37 were retrieved at both the surface and the deep-chlorophyll maximum (DCM) of the water column, with ARC_116 being the most widespread and dominant Chaetoceros MAG at the surface and SOC_37 at the DCM. The six MAGs ARC_189, ARC_267, PSE_171, PSE_253, PSW_256 and SOC_60 were found only at the surface of the water column while MED_399 was the only one retrieved solely at the DCM. Of note, SOC_37 was expected to be associated with both the Arctic and Southern Oceans (see Supplementary Table S4 in Delmont et al. (2022)), but it appeared to be restricted only to the Arctic Ocean in our analysis, likely due to the stringency of our filtration parameters. The co-occurrence patterns of the MAGs were addressed by performing pairwise correlation tests between the eleven MAGs using their metagenomic abundance. It appeared that none of the MAGs displayed significant co-occurrences or mutual exclusions, except for MAGs PSE_171 and PSE_253 (p-value = 0) that were associated with the same unique station (TARA_92) (Fig. 3B). Each of the different MAGs appeared restricted to a distinct environment characterised by a narrow range of temperature between 1 and 4 °C for the same MAG (Fig. 3C). Conversely, most of the MAG populations appeared to be distributed across a rather large spectrum of iron, silicate, phosphate and nitrate concentrations (Supplementary Figure S5).

Figure 3.

Biogeography of the Chaetoceros MAGs throughout Tara Oceans sampling sites. (A) Relative contributions of the Chaetoceros MAGs in surface (SUR) and deep-chlorophyll maximum (DCM) depths. (B) Pairwise correlation patterns of the MAGs (Pearson’s correlation rho shaded if superior to 0.05). (C) Bubble plot corresponding to the measured temperature at the sampling stations where each MAG was detected.

Investigating genomic differentiation between Chaetoceros MAGs

Chaetoceros SNV landscape

We then investigated the level of genomic diversity in the different Chaetoceros populations by identifying for each MAG their respective single nucleotide variants. The number of variants of the Chaetoceros populations ranged from 1e5 to ~ 8e5 (Fig. 4B, Supplementary Figure S6), accounting for a total of 8,425,600 variants recruited. Globally, no significant correlation between the genome coverage and number of variants retrieved was observed (Pearson’s correlation rho = −0.22; p-value = 0.26) (Fig. 4A, Supplementary Table S4). Some MAGs, such as ARC_217, ARC_232 and PSW_256 nonetheless displayed variant patterns that followed the number of reads. Consequently, we assumed that the number of variants did not necessarily follow genome coverage and was rather dependent on the genome considered. The highest local SNV level ranged from 0.63% for ARC_217 at station TARA_205 (SUR) to as much as 2.34%, observed for ARC_116, which exhibited the highest range in terms of SNV levels, at station TARA_194 (DCM) in the Arctic Ocean (Fig. 4B, Supplementary Figure S6). This suggests that the average nucleotide identity of each MAG population to its respective consensus genome ranged between about 98% and 99%, a strong indicator illustrating the occurrence of local species harbouring non-negligible micro-diversity traits in different populations. We did not observe a significant correlation between the amount of SNVs and latitude but we noticed a rather strong correlation between SNVs and longitude (Supplementary Figure S7), which might be explained by the effect of oceanic currents in the Arctic Ocean. The most elevated mean SNV level depending on the MAG oceanic regions was observed for the Chaetoceros populations in the Pacific South Eastern Ocean (1.21%), followed by those in the Arctic Ocean (1.16%), the Southern Ocean (0.81%) and the Mediterranean (0.76%). Transition to transversion ratios ranged from 1.31 (ARC_217) to 1.96 (PSW_256), with a global average of 1.50 (Supplementary Figure S8). Overall, most of the Chaetoceros population variants were observed in the coding regions (49.28% mean value), followed by the intergenic (32.54%), UTR (14.11%) and intronic (4.07%) regions, a pattern consistently observed independently of the genome considered (Supplementary Figure S9). The variant effects were mostly missense mutations (53.29% mean value), followed by silent ones (45.97%) and a slight proportion of nonsense mutations (0.74%) (Supplementary Figure S10).

Figure 4.

Population genomic analyses of Chaetoceros MAGs. (A) Scatterplot representing the number of SNVs compared to the number of reads for all samples considered in this study with Pearson’s correlation rho (n = 29). (B) Relative number of SNVs within ARC_116 populations. (C) F_ST distribution profile of ARC_116. (D) Pairwise F_ST matrix of ARC_116 populations. (E) Global pairwise F_ST matrix of all MAGs among Arctic Ocean regions (refers to ARC_116, ARC_189, ARC_217, PSW_256 and SOC_37). D: deep-chlorophyll maximum; S: surface.

Analysis of Chaetoceros population structure

We then investigated the level of population structure of the MAGs using the previously identified SNVs associated with the different populations and computed their pairwise fixation index (F-statistic or FST). This index, which can range from 0 (no genetic differentiation) to 1 (complete differentiation), measures the extent of genetic inbreeding between populations using allele frequency, and is thus a proxy of their genetic distance (Wright 1965; Wright 1984). Among the detected variable loci, we selected the SNVs associated with the different populations for the MAGs that were present in at least two different sampling points (Tara Stations and/or depths), i.e., for ARC_116 (9 samples), ARC_217 (4), ARC_232 (2), PSW_256 (3) and SOC_37 (5). Plotting the population-wide F_ST distributions revealed globally unimodal patterns, indicative of a single species for each MAG (Fig. 4C, Supplementary Figure S11). ARC_116, which was the largest extant MAG, showed populations from stations TARA_175, TARA_188 and TARA_189 appearing to be genetically similar (pairwise F_ST of 0), consistent with their respective distance, and indicating that they formed one homogenous population (Fig. 4D). Noticeably, this MAG showed great (≥ 0.15) genetic differentiation at the DCM of station TARA_194, according to Wright’s guidelines for analysing bi-allelic loci (Wright 1984), and surface population at the same station also displayed pairwise FST values distinct from the others but of lower magnitude, ranging from moderate to high (~0.05-0.15) genetic differentiation. This difference with the other ARC_116 populations may be at least partly explained by local marine currents, as station TARA_194 is influenced by inflow waters from the Pacific Ocean through the Bering Strait, and TARA_193 is enriched in cold waters circulating back to the Pacific Ocean. As evidenced by a pairwise F_ST of 0.14, both TARA_194 depths appeared to have moderate genetic differentiation among one another. This result suggested that this Tara station displayed rather important stratification patterns, resulting in the genetic differentiation of two sub-populations. Examining the metadata associated with this station revealed that the DCM was sampled 30 m deeper (35 m) than the surface (5 m) (Supplementary Table S5). Moreover, this sampling point showed distinct patterns of oxygen concentration and salinity between the depths as well as a phosphate enrichment at the DCM. The MAG ARC_217 showed relatively low pairwise FST values indicating elevated connectivity between the populations, except between stations TARA_205 and TARA_173 (both SUR) located in the Davis-Baffin Bay and in the Kara-Laptev Seas (Supplementary Figures S12 and S13A). Both genomes ARC_232 and PSW_256 showed genetically similar populations (Supplementary Figure S13B-C), with PSW_256 exhibiting populations in stations TARA_113, TARA_119 and TARA_120 equally distinct from one another genetically. This pattern might be linked to these Tara stations being located between the Gambier Island archipelago in French Polynesia. Finally, the SOC_37 genome displayed globally low genetic differentiation, albeit slightly more elevated when compared with station TARA_173 at the surface than with the others. No clear differentiation with station TARA_201 was observed, although it is located at the opposite side on the Davis-Baffin Bay (Supplementary Figures S12 and S13D), suggesting a low effect of dispersal on their connectivity.

Global population structure among Arctic Ocean regions

We further compared the genomic differentiation of Chaetoceros populations between the Arctic regions, which were divided into five groups depending on their localisation: Pacific-Arctic, Kara-Laptev, Atlantic-Arctic, Arctic archipelago and Davis-Baffin. The most elevated genomic differentiation was between the Kara-Laptev and Davis-Baffin regions, which consistently appear opposite from one another (Fig. 4E; Supplementary Figure S12). The Chaetoceros populations located in the Arctic archipelago displayed globally moderate genetic differentiation compared to those in the other regions, with the largest difference being with the Davis-Baffin population. Low genetic structure was observed compared to the populations from the Atlantic-Arctic and Pacific-Arctic, both of which are located in zones with water influx from either the Atlantic or Pacific Oceans. Low differentiation was also noted between the Kara-Laptev and Pacific-Arctic. Finally, no genetic differentiation was observed between populations in the Kara-Laptev and the Atlantic-Arctic regions. This was rather expected given that a unique Tara Oceans Station in the Atlantic-Arctic, TARA_175, appeared to harbour Chaetoceros populations in the present analysis and is located at the interface between the Kara-Laptev regions (Supplementary Figure S12).

Examining the correlation of abiotic parameters with population structure

The above results show that, depending on the MAG considered, there are noticeable patterns of population structure among Chaetoceros populations. We then investigated the correlation of different environmental parameters and geographic distance with the genetic differentiation of the MAG populations. For this, we selected the MAGs that were present in at least three different stations or depths and with a variance superior to zero, namely MAGs ARC_116, ARC_217 and SOC_37, all of them having populations in the Arctic Ocean. Pairwise-F_ST values between the MAG populations were modelled depending on a range of environmental parameters and Euclidean distance by applying a linear mixed model (LMM), as described in Laso-Jadart et al. (2021), to perform a variance partitioning analysis. The fixed part of the unexplained variance was below 10% for the three analyses, and was therefore considered negligible. We further applied Mantel tests to verify these results. For ARC_116, most of the genomic variation was correlated with silicate concentrations (35%), followed by phosphate (14%) and nitrite concentrations (13%), but they were not validated by the Mantel tests (Supplementary Figure S14A-C). A small correlation with the geographic distance was noted (7%), which was validated by the Mantel test (Supplementary Figure S14D).

On the other hand, iron was the environmental parameter correlated the most (77%) with genomic differentiation of ARC_217, which was not significantly validated by a Mantel test (Supplementary Figure S15). Finally, it was phosphate (45%), nitrite (14%), silicate (13%) and temperature (11%) that were the most correlated with genomic differentiation of SOC_37 populations. Phosphate, nitrite and silicate were not validated by the Mantel tests but temperature was (Supplementary Figure S16A-D). The fact that some of the Mantel tests were not validated despite a strong correlation of one parameter in the variance partitioning analyses was expected given that most of our samples were small, particularly for ARC_217 and SOC_37. It is indeed evident that most of the data points for these two MAGs are fairly dispersed around the regression curve, with minor exceptions (Supplementary Figures S14-S16). Moreover, Mantel tests may sometimes give biased p-values given the autocorrelation of some environmental variables examined in ecology studies (Guillot and Rousset 2013; Diniz-Filho et al. 2013). Taking this into account, some Mantel tests nonetheless confirmed the correlation between abiotic parameters and genetic differentiation of Chaetoceros populations observed in the variance partitioning analyses. From this we conclude that micro-diversification appeared to occur in response to different environmental factors in at least some closely related Chaetoceros populations.

Identification of Chaetoceros genes under selection

Given the correlation observed between abiotic parameters and the Chaetoceros population structure patterns, we then examined whether some genes were undergoing selection. This analysis was conducted on the MAGs presenting variants in at least 3 different populations with discriminant F_ST values, namely MAGs ARC_116, ARC_217 and SOC_37. The LK distribution of the respective loci followed the expected chi-square distribution (Supplementary Figure S17), indicating that the loci followed the neutral evolution model of a single species. We identified several strong candidates potentially under positive selection on the genome contigs, that is 28 loci for ARC_116, 1,116 loci for ARC_217 and 1,101 loci for SOC_37, representing 0.17% (28); 6.89% (802) and 6.60% (534) of the genome contigs, respectively (Supplementary Table S6A-C). Globally, an inspection of the B-allele frequencies (BAF) showed that loci under selection were derived principally from Tara Oceans stations 175, 188, 193 and 201 (all SUR) for ARC_116, while it was mainly Tara Oceans station 201 (SUR or DCM) for ARC_217 and stations 173 and 201 (both DCM) for SOC_37.

We examined the Gene Ontology (GO) terms generated during the Interproscan analysis to gain insights into the functional repertoire of the genes under selection. All three MAGs presented GO terms associated with cellular components, with ARC_116 displaying a more elevated proportion of genes associated with membrane domains (Supplementary Figure S18). Regarding molecular functions, the three MAGs displayed an elevated proportion of genes associated with binding and catalytic activities, and ARC_217 showed the most diverse GO terms in this category. The GOs associated with biological processes were mostly represented by general cellular and metabolic processes for all three genomes, followed by cellular organisation or biogenesis as well as localization.

To focus our analysis on describing the potential functions associated with the genes under selection, we selected the SNVs located within a gene sequence with an assigned PFAM domain. A total of 31 PFAM domains were found in the genes harbouring the loci under selection for ARC_116, while we identified 697 for ARC_217 and 805 for SOC_37. Among the PFAMs associated with ARC_116 loci 54.84% (17) presented an associated GO term while the percentage was lower for ARC_217 (46.92%) and SOC_37 (40.75%). Most of the outlier loci were associated with coding or untranslated regions (UTRs), with a minor contribution of loci within intronic regions (Supplementary Figure S19). Some variants were responsible for loss of function events, such as stop codon gain and frame-shift mutations. We subsequently searched for domain functions potentially linked to the environmental parameters possibly driving the micro-diversification patterns.

Strikingly, all variants under selection within the ARC_116 populations were completely absent from station TARA_189 (DCM). Most of the non-synonymous loci displayed domains associated with kinases, oxidoreductases and transferases. Among the loci under selection were domains involved in redox balance, such as missense or 3’ UTR variants in genes harbouring glutaredoxin (PF00462, PF13417) and cytochrome c oxidase domains (PF02683), all mostly fixed in surface populations of stations TARA_175, TARA_193 and TARA_201 (Supplementary Table S6D). Other loci included synonymous, missense and 3’ UTR variants in domains involved in intracellular transport (e.g., PF04811 and PF08318). Both these domains appeared associated with endoplasmic reticulum to Golgi transport and their variants were almost fixed at the surface of stations TARA_175, TARA_188, TARA_193 and TARA_201.

Conversely, both ARC_217 and SOC_37 showed loci under selection for potential chlorophyll-binding and CobW proteins. The former are found in light-harvesting complexes of the photosynthetic apparatus while the latter form a large family of metal chaperones associated with metal homeostasis processes either with zinc, iron or cobalt molecules (Haas et al. 2009; Hsieh et al. 2013). Interestingly, both these functions may have a link with iron availability status, as phytoplankton can cope with iron limitation through remodelling of light-harvesting complexes, while some CobW proteins may exert iron-responsive patterns (Behrenfeld and Milligan 2013; Kotabova et al. 2021). Comparing the frequency of these variants among the two genomes, both ARC_217 and SOC_37 populations were found at stations TARA_201 (Arctic archipelago) and TARA_173 (Kara-Laptev) and showed elevated frequency of this variant at the former, whereas those at station TARA_173 exhibited lower frequency (Supplementary Table S6E-F). Looking at the environmental metadata of these stations using the PANGAEA database (Ardyna et al. 2017; Guidi, Morin, et al. 2017; Guidi, Picheral, et al. 2017), we observed that station TARA_173 was characterised by higher iron and nitrate levels but was lower in phosphate (Fig. 6E, Supplementary Table S5). ARC_217 populations were also found at station TARA_205 (SUR), which displayed the lowest iron concentration (51% and 38% lower than the ones of stations TARA_173 and TARA_201 (both SUR), respectively, Fig. 6E, Supplementary Table S5), where the CobW variant was completely fixed and the chlorophyll-binding domain was completely absent. Moreover, we identified a synonymous variant of ARC_217 located in gene TARA_ARC_108_MAG_00217_000000002161.2.2 within a flavodoxin domain (PF00258). Flavodoxin proteins are known to over-accumulate in iron-limited conditions over their iron-containing counterpart ferredoxin (La Roche et al. 1996). This SNV was almost fixed at the surface of stations TARA_201 and TARA_205 (Supplementary Table S6E). A variant within an iron-sulfur cluster associated domain (PF02657) involved in redox and regulation of gene expression processes (Johnson 1998) was also found almost fixed in TARA_173 (DCM) and TARA_201 (SUR) SOC_37 populations (Supplementary Table S6F).

Figure 5.

Correlation between environmental parameters variation and Chaetoceros genomic differentiation. Barplots of variance partitioning analysis results for (A) ARC_116, (B) ARC_217 and (C) SOC_37. Ammo.: ammonium; Phos.: phosphate; Silic.: silicate; Temp.: temperature.

Figure 6.

Selection of variants in Arctic Chaetoceros populations. (A) and (C) represent Manhattan plots in a 10 kb window around the variants of interest, shown for ARC_217 (ISIP) and SOC_37 (spermidine/spermine synthase). Red dots correspond to SNVs considered under selection (q-value < 0.15). (B) and (D) represent the barplots of the B-allele frequency (BAF) for the respective loci of interest depending on the population considered. (E) Range-transformed heatmap of abiotic parameters for the Tara Oceans stations in the Arctic Ocean where the Chaetoceros MAGs are present (see Supplementary Table S5 for raw values). The brightest yellow colour represents the most elevated values for any given parameter in the dataset, while the darkest purple indicates the lowest. D: deep-chlorophyll maximum; S: surface.

Most notably, one variant of ARC_217 appeared to be located in a gene encoding a low iron-inducible periplasmic domain (PF07692). This SNV corresponded to a synonymous mutation located in the coding region of gene TARA_ARC_108_MAG_00217_000000001530.105.1. An examination of the Manhattan plot of this SNV revealed few drafted variants around the loci under selection, indicative of a potential soft selective sweep, and the BAF showed that selection of this variant was occurring on Chaetoceros populations from stations with higher iron concentrations (TARA_173 and TARA_201; both SUR) (Fig. 6E, Supplementary Table S5), while it was completely absent from TARA_205 (SUR) (Fig. 6A-B). Next, we searched for potential homologue candidates in Phaeodactylum tricornutum by aligning the corresponding ARC_217 protein on the Phatr3 proteome (Rastogi et al. 2018) using BLASTp. We identified protein B7FYL2 (https://bioinformatics.psb.ugent.be/plaza/versions/plaza_diatoms_01/genes/view/ptri151180) which was previously identified as an iron starvation-induced protein (ISIP) ISIP2a (Bowler et al. 2008), and showed an e-value of 1e-13. The corresponding gene located within a genomic region marked by histone post-translational modification (PTM) in H3K4me2, a mark suggested to associate with expressed genes in P. tricornutum (Veluchamy et al. 2015). The mark was reduced in nitrate-limited conditions compared to repletion. We further compared homologue sequences among the 8 Chaetoceros transcriptomes from Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) that were used for the phylogeny reconstruction (see Results) and retrieved 16 candidate sequences from 7 transcriptomes (see Supplementary data file S1, Supplementary material online). Aligning these sequences to the reference gene allowed us to identify significant identity at the SNV position for 7 (44%) sequences, including 2 sequences from C. dichaeta that displayed the same SNV (C>T) as the loci under selection (Supplementary Figure S20, Supplementary data file S2, Supplementary material online). Although the SNV observed in the ARC_217 gene sequence was predicted to induce a synonymous mutation, we searched whether the nucleotide substitution modified the predicted RNA secondary structure through an analysis conducted on the LinearFold and RNAfold web servers (Gruber et al. 2008; Huang et al. 2019), as silent mutations may impact haplotypes as for instance changes in RNA secondary structure (Sauna et al. 2007). No clear change of RNA secondary structure was predicted by LinearFold (Supplementary Figure S21A-B). RNAfold outputs showed that the reference sequence displayed an ensemble diversity (i.e., an average base-pair distance between all the structures in the thermodynamic ensemble) of 1514.62, while the mutated sequence showed a value of 1486.24. Slight differences of free energy minimisation and centroid (structure with the minimal total base-pair distance to all structures in the ensemble) structures (Supplementary Figure S21C-D) were observed. These results suggested a possible minor impact of the mutation on the RNA folding structure.

Next, an examination of SOC_37 loci under selection revealed genes encoding functions associated with phosphate metabolism, such as a putative cytosolic domain of 10TM phosphate transporter (PF14703), protein and histidine phosphatase domains (PF00481 and PF00300) (Supplementary Table S6F). In accordance with its enrichment in RNA polymerase Rpb1 C-terminal domain (see Results), many genes under selection harboured PFAM domains associated with RNA, as for instance an elongation factor of RNA pol II, RNA recognition motifs and binding domains, in addition to a reverse transcriptase involved in transposable element activity, most of them almost fixed in station 201 (DCM) while absent from station TARA_173 (SUR) and under selection at the DCM of stations TARA_173, TARA_180 and TARA_194 (Supplementary Table S6F). Other functions notably included domains potentially involved in methyl transfers and epigenetic mechanisms regulating gene expression (e.g., PF00850, PF00856 and PF08123), all encoded by genes under selection in TARA_173 and TARA_201 (DCM). Most remarkably, loci under selection included two variants potentially involved in polyamine biosynthetic processes. One SNV was located in the UTR of carbamoyl phosphatase domains (PF00988 and PF02786) and the other in the coding region of a spermine/spermidine synthase domain (PF01564) in gene TARA_SOC_28_MAG_00037_000000001696.30.1, causing a non-synonymous mutation (p.Met1063Leu). Polyamines such as spermine and spermidine are involved in frustule formation through the production of long-chain polyamines (Kröger et al. 2000) using carbamoyl phosphate synthase (Armbrust et al. 2004). Inspection of the Manhattan plot of the polyamine synthase variant suggested a potential hard selective sweep signature, and the BAF showed that it was almost fixed at the DCM of stations TARA_173 and TARA_201 (Fig. 6C-D) which were characterised by the lowest oxygen concentrations in our analysis (Fig. 6E, Supplementary Table S5). The carbamoyl phosphate variant showed similar patterns and appeared more frequent in stations with less nitrate and phosphate (Supplementary Tables S5 and S6F). A sequence similarity search for a polyamine synthase homologue in the Phatr3 proteome allowed us to identify the protein B7FPJ4 (https://bioinformatics.psb.ugent.be/plaza/versions/plaza_diatoms_01/genes/view/ptri221670) involved in spermidine biosynthesis (Bowler et al. 2008), with an e-value of 3e-163. The gene coding this protein exhibited significant changes of expression in nitrate depletion compared to repletion (- ~0.5 fold change; Levitan et al. (2015)) and in phosphate depletion compared to repletion (~2.9 fold change; Cruz de Carvalho et al. (2016)). Moreover, this gene located in region marked by H3K9me2 and H3K4me2 PTMs in P. tricornutum (Veluchamy et al. 2015). Searching among the 8 Chaetoceros transcriptomes from MMETSP yielded 17 candidate sequences from 3 transcriptomes (see Supplementary data file S3, Supplementary material online). Among them, 16 (~94%) sequences showed significant identity at the SNV position with 1 sequence from C. debilis displaying the same SNV (A>C) as the loci under selection (Supplementary Figure S22, Supplementary data file S4, Supplementary material online). Predictions using LinearFold confirmed clear changes of RNA secondary structure between reference and mutated gene (Supplementary Figure S23A-B). Moreover, RNAfold predictions agreed with this pattern as they indicated an ensemble diversity of 957.66 and 1039.87 for the reference and mutated sequences, respectively, along with significant modifications of free energy minimisation and centroid structure (Supplementary Figure S23C-D). Overall, these findings illustrate the process by which a single mutation may have a direct effect on the electrochemical properties of RNA structures, as well as potentially impacting the biochemical kinetics of the protein.

Discussion

Our understanding of both the ecology and biological functions of marine algae has progressed considerably with the help of molecular-based methods, generating an increasing number of genomes now reaching more than several hundred (Hanschen and Starkenburg 2020; Grigoriev et al. 2020). Nonetheless, one drawback of laboratory-generated genomes is their potential lack of representativeness of species that thrive in the environment. As an example, it has been demonstrated that P. tricornutum cultures artificially selected individuals in the population, thus reducing the overall diversity of their genetic pool leading to genetic convergence of the strains; with nutrient-replete conditions favouring somatic mutations leading to the loss of function of some genes that are potentially important in highly fluctuating environments (Helliwell et al. 2015; Rastogi et al. 2020). Collectively, these factors may constitute a limit to the understanding of metabolism and locally relevant genomic functions. Moreover, interrogating Tara Oceans metagenomes with the Fragilariopsis cylindrus CCMP 1102 genome showed enough read coverage for only one sampling station, further exemplifying the divergence between laboratory strains and natural populations (Bulankova et al. 2021). While cultivating strains in the lab is a necessary first step to gaining insights into their fundamental ecology, accessing the genomes of organisms directly from their environment is key to fully understanding their role within natural communities and their responses to environmental fluctuations. In this context, genome-resolved metagenomics, which consists in the mapping of environmental metagenomic reads on a reference genome, represents a powerful tool to access the diversity and distribution of organisms in their native environment without relying on taxonomic markers that may be too conservative to unveil the amount of diversity, a fact that appears in particular to be the case for unicellular organisms (Piganeau et al. 2011). However, as this field is in its infancy, specific attention must be paid during the binning of metagenomes to prevent chimeric assembly (Nelson et al. 2020). Metagenome-assembled genomes must furthermore include information about assembly quality, level of contamination and completeness, to enable robust comparisons between studies (Bowers et al. 2017). The exploration of marine phytoplankton population genomics using metagenomes has only just begun, with a few pioneering studies focused on the Mamiellales genera Bathycoccus, Micromonas and Ostreococcus (Leconte et al. 2020; Leconte et al. 2021). In line with these, the present study aimed to generate a portrait of the diversity landscape among natural Chaetoceros populations and to bridge the gap between diatom genomes, physiological responses and population genomics by teasing apart the correlation of geographical distance and environmental factors with population structure. Going from not only one but eleven metagenomes to gene selection using genome-resolved metagenomics, this work is to our knowledge the first to assess population-scale diversity among Chaetoceros genomes directly reconstructed from the environment.

Chaetoceros metagenome-assembled genomes from Tara Oceans

The genomes we considered for the present study have been generated from the Tara Oceans metagenomic reads and displayed the same magnitude in size, number of protein-coding genes and G+C content as the newly published Chaetoceros tenuissimus genome (Hongo et al. 2021), pointing out the accuracy of the MAG reconstruction methods. Several studies have investigated the link between genome size compared to cell morphology and metabolism and have found that both cell size and growth rate are, respectively, proportional and inversely proportional to genome size (Williams 1964; Holm-Hansen 1969; Shuter et al. 1983; Veldhuis et al. 1997; Cavalier-Smith 2005; Von Dassow et al. 2008). We observed contrasted differences in genome sizes for the MAGs ARC_232, ARC_267, MED_399 and PSW_256, which all exhibited the same level of G+C% (the lowest being around 39%), displayed an enrichment of the level of D, E, K and N amino-acids and a depletion in C, D and A residues, and seemed to belong to the C. affinis subclade, indicating that they potentially belong to the same species. These genome size discrepancies suggest potential contrasted growth rates and cell sizes for closely related Chaetoceros species. Koester et al. (2010) noted a two-fold genome size difference between cryptic but geographically separated populations of the diatom Ditylum brightwellii, accompanied by a difference in growth rate, suggesting that whole-genome duplication events may constitute important drivers of genetic diversification in diatoms. Here, both ARC_232 and ARC_267 populations were identified in the Arctic but did not seem to co-occur in our analyses, while MED_399 and PSW_256 were found to be restricted to the Mediterranean and Pacific Oceans, respectively. It is possible that duplication and/or transposition events potentially linked to stress, as was observed in P. tricornutum (Maumus et al. 2009), gave rise to diverged subpopulations that dispersed and were finally genetically separated following allopatric speciation.

Insights into biogeographical patterns of the genus Chaetoceros

We identified each of the MAGs in a relatively small number of samples with an uneven distribution, suggesting potential habitat specialists, with the notable exception of ARC_116, the only MAG that was distributed in samples spanning globally across the Arctic Ocean, indicating a potential pan-Arctic species. Other studies, based on the Tara Oceans and Ocean Sampling Day sample datasets, have already provided a thorough pattern of Chaetoceros distribution at global scale in the oceans (Malviya et al. 2016; De Luca, Kooistra, et al. 2019). These have shown a prevalence of Chaetoceros in the Arctic Ocean, with discrepancies depending on the species considered. For instance, metabarcoding analyses showed that Chaetoceros neogracile was restricted to the northern hemisphere (De Luca, Kooistra, et al. 2019), which consistently matches the distribution of ARC_116 and its taxonomic closeness to C. neogracile RCC1993. C. dichaeta has been retrieved near Alaska and the Antarctic peninsula, a distribution that appears in line with that of ARC_189 and with the geographic closeness of PSE_253 in South America. The same study indicated that C. affinis was present in the Mediterranean as well as in the Atlantic Ocean and North Sea. We found 4 MAGs (ARC_232, ARC_267, MED_399 and PSW_256) that exhibited taxonomic closeness to C. affinis. Only one of them was found in the Mediterranean Sea while the three others were retrieved from the Pacific and Arctic Oceans, suggesting potential new niches for this species. C. debilis, of which our MAG SOC_60 was also found to be very close, was retrieved in different localities: in European coastal waters and in the Arctic Ocean for the northern hemisphere, as well as in the Indian and Southern Oceans, in agreement with the distribution of this MAG. Chaetoceros has been viewed as a local opportunistic genus (Barton et al. 2010; Smodlaka Tanković et al. 2018) but the various species we describe here appeared to evolve in sympatry with up to three MAG populations in the same sampling stations. A temporal survey of these sampling points could help reveal whether the populations from different species are sympatric on a regular manner or if some competition mechanisms or niche exclusions are observable. Of note is the observation that none of the MAGs belonging to the same subclade were found at the same locations, with the exception of ARC_217 and SOC_37. In general, we found most of our Chaetoceros populations in the Arctic Ocean, which may be due to their lower sequence coverage in tropical and subtropical waters, as these species are likely to be among dominant phytoplankton in the Arctic (Sommeria-Klein et al. 2021). It is moreover evident that other localisations harbour Chaetoceros populations, such as for instance in the Southern Ocean where diatoms dominate photosynthetic protist assemblages (Malviya et al. 2016; Sommeria-Klein et al. 2021). In the same vein, associations between Chaetoceros and tintinnid ciliates have been observed in the Pacific Ocean and Caribbean Sea (Gómez 2007; Gómez 2020), which were only partially sampled during the Tara Oceans expeditions (Malviya et al. 2016). Future oceanographic campaigns should help reveal the distribution of Chaetoceros populations and the extent of their genomic variability.

Genetic differentiation among closely related Chaetoceros populations is correlated with different environmental variables

We investigated global patterns of population structure and genetic differentiation in Chaetoceros, a cosmopolitan diatom found in every major oceanic province, and one of the most diverse. By leveraging metagenomes reconstructed through the Tara Oceans expeditions, we drew a comprehensive landscape of the genetic diversity among different populations of this genus and were able to address their level of gene flow through an analysis of their population structure. The levels of micro-diversity observed here, ranging from 0.63% to a maximum of 2.34%, are in line with previous analyses conducted on natural populations of the diatom Fragilariopsis cylindrus from Tara Oceans station 86, which displayed ~2% SNV density (Bulankova et al. 2021). The observation of elevated genetic differences among populations from the same species means that despite their high potential dispersal, Chaetoceros diatoms can express significant levels of divergence. Moreover, our variance partitioning analyses revealed that the genetic differentiation between Chaetoceros populations was correlated with a combination of different abiotic factors, with only a minor correlation with geographic distance. This is in agreement with previous studies analysing population diversity using microsatellite markers, such as Härnström et al. (2011) on Skeletonema marinoi and Whittaker and Rynearson (2017) on Thalassiosira rotula, and contradicts the results found by Casteleyn et al. (2010) for Pseudo-nitzschia pungens. It must be noted that the former two are homothallic centric diatoms while P. pungens is a heterothallic pennate diatom. Therefore, the historical assumption that geographic distance is the parameter conditioning most microbial genetic diversity appears conflicting in diatoms.

Among the most notable nutrients that regulate diatom populations are nitrate (Moore et al. 2004), iron (Boyd et al. 2007; Caputi et al. 2019), phosphate (Egge 1998; Cruz de Carvalho and Bowler 2020), silicon (Martin-Jezequel et al. 2000; Yool and Tyrrell 2003) and cobalamin (i.e., vitamin B12) (Bertrand et al. 2012; Ellis et al. 2017), although environmental controls of diatom populations vary locally due to their cosmopolitan nature. To our knowledge, the closest study to the present one is that of Whittaker and Rynearson (2017), where the authors investigated the correlation of abiotic parameters and geographic distance with Thalassiosira rotula population structure and revealed a correlation with temperature. By contrast, in the present analysis we rather show a correlation of genetic differentiation with phosphate, silicate and iron in Chaetoceros species, with only a minor (~10%) correlation between temperature and genetic differentiation in SOC_37 populations. It should nonetheless be noted that the MAGs were found in samples displaying a narrow range of temperature between 1 and 4 °C. Therefore, the patterns of environmental control on global genomic diversity are consistent with expectations from the literature.

Significant population structure was observed among the Chaetoceros MAGs, but with relatively moderate between-region differences in the Arctic, as was observed for zooplankton (Laso-Jadart et al. 2021, p.), with F_ST levels reaching up to ≥ 0.2. These high levels of genetic differentiation appear approximately close to those described in different P. tricornutum accessions (pairwise F_ST ~0.2-0.4), represented for the most part by strains that have been maintained in culture collections for decades (Rastogi et al. 2020). In particular, ongoing speciation of the ARC_116 population located at station TARA_194, particularly at the DCM, might be a reason explaining why we observed a dramatic number of SNVs, leading us to exclude this station in order to perform a more conservative study when identifying genes under selection. Indeed, this indicates unequal gene flow among the populations and suggests a metapopulation structure consisting of populations of populations, as has been described for the diatom D. brightwellii (Rynearson et al. 2009). This difference in genetic structure of the ARC_116 population at station TARA_194 was not observed in other populations present at this particular station, such as for SOC_37. Overall, all three MAGs ARC_116, ARC_217 and SOC_37 appeared closely related in our phylogenetic analyses but nonetheless showed elevated numbers of MAG-specific orthogroups, and their genetic differentiation was correlated with different sets of abiotic parameters. Taken together, these results emphasize that even with the same local environmental conditions, populations of closely related diatoms from the same genus do not display identical gene flow patterns, emphasising their enormous genetic diversity as well as significant adaptive potential.

Functional overview of natural selection among Chaetoceros populations

We were able to identify genes under selection between the different Chaetoceros populations and tried to assess their respective functions. Previous studies have tried to investigate the gene functions that are essential for diatom survival. Among these are functions associated with light perception and energy dissipation, such as for instance phytochromes involved in red/far-red light sensing (Fortunato et al. 2016) and light-harvesting complex stress-related proteins (LHCX1) that modulate light responses (Bailleul et al. 2010). Other important functions include metabolic plasticity and response to nutrient fluctuations. As an example, ornithine-urea cycle proteins mediate rapid responses to nitrogen variability (Allen et al. 2011). Another remarkable characteristic of diatoms is their ability to respond to iron fluctuations, as they are among communities most strongly linked to the concentration patterns of this micronutrient (Caputi et al. 2019). Indeed, diatoms exhibit a diverse range of iron uptake mechanisms, involving siderophores (Kazamia et al. 2018), phytotransferrins (Allen et al. 2008; Morrissey et al. 2015) and ferric reductases (Gao et al. 2021). They exhibit differential mechanisms of iron storage, with Pseudo-nitzschia using ferritin and members of the Chaetoceros and Thalassiosira genera are believed to be able to store iron in their vacuole (Lampe et al. 2018). While we did not observe selection of functions related to light acquisition in our analyses, clear positive selection patterns of genes involved in iron response were retrieved, for the most part in ARC_217 populations. These included a gene encoding an iron starvation-induced protein (ISIP), which exhibited contrasted frequency that followed iron concentration patterns, the variant being more frequent in stations with more elevated iron concentrations. We found this gene to be a homologue of P. tricornutum ISIP2a, which encodes a protein involved in concentrating ferric iron at the cell surface (Morrissey et al. 2015), with a function equivalent to human transferrin, hence its name “phytotransferrin” (McQuaid et al. 2018). This protein has been proposed to constitute an ecological marker of iron starvation in diatoms (Marchetti et al. 2017) as it is strongly up-regulated by this condition. Previous analyses in P. tricornutum cells deficient in ISIP2a showed reduced iron uptake capabilities (Morrissey et al. 2015; Kazamia et al. 2018). While our results suggested a slight effect of the mutation on its RNA secondary structure, we identified the same SNV in sequences homologous to this gene in Chaetoceros transcriptomes. It therefore appears plausible that relaxed selective pressure on the gene in an environment more iron-replete could have led to the observed mutation.

Additionally, we noted the positive selection of genes encoding a carbamoyl phosphate synthase and spermine/spermidine synthase in SOC_37 populations, with the latter showing a potential impact on RNA secondary structure, along with other genes linked with phosphate metabolism. Carbamoyl phosphate synthase is thought to catalyse the first urea cycle step, a process that generates polyamine precursors (Armbrust et al. 2004). Polyamines such as spermine and spermidine are nitrogenous compounds involved in frustule formation through their interaction with the heavily phosphorylated silaffin phosphoproteins (Kröger and Sumper 1998; Poulsen et al. 2003). In consequence, diatom frustule formation relies on both nitrogen and phosphorus. We identified a polyamine synthase homologue in P. tricornutum displaying significant modulation of its expression in response to nitrate and phosphate availability levels, as well as homologue bearing the same mutation in a Chaetoceros transcriptome. Polyamine biosynthetic processes have been linked to diatom physiological responses to nitrogen, salinity and temperature (Scoccianti et al. 1995; Liu et al. 2016; Gleich et al. 2020) and many polar diatoms show increased silicate content under iron-limited conditions, which can result from either increased silicate accumulation or lower accumulation of nitrate depending on the species considered (Timmermans et al. 2004; Hoffmann et al. 2007). Despite this, the putative link between abiotic parameter variations across stations and the variant frequency patterns observed remains unclear. In this study we noted that several genes harbouring SNVs displayed homologues associated to PTMs in P. tricornutum (Veluchamy et al. 2015). Changes in the amount of cytosine residues in genes (e.g., A>C) could potentially affect gene expression through the increase or decrease of methylation sites available, especially when in CpG islands context. Overall, future studies involving engineered knock-out mutants of the genes containing the SNVs and including different sets of abiotic parameters should help gain insights into their respective impact on gene expression patterns and whole-cell fitness.

Conclusion

Chaetoceros is the most widespread and connected diatom genus, making it a key component of plankton communities, and has been identified as a genus vulnerable to projected climate change. Here, we have shown significant correlations between nutrient availability and genetic differentiation among Chaetoceros populations, with a potential impact on their growth strategies. As climate change is expected to influence water stratification, acidification as well as nutrient availability, it appears more than likely that predicted environmental changes in the Arctic will influence Chaetoceros distributions and its gene pool. The present study positions itself as an extension of previous work realised on plankton population genomics, and is to our knowledge the first of its kind dealing with micro-diversification patterns of multiple metagenome-assembled genomes from a non-model diatom genus. Finally, this work highlights the necessity to perform repeated sampling over time to be able to test whether separated Chaetoceros populations can evolve in sympatry as well as the effect of seasonality and local environmental fluctuations on gene selection.

Materials and Methods

Genomic resources

Eleven reconstructed and manually curated metagenome-assembled genomes (MAGs), generated from Tara Oceans metagenomic reads (Delmont et al. 2022); available at https://www.genoscope.cns.fr/tara/#SMAGs) and belonging to the genus Chaetoceros were considered in the present study. For clarity and readability, the original MAG IDs were shortened and the corresponding information is available in Supplementary Table S1. Information corresponding to the genome size and number of genes were extracted from Delmont et al. (2022). All these genomes received a former geographical assignment based on their read recruitment in the Tara Oceans sampling stations after mapping each MAG onto the Tara Oceans metagenomic dataset. The estimation of genome completion was performed by retrieving the Benchmarking Universal Single-Copy Orthologs (BUSCOs) (Simão et al. 2015) using the DNA contigs of the MAGs and control genomes as input for BUSCO v2.0.0 with the eukaryota_odb10 library. Gene length was estimated using SAMtools (Li et al. 2009) ‘faidx’ on the genome FASTA files. Percentage of G+C in the genome codons was calculated with the COUSIN online tool (Bourret et al. 2019). Reference diatom genomes for Phaeodactylum tricornutum CCAP 1055 and Thalassiosira pseudonana CCMP 1335 were also considered to provide a comparison with the Chaetoceros MAGs and were both retrieved from the Joint Genome Institute (Armbrust et al. 2004; Bowler et al. 2008). Information about their completion level, gene length and percentage of GC were obtained with the aforementioned methods.

Average nucleotide identity and average amino-acid identity

Average nucleotide identity (ANI) between the MAGs was calculated using FastANI (Jain et al. 2018) on the whole genomes with the --ql and --rl parameters and --minFraction 0.05 to retrieve all the identity percentages even for highly divergent MAGs. In a similar manner, average amino-acid identity (AAI) was estimated on the whole predicted proteomes using the online AAI calculator provided by the Konstantinidis lab (http://enve-omics.ce.gatech.edu/aai/) (Rodriguez-R and Konstantinidis 2014).

Phylogeny and identification of orthogroups

To investigate the taxonomic relatedness of the MAGs, we considered the BUSCO genes identified from estimation of genome completion (part 3.1). Using the BUSCO IDs for which at least 8 out of the 11 MAGs (~70%) presented a sequence (83 over 255 eukaryotic BUSCOs), the BUSCO gene clusters translated into proteins were aligned with MAFFT (Katoh et al. 2002) in automatic mode, followed by a manual cleaning of each of the alignments by removing N-terminal and C-terminal residues displaying less than 70% conservation. The alignments were then trimmed using trimAl (Capella-Gutierrez et al. 2009) with the parameter -gt 0.5, followed by another alignment step with MAFFT. A total of 83 gene clusters were retained. To identify potential contaminants, consensus guide trees were generated for each of the approved alignments using RAxML (Stamatakis et al. 2005) with the parameters -m PROTGAMMAJTT for the substitution model, -N 100 bootstrap replicates and randomly defined numbers between 1 and 99,999 for the parameters -x and -p. To evaluate the MAGs relatedness with respect to other taxa, the same approach was conducted based on a concatenation tree with 23 supplementary taxa sampled across the eukaryotic tree of life (34 total taxa), including 8 Chaetoceros transcriptomes from MMETSP deposited in the European Nucleotide Archive converted into protein sequences, for the same 83 single-copy nuclear genes. The cleaned alignments were then concatenated and a final ML tree was built with RAxML 8.2.12 (100 bootstraps) and the final figure exported in iTOL (Letunic and Bork 2021). Protein sequences from the MAGs (available at https://www.genoscope.cns.fr/tara/#SMAGs) were used as input for the OrthoFinder (Emms and Kelly 2015) software to identify the orthogroups.

Comparative analysis of the amino acid composition and PFAMs of the MAGs

The amino acid composition of the genomes was computed by analysing the FASTA files with the ‘protr’ package in R (Xiao et al. 2015) and plotting their corresponding frequencies. The amino acid composition of each MAG was then normalised by the respective amino acid global mean. Protein FAMilies (PFAMs) domains were inferred by searching in a local installation of Interproscan (Jones et al. 2014). Raw values are available in Supplementary Table S2.

Genome-resolved metagenomics of Chaetoceros MAGs

To generate an estimate of Chaetoceros MAG abundance, we performed a mapping of the Tara Oceans metagenomic dataset on their contigs using BWA-MEM (Li and Durbin 2009) with default parameters, with an 80% identity filter and at least 4x mean vertical coverage, removed the duplicate reads, and stored the recruited reads as BAM files with SAMtools. The resulting mapping files were sorted using SAMtools ‘sort’ and the different size fractions were merged to increase the coverage, keeping the surface (SUR) and deep-chlorophyll maximum (DCM) depths separated in order to compare their corresponding local populations when possible. Read identity was extracted using a custom perl script using SAMtools ‘view’, and plotted in RStudio. We performed a final step of read filtration on their identity by extracting the read names with at least 97% identity in RStudio, then retrieving them using the Picard toolkit (https://broadinstitute.github.io/picard/) on the indexed BAM files with the option ‘FilterSamReads’ in ‘lenient’ mode. In order to ensure that the recruited reads belonged to our genomes of interest, we generated plots of the genomic coverage of the reads with Bedtools ‘genomecov’ (Quinlan and Hall 2010) and performed a visual inspection to detect bimodal trends. We excluded the reads from sampling stations that presented insufficient or bimodal coverage distribution (see Supplementary Fig. S3 where we report an example of discarded samples based on read coverage), and the ones with a coverage breadth estimated with SAMtools ‘depth’ inferior to 80. These read filtration and sample inspection steps were critical as some non-specific read recruitment may happen (due for instance to the stability of 18S rRNA gene and to hypervariable genomic regions). This resulted in a total of 20 Tara Oceans Stations and/or depths investigated. The relative abundance of the MAGs in each sample was estimated as the number of mapped reads, obtained with SAMtools ‘flagstat’, normalised by the total number of reads per sample (available in Supplementary Table S4 of Delmont et al. (2022)). We then computed the relative proportion of Chaetoceros reads per station and generated the corresponding maps with RStudio 4.0.1.

Population genomics analyses

Genomic variants of the Chaetoceros populations associated to different stations and/or depths were called with BCFtools 1.13.25 (Li 2011) to generate a VCF file (mpileup of the files with at least 97% identity). Variant annotation of the VCF files was conducted using SnpEff (Cingolani et al. 2012), converting GFF files into GTF 2.2 format with AGAT (Dainat et al. 2022). The number, position and type of variants as well as their respective effects were plotted on RStudio.

To calculate the genetic distance between the MAG populations, the genetic variants were identified using SAMtools mpileup -B (multiple BAM files per MAG) and merged into one file. The Popoolation2 tool (Kofler et al. 2011) was then applied on the MAGs present in at least two stations (ARC 116, ARC 217, ARC 232, SOC 37) to generate a synchronised (.sync) file from the merged mpileup with the following parameters: --fastq-type sanger --min-qual 20. The .fst files were subsequently computed with the parameters --suppress-noninformative --min-count 2 --min-coverage 4 --max-coverage 200 --min-covered-fraction 1 --window-size 1 --step-size 1 --pool-size 500. The FST metrics were computed from the allele frequencies (not the allele counts) using the equation in (Hartl and Clark 2007). Allelic frequencies were computed with PoPoolation2 with the parameters --min-count 2 --min-coverage 4 --max-coverage 200. To ensure that these alleles belonged to our query genomes, the global population-wide F-statistic was computed for each MAG of interest and its distribution plotted and inspected for unimodality. Median pairwise-F_ST values were considered as a proxy for genomic differentiation between the respective MAG populations. In addition, the LK statistics (Lewontin and Krakauer 1973) were computed and compared with the expected chi-squared distribution with df = n-1, with n being the number of populations. Under a neutral evolution model, the loci are supposed to follow an expected chi-squared distribution if there is a single species.

We further investigated the global connectivity level between the MAG populations in the different Arctic Ocean regions. For this, the Arctic stations were divided based on their geographic location, as was previously done by (Royo-Llonch et al. 2021, p.). Five regions were identified: Pacific-Arctic, Kara-Laptev, Atlantic-Arctic, Arctic archipelago and Davis-Baffin (see Supplementary Figure S12 for the polar view of the Arctic Tara Oceans stations). To compare the level of genomic differentiation between these regions, median between regions pairwise-FST values from all the Chaetoceros populations were extracted and their respective means compared.

Estimation of variance partitioning

In a second step, the estimation of the relative effects of abiotic parameters and geographic distance on the genomic differentiation of MAGs was undertaken. For this, a linear mixed model from the R package ‘MM4LMM’ (Laporte and Mary-Huard 2021) was used as previously applied by Laso-Jadart et al. (2021) on marine plankton. As an input dataset for abiotic parameters, median values of different environmental parameters were extracted from the PANGAEA database (Ardyna et al. 2017; Guidi, Morin, et al. 2017; Guidi, Picheral, et al. 2017) and for each sampling site, namely oxygen, salinity, temperature, and nutrient concentrations of ammonium, iron, nitrate, nitrite, phosphate and silicate. Euclidean distances were then computed between the stations for all these parameters as well as for the station coordinates as a proxy for geographic distance. Finally, median pairwise-FST values with the different abiotic parameters and distances were used as input for the LMM, allowing to estimate the relative proportion of the genomic variance explained by each parameter as well as an unexplained proportion. Mantel tests from the ‘vegan’ R package (Oksanen et al. 2020) were applied to verify the results.

Identification of genes under selection

The ‘pcadapt’ R package v4.0.2 (Luu et al. 2017) was applied to detect selection among populations using the B-allele frequency (BAF) matrix, on ‘pool-seq’ mode with a minimal allele frequency of 0.05 within the populations, as was done in Laso-Jadart et al. (2021). Based on the PCA results, two samples from one station (194 SUR and DCM) of the MAG ARC_116 exhibited very distinct variation patterns (Supplementary Figure S24A), leading to a very high number of outliers (12,954) and were therefore removed from the analysis (Supplementary Figure 24B) to avoid false positive inflation. We computed q-values using the R package ‘qvalue’ with false discovery rate (FDR) correction (Storey et al. 2022). For the three MAGs ARC_116, ARC_217 and SOC_37, loci with a q-value < 0.15 were considered to be under selection. The functions of the genes harbouring loci under selection were investigated with the PFAM domains generated in part 3.5. Homologues of the genes of interest were subsequently searched by reverse genetics in P. tricornutum (Phatr3) by BLASTp. Manhattan plots of the contigs harbouring these loci were then built for each MAG, as well as bar plots representing their B-allele frequencies (BAF). To investigate the frequency of selected genes displaying loci under selection in other Chaetoceros species, searches of homologues among Chaetoceros spp. transcriptomes (derived from MMETSP; see the phylogeny section in Materials and Methods and Supplementary Figures S20 and S22) were performed by aligning target protein sequences on the 8 transcriptomes using tBLASTn. Output sequences were considered as homologue candidates when their scores and e-values were, respectively, at least equal to 200 and 1e-50. Alignments were conducted using the online version of MAFFT v7.463 (https://mafft.cbrc.jp/alignment/software/) (Katoh and Standley 2013). The sequences and alignments are available in the Supplementary Material online. Predictions of RNA secondary structures were conducted using RNAfold 2.4.18 (http://rna.tbi.univie.ac.at/) (Gruber et al. 2008) and Linearfold (beta) (http://linearfold.org/) (Huang et al. 2019) web servers.

Author Contributions

C.N., A.M. and C.B. designed the study. E.P. retrieved and pre-processed the metagenomic data. C.N. performed the comparative genomics, phylogenetic analyses and genome-resolved metagenomics, interpreted the data, wrote the first manuscript draft and conceived the figures. C.N. performed the population genomic analyses with the support of A.M. C.N., A.M. and E.P. interpreted the data. All authors discussed the results and commented on the manuscript.

Acknowledgments

We would like to thank all colleagues from the Tara Oceans consortium as well as the Tara Ocean Foundation for their inspirational vision. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Diatomic; grant agreement No. 835067). Additional funding is acknowledged from the French Government “Investissements d’Avenir” Programmes MEMO LIFE (Grant ANR-10-LABX-54), Université de Recherche Paris Sciences et Lettres (PSL) (Grant ANR-125311-IDEX-0001-02), France Génomique (ANR-10-INBS-09), and OCEANOMICS (Grant ANR-11-BTBR-0008). This article is contribution number *** of Tara Oceans.

References

↵
Allen AE, Dupont CL, Oborník M, Horák A, Nunes-Nesi A, McCrow JP, Zheng H, Johnson DA, Hu H, Fernie AR, Bowler C. 2011. Evolution and metabolic significance of the urea cycle in photosynthetic diatoms. Nature. 473(7346):203–207. https://doi.org/10.1038/nature10074
OpenUrl CrossRef PubMed Web of Science
↵
Allen AE, LaRoche J, Maheswari U, Lommer M, Schauer N, Lopez PJ, Finazzi G, Fernie AR, Bowler C. 2008. Whole-cell response of the pennate diatom Phaeodactylum tricornutum to iron starvation. Proc Natl Acad Sci. 105(30):10438–10443. https://doi.org/10.1073/pnas.0711370105
OpenUrl Abstract/FREE Full Text
↵
Ardyna M, d’Ovidio F, Speich S, Leconte J, Chaffron S, Audic S, Garczarek L, Pesant S, Tara Oceans Consortium C, Tara Oceans Expedition P. 2017. Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about mesoscale features at the sampling location. Tara Oceans Consort Coord Tara Oceans Exped Particip 2017 Regist Samples Tara Oceans Exped 2009-2013 PANGAEA [Internet]. [accessed 2022 Feb 22]. https://doi.org/10.1594/PANGAEA.875577
↵
Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, et al. 2004. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 306(5693):79–86. https://doi.org/10.1126/science.1101156
OpenUrl Abstract/FREE Full Text
↵
Bailleul B, Rogato A, Martino A de, Coesel S, Cardol P, Bowler C, Falciatore A, Finazzi G. 2010. An atypical member of the light-harvesting complex stress-related protein family modulates diatom responses to light. Proc Natl Acad Sci. 107(42):18214–18219. https://doi.org/10.1073/pnas.1007703107
OpenUrl Abstract/FREE Full Text
↵
Baker LJ, Alegado RA, Kemp PF. 2016. Response of diatom-associated bacteria to host growth state, nutrient concentrations, and viral host infection in a model system. Environ Microbiol Rep. 8(5):917–927. https://doi.org/10.1111/1758-2229.12456
OpenUrl
↵
Barton AD, Dutkiewicz S, Flierl G, Bragg J, Follows MJ. 2010. Patterns of Diversity in Marine Phytoplankton. Science. 327(5972):1509–1511. https://doi.org/10.1126/science.1184961
OpenUrl Abstract/FREE Full Text
↵
Behrenfeld MJ, Milligan AJ. 2013. Photophysiological expressions of iron stress in phytoplankton. Annu Rev Mar Sci. 5:217–246. https://doi.org/10.1146/annurev-marine-121211-172356
OpenUrl
↵
Bertrand EM, Allen AE, Dupont CL, Norden-Krichmar TM, Bai J, Valas RE, Saito MA. 2012. Influence of cobalamin scarcity on diatom molecular physiology and identification of a cobalamin acquisition protein. Proc Natl Acad Sci.:1762–1771.
↵
Bopp L, Resplandy L, Orr JC, Doney SC, Dunne JP, Gehlen M, Halloran P, Heinze C, Ilyina T, Séférian R, et al. 2013. Multiple stressors of ocean ecosystems in the 21st century: projections with CMIP5 models. Biogeosciences. 10(10):6225–6245. https://doi.org/10.5194/bg-10-6225-2013
OpenUrl
↵
Bourret J, Alizon S, Bravo IG. 2019. COUSIN (COdon Usage Similarity INdex): A Normalized Measure of Codon Usage Preferences. Genome Biol Evol. 11(12):3523–3528. https://doi.org/10.1093/gbe/evz262
OpenUrl
↵
Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, et al. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 35(8):725–731. https://doi.org/10.1038/nbt.3893
OpenUrl CrossRef PubMed
↵
Bowler C, Allen AE, Badger JH, Grimwood J, Jabbari K, Kuo A, Maheswari U, Martens C, Maumus F, Otillar RP, et al. 2008. The Phaeodactylumgenome reveals the evolutionary history of diatom genomes. Nature. 456(7219):239–244. https://doi.org/10.1038/nature07410
OpenUrl CrossRef PubMed Web of Science
↵
Boyd PW, Jickells T, Law CS, Blain S, Boyle EA, Buesseler KO, Coale KH, Cullen JJ, de Baar HJW, Follows M, et al. 2007. Mesoscale Iron Enrichment Experiments 1993-2005: Synthesis and Future Directions. Science. 315(5812):612–617. https://doi.org/10.1126/science.1131669
OpenUrl Abstract/FREE Full Text
↵
Bulankova P, Sekulić M, Jallet D, Nef C, Oosterhout C van, Delmont TO, Vercauteren I, Osuna-Cruz CM, Vancaester E, Mock T, et al. 2021. Mitotic recombination between homologous chromosomes drives genomic diversity in diatoms. Curr Biol [Internet]. [accessed 2021 Jun 9] 0(0). https://doi.org/10.1016/j.cub.2021.05.013
↵
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25(15):1972–1973. https://doi.org/10.1093/bioinformatics/btp348
OpenUrl CrossRef PubMed Web of Science
↵
Caputi L, Carradec Q, Eveillard D, Kirilovsky A, Pelletier E, Pierella Karlusich JJ, Rocha Jimenez Vieira F, Villar E, Chaffron S, Malviya S, et al. 2019. Community-Level Responses to Iron Availability in Open Ocean Plankton Ecosystems. Glob Biogeochem Cycles. 33(3):391–419. https://doi.org/10.1029/2018GB006022
OpenUrl CrossRef
↵
Casteleyn G, Leliaert F, Backeljau T, Debeer A-E, Kotaki Y, Rhodes L, Lundholm N, Sabbe K, Vyverman W. 2010. Limits to gene flow in a cosmopolitan marine planktonic diatom. Proc Natl Acad Sci. 107(29):12952–12957. https://doi.org/10.1073/pnas.1001380107
OpenUrl Abstract/FREE Full Text
↵
Cavalier-Smith T. 2005. Economy, Speed and Size Matter: Evolutionary Forces Driving Nuclear Genome Miniaturization and Expansion. Ann Bot. 95(1):147–175. https://doi.org/10.1093/aob/mci010
OpenUrl CrossRef PubMed
↵
Cermeño P, Falkowski PG. 2009. Controls on diatom biogeography in the ocean. Science. 325(5947):1539–1541. https://doi.org/10.1126/science.1174159
OpenUrl Abstract/FREE Full Text
↵
Chaffron S, Delage E, Budinich M, Vintache D, Henry N, Nef C, Ardyna M, Zayed AA, Junger PC, Galand PE, et al. 2021. Environmental vulnerability of the global ocean epipelagic plankton community interactome. Sci Adv. 7(35):eabg1921. https://doi.org/10.1126/sciadv.abg1921
OpenUrl FREE Full Text
↵
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w ¹¹¹⁸; iso-2; iso-3. Fly (Austin). 6(2):80–92. https://doi.org/10.4161/fly.19695
OpenUrl CrossRef
↵
Crenn K, Duffieux D, Jeanthon C. 2018. Bacterial Epibiotic Communities of Ubiquitous and Abundant Marine Diatoms Are Distinct in Short-and Long-Term Associations. Front Microbiol [Internet]. [accessed 2022 Feb 11] 9. https://doi.org/10.3389/fmicb.2018.02879
↵
Cruz de Carvalho MH, Bowler C. 2020. Global identification of a marine diatom long noncoding natural antisense transcripts (NATs) and their response to phosphate fluctuations. Sci Rep. 10(1):14110. https://doi.org/10.1038/s41598-020-71002-0
OpenUrl
↵
Cruz de Carvalho MH, Sun H, Bowler C, Chua N. 2016. Noncoding and coding transcriptome responses of a marine diatom to phosphate fluctuations. New Phytol. 210(2):497–510. https://doi.org/10.1111/nph.13787
OpenUrl CrossRef
↵
Dainat J, Hereñú D, LucileSol, Pascal-Git. 2022. NBISweden/AGAT: AGAT-v0.8.1 [Internet]. [place unknown]: Zenodo; [accessed 2022 Feb 11]. https://doi.org/10.5281/ZENODO.3552717
↵
De Luca D, Kooistra WHCF, Sarno D, Gaonkar CC, Piredda R. 2019. Global distribution and diversity of Chaetoceros (Bacillariophyta, Mediophyceae): integration of classical and novel strategies. PeerJ. 7:e7410. https://doi.org/10.7717/peerj.7410
OpenUrl
↵
De Luca D, Sarno D, Piredda R, Kooistra WHCF. 2019. A multigene phylogeny to infer the evolutionary history of Chaetocerotaceae (Bacillariophyta). Mol Phylogenet Evol. 140:106575. https://doi.org/10.1016/j.ympev.2019.106575
OpenUrl
↵
Delmont TO, Gaia M, Hinsinger DD, Frémont P, Vanni C, Fernandez-Guerra A, Eren AM, Kourlaiev A, d’Agata L, Clayssen Q, et al. 2022. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genomics. 2(5):100123. https://doi.org/10.1016/j.xgen.2022.100123
OpenUrl
↵
Diniz-Filho JAF, Soares TN, Lima JS, Dobrovolski R, Landeiro VL, de Campos Telles MP, Rangel TF, Bini LM. 2013. Mantel test in population genetics. Genet Mol Biol. 36(4):475–485. https://doi.org/10.1590/S1415-47572013000400002
OpenUrl
↵
Dorrell RG, Gile G, McCallum G, Méheust R, Bapteste EP, Klinger CM, Brillet-Guéguen L, Freeman KD, Richter DJ, Bowler C. 2017. Chimeric origins of ochrophytes and haptophytes revealed through an ancient plastid proteome.Bhattacharya D, editor. eLife. 6:e23717. https://doi.org/10.7554/eLife.23717
OpenUrl
↵
Dorrell RG, Smith AG. 2011. Do Red and Green Make Brown?: Perspectives on Plastid Acquisitions within Chromalveolates. Eukaryot Cell [Internet]. [accessed 2022 Feb 11]. https://journals.asm.org/doi/abs/10.1128/EC.00326-10
↵
Egge JK. 1998. Are diatoms poor competitors at low phosphate concentrations? J Mar Syst. 16(3):191–198. https://doi.org/10.1016/S0924-7963(97)00113-9
OpenUrl CrossRef
↵
Ellis KA, Cohen NR, Moreno C, Marchetti A. 2017. Cobalamin-independent Methionine Synthase Distribution and Influence on Vitamin B12 Growth Requirements in Marine Diatoms. Protist. 168(1):32–47. https://doi.org/10.1016/j.protis.2016.10.007
OpenUrl CrossRef
↵
Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16(1):157. https://doi.org/10.1186/s13059-015-0721-2
OpenUrl CrossRef PubMed
↵
Falkowski PG, Katz ME, Knoll AH, Quigg A, Raven JA, Schofield O, Taylor FJR. 2004. The evolution of modern eukaryotic phytoplankton. Science. 305(5682):354–360. https://doi.org/10.1126/science.1095964
OpenUrl Abstract/FREE Full Text
↵
Field CB, Behrenfeld MJ, Randerson JT, Falkowski P. 1998. Primary Production of the Biosphere: Integrating Terrestrial and Oceanic Components. Science. 281(5374):237–240. https://doi.org/10.1126/science.281.5374.237
OpenUrl Abstract/FREE Full Text
↵
Finlay BJ. 2002. Global dispersal of free-living microbial eukaryote species. Science. 296(5570):1061–1063. https://doi.org/10.1126/science.1070710
OpenUrl Abstract/FREE Full Text
↵
Fortunato AE, Jaubert M, Enomoto G, Bouly J-P, Raniello R, Thaler M, Malviya S, Bernardes JS, Rappaport F, Gentili B, et al. 2016. Diatom Phytochromes Reveal the Existence of Far-Red-Light-Based Sensing in the Ocean. Plant Cell. 28(3):616–628. https://doi.org/10.1105/tpc.15.00928
OpenUrl Abstract/FREE Full Text
↵
Foster RA, Kuypers MMM, Vagner T, Paerl RW, Musat N, Zehr JP. 2011. Nitrogen fixation and transfer in open ocean diatom–cyanobacterial symbioses. ISME J. 5(9):1484–1493. https://doi.org/10.1038/ismej.2011.26
OpenUrl CrossRef PubMed Web of Science
↵
Gao X, Bowler C, Kazamia E. 2021. Iron metabolism strategies in diatoms. J Exp Bot. 72(6):2165–2180. https://doi.org/10.1093/jxb/eraa575
OpenUrl
↵
Gleich SJ, Plough LV, Glibert PM. 2020. Photosynthetic efficiency and nutrient physiology of the diatom Thalassiosira pseudonana at three growth temperatures. Mar Biol. 167(9):124. https://doi.org/10.1007/s00227-020-03741-7
OpenUrl
↵
Gómez F. 2007. On the consortium of the tintinnid Eutintinnus and the diatom Chaetoceros in the Pacific Ocean. Mar Biol. 151(5):1899–1906. https://doi.org/10.1007/s00227-007-0625-0
OpenUrl
↵
Gómez F. 2020. Symbioses of Ciliates (Ciliophora) and Diatoms (Bacillariophyceae): Taxonomy and Host–Symbiont Interactions. Oceans. 1(3):133–155. https://doi.org/10.3390/oceans1030010
OpenUrl
↵
Grigoriev IV, Hayes RD, Calhoun S, Kamel B, Wang A, Ahrendt S, Dusheyko S, Nikitin R, Mondo SJ, Salamov A, et al. 2020. PhycoCosm, a comparative algal genomics resource. Nucleic Acids Res. 49(D1):D1004–D1011. https://doi.org/10.1093/nar/gkaa898
OpenUrl
↵
Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. 2008. The Vienna RNA Websuite. Nucleic Acids Res. 36(suppl_2):W70–W74. https://doi.org/10.1093/nar/gkn188
OpenUrl CrossRef PubMed Web of Science
↵
Guidi L, Chaffron S, Bittner L, Eveillard D, Larhlimi A, Roux S, Darzi Y, Audic S, Berline L, Brum JR, et al. 2016. Plankton networks driving carbon export in the oligotrophic ocean. Nature. 532(7600):465–470. https://doi.org/10.1038/nature16942
OpenUrl CrossRef
↵
Guidi L, Morin P, Coppola L, Tremblay J-É, Pesant S, Tara Oceans Consortium C, Tara Oceans Expedition P. 2017. Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about nutrients in the targeted environmental feature. Tara Oceans Consort Coord Tara Oceans Exped Particip 2017 Regist Samples Tara Oceans Exped 2009-2013 PANGAEA [Internet]. [accessed 2022 Feb 22]. https://doi.org/10.1594/PANGAEA.875575
↵
Guidi L, Picheral M, Pesant S, Tara Oceans Consortium C, Tara Oceans Expedition P. 2017. Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about sensor data in the targeted environmental feature. Tara Oceans Consort Coord Tara Oceans Exped Particip 2017 Regist Samples Tara Oceans Exped 2009-2013 PANGAEA Httpsdoiorg101594PANGAEA875582 [Internet]. [accessed 2022 Feb 22]. https://doi.org/10.1594/PANGAEA.875576
↵
Guillot G, Rousset F. 2013. Dismantling the Mantel tests. Methods Ecol Evol. 4(4):336–344. https://doi.org/10.1111/2041-210x.12018
OpenUrl CrossRef
↵
Haas CE, Rodionov DA, Kropat J, Malasarn D, Merchant SS, de Crécy-Lagard V. 2009. A subset of the diverse COG0523 family of putative metal chaperones is linked to zinc homeostasis in all kingdoms of life. BMC Genomics. 10:470. https://doi.org/10.1186/1471-2164-10-470
OpenUrl CrossRef PubMed
↵
Hanschen ER, Starkenburg SR. 2020. The state of algal genome quality and diversity. Algal Res. 50:101968. https://doi.org/10.1016/j.algal.2020.101968
OpenUrl
↵
Härnström K, Ellegaard M, Andersen TJ, Godhe A. 2011. Hundred years of genetic structure in a sediment revived diatom population. Proc Natl Acad Sci. 108(10):4252–4257. https://doi.org/10.1073/pnas.1013528108
OpenUrl Abstract/FREE Full Text
↵
Hartl DL, Clark AG. 2007. Principles of Population Genetics. Écoscience. 14(4):544–545.
OpenUrl
↵
Hays GC, Richardson AJ, Robinson C. 2005. Climate change and marine plankton. Trends Ecol Evol. 20(6):337–344. https://doi.org/10.1016/j.tree.2005.03.004
OpenUrl CrossRef PubMed Web of Science
↵
Helliwell KE, Collins S, Kazamia E, Wheeler GL, Smith AG. 2015. Fundamental shift in vitamin B12 eco-physiology of a model alga demonstrated by experimental evolution. ISME J. 9:1446–1455.
OpenUrl PubMed
↵
Hoffmann LJ, Peeken I, Lochte K. 2007. Effects of iron on the elemental stoichiometry during EIFEX and in the diatoms Fragilariopsis kerguelensis and Chaetoceros dichaeta. Biogeosciences. 4(4):569–579. https://doi.org/10.5194/bg-4-569-2007
OpenUrl
↵
Holm-Hansen O. 1969. Algae: amounts of DNA and organic carbon in single cells. Science. 163(3862):87–88. https://doi.org/10.1126/science.163.3862.87
OpenUrl Abstract/FREE Full Text
↵
Hongo Y, Kimura K, Takaki Y, Yoshida Y, Baba S, Kobayashi G, Nagasaki K, Hano T, Tomaru Y. 2021. The genome of the diatom Chaetoceros tenuissimus carries an ancient integrated fragment of an extant virus. Sci Rep. 11(1):22877. https://doi.org/10.1038/s41598-021-00565-3
OpenUrl
↵
Hsieh SI, Castruita M, Malasarn D, Urzica E, Erde J, Page MD, Yamasaki H, Casero D, Pellegrini M, Merchant SS, Loo JA. 2013. The proteome of copper, iron, zinc, and manganese micronutrient deficiency in Chlamydomonas reinhardtii. Mol Cell Proteomics MCP. 12(1):65–86. https://doi.org/10.1074/mcp.M112.021840
OpenUrl
↵
Huang L, Zhang H, Deng D, Zhao K, Liu K, Hendrix DA, Mathews DH. 2019. LinearFold: linear-time approximate RNA folding by 5’-to-3’ dynamic programming and beam search. Bioinformatics. 35(14):i295–i304. https://doi.org/10.1093/bioinformatics/btz375
OpenUrl
↵
Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. 2012. Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science. 335(6068):587–590. https://doi.org/10.1126/science.1212665
OpenUrl Abstract/FREE Full Text
↵
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 9(1):5114. https://doi.org/10.1038/s41467-018-07641-9
OpenUrl CrossRef PubMed
↵
Johnson MK. 1998. Iron—sulfur proteins: new roles for old clusters. Curr Opin Chem Biol. 2(2):173–181. https://doi.org/10.1016/S1367-5931(98)80058-6
OpenUrl CrossRef PubMed Web of Science
↵
Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinforma Oxf Engl. 30(9):1236–1240. https://doi.org/10.1093/bioinformatics/btu031
OpenUrl
↵
Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14):3059–3066. https://doi.org/10.1093/nar/gkf436
OpenUrl CrossRef PubMed Web of Science
↵
Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 30(4):772–780. https://doi.org/10.1093/molbev/mst010
OpenUrl CrossRef PubMed Web of Science
↵
Kazamia E, Sutak R, Paz-Yepes J, Dorrell RG, Vieira FRJ, Mach J, Morrissey J, Leon S, Lam F, Pelletier E, et al. 2018. Endocytosis-mediated siderophore uptake as a strategy for Fe acquisition in diatoms. Sci Adv. 4(5):eaar4536. https://doi.org/10.1126/sciadv.aar4536
OpenUrl FREE Full Text
↵
Kemp AES, Villareal TA. 2018. The case of the diatoms and the muddled mandalas: Time to recognize diatom adaptations to stratified waters. Prog Oceanogr. 167:138–149. https://doi.org/10.1016/j.pocean.2018.08.002
OpenUrl CrossRef
↵
Kim E, Harrison JW, Sudek S, Jones MDM, Wilcox HM, Richards TA, Worden AZ, Archibald JM. 2011. Newly identified and diverse plastid-bearing branch on the eukaryotic tree of life. Proc Natl Acad Sci U S A. 108(4):1496–1500. https://doi.org/10.1073/pnas.1013337108
OpenUrl Abstract/FREE Full Text
↵
Kimura K, Tomaru Y. 2014. Coculture with marine bacteria confers resistance to complete viral lysis of diatom cultures. Aquat Microb Ecol. 73(1):69–80. https://doi.org/10.3354/ame01705
OpenUrl
↵
Koester JA, Swalwell JE, von Dassow P, Armbrust EV. 2010. Genome size differentiates co-occurring populations of the planktonic diatom Ditylum brightwellii(Bacillariophyta). BMC Evol Biol. 10(1):1. https://doi.org/10.1186/1471-2148-10-1
OpenUrl CrossRef PubMed
↵
Kofler R, Pandey RV, Schlotterer C. 2011. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics. 27(24):3435–3436. https://doi.org/10.1093/bioinformatics/btr589
OpenUrl CrossRef PubMed Web of Science
↵
Kotabova E, Malych R, Pierella Karlusich JJ, Kazamia E, Eichner M, Mach J, Lesuisse E, Bowler C, Prášil O, Sutak R. 2021. Complex Response of the Chlorarachniophyte Bigelowiella natans to Iron Availability. mSystems. 6(1):e00738-20. https://doi.org/10.1128/mSystems.00738-20
OpenUrl Abstract/FREE Full Text
↵
Kröger N, Deutzmann R, Bergsdorf C, Sumper M. 2000. Species-specific polyamines from diatoms control silica morphology. Proc Natl Acad Sci. 97(26):14133–14138. https://doi.org/10.1073/pnas.260496497
OpenUrl Abstract/FREE Full Text
↵
Kröger N, Sumper M. 1998. Diatom Cell Wall Proteins and the Cell Biology of Silica Biomineralization. Protist. 149(3):213–219. https://doi.org/10.1016/S1434-4610(98)70029-X
OpenUrl CrossRef PubMed
↵
La Roche J, Boyd PW, McKay RML, Geider RJ. 1996. Flavodoxin as an in situ marker for iron stress in phytoplankton. Nature. 382(6594):802–805. https://doi.org/10.1038/382802a0
OpenUrl CrossRef Web of Science
↵
Lampe RH, Mann EL, Cohen NR, Till CP, Thamatrakoln K, Brzezinski MA, Bruland KW, Twining BS, Marchetti A. 2018. Different iron storage strategies among bloom-forming diatoms. Proc Natl Acad Sci. 115(52):E12275–E12284. https://doi.org/10.1073/pnas.1805243115
OpenUrl Abstract/FREE Full Text
↵
Laporte F, Mary-Huard T. 2021. MM4LMM: Inference of Linear Mixed Models Through MM Algorithm [Internet]. [place unknown]; [accessed 2022 Feb 21]. https://CRAN.R-project.org/package=MM4LMM
↵
Laso-Jadart R, O’Malley M, Sykulski AM, Ambroise C, Madoui M-A. 2021. How marine currents and environment shape plankton genomic differentiation: a mosaic view from Tara Oceans metagenomic data [Internet]. [place unknown]; [accessed 2022 Feb 16]. https://doi.org/10.1101/2021.04.29.441957
↵
Leblanc K, Quéguiner B, Diaz F, Cornet V, Michel-Rodriguez M, Durrieu de Madron X, Bowler C, Malviya S, Thyssen M, Grégori G, et al. 2018. Nanoplanktonic diatoms are globally overlooked but play a role in spring blooms and carbon export. Nat Commun. 9(1):953. https://doi.org/10.1038/s41467-018-03376-9
OpenUrl
↵
Leconte J, Benites LF, Vannier T, Wincker P, Piganeau G, Jaillon O. 2020. Genome Resolved Biogeography of Mamiellales. Genes. 11(1):66. https://doi.org/10.3390/genes11010066
OpenUrl
↵
Leconte J, Timsit Y, Delmont TO, Lescot M, Piganeau G, Wincker P, Jaillon O. 2021. Equatorial to Polar genomic variability of the microalgae Bathycoccus prasinos [Internet]. [place unknown]: Genomics; [accessed 2022 Apr 19]. https://doi.org/10.1101/2021.07.13.452163
↵
Letunic I, Bork P. 2021. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49(W1):W293–W296. https://doi.org/10.1093/nar/gkab301
OpenUrl CrossRef
↵
Levitan O, Dinamarca J, Zelzion E, Lun DS, Guerra LT, Kim MK, Kim J, Van Mooy BAS, Bhattacharya D, Falkowski PG. 2015. Remodeling of intermediate metabolism in the diatom Phaeodactylum tricornutum under nitrogen stress. Proc Natl Acad Sci. 112(2):412–417. https://doi.org/10.1073/pnas.1419818112
OpenUrl Abstract/FREE Full Text
↵
Lewin JC. 1961. The dissolution of silica from diatom walls. Geochim Cosmochim Acta. 21(3):182–198. https://doi.org/10.1016/S0016-7037(61)80054-9
OpenUrl CrossRef Web of Science
↵
Lewontin RC, Krakauer J. 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 74(1):175–195. https://doi.org/10.1093/genetics/74.1.175
OpenUrl Abstract/FREE Full Text
↵
Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27(21):2987–2993. https://doi.org/10.1093/bioinformatics/btr509
OpenUrl CrossRef PubMed Web of Science
↵
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/btp324
OpenUrl CrossRef PubMed Web of Science
↵
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
OpenUrl CrossRef PubMed Web of Science
↵
Liu Z, Campbell V, Heidelberg KB, Caron DA. 2016. Gene expression characterizes different nutritional strategies among three mixotrophic protists.Olson J, editor. FEMS Microbiol Ecol. 92(7):fiw106. https://doi.org/10.1093/femsec/fiw106
OpenUrl CrossRef PubMed
↵
Llopis Monferrer N, Leynaert A, Tréguer P, Gutiérrez-Rodríguez A, Moriceau B, Gallinari M, Latasa M, L’Helguen S, Maguer J-F, Safi K, et al. 2021. Role of small Rhizaria and diatoms in the pelagic silica production of the Southern Ocean. Limnol Oceanogr. 66(6):2187–2202. https://doi.org/10.1002/lno.11743
OpenUrl
↵
Luu K, Bazin E, Blum MGB. 2017. pcadapt: an R package to perform genome scans for selection based on principal component analysis. Mol Ecol Resour. 17(1):67–77. https://doi.org/10.1111/1755-0998.12592
OpenUrl CrossRef
↵
Malviya S, Scalco E, Audic S, Vincent F, Veluchamy A, Poulain J, Wincker P, Iudicone D, de Vargas C, Bittner L, et al. 2016. Insights into global diatom distribution and diversity in the world’s ocean. Proc Natl Acad Sci. 113(11):E1516–E1525. https://doi.org/10.1073/pnas.1509523113
OpenUrl Abstract/FREE Full Text
↵
Mangot J-F, Logares R, Sánchez P, Latorre F, Seeleuthner Y, Mondy S, Sieracki ME, Jaillon O, Wincker P, Vargas C de, Massana R. 2017. Accessing the genomic information of unculturable oceanic picoeukaryotes by combining multiple single cells. Sci Rep. 7(1):41498. https://doi.org/10.1038/srep41498
OpenUrl
↵
Marchetti A, Moreno CM, Cohen NR, Oleinikov I, deLong K, Twining BS, Armbrust EV, Lampe RH. 2017. Development of a molecular-based index for assessing iron status in bloom-forming pennate diatoms. J Phycol. 53(4):820–832. https://doi.org/10.1111/jpy.12539
OpenUrl
↵
Martin-Jezequel V, Hildebrand M, Brzezinski MA. 2000. Silicon metabolism in diatoms: implications for growth. J Phycol. 36(5):821–840. https://doi.org/10.1046/j.1529-8817.2000.00019.x
OpenUrl CrossRef Web of Science
↵
Massana R, del Campo J, Sieracki ME, Audic S, Logares R. 2014. Exploring the uncultured microeukaryote majority in the oceans: reevaluation of ribogroups within stramenopiles. ISME J. 8(4):854–866. https://doi.org/10.1038/ismej.2013.204
OpenUrl CrossRef PubMed Web of Science
↵
Maumus F, Allen AE, Mhiri C, Hu H, Jabbari K, Vardi A, Grandbastien M-A, Bowler C. 2009. Potential impact of stress activated retrotransposons on genome evolution in a marine diatom. BMC Genomics. 10:624. https://doi.org/10.1186/1471-2164-10-624
OpenUrl CrossRef PubMed
↵
McQuaid JB, Kustka AB, Oborník M, Horák A, McCrow JP, Karas BJ, Zheng H, Kindeberg T, Andersson AJ, Barbeau KA, Allen AE. 2018. Carbonate-sensitive phytotransferrin controls high-affinity iron uptake in diatoms. Nature. 555(7697):534–537. https://doi.org/10.1038/nature25982
OpenUrl PubMed
↵
Moore JK, Doney SC, Lindsay K. 2004. Upper ocean ecosystem dynamics and iron cycling in a global three-dimensional model. Glob Biogeochem Cycles [Internet]. [accessed 2022 Apr 26] 18(4). https://doi.org/10.1029/2004GB002220
↵
Morrissey J, Sutak R, Paz-Yepes J, Tanaka A, Moustafa A, Veluchamy A, Thomas Y, Botebol H, Bouget F-Y, McQuaid JB, et al. 2015. A novel protein, ubiquitous in marine phytoplankton, concentrates iron at the cell surface and facilitates uptake. Curr Biol CB. 25(3):364–371. https://doi.org/10.1016/j.cub.2014.12.004
OpenUrl
↵
Moustafa A, Beszteri B, Maier UG, Bowler C, Valentin K, Bhattacharya D. 2009. Genomic Footprints of a Cryptic Plastid Endosymbiosis in Diatoms. Science. 324(5935):1724–1726. https://doi.org/10.1126/science.1172983
OpenUrl Abstract/FREE Full Text
↵
Nelson DM, Tréguer P, Brzezinski MA, Leynaert A, Quéguiner B. 1995. Production and dissolution of biogenic silica in the ocean: Revised global estimates, comparison with regional data and relationship to biogenic sedimentation. Glob Biogeochem Cycles. 9(3):359–372. https://doi.org/10.1029/95GB01070
OpenUrl CrossRef GeoRef Web of Science
↵
Nelson DR, Hazzouri KM, Lauersen KJ, Jaiswal A, Chaiboonchoe A, Mystikou A, Fu W, Daakour S, Dohai B, Alzahmi A, et al. 2021. Large-scale genome sequencing reveals the driving forces of viruses in microalgal evolution. Cell Host Microbe. 29(2):250–266.e8. https://doi.org/10.1016/j.chom.2020.12.005
OpenUrl
↵
Nelson WC, Tully BJ, Mobberley JM. 2020. Biases in genome reconstruction from metagenomic data. PeerJ. 8:e10119. https://doi.org/10.7717/peerj.10119
OpenUrl CrossRef
↵
Norris RD. 2000. Pelagic Species Diversity, Biogeography, and Evolution. Paleobiology. 26(4):236–258.
OpenUrl CrossRef GeoRef Web of Science
↵
Not F, Valentin K, Romari K, Lovejoy C, Massana R, Toebe K, Vaulot D, Medlin L. 2007. Picobiliphytes: A Marine Picoplanktonic Algal Group with Unknown Affinities to Other Eukaryotes. Science. https://doi.org/10.1126/SCIENCE.1136264
↵
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, et al. 2020. vegan: Community Ecology Package [Internet]. [place unknown]; [accessed 2022 Feb 21]. https://CRAN.R-project.org/package=vegan
↵
Pierella Karlusich JJ, Bowler C, Biswas H. 2021. Carbon Dioxide Concentration Mechanisms in Natural Populations of Marine Diatoms: Insights From Tara Oceans. Front Plant Sci [Internet]. [accessed 2022 Feb 11] 12. https://www.frontiersin.org/article/10.3389/fpls.2021.657821
↵
Pierella Karlusich JJ, Pelletier E, Lombard F, Carsique M, Dvorak E, Colin S, Picheral M, Cornejo-Castillo FM, Acinas SG, Pepperkok R, et al. 2021. Global distribution patterns of marine nitrogen-fixers by imaging and molecular methods. Nat Commun. 12(1):4160. https://doi.org/10.1038/s41467-021-24299-y
OpenUrl
↵
Piganeau G, Eyre-Walker A, Grimsley N, Moreau H. 2011. How and Why DNA Barcodes Underestimate the Diversity of Microbial Eukaryotes.Lopez-Garcia P, editor. PLoS ONE. 6(2):e16342. https://doi.org/10.1371/journal.pone.0016342
OpenUrl CrossRef PubMed
↵
Poulsen N, Sumper M, Kröger N. 2003. Biosilica formation in diatoms: Characterization of native silaffin-2 and its role in silica morphogenesis. Proc Natl Acad Sci. 100(21):12075–12080. https://doi.org/10.1073/pnas.2035131100
OpenUrl Abstract/FREE Full Text
↵
Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26(6):841–842. https://doi.org/10.1093/bioinformatics/btq033
OpenUrl CrossRef PubMed Web of Science
↵
Rastogi A, Maheswari U, Dorrell RG, Vieira FRJ, Maumus F, Kustka A, McCarthy J, Allen AE, Kersey P, Bowler C, Tirichine L. 2018. Integrative analysis of large scale transcriptome data draws a comprehensive landscape of Phaeodactylum tricornutum genome and evolutionary origin of diatoms. Sci Rep. 8(1):4834. https://doi.org/10.1038/s41598-018-23106-x
OpenUrl CrossRef
↵
Rastogi A, Vieira FRJ, Deton-Cabanillas A-F, Veluchamy A, Cantrel C, Wang G, Vanormelingen P, Bowler C, Piganeau G, Hu H, Tirichine L. 2020. A genomics approach reveals the global genetic polymorphism, structure, and functional diversity of ten accessions of the marine model diatom Phaeodactylum tricornutum. ISME J. 14(2):347–363. https://doi.org/10.1038/s41396-019-0528-3
OpenUrl
↵
Rodriguez-R LM, Konstantinidis KT. 2014. Bypassing Cultivation To Identify Bacterial Species: Culture-independent genomic approaches identify credibly distinct clusters, avoid cultivation bias, and provide true insights into microbial species. Microbe Mag. 9(3):111–118. https://doi.org/10.1128/microbe.9.111.1
OpenUrl CrossRef
↵
Royo-Llonch M, Sánchez P, Ruiz-González C, Salazar G, Pedrós-Alió C, Sebastián M, Labadie K, Paoli L, Ibarbalz F, Zinger L, et al. 2021. Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean. Nat Microbiol. 6:1–14. https://doi.org/10.1038/s41564-021-00979-9
OpenUrl
↵
Rynearson TA, Lin EO, Armbrust EV. 2009. Metapopulation Structure in the Planktonic Diatom Ditylum brightwellii (Bacillariophyceae). Protist. 160(1):111–121. https://doi.org/10.1016/j.protis.2008.10.003
OpenUrl CrossRef PubMed Web of Science
↵
Sauna ZE, Kimchi-Sarfaty C, Ambudkar SV, Gottesman MM. 2007. The sounds of silence: synonymous mutations affect function. Pharmacogenomics. 8(6):527–532. https://doi.org/10.2217/14622416.8.6.527
OpenUrl CrossRef PubMed Web of Science
↵
Schiffrine N, Tremblay J-É, Babin M. 2020. Growth and Elemental Stoichiometry of the Ecologically-Relevant Arctic Diatom Chaetoceros gelidus: A Mix of Polar and Temperate. Front Mar Sci [Internet]. [accessed 2022 Feb 11] 6. https://www.frontiersin.org/article/10.3389/fmars.2019.00790
↵
Scoccianti V, Penna A, Penna N, Magnani M. 1995. Effect of heat stress on polyamine content and protein pattern in Skeletonema costatum. Mar Biol. 121(3):549–554. https://doi.org/10.1007/BF00349465
OpenUrl CrossRef Web of Science
↵
Seenivasan R, Sausen N, Medlin LK, Melkonian M. 2013. Picomonas judraskeda Gen. Et Sp. Nov.: The First Identified Member of the Picozoa Phylum Nov., a Widespread Group of Picoeukaryotes, Formerly Known as ‘Picobiliphytes.’ PLOS ONE. 8(3):e59565. https://doi.org/10.1371/journal.pone.0059565
OpenUrl
↵
Shuter BJ, Thomas JE, Taylor WD, Zimmerman AM. 1983. Phenotypic Correlates of Genomic DNA Content in Unicellular Eukaryotes and Other Cells. Am Nat. 122(1):26–44.
OpenUrl CrossRef Web of Science
↵
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19):3210–3212. https://doi.org/10.1093/bioinformatics/btv351
OpenUrl CrossRef PubMed
↵
Smetacek V. 1999. Diatoms and the ocean carbon cycle. Protist. 150(1):25–32. https://doi.org/10.1016/S1434-4610(99)70006-4
OpenUrl CrossRef PubMed Web of Science
↵
Smodlaka Tanković M, Baričević A, Ivančić I, Kužat N, Medić N, Pustijanac E, Novak T, Gašparović B, Marić Pfannkuchen D, Pfannkuchen M. 2018. Insights into the life strategy of the common marine diatom Chaetoceros peruvianus Brightwell.Ianora A, editor. PLOS ONE. 13(9):e0203634. https://doi.org/10.1371/journal.pone.0203634
OpenUrl
↵
Sommeria-Klein G, Watteaux R, Ibarbalz FM, Pierella Karlusich JJ, Iudicone D, Bowler C, Morlon H. 2021. Global drivers of eukaryotic plankton biogeography in the sunlit ocean. Science. 374(6567):594–599. https://doi.org/10.1126/science.abb3717
OpenUrl CrossRef
↵
Stamatakis A, Ludwig T, Meier H. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 21(4):456–463. https://doi.org/10.1093/bioinformatics/bti191
OpenUrl CrossRef PubMed Web of Science
↵
Storey JD, Bass AJ, Dabney A, Robinson D, Warnes G. 2022. qvalue: Q-value estimation for false discovery rate control [Internet]. [place unknown]: Bioconductor version: Release (3.14); [accessed 2022 Feb 21]. https://doi.org/10.18129/B9.bioc.qvalue
↵
Timmermans KR, van der Wagt B, de Baar HJW. 2004. Growth rates, half-saturation constants, and silicate, nitrate, and phosphate depletion in relation to iron availability of four large, open-ocean diatoms from the Southern Ocean. Limnol Oceanogr. 49(6):2141–2151. https://doi.org/10.4319/lo.2004.49.6.2141
OpenUrl
↵
Tréguer P, Bowler C, Moriceau B, Dutkiewicz S, Gehlen M, Aumont O, Bittner L, Dugdale R, Finkel Z, Iudicone D, et al. 2018. Influence of diatom diversity on the ocean biological carbon pump. Nat Geosci. 11(1):27–37. https://doi.org/10.1038/s41561-017-0028-x
OpenUrl CrossRef
↵
Tréguer PJ, de La Rocha CL. 2013. The World Ocean Silica Cycle. Annu Rev Mar Sci. 5(1):477–501. https://doi.org/10.1146/annurev-marine-121211-172346
OpenUrl
↵
de Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le Bescot N, Probert I, et al. 2015. Eukaryotic plankton diversity in the sunlit ocean. Science. 348(6237):1261605. https://doi.org/10.1126/science.1261605
OpenUrl Abstract/FREE Full Text
↵
Veldhuis MJW, Cucci TL, Sieracki ME. 1997. Cellular Dna Content of Marine Phytoplankton Using Two New Fluorochromes: Taxonomic and Ecological Implications1. J Phycol. 33(3):527–541. https://doi.org/10.1111/j.0022-3646.1997.00527.x
OpenUrl CrossRef Web of Science
↵
Veluchamy A, Rastogi A, Lin X, Lombard B, Murik O, Thomas Y, Dingli F, Rivarola M, Ott S, Liu X, et al. 2015. An integrative analysis of post-translational histone modifications in the marine diatom Phaeodactylum tricornutum. Genome Biol. 16(1):102. https://doi.org/10.1186/s13059-015-0671-8
OpenUrl CrossRef PubMed
↵
1. Petersen J
Vincent F, Bowler C. 2020. Diatoms Are Selective Segregators in Global Ocean Planktonic Communities. Petersen J, editor. mSystems. 5(1):e00444-19, /msystems/5/1/msys.00444-19.atom. https://doi.org/10.1128/mSystems.00444-19
OpenUrl Abstract/FREE Full Text
↵
Von Dassow P, Petersen TW, Chepurnov VA, Virginia Armbrust E. 2008. Inter-and Intraspecific Relationships Between Nuclear Dna Content and Cell Size in Selected Members of the Centric Diatom Genus Thalassiosira (bacillariophyceae)1. J Phycol. 44(2):335–349. https://doi.org/10.1111/j.1529-8817.2008.00476.x
OpenUrl CrossRef PubMed Web of Science
↵
Whittaker KA, Rynearson TA. 2017. Evidence for environmental and ecological selection in a microbe with no geographic limits to gene flow. Proc Natl Acad Sci. 114(10):2651–2656. https://doi.org/10.1073/pnas.1612346114
OpenUrl Abstract/FREE Full Text
↵
Williams RB. 1964. Division Rates of Salt Marsh Diatoms in Relation to Salinity and Cell Size. Ecology. 45(4):877–880. https://doi.org/10.2307/1934940
OpenUrl CrossRef
↵
Wright S. 1965. The Interpretation of Population Structure by F-Statistics with Special Regard to Systems of Mating. Evolution. 19(3):395–420. https://doi.org/10.2307/2406450
OpenUrl CrossRef Web of Science
↵
Wright S. 1984. Evolution and the Genetics of Populations, Volume 4: Variability Within and Among Natural Populations [Internet]. Chicago, IL: University of Chicago Press; [accessed 2022 Feb 16]. https://press.uchicago.edu/ucp/books/book/chicago/E/bo3642015.html
↵
Xiao N, Cao D-S, Zhu M-F, Xu Q-S. 2015. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequence. Bioinformatics. 31(11):1857–1859.
OpenUrl CrossRef PubMed
↵
Yool A, Tyrrell T. 2003. Role of diatoms in regulating the ocean’s silicon cycle. Glob Biogeochem Cycles [Internet]. [accessed 2022 Apr 26] 17(4). https://doi.org/10.1029/2002GB002018

View the discussion thread.

Posted May 20, 2022.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Ecology

Subject Areas

All Articles

Animal Behavior and Cognition (5215)
Biochemistry (11752)
Bioengineering (8752)
Bioinformatics (29200)
Biophysics (14974)
Cancer Biology (12096)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14182)
Epidemiology (2067)
Evolutionary Biology (18308)
Genetics (12245)
Genomics (16803)
Immunology (11869)
Microbiology (28097)
Molecular Biology (11594)
Neuroscience (60969)
Paleontology (451)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2886)
Systems Biology (7340)
Zoology (1651)

[1] ↵
Allen AE, Dupont CL, Oborník M, Horák A, Nunes-Nesi A, McCrow JP, Zheng H, Johnson DA, Hu H, Fernie AR, Bowler C. 2011. Evolution and metabolic significance of the urea cycle in photosynthetic diatoms. Nature. 473(7346):203–207. https://doi.org/10.1038/nature10074
OpenUrl CrossRef PubMed Web of Science

[2] ↵
Allen AE, LaRoche J, Maheswari U, Lommer M, Schauer N, Lopez PJ, Finazzi G, Fernie AR, Bowler C. 2008. Whole-cell response of the pennate diatom Phaeodactylum tricornutum to iron starvation. Proc Natl Acad Sci. 105(30):10438–10443. https://doi.org/10.1073/pnas.0711370105
OpenUrl Abstract/FREE Full Text

[3] ↵
Ardyna M, d’Ovidio F, Speich S, Leconte J, Chaffron S, Audic S, Garczarek L, Pesant S, Tara Oceans Consortium C, Tara Oceans Expedition P. 2017. Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about mesoscale features at the sampling location. Tara Oceans Consort Coord Tara Oceans Exped Particip 2017 Regist Samples Tara Oceans Exped 2009-2013 PANGAEA [Internet]. [accessed 2022 Feb 22]. https://doi.org/10.1594/PANGAEA.875577

[4] ↵
Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, et al. 2004. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science. 306(5693):79–86. https://doi.org/10.1126/science.1101156
OpenUrl Abstract/FREE Full Text

[5] ↵
Bailleul B, Rogato A, Martino A de, Coesel S, Cardol P, Bowler C, Falciatore A, Finazzi G. 2010. An atypical member of the light-harvesting complex stress-related protein family modulates diatom responses to light. Proc Natl Acad Sci. 107(42):18214–18219. https://doi.org/10.1073/pnas.1007703107
OpenUrl Abstract/FREE Full Text

[6] ↵
Baker LJ, Alegado RA, Kemp PF. 2016. Response of diatom-associated bacteria to host growth state, nutrient concentrations, and viral host infection in a model system. Environ Microbiol Rep. 8(5):917–927. https://doi.org/10.1111/1758-2229.12456
OpenUrl

[7] ↵
Barton AD, Dutkiewicz S, Flierl G, Bragg J, Follows MJ. 2010. Patterns of Diversity in Marine Phytoplankton. Science. 327(5972):1509–1511. https://doi.org/10.1126/science.1184961
OpenUrl Abstract/FREE Full Text

[8] ↵
Behrenfeld MJ, Milligan AJ. 2013. Photophysiological expressions of iron stress in phytoplankton. Annu Rev Mar Sci. 5:217–246. https://doi.org/10.1146/annurev-marine-121211-172356
OpenUrl

[9] ↵
Bertrand EM, Allen AE, Dupont CL, Norden-Krichmar TM, Bai J, Valas RE, Saito MA. 2012. Influence of cobalamin scarcity on diatom molecular physiology and identification of a cobalamin acquisition protein. Proc Natl Acad Sci.:1762–1771.

[10] ↵
Bopp L, Resplandy L, Orr JC, Doney SC, Dunne JP, Gehlen M, Halloran P, Heinze C, Ilyina T, Séférian R, et al. 2013. Multiple stressors of ocean ecosystems in the 21st century: projections with CMIP5 models. Biogeosciences. 10(10):6225–6245. https://doi.org/10.5194/bg-10-6225-2013
OpenUrl

[11] ↵
Bourret J, Alizon S, Bravo IG. 2019. COUSIN (COdon Usage Similarity INdex): A Normalized Measure of Codon Usage Preferences. Genome Biol Evol. 11(12):3523–3528. https://doi.org/10.1093/gbe/evz262
OpenUrl

[12] ↵
Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, et al. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 35(8):725–731. https://doi.org/10.1038/nbt.3893
OpenUrl CrossRef PubMed

[13] ↵
Bowler C, Allen AE, Badger JH, Grimwood J, Jabbari K, Kuo A, Maheswari U, Martens C, Maumus F, Otillar RP, et al. 2008. The Phaeodactylumgenome reveals the evolutionary history of diatom genomes. Nature. 456(7219):239–244. https://doi.org/10.1038/nature07410
OpenUrl CrossRef PubMed Web of Science

[14] ↵
Boyd PW, Jickells T, Law CS, Blain S, Boyle EA, Buesseler KO, Coale KH, Cullen JJ, de Baar HJW, Follows M, et al. 2007. Mesoscale Iron Enrichment Experiments 1993-2005: Synthesis and Future Directions. Science. 315(5812):612–617. https://doi.org/10.1126/science.1131669
OpenUrl Abstract/FREE Full Text

[15] ↵
Bulankova P, Sekulić M, Jallet D, Nef C, Oosterhout C van, Delmont TO, Vercauteren I, Osuna-Cruz CM, Vancaester E, Mock T, et al. 2021. Mitotic recombination between homologous chromosomes drives genomic diversity in diatoms. Curr Biol [Internet]. [accessed 2021 Jun 9] 0(0). https://doi.org/10.1016/j.cub.2021.05.013

[16] ↵
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25(15):1972–1973. https://doi.org/10.1093/bioinformatics/btp348
OpenUrl CrossRef PubMed Web of Science

[17] ↵
Caputi L, Carradec Q, Eveillard D, Kirilovsky A, Pelletier E, Pierella Karlusich JJ, Rocha Jimenez Vieira F, Villar E, Chaffron S, Malviya S, et al. 2019. Community-Level Responses to Iron Availability in Open Ocean Plankton Ecosystems. Glob Biogeochem Cycles. 33(3):391–419. https://doi.org/10.1029/2018GB006022
OpenUrl CrossRef

[18] ↵
Casteleyn G, Leliaert F, Backeljau T, Debeer A-E, Kotaki Y, Rhodes L, Lundholm N, Sabbe K, Vyverman W. 2010. Limits to gene flow in a cosmopolitan marine planktonic diatom. Proc Natl Acad Sci. 107(29):12952–12957. https://doi.org/10.1073/pnas.1001380107
OpenUrl Abstract/FREE Full Text

[19] ↵
Cavalier-Smith T. 2005. Economy, Speed and Size Matter: Evolutionary Forces Driving Nuclear Genome Miniaturization and Expansion. Ann Bot. 95(1):147–175. https://doi.org/10.1093/aob/mci010
OpenUrl CrossRef PubMed

[20] ↵
Cermeño P, Falkowski PG. 2009. Controls on diatom biogeography in the ocean. Science. 325(5947):1539–1541. https://doi.org/10.1126/science.1174159
OpenUrl Abstract/FREE Full Text

[21] ↵
Chaffron S, Delage E, Budinich M, Vintache D, Henry N, Nef C, Ardyna M, Zayed AA, Junger PC, Galand PE, et al. 2021. Environmental vulnerability of the global ocean epipelagic plankton community interactome. Sci Adv. 7(35):eabg1921. https://doi.org/10.1126/sciadv.abg1921
OpenUrl FREE Full Text

[22] ↵
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w ¹¹¹⁸; iso-2; iso-3. Fly (Austin). 6(2):80–92. https://doi.org/10.4161/fly.19695
OpenUrl CrossRef

[23] ↵
Crenn K, Duffieux D, Jeanthon C. 2018. Bacterial Epibiotic Communities of Ubiquitous and Abundant Marine Diatoms Are Distinct in Short-and Long-Term Associations. Front Microbiol [Internet]. [accessed 2022 Feb 11] 9. https://doi.org/10.3389/fmicb.2018.02879

[24] ↵
Cruz de Carvalho MH, Bowler C. 2020. Global identification of a marine diatom long noncoding natural antisense transcripts (NATs) and their response to phosphate fluctuations. Sci Rep. 10(1):14110. https://doi.org/10.1038/s41598-020-71002-0
OpenUrl

[25] ↵
Cruz de Carvalho MH, Sun H, Bowler C, Chua N. 2016. Noncoding and coding transcriptome responses of a marine diatom to phosphate fluctuations. New Phytol. 210(2):497–510. https://doi.org/10.1111/nph.13787
OpenUrl CrossRef

[26] ↵
Dainat J, Hereñú D, LucileSol, Pascal-Git. 2022. NBISweden/AGAT: AGAT-v0.8.1 [Internet]. [place unknown]: Zenodo; [accessed 2022 Feb 11]. https://doi.org/10.5281/ZENODO.3552717

[27] ↵
De Luca D, Kooistra WHCF, Sarno D, Gaonkar CC, Piredda R. 2019. Global distribution and diversity of Chaetoceros (Bacillariophyta, Mediophyceae): integration of classical and novel strategies. PeerJ. 7:e7410. https://doi.org/10.7717/peerj.7410
OpenUrl

[28] ↵
De Luca D, Sarno D, Piredda R, Kooistra WHCF. 2019. A multigene phylogeny to infer the evolutionary history of Chaetocerotaceae (Bacillariophyta). Mol Phylogenet Evol. 140:106575. https://doi.org/10.1016/j.ympev.2019.106575
OpenUrl

[29] ↵
Delmont TO, Gaia M, Hinsinger DD, Frémont P, Vanni C, Fernandez-Guerra A, Eren AM, Kourlaiev A, d’Agata L, Clayssen Q, et al. 2022. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genomics. 2(5):100123. https://doi.org/10.1016/j.xgen.2022.100123
OpenUrl

[30] ↵
Diniz-Filho JAF, Soares TN, Lima JS, Dobrovolski R, Landeiro VL, de Campos Telles MP, Rangel TF, Bini LM. 2013. Mantel test in population genetics. Genet Mol Biol. 36(4):475–485. https://doi.org/10.1590/S1415-47572013000400002
OpenUrl

[31] ↵
Dorrell RG, Gile G, McCallum G, Méheust R, Bapteste EP, Klinger CM, Brillet-Guéguen L, Freeman KD, Richter DJ, Bowler C. 2017. Chimeric origins of ochrophytes and haptophytes revealed through an ancient plastid proteome.Bhattacharya D, editor. eLife. 6:e23717. https://doi.org/10.7554/eLife.23717
OpenUrl

[32] ↵
Dorrell RG, Smith AG. 2011. Do Red and Green Make Brown?: Perspectives on Plastid Acquisitions within Chromalveolates. Eukaryot Cell [Internet]. [accessed 2022 Feb 11]. https://journals.asm.org/doi/abs/10.1128/EC.00326-10

[33] ↵
Egge JK. 1998. Are diatoms poor competitors at low phosphate concentrations? J Mar Syst. 16(3):191–198. https://doi.org/10.1016/S0924-7963(97)00113-9
OpenUrl CrossRef

[34] ↵
Ellis KA, Cohen NR, Moreno C, Marchetti A. 2017. Cobalamin-independent Methionine Synthase Distribution and Influence on Vitamin B12 Growth Requirements in Marine Diatoms. Protist. 168(1):32–47. https://doi.org/10.1016/j.protis.2016.10.007
OpenUrl CrossRef

[35] ↵
Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16(1):157. https://doi.org/10.1186/s13059-015-0721-2
OpenUrl CrossRef PubMed

[36] ↵
Falkowski PG, Katz ME, Knoll AH, Quigg A, Raven JA, Schofield O, Taylor FJR. 2004. The evolution of modern eukaryotic phytoplankton. Science. 305(5682):354–360. https://doi.org/10.1126/science.1095964
OpenUrl Abstract/FREE Full Text

[37] ↵
Field CB, Behrenfeld MJ, Randerson JT, Falkowski P. 1998. Primary Production of the Biosphere: Integrating Terrestrial and Oceanic Components. Science. 281(5374):237–240. https://doi.org/10.1126/science.281.5374.237
OpenUrl Abstract/FREE Full Text

[38] ↵
Finlay BJ. 2002. Global dispersal of free-living microbial eukaryote species. Science. 296(5570):1061–1063. https://doi.org/10.1126/science.1070710
OpenUrl Abstract/FREE Full Text

[39] ↵
Fortunato AE, Jaubert M, Enomoto G, Bouly J-P, Raniello R, Thaler M, Malviya S, Bernardes JS, Rappaport F, Gentili B, et al. 2016. Diatom Phytochromes Reveal the Existence of Far-Red-Light-Based Sensing in the Ocean. Plant Cell. 28(3):616–628. https://doi.org/10.1105/tpc.15.00928
OpenUrl Abstract/FREE Full Text

[40] ↵
Foster RA, Kuypers MMM, Vagner T, Paerl RW, Musat N, Zehr JP. 2011. Nitrogen fixation and transfer in open ocean diatom–cyanobacterial symbioses. ISME J. 5(9):1484–1493. https://doi.org/10.1038/ismej.2011.26
OpenUrl CrossRef PubMed Web of Science

[41] ↵
Gao X, Bowler C, Kazamia E. 2021. Iron metabolism strategies in diatoms. J Exp Bot. 72(6):2165–2180. https://doi.org/10.1093/jxb/eraa575
OpenUrl

[42] ↵
Gleich SJ, Plough LV, Glibert PM. 2020. Photosynthetic efficiency and nutrient physiology of the diatom Thalassiosira pseudonana at three growth temperatures. Mar Biol. 167(9):124. https://doi.org/10.1007/s00227-020-03741-7
OpenUrl

[43] ↵
Gómez F. 2007. On the consortium of the tintinnid Eutintinnus and the diatom Chaetoceros in the Pacific Ocean. Mar Biol. 151(5):1899–1906. https://doi.org/10.1007/s00227-007-0625-0
OpenUrl

[44] ↵
Gómez F. 2020. Symbioses of Ciliates (Ciliophora) and Diatoms (Bacillariophyceae): Taxonomy and Host–Symbiont Interactions. Oceans. 1(3):133–155. https://doi.org/10.3390/oceans1030010
OpenUrl

[45] ↵
Grigoriev IV, Hayes RD, Calhoun S, Kamel B, Wang A, Ahrendt S, Dusheyko S, Nikitin R, Mondo SJ, Salamov A, et al. 2020. PhycoCosm, a comparative algal genomics resource. Nucleic Acids Res. 49(D1):D1004–D1011. https://doi.org/10.1093/nar/gkaa898
OpenUrl

[46] ↵
Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. 2008. The Vienna RNA Websuite. Nucleic Acids Res. 36(suppl_2):W70–W74. https://doi.org/10.1093/nar/gkn188
OpenUrl CrossRef PubMed Web of Science

[47] ↵
Guidi L, Chaffron S, Bittner L, Eveillard D, Larhlimi A, Roux S, Darzi Y, Audic S, Berline L, Brum JR, et al. 2016. Plankton networks driving carbon export in the oligotrophic ocean. Nature. 532(7600):465–470. https://doi.org/10.1038/nature16942
OpenUrl CrossRef

[48] ↵
Guidi L, Morin P, Coppola L, Tremblay J-É, Pesant S, Tara Oceans Consortium C, Tara Oceans Expedition P. 2017. Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about nutrients in the targeted environmental feature. Tara Oceans Consort Coord Tara Oceans Exped Particip 2017 Regist Samples Tara Oceans Exped 2009-2013 PANGAEA [Internet]. [accessed 2022 Feb 22]. https://doi.org/10.1594/PANGAEA.875575

[49] ↵
Guidi L, Picheral M, Pesant S, Tara Oceans Consortium C, Tara Oceans Expedition P. 2017. Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about sensor data in the targeted environmental feature. Tara Oceans Consort Coord Tara Oceans Exped Particip 2017 Regist Samples Tara Oceans Exped 2009-2013 PANGAEA Httpsdoiorg101594PANGAEA875582 [Internet]. [accessed 2022 Feb 22]. https://doi.org/10.1594/PANGAEA.875576

[50] ↵
Guillot G, Rousset F. 2013. Dismantling the Mantel tests. Methods Ecol Evol. 4(4):336–344. https://doi.org/10.1111/2041-210x.12018
OpenUrl CrossRef

[51] ↵
Haas CE, Rodionov DA, Kropat J, Malasarn D, Merchant SS, de Crécy-Lagard V. 2009. A subset of the diverse COG0523 family of putative metal chaperones is linked to zinc homeostasis in all kingdoms of life. BMC Genomics. 10:470. https://doi.org/10.1186/1471-2164-10-470
OpenUrl CrossRef PubMed

[52] ↵
Hanschen ER, Starkenburg SR. 2020. The state of algal genome quality and diversity. Algal Res. 50:101968. https://doi.org/10.1016/j.algal.2020.101968
OpenUrl

[53] ↵
Härnström K, Ellegaard M, Andersen TJ, Godhe A. 2011. Hundred years of genetic structure in a sediment revived diatom population. Proc Natl Acad Sci. 108(10):4252–4257. https://doi.org/10.1073/pnas.1013528108
OpenUrl Abstract/FREE Full Text

[54] ↵
Hartl DL, Clark AG. 2007. Principles of Population Genetics. Écoscience. 14(4):544–545.
OpenUrl

[55] ↵
Hays GC, Richardson AJ, Robinson C. 2005. Climate change and marine plankton. Trends Ecol Evol. 20(6):337–344. https://doi.org/10.1016/j.tree.2005.03.004
OpenUrl CrossRef PubMed Web of Science

[56] ↵
Helliwell KE, Collins S, Kazamia E, Wheeler GL, Smith AG. 2015. Fundamental shift in vitamin B12 eco-physiology of a model alga demonstrated by experimental evolution. ISME J. 9:1446–1455.
OpenUrl PubMed

[57] ↵
Hoffmann LJ, Peeken I, Lochte K. 2007. Effects of iron on the elemental stoichiometry during EIFEX and in the diatoms Fragilariopsis kerguelensis and Chaetoceros dichaeta. Biogeosciences. 4(4):569–579. https://doi.org/10.5194/bg-4-569-2007
OpenUrl

[58] ↵
Holm-Hansen O. 1969. Algae: amounts of DNA and organic carbon in single cells. Science. 163(3862):87–88. https://doi.org/10.1126/science.163.3862.87
OpenUrl Abstract/FREE Full Text

[59] ↵
Hongo Y, Kimura K, Takaki Y, Yoshida Y, Baba S, Kobayashi G, Nagasaki K, Hano T, Tomaru Y. 2021. The genome of the diatom Chaetoceros tenuissimus carries an ancient integrated fragment of an extant virus. Sci Rep. 11(1):22877. https://doi.org/10.1038/s41598-021-00565-3
OpenUrl

[60] ↵
Hsieh SI, Castruita M, Malasarn D, Urzica E, Erde J, Page MD, Yamasaki H, Casero D, Pellegrini M, Merchant SS, Loo JA. 2013. The proteome of copper, iron, zinc, and manganese micronutrient deficiency in Chlamydomonas reinhardtii. Mol Cell Proteomics MCP. 12(1):65–86. https://doi.org/10.1074/mcp.M112.021840
OpenUrl

[61] ↵
Huang L, Zhang H, Deng D, Zhao K, Liu K, Hendrix DA, Mathews DH. 2019. LinearFold: linear-time approximate RNA folding by 5’-to-3’ dynamic programming and beam search. Bioinformatics. 35(14):i295–i304. https://doi.org/10.1093/bioinformatics/btz375
OpenUrl

[62] ↵
Iverson V, Morris RM, Frazar CD, Berthiaume CT, Morales RL, Armbrust EV. 2012. Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota. Science. 335(6068):587–590. https://doi.org/10.1126/science.1212665
OpenUrl Abstract/FREE Full Text

[63] ↵
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 9(1):5114. https://doi.org/10.1038/s41467-018-07641-9
OpenUrl CrossRef PubMed

[64] ↵
Johnson MK. 1998. Iron—sulfur proteins: new roles for old clusters. Curr Opin Chem Biol. 2(2):173–181. https://doi.org/10.1016/S1367-5931(98)80058-6
OpenUrl CrossRef PubMed Web of Science

[65] ↵
Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinforma Oxf Engl. 30(9):1236–1240. https://doi.org/10.1093/bioinformatics/btu031
OpenUrl

[66] ↵
Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30(14):3059–3066. https://doi.org/10.1093/nar/gkf436
OpenUrl CrossRef PubMed Web of Science

[67] ↵
Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol. 30(4):772–780. https://doi.org/10.1093/molbev/mst010
OpenUrl CrossRef PubMed Web of Science

[68] ↵
Kazamia E, Sutak R, Paz-Yepes J, Dorrell RG, Vieira FRJ, Mach J, Morrissey J, Leon S, Lam F, Pelletier E, et al. 2018. Endocytosis-mediated siderophore uptake as a strategy for Fe acquisition in diatoms. Sci Adv. 4(5):eaar4536. https://doi.org/10.1126/sciadv.aar4536
OpenUrl FREE Full Text

[69] ↵
Kemp AES, Villareal TA. 2018. The case of the diatoms and the muddled mandalas: Time to recognize diatom adaptations to stratified waters. Prog Oceanogr. 167:138–149. https://doi.org/10.1016/j.pocean.2018.08.002
OpenUrl CrossRef

[70] ↵
Kim E, Harrison JW, Sudek S, Jones MDM, Wilcox HM, Richards TA, Worden AZ, Archibald JM. 2011. Newly identified and diverse plastid-bearing branch on the eukaryotic tree of life. Proc Natl Acad Sci U S A. 108(4):1496–1500. https://doi.org/10.1073/pnas.1013337108
OpenUrl Abstract/FREE Full Text

[71] ↵
Kimura K, Tomaru Y. 2014. Coculture with marine bacteria confers resistance to complete viral lysis of diatom cultures. Aquat Microb Ecol. 73(1):69–80. https://doi.org/10.3354/ame01705
OpenUrl

[72] ↵
Koester JA, Swalwell JE, von Dassow P, Armbrust EV. 2010. Genome size differentiates co-occurring populations of the planktonic diatom Ditylum brightwellii(Bacillariophyta). BMC Evol Biol. 10(1):1. https://doi.org/10.1186/1471-2148-10-1
OpenUrl CrossRef PubMed

[73] ↵
Kofler R, Pandey RV, Schlotterer C. 2011. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics. 27(24):3435–3436. https://doi.org/10.1093/bioinformatics/btr589
OpenUrl CrossRef PubMed Web of Science

[74] ↵
Kotabova E, Malych R, Pierella Karlusich JJ, Kazamia E, Eichner M, Mach J, Lesuisse E, Bowler C, Prášil O, Sutak R. 2021. Complex Response of the Chlorarachniophyte Bigelowiella natans to Iron Availability. mSystems. 6(1):e00738-20. https://doi.org/10.1128/mSystems.00738-20
OpenUrl Abstract/FREE Full Text

[75] ↵
Kröger N, Deutzmann R, Bergsdorf C, Sumper M. 2000. Species-specific polyamines from diatoms control silica morphology. Proc Natl Acad Sci. 97(26):14133–14138. https://doi.org/10.1073/pnas.260496497
OpenUrl Abstract/FREE Full Text

[76] ↵
Kröger N, Sumper M. 1998. Diatom Cell Wall Proteins and the Cell Biology of Silica Biomineralization. Protist. 149(3):213–219. https://doi.org/10.1016/S1434-4610(98)70029-X
OpenUrl CrossRef PubMed

[77] ↵
La Roche J, Boyd PW, McKay RML, Geider RJ. 1996. Flavodoxin as an in situ marker for iron stress in phytoplankton. Nature. 382(6594):802–805. https://doi.org/10.1038/382802a0
OpenUrl CrossRef Web of Science

[78] ↵
Lampe RH, Mann EL, Cohen NR, Till CP, Thamatrakoln K, Brzezinski MA, Bruland KW, Twining BS, Marchetti A. 2018. Different iron storage strategies among bloom-forming diatoms. Proc Natl Acad Sci. 115(52):E12275–E12284. https://doi.org/10.1073/pnas.1805243115
OpenUrl Abstract/FREE Full Text

[79] ↵
Laporte F, Mary-Huard T. 2021. MM4LMM: Inference of Linear Mixed Models Through MM Algorithm [Internet]. [place unknown]; [accessed 2022 Feb 21]. https://CRAN.R-project.org/package=MM4LMM

[80] ↵
Laso-Jadart R, O’Malley M, Sykulski AM, Ambroise C, Madoui M-A. 2021. How marine currents and environment shape plankton genomic differentiation: a mosaic view from Tara Oceans metagenomic data [Internet]. [place unknown]; [accessed 2022 Feb 16]. https://doi.org/10.1101/2021.04.29.441957

[81] ↵
Leblanc K, Quéguiner B, Diaz F, Cornet V, Michel-Rodriguez M, Durrieu de Madron X, Bowler C, Malviya S, Thyssen M, Grégori G, et al. 2018. Nanoplanktonic diatoms are globally overlooked but play a role in spring blooms and carbon export. Nat Commun. 9(1):953. https://doi.org/10.1038/s41467-018-03376-9
OpenUrl

[82] ↵
Leconte J, Benites LF, Vannier T, Wincker P, Piganeau G, Jaillon O. 2020. Genome Resolved Biogeography of Mamiellales. Genes. 11(1):66. https://doi.org/10.3390/genes11010066
OpenUrl

[83] ↵
Leconte J, Timsit Y, Delmont TO, Lescot M, Piganeau G, Wincker P, Jaillon O. 2021. Equatorial to Polar genomic variability of the microalgae Bathycoccus prasinos [Internet]. [place unknown]: Genomics; [accessed 2022 Apr 19]. https://doi.org/10.1101/2021.07.13.452163

[84] ↵
Letunic I, Bork P. 2021. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49(W1):W293–W296. https://doi.org/10.1093/nar/gkab301
OpenUrl CrossRef

[85] ↵
Levitan O, Dinamarca J, Zelzion E, Lun DS, Guerra LT, Kim MK, Kim J, Van Mooy BAS, Bhattacharya D, Falkowski PG. 2015. Remodeling of intermediate metabolism in the diatom Phaeodactylum tricornutum under nitrogen stress. Proc Natl Acad Sci. 112(2):412–417. https://doi.org/10.1073/pnas.1419818112
OpenUrl Abstract/FREE Full Text

[86] ↵
Lewin JC. 1961. The dissolution of silica from diatom walls. Geochim Cosmochim Acta. 21(3):182–198. https://doi.org/10.1016/S0016-7037(61)80054-9
OpenUrl CrossRef Web of Science

[87] ↵
Lewontin RC, Krakauer J. 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 74(1):175–195. https://doi.org/10.1093/genetics/74.1.175
OpenUrl Abstract/FREE Full Text

[88] ↵
Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27(21):2987–2993. https://doi.org/10.1093/bioinformatics/btr509
OpenUrl CrossRef PubMed Web of Science

[89] ↵
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/btp324
OpenUrl CrossRef PubMed Web of Science

[90] ↵
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
OpenUrl CrossRef PubMed Web of Science

[91] ↵
Liu Z, Campbell V, Heidelberg KB, Caron DA. 2016. Gene expression characterizes different nutritional strategies among three mixotrophic protists.Olson J, editor. FEMS Microbiol Ecol. 92(7):fiw106. https://doi.org/10.1093/femsec/fiw106
OpenUrl CrossRef PubMed

[92] ↵
Llopis Monferrer N, Leynaert A, Tréguer P, Gutiérrez-Rodríguez A, Moriceau B, Gallinari M, Latasa M, L’Helguen S, Maguer J-F, Safi K, et al. 2021. Role of small Rhizaria and diatoms in the pelagic silica production of the Southern Ocean. Limnol Oceanogr. 66(6):2187–2202. https://doi.org/10.1002/lno.11743
OpenUrl

[93] ↵
Luu K, Bazin E, Blum MGB. 2017. pcadapt: an R package to perform genome scans for selection based on principal component analysis. Mol Ecol Resour. 17(1):67–77. https://doi.org/10.1111/1755-0998.12592
OpenUrl CrossRef

[94] ↵
Malviya S, Scalco E, Audic S, Vincent F, Veluchamy A, Poulain J, Wincker P, Iudicone D, de Vargas C, Bittner L, et al. 2016. Insights into global diatom distribution and diversity in the world’s ocean. Proc Natl Acad Sci. 113(11):E1516–E1525. https://doi.org/10.1073/pnas.1509523113
OpenUrl Abstract/FREE Full Text

[95] ↵
Mangot J-F, Logares R, Sánchez P, Latorre F, Seeleuthner Y, Mondy S, Sieracki ME, Jaillon O, Wincker P, Vargas C de, Massana R. 2017. Accessing the genomic information of unculturable oceanic picoeukaryotes by combining multiple single cells. Sci Rep. 7(1):41498. https://doi.org/10.1038/srep41498
OpenUrl

[96] ↵
Marchetti A, Moreno CM, Cohen NR, Oleinikov I, deLong K, Twining BS, Armbrust EV, Lampe RH. 2017. Development of a molecular-based index for assessing iron status in bloom-forming pennate diatoms. J Phycol. 53(4):820–832. https://doi.org/10.1111/jpy.12539
OpenUrl

[97] ↵
Martin-Jezequel V, Hildebrand M, Brzezinski MA. 2000. Silicon metabolism in diatoms: implications for growth. J Phycol. 36(5):821–840. https://doi.org/10.1046/j.1529-8817.2000.00019.x
OpenUrl CrossRef Web of Science

[98] ↵
Massana R, del Campo J, Sieracki ME, Audic S, Logares R. 2014. Exploring the uncultured microeukaryote majority in the oceans: reevaluation of ribogroups within stramenopiles. ISME J. 8(4):854–866. https://doi.org/10.1038/ismej.2013.204
OpenUrl CrossRef PubMed Web of Science

[99] ↵
Maumus F, Allen AE, Mhiri C, Hu H, Jabbari K, Vardi A, Grandbastien M-A, Bowler C. 2009. Potential impact of stress activated retrotransposons on genome evolution in a marine diatom. BMC Genomics. 10:624. https://doi.org/10.1186/1471-2164-10-624
OpenUrl CrossRef PubMed

[100] ↵
McQuaid JB, Kustka AB, Oborník M, Horák A, McCrow JP, Karas BJ, Zheng H, Kindeberg T, Andersson AJ, Barbeau KA, Allen AE. 2018. Carbonate-sensitive phytotransferrin controls high-affinity iron uptake in diatoms. Nature. 555(7697):534–537. https://doi.org/10.1038/nature25982
OpenUrl PubMed

[101] ↵
Moore JK, Doney SC, Lindsay K. 2004. Upper ocean ecosystem dynamics and iron cycling in a global three-dimensional model. Glob Biogeochem Cycles [Internet]. [accessed 2022 Apr 26] 18(4). https://doi.org/10.1029/2004GB002220

[102] ↵
Morrissey J, Sutak R, Paz-Yepes J, Tanaka A, Moustafa A, Veluchamy A, Thomas Y, Botebol H, Bouget F-Y, McQuaid JB, et al. 2015. A novel protein, ubiquitous in marine phytoplankton, concentrates iron at the cell surface and facilitates uptake. Curr Biol CB. 25(3):364–371. https://doi.org/10.1016/j.cub.2014.12.004
OpenUrl

[103] ↵
Moustafa A, Beszteri B, Maier UG, Bowler C, Valentin K, Bhattacharya D. 2009. Genomic Footprints of a Cryptic Plastid Endosymbiosis in Diatoms. Science. 324(5935):1724–1726. https://doi.org/10.1126/science.1172983
OpenUrl Abstract/FREE Full Text

[104] ↵
Nelson DM, Tréguer P, Brzezinski MA, Leynaert A, Quéguiner B. 1995. Production and dissolution of biogenic silica in the ocean: Revised global estimates, comparison with regional data and relationship to biogenic sedimentation. Glob Biogeochem Cycles. 9(3):359–372. https://doi.org/10.1029/95GB01070
OpenUrl CrossRef GeoRef Web of Science

[105] ↵
Nelson DR, Hazzouri KM, Lauersen KJ, Jaiswal A, Chaiboonchoe A, Mystikou A, Fu W, Daakour S, Dohai B, Alzahmi A, et al. 2021. Large-scale genome sequencing reveals the driving forces of viruses in microalgal evolution. Cell Host Microbe. 29(2):250–266.e8. https://doi.org/10.1016/j.chom.2020.12.005
OpenUrl

[106] ↵
Nelson WC, Tully BJ, Mobberley JM. 2020. Biases in genome reconstruction from metagenomic data. PeerJ. 8:e10119. https://doi.org/10.7717/peerj.10119
OpenUrl CrossRef

[107] ↵
Norris RD. 2000. Pelagic Species Diversity, Biogeography, and Evolution. Paleobiology. 26(4):236–258.
OpenUrl CrossRef GeoRef Web of Science

[108] ↵
Not F, Valentin K, Romari K, Lovejoy C, Massana R, Toebe K, Vaulot D, Medlin L. 2007. Picobiliphytes: A Marine Picoplanktonic Algal Group with Unknown Affinities to Other Eukaryotes. Science. https://doi.org/10.1126/SCIENCE.1136264

[109] ↵
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, et al. 2020. vegan: Community Ecology Package [Internet]. [place unknown]; [accessed 2022 Feb 21]. https://CRAN.R-project.org/package=vegan

[110] ↵
Pierella Karlusich JJ, Bowler C, Biswas H. 2021. Carbon Dioxide Concentration Mechanisms in Natural Populations of Marine Diatoms: Insights From Tara Oceans. Front Plant Sci [Internet]. [accessed 2022 Feb 11] 12. https://www.frontiersin.org/article/10.3389/fpls.2021.657821

[111] ↵
Pierella Karlusich JJ, Pelletier E, Lombard F, Carsique M, Dvorak E, Colin S, Picheral M, Cornejo-Castillo FM, Acinas SG, Pepperkok R, et al. 2021. Global distribution patterns of marine nitrogen-fixers by imaging and molecular methods. Nat Commun. 12(1):4160. https://doi.org/10.1038/s41467-021-24299-y
OpenUrl

[112] ↵
Piganeau G, Eyre-Walker A, Grimsley N, Moreau H. 2011. How and Why DNA Barcodes Underestimate the Diversity of Microbial Eukaryotes.Lopez-Garcia P, editor. PLoS ONE. 6(2):e16342. https://doi.org/10.1371/journal.pone.0016342
OpenUrl CrossRef PubMed

[113] ↵
Poulsen N, Sumper M, Kröger N. 2003. Biosilica formation in diatoms: Characterization of native silaffin-2 and its role in silica morphogenesis. Proc Natl Acad Sci. 100(21):12075–12080. https://doi.org/10.1073/pnas.2035131100
OpenUrl Abstract/FREE Full Text

[114] ↵
Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26(6):841–842. https://doi.org/10.1093/bioinformatics/btq033
OpenUrl CrossRef PubMed Web of Science

[115] ↵
Rastogi A, Maheswari U, Dorrell RG, Vieira FRJ, Maumus F, Kustka A, McCarthy J, Allen AE, Kersey P, Bowler C, Tirichine L. 2018. Integrative analysis of large scale transcriptome data draws a comprehensive landscape of Phaeodactylum tricornutum genome and evolutionary origin of diatoms. Sci Rep. 8(1):4834. https://doi.org/10.1038/s41598-018-23106-x
OpenUrl CrossRef

[116] ↵
Rastogi A, Vieira FRJ, Deton-Cabanillas A-F, Veluchamy A, Cantrel C, Wang G, Vanormelingen P, Bowler C, Piganeau G, Hu H, Tirichine L. 2020. A genomics approach reveals the global genetic polymorphism, structure, and functional diversity of ten accessions of the marine model diatom Phaeodactylum tricornutum. ISME J. 14(2):347–363. https://doi.org/10.1038/s41396-019-0528-3
OpenUrl

[117] ↵
Rodriguez-R LM, Konstantinidis KT. 2014. Bypassing Cultivation To Identify Bacterial Species: Culture-independent genomic approaches identify credibly distinct clusters, avoid cultivation bias, and provide true insights into microbial species. Microbe Mag. 9(3):111–118. https://doi.org/10.1128/microbe.9.111.1
OpenUrl CrossRef

[118] ↵
Royo-Llonch M, Sánchez P, Ruiz-González C, Salazar G, Pedrós-Alió C, Sebastián M, Labadie K, Paoli L, Ibarbalz F, Zinger L, et al. 2021. Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean. Nat Microbiol. 6:1–14. https://doi.org/10.1038/s41564-021-00979-9
OpenUrl

[119] ↵
Rynearson TA, Lin EO, Armbrust EV. 2009. Metapopulation Structure in the Planktonic Diatom Ditylum brightwellii (Bacillariophyceae). Protist. 160(1):111–121. https://doi.org/10.1016/j.protis.2008.10.003
OpenUrl CrossRef PubMed Web of Science

[120] ↵
Sauna ZE, Kimchi-Sarfaty C, Ambudkar SV, Gottesman MM. 2007. The sounds of silence: synonymous mutations affect function. Pharmacogenomics. 8(6):527–532. https://doi.org/10.2217/14622416.8.6.527
OpenUrl CrossRef PubMed Web of Science

[121] ↵
Schiffrine N, Tremblay J-É, Babin M. 2020. Growth and Elemental Stoichiometry of the Ecologically-Relevant Arctic Diatom Chaetoceros gelidus: A Mix of Polar and Temperate. Front Mar Sci [Internet]. [accessed 2022 Feb 11] 6. https://www.frontiersin.org/article/10.3389/fmars.2019.00790

[122] ↵
Scoccianti V, Penna A, Penna N, Magnani M. 1995. Effect of heat stress on polyamine content and protein pattern in Skeletonema costatum. Mar Biol. 121(3):549–554. https://doi.org/10.1007/BF00349465
OpenUrl CrossRef Web of Science

[123] ↵
Seenivasan R, Sausen N, Medlin LK, Melkonian M. 2013. Picomonas judraskeda Gen. Et Sp. Nov.: The First Identified Member of the Picozoa Phylum Nov., a Widespread Group of Picoeukaryotes, Formerly Known as ‘Picobiliphytes.’ PLOS ONE. 8(3):e59565. https://doi.org/10.1371/journal.pone.0059565
OpenUrl

[124] ↵
Shuter BJ, Thomas JE, Taylor WD, Zimmerman AM. 1983. Phenotypic Correlates of Genomic DNA Content in Unicellular Eukaryotes and Other Cells. Am Nat. 122(1):26–44.
OpenUrl CrossRef Web of Science

[125] ↵
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19):3210–3212. https://doi.org/10.1093/bioinformatics/btv351
OpenUrl CrossRef PubMed

[126] ↵
Smetacek V. 1999. Diatoms and the ocean carbon cycle. Protist. 150(1):25–32. https://doi.org/10.1016/S1434-4610(99)70006-4
OpenUrl CrossRef PubMed Web of Science

[127] ↵
Smodlaka Tanković M, Baričević A, Ivančić I, Kužat N, Medić N, Pustijanac E, Novak T, Gašparović B, Marić Pfannkuchen D, Pfannkuchen M. 2018. Insights into the life strategy of the common marine diatom Chaetoceros peruvianus Brightwell.Ianora A, editor. PLOS ONE. 13(9):e0203634. https://doi.org/10.1371/journal.pone.0203634
OpenUrl

[128] ↵
Sommeria-Klein G, Watteaux R, Ibarbalz FM, Pierella Karlusich JJ, Iudicone D, Bowler C, Morlon H. 2021. Global drivers of eukaryotic plankton biogeography in the sunlit ocean. Science. 374(6567):594–599. https://doi.org/10.1126/science.abb3717
OpenUrl CrossRef

[129] ↵
Stamatakis A, Ludwig T, Meier H. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics. 21(4):456–463. https://doi.org/10.1093/bioinformatics/bti191
OpenUrl CrossRef PubMed Web of Science

[130] ↵
Storey JD, Bass AJ, Dabney A, Robinson D, Warnes G. 2022. qvalue: Q-value estimation for false discovery rate control [Internet]. [place unknown]: Bioconductor version: Release (3.14); [accessed 2022 Feb 21]. https://doi.org/10.18129/B9.bioc.qvalue

[131] ↵
Timmermans KR, van der Wagt B, de Baar HJW. 2004. Growth rates, half-saturation constants, and silicate, nitrate, and phosphate depletion in relation to iron availability of four large, open-ocean diatoms from the Southern Ocean. Limnol Oceanogr. 49(6):2141–2151. https://doi.org/10.4319/lo.2004.49.6.2141
OpenUrl

[132] ↵
Tréguer P, Bowler C, Moriceau B, Dutkiewicz S, Gehlen M, Aumont O, Bittner L, Dugdale R, Finkel Z, Iudicone D, et al. 2018. Influence of diatom diversity on the ocean biological carbon pump. Nat Geosci. 11(1):27–37. https://doi.org/10.1038/s41561-017-0028-x
OpenUrl CrossRef

[133] ↵
Tréguer PJ, de La Rocha CL. 2013. The World Ocean Silica Cycle. Annu Rev Mar Sci. 5(1):477–501. https://doi.org/10.1146/annurev-marine-121211-172346
OpenUrl

[134] ↵
de Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le Bescot N, Probert I, et al. 2015. Eukaryotic plankton diversity in the sunlit ocean. Science. 348(6237):1261605. https://doi.org/10.1126/science.1261605
OpenUrl Abstract/FREE Full Text

[135] ↵
Veldhuis MJW, Cucci TL, Sieracki ME. 1997. Cellular Dna Content of Marine Phytoplankton Using Two New Fluorochromes: Taxonomic and Ecological Implications1. J Phycol. 33(3):527–541. https://doi.org/10.1111/j.0022-3646.1997.00527.x
OpenUrl CrossRef Web of Science

[136] ↵
Veluchamy A, Rastogi A, Lin X, Lombard B, Murik O, Thomas Y, Dingli F, Rivarola M, Ott S, Liu X, et al. 2015. An integrative analysis of post-translational histone modifications in the marine diatom Phaeodactylum tricornutum. Genome Biol. 16(1):102. https://doi.org/10.1186/s13059-015-0671-8
OpenUrl CrossRef PubMed

[137] ↵
Petersen J
Vincent F, Bowler C. 2020. Diatoms Are Selective Segregators in Global Ocean Planktonic Communities. Petersen J, editor. mSystems. 5(1):e00444-19, /msystems/5/1/msys.00444-19.atom. https://doi.org/10.1128/mSystems.00444-19
OpenUrl Abstract/FREE Full Text

[138] Petersen J

[139] ↵
Von Dassow P, Petersen TW, Chepurnov VA, Virginia Armbrust E. 2008. Inter-and Intraspecific Relationships Between Nuclear Dna Content and Cell Size in Selected Members of the Centric Diatom Genus Thalassiosira (bacillariophyceae)1. J Phycol. 44(2):335–349. https://doi.org/10.1111/j.1529-8817.2008.00476.x
OpenUrl CrossRef PubMed Web of Science

[140] ↵
Whittaker KA, Rynearson TA. 2017. Evidence for environmental and ecological selection in a microbe with no geographic limits to gene flow. Proc Natl Acad Sci. 114(10):2651–2656. https://doi.org/10.1073/pnas.1612346114
OpenUrl Abstract/FREE Full Text

[141] ↵
Williams RB. 1964. Division Rates of Salt Marsh Diatoms in Relation to Salinity and Cell Size. Ecology. 45(4):877–880. https://doi.org/10.2307/1934940
OpenUrl CrossRef

[142] ↵
Wright S. 1965. The Interpretation of Population Structure by F-Statistics with Special Regard to Systems of Mating. Evolution. 19(3):395–420. https://doi.org/10.2307/2406450
OpenUrl CrossRef Web of Science

[143] ↵
Wright S. 1984. Evolution and the Genetics of Populations, Volume 4: Variability Within and Among Natural Populations [Internet]. Chicago, IL: University of Chicago Press; [accessed 2022 Feb 16]. https://press.uchicago.edu/ucp/books/book/chicago/E/bo3642015.html

[144] ↵
Xiao N, Cao D-S, Zhu M-F, Xu Q-S. 2015. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequence. Bioinformatics. 31(11):1857–1859.
OpenUrl CrossRef PubMed

[145] ↵
Yool A, Tyrrell T. 2003. Role of diatoms in regulating the ocean’s silicon cycle. Glob Biogeochem Cycles [Internet]. [accessed 2022 Apr 26] 17(4). https://doi.org/10.1029/2002GB002018