Metabarcoding reveals different zooplankton communities in northern and southern areas of the North Sea

Zooplankton are key players in marine ecosystems, linking primary production to higher trophic levels. The high abundance and high taxonomic diversity renders zooplankton ideal for biodiversity monitoring. However, taxonomic identification of the zooplankton assemblage is challenging due to its high diversity, subtle morphological differences and the presence of many meroplanktonic species, especially in coastal seas. Molecular techniques such as metabarcoding can help with rapid processing and identification of taxa in complex samples, and are therefore promising tools for identifying zooplankton communities. In this study, we applied metabarcoding of the mitochondrial cytochrome c oxidase I gene to zooplankton samples collected along a latitudinal transect in the North Sea, a shelf sea of the Atlantic Ocean. Northern regions of the North Sea are influenced by inflow of oceanic Atlantic waters, whereas the southern parts are characterised by more coastal waters. Our metabarcoding results indicated strong differences in zooplankton community composition between northern and southern areas of the North Sea, particularly in the classes Copepoda, Actinopterygii (ray-finned fishes) and Polychaeta. We compared these results to the known distributions of species reported in previous studies, and by comparing the abundance of copepods to data obtained from the Continuous Plankton Recorder (CPR). We found that our metabarcoding results are mostly congruent with the reported distribution and abundance patterns of zooplankton species in the North Sea. Our results highlight the power of metabarcoding to rapidly assess complex zooplankton samples, and we suggest that the technique could be used in future monitoring campaigns and biodiversity assessments. Highlights Zooplankton communities are different in northern and southern areas of the North Sea Metabarcoding results are consistent with known species distributions and abundance Metabarcoding allows for fast identification of meroplanktonic species


46
Zooplankton are key players in marine ecosystems, linking primary production to higher 47 pairs to remove reads that contain gaps or indels, which can be present due to amplification 157 of non-eukaryotic taxa (Wangensteen et al., 2018;Macher et al., 2018;Collins et al., 2019). 158 UNOISE (Edgar, 2016) was used for clustering of Operational Taxonomic Units (OTUs). We 159 chose thresholds of alpha = 4 and a minimum number of 10 reads for the denoising 160 approach, which is similar to settings reported in previous studies that found an alpha of 5 to 161 give reliable results (Elbrecht et al., 2018;Turon et al., 2019). We chose an alpha of 4 to be 162 slightly more restrictive and remove more potentially wrong sequence variants from the 163 7 dataset, although this approach might also increase the loss of genuine variants. To further 164 reduce the risk of analysing spurious OTUs, only those OTUs with >0.002% relative 165 abundance per sample were retained, which corresponds to >1 read in the sample with the 166 lowest read count. Further, we only retained OTUs that were present in both extraction 167 replicates per sample. Such an approach, i.e. filtering out low abundant OTUs based on 168 relative abundance, is commonly used in metabarcoding studies 169 Pereira-da-Conceicoa et al., 2019; Theissinger et al., 2019). After this quality filtering step, 170 the reads of the two technical replicates per sample were summed up to build the final 171 dataset. Quality filtered reads were assigned to species using the BOLD database 172 (Ratnasingham & Hebert, 2007) with the BOLDigger tool (Buchner & Leese, 2020). The 173 following identity thresholds were used for assigning taxonomic ranks: species 98%; genus 174 95%; family 93%; order 90%; class 85%. OTUs that were assigned to the same taxonomic 175 name were subsequently lumped by summing up reads to prevent analyses of intraspecific 176 variability as provided by the UNOISE pipeline. We focussed our analyses on planktonic 177 metazoans (animal zooplankton). 178 179

Community analyses 180
We tested which of the parameters: salinity, temperature, bottom depth or latitude, best 181 explained the community composition of zooplankton in the North Sea during the NICO 10 182 expedition. Analyses of community composition were conducted using the R package vegan 183 (Oksanen et al. 2019, https://cran.r-project.org/package=vegan). Averages of the abiotic 184 variables 'salinity' and 'temperature' across the water column were obtained from the CTD 185 data (Supplementary Table 1). The variables 'bottom depth at sampling site' and 'latitude of 186 sampling site' were extracted from the ship logbook. The variables were categorized into two 187 classes (<50th percentile, ≥ 50th percentile of the variable range), and sampling sites were 188 assigned to these classes accordingly. The four southernmost sampling sites (south of 56°N) 189 were also the shallowest (shallower than 75m), while the five northern sampling sites (north 190 of 56°N) were all deeper than 75m. Sites were therefore categorized as 'southern/shallow' and 'northern/deep', respectively (Fig. 1) to the area covered during the NICO leg 10 expedition. We compared the metabarcoding 226 data (relative read abundance) with data from the CPR (abundance/m 3 ). For the ray-finned 227 fishes and polychaetes using CPR data was not possible, as these taxa are recorded as 228 larvae or eggs without further taxonomic identification. Oithona similis in the metabarcoding 229 dataset was compared to the Oithona spp. data from the CPR, as Oithona similis is not 230 specifically recorded by the CPR, but is by far the most common Oithona species in the 231 North Sea (Fransz et al., 1991). 232

234
Zooplankton communities from nine sampling sites across the North Sea were analyzed, 235 and 42,798,930 raw reads were obtained. The two negative controls contained a total of 236 1204 reads (0.0028% of all reads). As Illumina platforms commonly show a low percentage 237 of tag switching during sequencing (Schnell, Bohmann & Gilbert, 2015) and no DNA was 238 observed in the negative controls during library preparation, no contamination was 239 suspected. After merging of forward and reverse reads and quality filtering, 18,904,404 240 sequences were retained. Bray-Curtis dissimilarity between extraction replicates of the 241 same sample was low (mean 0.018, standard error of the mean 0.002), and therefore no 242 systematic problem with extraction or laboratory processing was suspected. 243 244

Community composition 245
A total of 3315 OTUs were obtained. These belonged to 33 taxonomic classes, to which 246 96.1% of quality filtered reads could be assigned. Of the 33 classes, 26 were identified as 247 animals, while 7 classes (with 0.97% of all reads) were identified as plants and bacteria, and 248 were removed prior to further analysis. The zooplankton classes with the highest abundance 249 (based on read counts) were Copepoda (30.2% of reads, 28 identified species), 250 Actinopterygii (ray-finned fishes; 26.3% of reads, 16 identified species), Sagittoidea (arrow 251 worms, 19.9% of reads, 3 identified species), Branchiopoda (10.1% of reads, 3 identified 252 species), Polychaeta (5.8% of reads, 29 identified species), and Echinoidea (5.7% of reads, 253 5 identified species). All other classes were present with less than 1% of reads. The 26 254 classes were assigned to 59 orders, 103 families, 119 genera, and 127 species, to which 255 75% of all reads could be assigned (see supplementary table S2 for a species list). 256 Community composition of the entire zooplankton assemblage differed significantly and 257 strongly between 'northern/deep' and 'southern/shallow' sampling sites (R2 = 0.35, p = 258 0.004) as well as between northern and southern sites categorized based on salinity (R2 = 259 0.31, p = 0.018, Table 1). Water temperature did not explain overall community composition. 260 11 Similar results were obtained when focussing only on the copepods and polychaetes. For 261 the ray-finned fishes, 'salinity/latitude' best explained community composition (Table 1). 262 Copepods, ray-finned fishes and polychaetes as the most abundant and species-rich groups 263 are discussed in more detail below. As latitude/depth best explained community 264 composition, we further focus on this factor. 265 266  However, high read abundances were found in central North Sea sampling sites (Site S109, 295 18.5% of copepod reads; Site S22, 45.4%), whereas abundance of the species did not 296 exceed 6% of reads in all other sampling sites. Metabarcoding and CPR data could not be 297 compared, as Microcalanus pusillus is absent from the CPR data due to its distribution in 298 deeper water and small size (Fransz 1991). The CPR samples only the top water layer and 299 the used mesh size does not reliably retain very small organisms. For Calanus finmarchicus, 300 a strong, significant increase in read abundance with latitude was observed in the 301 metabarcoding dataset (Figure 3c). Equally, the CPR dataset showed that abundance of the species significantly increased with latitude ( Figure 3f). Temora longicornis showed a 303 negative, but non-significant trend in read abundance from southern to northern sampling 304 sites in the metabarcoding data ( Figure 3d). This trend was also found in the CPR data and 305 was significant (Figure 3g Out of 16 identified ray-finned fish species, 3 (18.8%) were found exclusively in the 314 'northern/deep' sampling sites, while 6 (37.5%) were only found in the 'southern/shallow' 315 sites, and 7 (43.6%) were found in both areas (Figure 2c). The most abundant species were 316 the common mackerel (Scomber scombrus, 48.6% of ray-finned fish reads), common dab 317 (Limanda limanda, 19%), common ling (Molva molva, 11.8%) and the scaldfish Arnoglossus 318 laterna (8.3%). As the CPR assesses fish on the level of eggs and larvae without species 319 identification, no comparison of metabarcoding and CPR data was possible. 320 High read abundances of the common mackerel (Scomber scombrus) were found in the 321 central and northern part of the North Sea (>90% of ray-finned fish reads; sites S109, 56.06 322 °N; S22, 56.59 °N; S93, 57.36 °N)(figure 4a). The common dab (Limanda limanda) was 323 found in high abundance (>90% of fish reads) at sampling sites S176 (54.41°N) and S156 324 Of the 29 identified polychaete species, 11 (37.9%) were exclusively found in the 336 'northern/deep' sampling sites, 9 (31%) exclusively in the 'southern/shallow' sampling sites, 337 and 9 (31%) were found in both areas (Figure 2 d). The most abundant species were 338 Paramphinome jeffreysii (28.3% of polychaete reads), Glycera lapidum sp. 1 (20.9%), 339 Pectinaria koreni (18.4%) and Magelona johnstoni (11.8%). As the CPR registers all 340 polychaetes apart from the holoplanktonic Tomopteris, as (unidentified) larvae, no 341 comparison of metabarcoding data and CPR data was possible. 342 Paramphinome jeffreysii showed a strong positive correlation of read abundance and latitude 343 (figure 5 a), with read abundances reaching >60% of polychaete reads in all 'northern/deep' 344 sampling sites, while it was only found in one site in the southern North Sea, with a read 345 abundance of <1%. Pectinaria koreni did not show a significant correlation of read 346 abundance and latitude, but was found in high abundance in two sampling sites in the 347 'southern/shallow' area of the North Sea (S156, 47%; S130, 29%), and 20% read abundance 348 in the northernmost sampling site (figure 5b). Glycera lapidum sp. 1 was commonly found 349 with less than 1% of reads, except in the two southern sampling sites S176 (54.41°N) and 350 S130 (55.62°N), where the species occurred with >50% read abundance. No significant 351 correlation of read abundance and latitude was found (figure 5c). For Magelona johnstoni, 352 the highest relative abundances were found in 'southern/shallow' sites (S203, 63%; 353 respectively 32% in S156). Read abundance at all other sampling sites was <1% (figure 5d) 354 (All statistical results: Table S5 numbers. However, our results show that the abundance data inferred through 387 metabarcoding mostly matches with the known distribution of species described in previous 388 studies, thus underlining that metabarcoding is a potentially powerful tool for monitoring of 389 zooplankton communities. The highly degenerate Leray XT primers used in this study 390 amplified a wide range of planktonic taxa. A high percentage (96%) of reads could be 391 assigned to a taxonomic class, and only 1% of these reads were assigned to non-metazoan 392 taxa. This is in contrast to previous metabarcoding studies targeting highly diverse 393 communities, which highlighted that degenerate primers commonly co-amplify a high 394 number of non-metazoan taxa (Weigand & Macher, 2018;Wangensteen et al., 2018). We 395 assume that only little non-metazoan biomass was present in our samples, leading to 396 successful amplification of the target planktonic animals. However, only 75% of reads could 397 be assigned to species level, which highlights the lack of genetic reference data for marine 398 planktonic organisms. This can render molecular identification of species impossible (see 399 e.g. (Porter & Hajibabaei, 2018;Weigand et al., 2019)) and underlines the need for improved 400 reference databases, which can only be otained with the help of taxonomic experts. We 401 discuss the findings on copepods, ray-finned fishes and polychaetes in more detail below. was Oithona atlantica, which was found in low read abundances (1% of copepod reads). We 416 therefore see the comparison as legitimate. Microcalanus pusillus was the second most 417 abundant species in the metabarcoding dataset, but we could not compare our data with 418 CPR data. The species is likely overlooked by CPR sampling due to its distribution in deeper 419 water layers and its small size, which are not sampled by the CPR (Hays & Warner, 1993;420 Hays, 1994). Previous work identified M. pusillus as a mostly Atlantic species (Fransz et  AAO2968; AAJ1005). We further detected Pseudocalanus mimus, which is generally 440 considered a North Pacific species (Frost, 1989;Questel et al., 2016), and that is also where 441 the BOLD references (BIN AAH8134) stem from. However, a few records of the species 442 from between Canada and Greenland exist (http://www.iobis.org/) (Nelson, 2014). The species 443 has not previously been recorded from the North East Atlantic, which could mean that the 444 species was either overlooked, or that its distribution range has recently expanded. Of the 445 copepod species that were exclusively found in the southern sampling sites, all except 446 Fish larvae and eggs are part of the zooplankton for a limited time and their occurrence is 453 20 mostly influenced by spawning and nursery areas to which eggs and larvae drift (Knijn, 454 1993;Gibson, 2001;Gibson et al., 2015). We found that the inferred community composition 455 of ray-finned fishes strongly differed between northern and southern sampling sites in the 456 North Sea, and that our data corresponds to known distribution patterns of fish species and 457 their spawning areas. The three species exclusively found in the 'northern/deep' sampling 458 sites were the grey gurnard (Eutrigla gurnadus), the argentine (Argentina sphyraena) and 459 the slender snipe eel (Nemichthys scolopaceus). The grey gurnard and the argentine are 460 known to occur mostly in deeper, northern waters (Knijn, 1993 also known to spawn in these areas (Knijn, 1993). Further research will show if 481 22 Meißner & Darr, 2009). We consider it possible that the high abundance of these species in 510 a few sampling sites can be explained by local spawning events. Overall, our results show 511 the power of metabarcoding to assess the meroplanktonic polychaete community, but we 512 conclude that more combined molecular and morphological work is required to fully 513 understand distribution patterns of polychaete larvae. 514 515

Conclusion 516
We showed that metabarcoding of zooplankton samples from the North Sea, using highly 517 degenerate COI primers, can give valuable insights into the diversity and distribution of 518 planktonic animals. We found clear differences in the overall zooplankton assemblages 519 between northern and southern areas of the North Sea, as well as more specifically for 520 copepods, ray-finned fishes and polychaetes. Our results were largely congruent with 521 previous studies based on morphological identifications, which indicates the robustness of 522 our molecular approach. Nevertheless, we highlight the need for more complete reference 523 databases to be able to make full use of the information gained through metabarcoding. We 524 suggest that metabarcoding should be considered for implementation into future biodiversity 525 assessments, as the ability to quickly assess whole zooplankton samples is valuable for 526 biodiversity studies in times of rapid ocean changes. 527 528 Acknowledgements 529 We thank the Pelagia crew for help during the NICO leg 10 cruise and Rob Witbaard for 530 organising and leading the NICO leg 10. We thank Elza Duijm for support in the lab.