Abstract
Biodiversity monitoring in aquatic ecosystems is challenging as it requires taxonomic expertise and is difficult to automate. One expedient approach is the use of environmental DNA (eDNA) surveys to infer species incidence from nucleic acids found in environmental samples. An advantage of eDNA surveys is that they allow rapid detection of non-indigenous species (NIS), lessening the time from introduction to discovery and increasing the likelihood of successful control and / or eradication. Despite this, relatively few studies have compared eDNA and traditional surveys for the study of NIS, or examined the differences between metabarcoding of different environmental sample types.
We evaluated the ability of eDNA to detect a broad range of native and NIS in urban coastal environments, and compared the results with previously published traditional surveys. We collected water and sediment samples from the same sites and then performed eDNA metabarcoding of 18S rRNA and COI genes.
We found very different patterns of biodiversity in water and sediment samples; with sediment containing more than two times the number of operational taxonomic units (OTUs) than seawater in some cases. The findings showed the presence of as much variation in assemblage diversity among environmental sample types as amongst geographically-segregated sampling sites. Additionally, species detection within phyla was not consistent in water or sediment samples, indicating that at a broad scale sample type does not perfectly predict taxa detected.
We found almost perfect agreement in species detection from eDNA and traditional surveys. Additionally, eDNA metabarcoding detected three previously undocumented species introductions. Finally, our work provided a novel high-resolution biodiversity dataset for urban marine environments.
Synthesis and applications Our study showed that the type of eDNA sample dramatically affects the detected biodiversity and that eDNA metabarcoding is accurate for the detection of notorious NIS. Natural resource managers, ecological practitioners and researchers should consider the benefits of integrating molecular tools such as eDNA into applied ecology.
Introduction
Anthropogenic activities are causing a global decrease in biodiversity (Sala and Knowlton 2006; Butchart et al. 2010) that negatively affects ecosystem services and function (Worm et al. 2006). Such impacts create an urgent need for tools that rapidly and accurately monitor species diversity. Traditional biodiversity surveys have been criticised for their lack of standardisation and taxonomic resolution (Oliver & Beattie 1993; Fitzpatrick et al. 2009). One approach that has the potential to overcome limitations of traditional surveys is the use of nucleic acids found in environmental samples, such as water or sediment, to infer presence or absence of living organisms in the local environment. Such genetic material, known as environmental DNA (hereafter eDNA), is a poly-disperse mixture of tissue, cells, subcellular fragments and extracellular DNA lost to the environment by organisms (Turner et al. 2014; Sassoubre et al. 2016). Studies using eDNA focus on detection from water samples using targeted (single species) methods such as qPCR (Dougherty et al. 2016; Simpson et al. 2017; Wood et al. 2017; Kim et al. 2018), or community (multi-species) methods such as metabarcoding (Borrell et al. 2017; Grey et al. 2018; Lacoursière-Roussel et al. 2018). Samples can be collected with minimal training and once the methodology is optimised, surveys are highly amenable to automation (McQuillan & Robidart 2017). Thus, eDNA surveys are highly informative (but see specific considerations to ensure validity, Goldberg et al. 2016) and can complement traditional methods (Deiner et al. 2017). Recent work has identified a vast range of viable protocols for the collection, extraction and detection of target nucleic acids from different environmental samples (Deiner et al., 2018; Spens et al. 2016; Sellers et al. 2018). Despite the finding that sediment has been shown to harbour 8-1800 times more eDNA compared to water samples in freshwater ecosystems (Turner, Uy & Everhart 2015) relatively few eDNA studies incorporate multiple environmental sample types (Shaw et al. 2016). Furthermore, no work to date has compared species richness estimates derived from metazoan target eDNA metabarcoding between seawater and marine sediment samples.
Non-indigenous species (NIS) are those that have been transported from their native range through human action into a novel geographic location. The impacts of NIS are well documented and they can pose a severe threat to natural ecological systems, agriculture, biodiversity and human health (Bax et al. 2003; Lovell, Stone & Fernandez 2006; Ricciardi et al. 2013; Mazza et al. 2014). Most marine NIS have spread globally via vectors such as transoceanic shipping, aquaculture, the construction of canals interconnecting large water bodies and capture aquarium industry (Molnar et al. 2008; Nunes et al. 2014). At fine (10s of km) geographical scales, other vectors such as recreational boating significantly enhance the spread and subsequent impact of NIS (Clarke Murray et al. 2011). Along coastal areas, NIS studies have highlighted the importance of monitoring marinas and harbours (Ashton et al. 2006), as these are hotspots of NIS. In these habitats, sessile marine NIS often outcompete native species and dominate artificial hard substrata (Glasby et al. 2007; Dafforn et al. 2009). Marinas and harbours have distinct ecological and physico-chemical conditions compared to the surrounding natural environment (Rivero et al. 2013; Foster et al. 2016). Consequently, there is a need for specific sampling and surveying protocols to study both native and NIS in these ecologically distinct environments.
The detection of NIS has been performed traditionally by surveys standardised by time or by reaching a species discovery asymptote (Ashton et al. 2006; Campbell, Gould & Hewitt 2007; Bishop et al. 2015). Moreover, traditional NIS surveys require expert taxonomic skills that are inherently time consuming and are not typically amenable to automation (Darling et al. 2017). In addition, the results of such surveys reflect only species that are being targeted at the time, with no ability to retrospectively separate erroneously grouped species in light of new discoveries.
This is particularly important considering that between 9,000-35,000 marine species (2.7% of the total number of estimated marine species) are cryptic (i.e., morphologically similar but genetically distinct) (Appeltans et al. 2012). Indeed, many common global sessile marine invaders are morphologically indistinguishable and contain cryptic lineages as revealed by genetic studies, highlighting the need for the use of genetic tools to accurately assess species invasions (Pérez-Portela et al. 2013; Rius et al. 2017). The use of eDNA as monitoring tool of NIS has great potential, mainly because it tackles limitations of the existing biodiversity monitoring tools.
Here we tested the efficacy of eDNA metabarcoding for exploring the biodiversity of several distinct marinas. We first documented the differences in both alpha and beta diversity of different eDNA sample types (seawater and sediment). We then compared the eDNA metabarcoding results with traditional methods and identified a number of key NIS that are both introduced in the study region and / or elsewhere. Subsequently, we identified a number of previously unrecorded NIS in the study region. Finally we discuss the strengths and weaknesses of eDNA metabarcoding for detecting marine NIS.
Methods
Study sites
Four marinas were selected from around the United Kingdom to represent variation in modelled invasion potential (Pearce, Peeler & Stebbing 2012), known NIS present (Bishop et al. 2015) and surrounding benthic habitat type (Calewaert et al. 2016). Importantly, all chosen marinas have been surveyed previously for NIS and so there is a good understanding of the expected NIS in these areas (Wood, Bishop & Yunnie 2015a; b; Wood et al. 2016). Marina access was contingent on anonymity and so marina names and exact locations are not presented, Fig.1a shows approximate locations. Marina TQ is an open marina subject to tides and varying salinity located in Southampton Water on the north coast of the English Channel. Marina PQ is a loch marina open during high tide to the Bristol Channel and the Celtic Sea. Marina TB is located at the mouth of the River Blackwater open to the North Sea. Marina HH is located on the Isle of Anglesey and is open to the Celtic Sea.
Environmental DNA sampling
Surveys were conducted during May 2017. A total of 24 sampling points were randomly selected within each site. At each point 50ml of water was collected from 10cm below the surface using a sterile 60ml luer lock syringe and filtered through a 0.22mm polyethersulfone Sterivex filter (Merck Millipore, Massachusetts USA). After collecting seawater from eight locations (400ml total volume) the filter was changed, resulting in a total of three filters per site. To test the effect of different sample preservation methods sampling was performed in duplicate. One set of three filters had ~1.5ml sterile Longmire’s solution (100mM Tris,10mM EDTA, 10mM NaCl, 0.5% SDS) applied in the inlet valve (Renshaw et al. 2015). The second set of three filters were kept on ice for no longer than eight hours before being frozen at −20°C. During the surveys, a sediment sample was collected at the start and then after every 3rd water sample, for a total of 9 per site. We used a UWITEC Corer (UWITEC, Mondsee, Austria) to collect a sediment core (600mm tall × 60mm diameter). Using a sterile disposable spatula, a subsample of 10-20g of sediment was taken from the top 2cm of the core, taking care to avoid sampling the sides of the core. The subsample was stored in a sterile plastic zip container and kept on ice for no longer than eight hours before begin frozen at −80°C. Due to equipment malfunction no sediment sample could be taken for Site HH. Disposable gloves were changed after collection of each sample. All reused equipment was washed thoroughly and soaked in 10% bleach between sites, before rinsing in DNAse-free sterile water.
eDNA extraction
DNA extractions were performed in a PCR-free cleanroom, separate from main laboratory facilities. No high copy templates, cultures or amplicons were permitted in this sterile laboratory. DNA extractions from water samples followed Spens et al. (2016) using the SXCAPSULE method. Briefly, preservative solution was removed from the outlet and filters were dried at room temperature for two hours, 720μl Qiagen buffer ATL (Qiagen, Hilden, Germany) and 80μl Proteinase K was added to the filter and all samples were digested overnight at 56°C. After digestion, samples were processed using the Qiagen DNeasy Blood and Tissue Kit as per manufacturer instructions. The final elution was 200μl PCR grade water.
Sediment extractions were performed using the Qiagen DNeasy Powermax Soil Kit using the manufacturer recommended protocol. For each site the nine samples were mixed to form three pooled samples; for each extraction, 10g of pooled sample was processed. A total of ten samples were processed, three each per site with a single no blank, as per manufacturer’s instructions.
Inhibition testing
To ensure extracted DNA was free of PCR inhibitors a Primer Design Real-Time PCR Internal Control Kit (PrimerDesign, Southampton, UK) was used. qPCR reactions were performed following the manufacturer’s protocol. Inhibition due to co-purified compounds from DNA extraction protocols would produce an increase in cycle threshold number in comparison to no template controls. All samples were successfully processed and no samples showed indication of PCR inhibition.
Primer selection & library preparation
Two sets of primers were chosen for metabarcoding the environmental samples: a 313bp section of the standard DNA barcoding region of the cytochrome c oxidase subunit I gene (COI) using primers described in Leray et al. (2013); and a variable length target of the hypervariable v4 region of the nuclear small subunit ribosomal DNA (18S) using primers from Zhan et al. (2013). Sequencing libraries were prepared using a 2-step PCR approach as detailed in Bista et al. (2017), this method amplifies the target region in PCR 1 annealing universal adapters onto which sample specific indices and sequencing primers are annealed in PCR 2. In contrast to Bista et al. (2017) we used unique dual-matched indexes for PCR 2 to avoid index crosstalk associated with combinatorial indexing (MacConaill et al. 2018). PCR 1 was prepared in a PCR-free room, separate from main laboratory facilities. PCR conditions and reaction volumes are detailed in Supplementary Information 1. Blank filters, DNA extraction kits and positive controls where collected, extracted and sequenced as the experimental treatments (detailed in Supplementary Information 1). Samples were pooled at an equimolar ratio and sequenced using the Illumina MiSeq instrument (Illumina, San Diego, USA) with a V3 2 × 300bp kit.
Bioinformatic analyses
Samples were demultiplexed using the Illumina MiSeq control software (v 2.6.2.1). The demultiplexed data was analysed using a custom pipeline written in the R programming language (R Core Team 2018) hosted at https://github.com/leholman/metabarTOAD, the steps are as follows. Forward and reverse paired end reads were merged using the-fastq_mergepairs option of USEARCH v.10.0.240 (Edgar 2013) with maximum difference of 15, percent identity of 80% and quality filter set at maximum expected errors of 1. Both the forward and reverse primer sequences were matched using Cutadapt v.1.16 (Martin 2011) and only sequences containing both primer regions were retained. Sequences were discarded if they were outside of a defined length boundary (303-323bp for COI, 375-450bp for 18S) using Cutadapt. Sequences were then pooled, singletons were discarded and sequences were quality filtered with a maximum expected error of 1 using the-fastq_filter option of vsearch v.2.4.3 (Rognes et al. 2016). Sequences were then denoised and chimeras filtered using the unoise3 algorithm implemented in USEARCH. The resultant OTUs were curated using the LULU package v.0.1.0 in R (Frøslev et al. 2017). An OTU by sample table was produced by mapping the merged and trimmed reads against the curated OTUs, reporting a single best hit within 97% of the query sequence. The OTU × sample table was filtered in R as follows. To minimise the chance of spurious OTUs being included in the final dataset any record with less than 3 raw reads were changed to zero and any OTU that did not appear in more than one sample was removed from the analysis. OTUs found in negative controls were removed from the analysis.
Taxonomic assignment
Assigning correct taxonomy to an unknown set of marine sequences can be challenging as large databases require vast computational resources for query matching; many databases contain errors and the taxonomy of some marine groups is uncertain. With such limitations in mind, we assigned taxonomy using a BLAST v.2.6.0+ search (Camacho et al. 2009) returning the single best hit from databases within 97% of the query. The MIDORI database (UNIQUE_20180221) (Machida et al. 2017) was used for the COI data and the SILVA database (SSU r132) (Quast et al. 2013) was used for the 18S rRNA data. The match taxa tool from the World Register of Marine Species (WoRMS) (WoRMS Editorial Board 2018) was used to filter the data for marine species and check the classification. Remaining annotations were checked against the World Register of Introduced Marine Species (WRIMS) (Ahyong et al. 2018) to determine non-indigenous status.
Statistical analyses
All statistical analyses were conducted in R v3.5.0. The Vegan R package v.2.5.2 (Oksanen et al. 2011) was used to rarefy samples to the minimum sample read depth for each amplicon. The number of OTUs per site/condition was calculated as the number of OTUs with a non-zero number of normalized reads after summing the reads across all three site level replicates. To test if there was a significant difference between the number of OTUs generated by sediment and water eDNA, individual non-summed replicate sample data was used to build a linear regression model following the formula number_of_OTUs~sedimentorwater*site implemented in R using the function lm(). Non-metric multidimensional scaling ordination plots were generated from Bray-Curtis dissimilarity values derived using Vegan. A Permutation Analysis of Variance (PERMANOVA) (Balakrishnan et al. 2014) was performed using the Bray Curtis dissimilarity following the model dissimlairty_matrix~sedimentorwater*site implemented in R using the function adonis from the vegan package. OTUs with taxonomic assignment were separated into those found in sediment, water or both media and the OTUs were then collapsed at the Phylum level to explore taxonomic patterns of detection in water or sediment. Phyla with less than eight OTUs were combined. To test for non-random counts of species detection between water and sediment within taxa an exact binomial test was performed between counts of species detected in water and sediment. Half the number of counts for species detected in both water and sediment were added to water and sediment, with non-integer values conservatively rounded up to the nearest whole number. A Bonferroni correction for multiple comparisons was applied across the p values from the exact binomial tests. Records from manual surveys previously conducted for non-native invertebrates at the sample sites (Wood, Bishop & Yunnie 2015a; b; Wood et al. 2016) were compared with the detected species from metabarcoding data.
Results
Raw sequencing results & OTU generation
Sequencing produced a total of 17.8 million paired end reads, with 15.2 million sequences remaining after paired end read merging and quality filtering. The average number of sequences per sample after filtering (excluding control samples) was 200,185 ± 64,019 (s.d). No template control samples contained an average of 811 ± 3,402 (s.d) sequences. One control sample contained ~15,000 sequences that mapped to an operational taxonomic unit (OTU) that had 100% identity match to a sequence of a terrestrial fungi (Genbank: FJ804151.1), excluding this OTU gives an average of 51 ± 94 (s.d) sequences per no-template control sample. Denoising produced 8,069 OTUs for COI and 2,433 for 18S with 6,435 and 1,679 remaining respectively after LULU curation. Taxonomic annotation identified 395 OTUs from the 18S rRNA dataset against the SILVA database and 219 OTUs from the COI dataset against the MIDORI database. Taxonomic data from WoRMS could be retrieved for 204 of the annotated COI OTUs and 138 of the 18S OTUs.
Biodiversity detection
The effect of different water eDNA sample preservation techniques differed between the target amplicons. The 18S rRNA amplicon produced significantly more OTUs in samples preserved by freezing compared to Longmire’s preservation method, while the COI amplicon showed no significant difference between preservation treatments (see Supplementary Information 2 for details). As a conservative approach all subsequent analyses used sample data from frozen samples. The minimum number of reads per sample was 137,624 for the COI dataset and 117,915 for the 18S dataset and so samples were rarefied to this number of reads. More OTUs in total were detected in the sediment samples compared to the water samples across all sites and both markers as shown in Figure 1b,d. In all cases, both water and sediment samples detected unique OTUs but the mean proportion of unique OTUs detected in water was lower (49.2%) in comparison to sediment (73.8%). A 2-way ANOVA testing the effect of eDNA type of environmental sample on number of OTUs generated indicated a significant effect (p<0.001) of sample type for both 18S rRNA and COI (See Supplementary Information 3 for full model output). Ordination plots of Bray-Curtis dissimilarity (Fig. 1c,e) showed that OTUs in eDNA found in sediment and water differ in community structure as much as among sites in ordination space. Additionally, the PERMANOVA model indicated highly significant differences (p<0.001) between sites and eDNA medium in both the 18S rRNA and COI datasets. Furthermore, eDNA detection medium in the PERMANOVA model explained 23.2% and 32.5% of the variation in the 18S and COI data respectively, while the site explained 34.2% and 30.5% in the COI and 18S rRNA data (See Supplementary Information 4 for full model output). At phylum level (Figure 2), taxonomy does not perfectly predict medium of detection, however a binomial goodness of fit test shows non-random detection proportions in the Nematoda (Bonferroni corrected p=0.005), with eDNA detections mostly in sediment.
Detection of non-indigenous species
In total 16 NIS to the study region and 30 species documented as NIS in other countries were detected across the four sites (see Supplementary Table 1 for full list). Out of the detected NIS seven were present in the list of 21 NIS previously detected in manual non-native invertebrate surveys at the sites. As shown in Fig. 3 the results of the eDNA surveys closely matched those of the manual survey results. Only a single detection differed from the manual surveys, an eDNA false-negative detection of Bugula neritina at Site HH. Remapping of the cleaned reads to the Bugula neritina COI region (Genbank Accession: KY235450.1) indicated that 5 reads from a single replicate corresponded with Bugula neritina. These reads were lost during data filtering and so did not feature in the final dataset. A detection of note was 199 reads from the sediment of Site TQ mapping to an OTU corresponding with Arcuatula senhousia (Asian Date Mussel), a previously undocumented NIS for the UK. Targeted visual surveys on tidal mudflats within two kilometres of Marina TQ confirmed the presence of this species in proximity to the sampling site.
Furthermore, COI sequences generated from these physical samples (Genbank Accession: MH924820 and MH924821) matched to known A. senhousia sequences confirming the eDNA detection (see Supplementary Information 5 for details of DNA barcoding). Additionally, the nematode Cephalothrix simula and the oligochaete Paranais frici were also detected using eDNA at site TQ. Both are new species introductions to the United Kingdom, previously undocumented in academic literature.
Discussion
We demonstrated that very different community composition can be detected in seawater and sediment samples, affirming the importance of integrating multiple sample types for obtaining a full assessment of community composition. We also found that eDNA metabarcoding shows excellent concordance with traditional methods for the detection of NIS. Finally, we demonstrated that eDNA metabarcoding can detect novel species introductions. This study emphasises the need to have a thorough understanding of the effects of environmental sample types both at the level of whole community and for specific species of concern. eDNA metabarcoding continues to be used in both regional and global biodiversity surveys.
The majority of research using eDNA to detect aquatic macrofauna is based on the collection of water samples, while sediment samples have received comparatively less attention. We found dramatic differences in species richness in sediment and water samples, observing a consistently greater number of OTUs detected in sediment compared to water. Shaw et al. (2016) found that sediment 12S rRNA metabarcoding detected fewer fish compared to water in a freshwater lotic environment. Our results indicated a similar trend when considering only fish species; with more fish being detected in seawater samples compared to sediment samples (5 in water, 2 in sediment and water), but the opposite when considering all OTUs. More broadly, taxonomy at the level of phylum did not predict if a species will be detected in water, sediment or both sample types (except the Nematoda, whose members are predominantly benthic inhabitants). It is likely, as seen in the case of fish above, that at lower taxonomic levels the species-specific ecology of eDNA (sensu Barnes & Turner 2015) will result in convergent eDNA occupancy in different sample types. However, further work is needed to clarify how eDNA partitions into adjacent environmental samples across the tree of life. Our study showed that at the level of phyla detection was not significantly different between sediment and water for most taxa. Similarly, we showed that for most NIS both water or sediment samples served as an excellent media for detection.
Current eDNA metabarcoding research has identified large variation in the detected diversity across small spatial scales in both sediment (Nascimento et al. 2018) and seawater (O’Donnell et al. 2017). Additionally fractionation of environmental samples (i.e. sorting samples by particle size class) can produce significant differences in the metabarcoding results between fractions (Wangensteen et al. 2018a; b) indicating that reliable variation can be discovered at a scale of site. Here we found similar patterns, with site and eDNA sample type containing approximately equivalent OTU biodiversity. Future research should explore how eDNA extracted using different sample types and extraction methods affects the detection of biodiversity, especially as eDNA metabarcoding moves from an experimental technique to a routine monitoring tool (Pawlowski et al. 2018; Aylagas et al. 2018).
We found that eDNA metabarcoding of water samples accurately detects many NIS species, as seen in previous work (Borrell et al. 2017; Grey et al. 2018; Lacoursière-Roussel et al. 2018). Additionally, we identified the presence of NIS in sediment samples for the first time. In comparing these data to those collected using traditional methods we found almost perfect agreement in NIS presence. The single false-negative was found to be a result of bioinformatic parameters, identifying that choices made during sequence processing can have an effect on the detectability of species in eDNA samples. Indeed, this has previously been shown in metabarcoding of bulk samples (Scott et al. 2018) and work is urgently needed to determine optimal bioinformatic parameters, the effects of primer binding sites and the role of DNA barcodes in reference databases for the detection of NIS in eDNA samples. It is therefore important to combine eDNA metabarcoding with traditional surveys where possible, as both methods provide reciprocal validation data (e.g. important NIS may be missed using molecular techniques or eDNA metabarcoding detects rare species that often missed by traditional surveys). We identified several NIS currently unrecorded in the UK and confirmed the detection with targeted local surveys to confirm the presence of A. senhousia. The case of A. senhousia is particularly relevant when evaluating eDNA metabarcoding for NIS detection as the species is spreading globally (Bachelet et al. 2009), has the potential to dramatically alter benthic biodiversity (Crooks 2001; Mistri 2003) and may have long remained undetected. Future research should evaluate the sensitivity of eDNA metabarcoding for the detection of novel species introductions, which may allow an unprecedented level of accuracy during early stages of invasion. The incorporation of autonomous sampling (McQuillan & Robidart 2017) and eDNA biobanking (Jarman, Berry & Bunce 2018) could also help the rapid detection of NIS by improving DNA reference databases for specific geographical regions that have high biosecurity risk, providing an invaluable resource for biodiversity managers and researchers alike.
Author Contributions
L.E.H. and M.R. designed the experiment, L.E.H. collected samples, generated and analysed the data, designed all figures and wrote the first draft of the paper. All authors contributed critically to the drafts and gave final approval for publication.
Data accessibility
Raw Illumina sequencing data is available from the European Nucleotide Archive under accession number (pending).
Associated metadata, script and intermediate files can be found on GitHub with the following DOI: 10.5281/zenodo.1453959.
Acknowledgements
We are grateful to the Bishop Lab (Marine Biological Association) and affiliates for excellent NIS survey data and for sharing information on marinas. We acknowledge staff in the Environmental Sequencing Facility at the National Oceanography Centre Southampton for advice and assistance during library preparation. We acknowledge the Department of Geography & Environment from the University of Southampton for access to coring equipment and lab space. LH was supported by the Natural Environmental Research Council (grant number NE/L002531/1).