Abstract
Background In light of the biodiversity crisis and our limited ability to explain variation in biodiversity, it is time to rethink the way we study biodiversity and its causes. Inspired by the recently published ecospace framework, we developed a protocol for environmental and biotic mapping that is scalable to habitats, ecosystems and biomes. We applied our protocol as part of a comprehensive biodiversity study in Denmark. We selected study sites (40 × 40 m) using stratified random sampling along the major environmental gradients underlying biotic variation. Using standard methods, we collected vascular plant, bryophyte, macrofungi, lichen, gastropod and arthropod species lists for each site. To evaluate sampling efficiency, we calculated regional coverage (relative to the number of species known from Denmark per taxonomic group), and project scale coverage (i.e., based on the sample coverage per taxonomic groups). To cover eukaryotic organisms that are less easily targeted by classical inventories (e.g., nematodes, “non-fruiting” fungi) we collected soil samples for environmental DNA analyses. Finally, to assess site conditions, we conducted a comprehensive mapping of abiotic conditions (position), biotic resources (expansion of organic carbon) and habitat continuity (spatial and temporal).
Results The 130 study sites covered 0.0005% of the Danish terrestrial area (~42,500 km2). We found 2040 species of macrofungi (62% of the Danish fungal pool), 663 vascular plant species (42%), 254 bryophyte species (41%) and 202 lichen species (20%). For invertebrates, we observed 334 spider species (59%), 126 carabid beetle species (38%) and 105 hoverfly species (36%). Overall, sample coverage was high across taxonomic groups, indicating that 130 sites were sufficient to represent the variation in biodiversity across Denmark. This inventory is unprecedented in detail and resulted in the discovery of 150 species with no previous record for Denmark. Comparison of soil DNA with observed plants was both strong and confirmative for a recovery of plant biota by soil-derived DNA.
Conclusions We successfully covered the majority of targeted biodiversity across Denmark using an approach that includes habitat coverage, multi-taxon biodiversity assessment, and ecospace mapping. Our approach can be readily applied to assess biodiversity for other ecoregions.
Background
The vast number of species on Earth have yet to be described, challenging our understanding of biodiversity [1]. For a deeper understanding of what determines the distribution of species across the planet, comprehensive data on species occurrence and environmental conditions are required. While some progress has been made in understanding the distribution of biodiversity at coarse spatial resolution, our knowledge of biodiversity at high spatial resolution is deficient [2]. In this study we consider biodiversity as the richness and turnover of taxonomic units, whether species or operational taxonomic units (OTUs) derived by meta-barcoding. Further, models of biodiversity for less well-known, but mega-diverse groups such as fungi and insects are almost non-existent across spatial scales [1]. Compared to targeted and systematic monitoring based on well-defined a priori hypotheses, surveillance data are often biased [e.g. temporal, spatial, taxonomic bias, 3] and therefore less appropriate and efficient for conservation management [4].
Recent developments in molecular techniques – in particular the extraction and sequencing of environmental DNA (eDNA) – hold the promise of more time-efficient sampling and identification of species [5, 6]. Further, eDNA enables the exploration of communities and organisms not easily recorded by traditional biodiversity assessment, such as soil-dwelling nematodes [7]. In fact, PCR-based methods combined with DNA sequencing have already provided valuable insight in the taxonomic diversity within complex environmental samples, such as soil [8–10] and water [e.g. 11, 12]. Due to the ongoing rapid development in DNA sequencing technology, with the emergence of next generation sequencing (NGS) techniques – generating billions of DNA sequences [13] - an environmental sample could now potentially be analyzed to a depth, which gives an almost exhaustive picture of the species composition at the site of collection. However promising, assessment of entire organismal communities from eDNA samples is still in its early stages [6, 12, 14]. To assess the suitability and potential of eDNA data in complementing - or even replacing - traditional field survey data, tests on comprehensive data sets need to be done.
In sampling design, the ecospace framework [15] was followed. Thus, we aimed at a systematic sampling of the major aspects of environmental conditions (position), biotic resources (expansion) and spatio-temporal extent of biotopes (continuity). Environmental conditions and local processes may be considered a template shaping local biodiversity (e.g. through environmental filtering) [16, 17]. This aspect is reflected in ecospace position of sampled biotopes in abiotic environmental space. In addition to the physico-chemical conditions shaping abiotic gradients, particularly important to autotrophic organisms, we considered the presence and abundance of specific carbon resources, crucial to heterotrophic organisms, such as specialist herbivores, detritivores and saproxylic species [18]. The recording of organic carbon resources and structures, e.g. dead wood, dung and carcasses, is not often included in community studies, although the limited knowledge in the area [15, 19] speaks for further studies. Spatial and temporal processes at regional extent, such as extinction, speciation and migration, shape species pools and thereby set the limits to local richness and species composition [16, 17, 20]. In order to improve our understanding of biodiversity patterns, local and regional factors should be considered concurrently [17, 21]. Thus, in our study, we recorded spatial and temporal continuity of the local biotopes.
The aim of this study was to describe a comprehensive biodiversity monitoring protocol and evaluate its efficacy in describing environmental gradient variation and biodiversity across a large region. Here, we present the protocols we used in the project called Biowide to systematically and comprehensively map regional biodiversity and environmental heterogeneity across Denmark - an area of limited geographical extent and with a relatively homogeneous climate. We aimed to cover all of the major environmental gradients, including natural variation in moisture, soil fertility and succession, as well as habitats under cultivation. Within this environmental space, we performed a systematic and comprehensive sampling of the environment and biodiversity. We combined traditional species observation and identification with modern methods of biodiversity mapping in the form of sequencing of eDNA soil samples.
Methods
Study area and site selection
We aimed to characterize biodiversity across the country of Denmark (Fig. 1a); a lowland area of 42,500 km2 and an elevational range below 200 m. While there are some limestone and chalk outcrops, there is no exposed bedrock in the investigated area. Soil texture ranges from coarse sands to heavy clay and organic soils of various origins [22]. Land-use is dominated by arable cultivation (61 %), most of it in annual rotation, while forest, most of which are plantations established in the 19th and 20th centuries, and scrub cover approx. 17 %, natural and semi-natural terrestrial habitats some 10 %, and freshwater lakes and streams 2 %. The remaining 10 % is made up of urban areas and infrastructure [23, 24].
When selecting sites, we considered major environmental gradients, the area we would use as a sampling unit, as well as practicalities of sampling. The selected observational unit was 40 × 40 m, which was a compromise between homogeneity and representativeness. We stratified site selection according to the identified major environmental gradients, including the intensity of human land use. We allocated 30 sites to cultivated habitats and 100 sites to natural and semi-natural habitats. The cultivated subset represented major land-use categories and the natural subset was stratified across natural gradients in soil fertility, soil moisture, and successional stage from sparsely vegetated to closed canopy forest, (Appendix A). We deliberately excluded linear features, such as hedgerows and road verges, urban areas with predominantly exotic plants as well as saline and aquatic habitats, but included temporarily inundated heath and dune depressions as well as wet mires.
The final set of 24 sampling strata consisted of six cultivated habitat types; three types of fields (rotational, grass leys, set aside) and three types of plantations (beech, oak, spruce). The remaining 18 natural strata constituted all factorial combinations of natural soil fertility (fertile and infertile), moisture (dry, moist and wet), and successional stage (low vegetation with bare soil, closed herb/scrub and forest) (Appendix A). These 24 strata were replicated in each of five geographical regions within Denmark (Fig. 1a). Finally, we included a subset of 10 sites placed within perceived hotspots for biodiversity in Denmark, selected subjectively by public polling among active natural history volunteers in the Danish nature conservation and nature management societies, but restricted so that each region held two hotspots. The result was 130 sites within 18 natural and 6 cultivated strata evenly distributed over the five geographic regions of Denmark (Table 1).
For the 18 natural habitat strata, site selection through stratified random sampling was guided by a large nation-wide data set of vegetation plots (n = 96,400 quadrats of 78.5 m2, www.naturdata.dk) from a national monitoring and mapping project [25] and in accordance with the EU Habitats Directive [26]. We used plant indicator values to identify environmental conditions to select potential site candidates for the targeted strata. First, we calculated plot mean values for Ellenberg indicator values based on vascular plants species lists [27] and Grime CSR-strategy allocations of recorded plants [28], the latter recoded into numeric values following Ejrnæs & Bruun [29]. We initially excluded saline and artificially fertilized habitats by excluding plots with Ellenberg S > 1 or Ellenberg N > 6. We then defined stratification categories as: fertile (Ellenberg N 3.5-6.0), infertile (Ellenberg N < 3.5), dry (Ellenberg F < 5.5), moist (Ellenberg F 5.5-7.0), wet (Ellenberg F > 7.0), early succession (Grime R > 4 and Ellenberg L > 7 or > 10 % of annual plants), late succession (mapped as forest), mid succession (remaining sites).
To reduce transport time and costs, all 26 sites within each region were grouped into three geographic clusters (Fig. 1a). The nested sampling design was also considered an opportunity to take spatially structured species distributions into account [30].
The procedure for site selection involved the following steps:
1) Designation of three geographic clusters within each region with the aim to cover all natural strata while a) keeping the cluster area below 200 km2 and b) ensuring high between-cluster dispersion in order to represent the geographic range of the region. In practice, hotspots were chosen first, then clusters were placed with reference to the highest ranking hotspots and in areas with a wide range of strata represented in our national monitoring plot data [NOVANA, 31].
2) Representing 24 strata in each region by selecting 8-9 potential sites in each cluster. Natural strata were selected from classified field-plot data whereas cultivated strata were assumed omnipresent and used as buffers in the process of completing the non-trivial task of finding all strata within three restricted cluster areas in each region.
3) Negotiating with land owners and, in case of disagreement, replacing the preferred site with an alternative site from the same stratum.
After each of the 130 sites were selected using available data we established each 40 × 40 m field site in a subjectively selected homogenous area that accounted for topography and vegetation structure. Each site was divided into four 20 × 20 m quadrants and from the center of each quadrant a 5 m radius circle (called a plot) was used as a sub-unit for data collection to supplement the data collected at site level (40 × 40 m) (Fig. 1b).
Collection of biodiversity data
For each of the 130 field inventory sites, we aimed to make an unbiased and representative assessment of the multi-taxon species richness. Data on vascular plants, bryophytes, lichens, macrofungi, arthropods and gastropods were collected using standard field inventory methods (Appendix B). For vascular plants, bryophytes and gastropods, we collected exhaustive species lists. For the remaining taxonomic groups that are more demanding to find, catch, and identify, we aimed to collect a reproducible and un-biased sample through a standardized level of effort (typically one hour). Each site was carefully examined for lichens and macrofungi assessing various substrates (soil, herbaceous debris, wood, stone surfaces and bark of trees up to 2 m). For fungi, we visited each site twice during the main fruiting season in 2014 – in August and early November - and once during the main fruiting season in 2015 – from late August to early October. Specimens that were not possible to identify with certainty in the field were sampled and, when possible, identified in the laboratory. For arthropod sampling, a standard set of pitfall traps (including meat-baited and dung-baited traps), yellow Möricke pan traps and Malaise traps were operated during a fixed period of the year. In addition, we used active search and collection methods, including sweep netting and beating as well as expert searches for plant gallers, miners and gastropods. Finally, we heat-extracted collembolas and oribatid mites from soil cores. Due to the limited size of the sites relative to the mobility of mammals, birds, reptiles and amphibians, data on these groups were not recorded. Taxonomic data will be transferred to the Global Biodiversity Information Facility (GBIF, http://www.gbif.org/) and specimens to the Natural History Museums, when the project ends in 2017. For further details on methods for collection of biodiversity data see Appendix B.
Collection of eDNA data
We used soil samples collected from all 130 sites for the eDNA inventory. At each site, we sampled soil cores in grids embedded in the 9 × 9 plots (81 soil cores per site) and pooled the collected samples after removal of coarse litter. We homogenized the soil by mixing with a drilling machine mounted with a mixing paddle. A subsample of soil was sampled from the homogenized sample and DNA was extracted for marker gene amplification and sequencing [14]. We chose the MiSeq platform by Illumina for DNA sequencing because, relative to other platforms (e.g., 454 b Roche), it produces 15 times the sequence output (approx. 15 000 000 reads). MiSeq is adapted to amplicon sequencing [32]. To our knowledge such comprehensive regional inventories of soil communities has not been carried out before.
For further details on methods for eDNA data and considerations on eDNA species richness and community composition measures see Appendix B.
Site environmental data
We have followed the suggestion in Brunbjerg et al. [15] to describe the fundamental requirements for biodiversity in terms of the ecospace (position, expansion and spatio-temporal continuity of the biotope).
Position
To assess the environmental variation across the 130 sites, we measured a core set of site factors that described the abiotic conditions at each site. Environmental recordings and estimates included soil pH, total soil carbon (C, g/m2), total soil nitrogen (N, g/m2) and total soil phosphorus (P, g/m2), soil moisture (% volumetric water content), leaf CNP (%), soil surface temperature (°C) and humidity (vapour pressure deficit), air temperature (°C) and light intensity (Lux). For further details on methods for collection of the abiotic data see Appendix B.
Expansion
We collected measurements that represented the organic C resources species consume as well as organic C structures that species can use as habitat. While many invertebrates are associated with other animals, in order to accomplish accurate sampling of the focal species we restricted our mapping of carbon space to the variation in live and dead plant tissue, including dung. We measured litter mass (g/m2), plant species richness, vegetation height (of herb layer, cm), cover of bare soil (%), bryophyte cover (%) and lichen cover (%), dead wood volume (m3/site), dominant herbs, the abundance of woody species, the number of woody plant individuals, flower density (basic distance abundance estimate, [33]), density of dung (basic distance abundance estimate), number of carcasses, fine woody debris density (basic distance abundance estimate), ant nest density (basic distance abundance estimate), boulders density and water puddle density (basic distance abundance estimate). For further details on methods for collection of expansion data see Appendix B.
Mapping of temporal and spatial continuity
For each site, we inspected a temporal sequence of aerial photos (from 1945 to 2014) and historical maps (1842-1945) starting with the most recent photo taken. We defined temporal continuity as the number of years since the most recent major and documented land use change. The year in which a change was identified was accepted as time for ‘break in continuity’. To estimate spatial continuity, we used ArcGIS to construct four buffers for each site (500 m, 1000 m, 2000 m, 5000 m). Within each buffer we estimated the amount of habitat similar to the site focal habitat by visual inspection of aerial photos with overlays representing nation-wide mapping of semi-natural habitat. For further details on methods for collection of continuity data see Appendix B.
Analyses
To illustrate the coverage of the three main gradients (moisture, fertility, and successional stage) spanned by the 130 sites, Ellenberg mean site values (mean of mean Ellenberg values for the four 5m radius quadrats within each site) for soil moisture (Ellenberg F), soil nutrients (Ellenberg N) and light conditions (Ellenberg L) were plotted relative to Ellenberg F, N and L values for a reference data set of 5m radius vegetation quadrats (47,202 from agricultural, semi-natural and natural open vegetation and 12,014 from forests (www.naturdata.dk) [25]. Mean Ellenberg values were only calculated for quadrats with more than five species and 95 percentile convex hull polygons where drawn for the reference data set as well as the Biowide data set. We assessed the regional coverage of species in the project, with reference to the number of known species from Denmark according to the taxonomic database Allearter (www.allearter.dk). This portal represents the most up-to-date list of species known from Denmark. Coverage (or sample completeness) was estimated for each taxonomic group across sites (Biowide coverage) as well as for each site individually (site coverage) for species groups with abundance data (Diptera, Coleoptera and Araneae) by comparing the number of species found to the estimated species richness of the sample using the iNEXT R-package [34].
To further evaluate how well we covered the environmental gradient for our inventory, we related community composition to the measured environmental variables (abiotic and biotic) based on a Nonmetric Multidimensional Scaling (NMDS) analyses in R v. 3.2.3 [35] using the vegan R-package [36] and the plant species × site matrix as well as the macrofungi species × site matrix. Abiotic and biotic variables were correlated with ordination axes to facilitate interpretation.
To illustrate and substantiate the adequacy of the eDNA sampling protocol and subsequent laboratory protocols, we correlated basic biodiversity measures of community composition (NMDS axes) and richness for plant eDNA (ITS2 marker region) with the same measures for our observed plant data (see Appendix B for detailed methods).
Results
The 130 sites were distributed in 15 clusters nested within five regions across Denmark (Fig. 1a). The measured variables differed according to the initial stratification of sites based on simple indicators (Table 1, Fig. 2a, b, ranges of measured variables in Appendix C). Managed sites (plantations and agricultural fields) revealed little variation in soil moisture (Fig. 2b). The Hotspots spanned the full variation of natural sites regarding fertility, moisture and successional stage (Fig 2b).
The selected 130 sites covered the main gradients reflected by a huge reference dataset from a national monitoring program (Fig. 3) as judged from a vegetation-based calibration of site conditions regarding moisture, fertility and succession (light intensity). Biowide data seemed to increase the upper range of the fertility gradient, which can be explained by the inclusion in Biowide of rotational fields that were not included in reference data (Fig. 2b, 3).
The environmental expansion of ecospace, which was measured as the amount and differentiation of organic carbon sources, varied among habitat types with high litter mass in plantations and late successional habitats, high plant species richness in early and mid-successional habitats, high dung density in open habitats (early successional and fields) and high amounts of dead wood in late successional habitats (Fig. 4). Spatial and temporal continuity varied for the 130 sites with less spatial continuity at larger buffer sizes (Fig. 5). The number of species found per site differed with taxonomic group with the highest number for macrofungi and lowest for bryophytes and lichens (Fig. 6).
We collected 2040 species of macrofungi (corresponding to 62 % of the number of macrofungi recorded in Denmark), 202 lichens (20 %), 663 vascular plants (42 %) and 254 bryophytes (41 %) during the monitoring period. We collected 75 species of gastropods (75 %), spiders (59 %), 105 hoverflies (36 %), 126 carabid beetles (38 %) and 203 galler and miner species (21 %). For all groups except macrofungi, the number of species found was highest in natural sites (90 sites of 130), but across taxonomic groups, plantations and agricultural fields harbored unique species – plantations were particularly important in harboring unique species of macrofungi (Table 2). The taxonomic sample coverage calculated by rarefaction within the 130 sites was high overall (range: 0.86-0.99), but highest for gastropods and spiders and lowest for gallers and miners (Table 2).
The inventory was unprecedented in detail and resulted in a total of 118 new macrofungi, 1 new lichen and 32 new invertebrate species (of which 12 were gallers and miners and 3 spiders) that had not previously been documented in Denmark (Table 2).
The NMDS ordination (3-dimensional, final stress = 0.102) accounted for 81 % of the variation in plant species composition and 72 % of the variation in macrofungal species composition (3-dimensional, final stress = 0.146). The major gradients in plant species composition of the 130 sites correlated strongly with soil fertility (NMDS axis 1 strong correlation with soil N, P and pH), successional stage (NMDS axis 2 strong correlation with light intensity and opposite correlation with litter mass and number of large trees) and soil moisture (NMDS axis 3 strong correlation with measured soil moisture), reflecting the gradients that the sites were selected to cover (Fig. 7, see correlation matrix for the rest of the environmental variables in Appendix D). Macrofungal species composition showed the same gradients, however succession and fertility swapped with succession as primary gradient (NMDS1) and fertility as secondary gradient (NMDS2). NMDS axis 3 reproduced a strong correlation with soil moisture.
Spearman Rho correlations between observational plant species richness and eDNA OTU ‘richness’ as well as observational plant community composition (as represented by NMDS axes 1-3) and eDNA OTU composition were both strong and confirmative for a recovery of plant diversity by soil-derived DNA (R2richness=0.652, R2composition=0.577-697, Fig. 8).
Discussion
Using ecospace as conceptual framework [15], we developed a protocol for mapping terrestrial biodiversity at a regional level and covering numerous, mega-diverse taxa. Across the 130 surveyed sites, covering a tiny fraction (0.0005 %) of the total area of Denmark, we observed a total of ~5 700 species, of which 150 represented new species records for the country, and 20-75 % of known regional species number of species depending on taxonomic group. Our data indicated that the sampling at 130 sites sufficiently covered the known local and regional environmental variation of Denmark and also delivered a good coverage of biodiversity at the spatial scale of sites – even for diverse groups of invertebrates and fungi. Finally, the study demonstrates that eDNA data, once properly curated (Frøslev et al. submitted), may be used as an important supplement to classical biodiversity monitoring.
Environmental filtering is an important process in community assembly, reflecting the prominent role of niche-differentiation in evolution [37]. The most obvious design principle for a biodiversity inventory is, therefore, to stratify sampling according to major abiotic and biotic environmental gradients [e.g. 38]. We found a close correspondence between the variation in average Ellenberg values at our sites and those extracted from a very large vegetation database comprising vascular plant species lists from a national monitoring program. This indicates an almost complete gradient coverage in our study and allows us to generalize relationships between environment and biodiversity derived from local measurements across gradients to a large spatial extent. Although the use of stratified random sampling implies a biased representation of rare and common environmental conditions, complete random sampling would have led to limited representation of natural biotopes and their disproportionate contribution to the total biodiversity may have been missed.
While the ecospace framework helped structure our sampling, it also proved challenging with respect to decisions about site area and homogeneity (related to ecospace position), recording of carbon resources (assessing ecospace expansion) and definition of temporal and spatial continuity. Ideally, abiotic and biotic conditions should be homogenous across a site in order to ensure that site measurements reflect the abiotic position and biotic expansion [15]. Thus, site area was a trade-off between homogeneity (small area) and representativeness (large area) and across long environmental gradients, homogeneity and representativeness may vary among for example, grassland, heathland, and forest. Similarly, while counting the number of different plant species is easy, accounting for the relative contribution of each species to total biomass and measuring the availability of different pools of wood, woody debris, litter, dung, flowers and seeds is much harder. Finally, spatial and temporal continuity is hard to quantify due to data limitations and because past soil tillage, fertilization, or other land management or disturbance regimes have not been recorded and must be inferred. In addition, an unambiguous definition of continuity breaks is impossible given that most land use changes and derived community turnover occur gradually over time. We estimated spatial continuity using broad habitat classes at a range of scales (500 m, 1000 m, 2000 m, 5000 m) acknowledging that the dependency on spatio-temporal continuity depend on the mobility, life history and habitat specificity of different species. Our estimates of temporal continuity were also limited by the availability of aerial photographs and maps, which while not perfect, is good relative to other parts of the world. Despite these constraints, our estimates of spatial and temporal continuity varied among sites and were uncorrelated, which allowed us to statistically test for their relative roles.
We aimed for equal sampling effort per site in terms of trapping and searching time. However, this was challenged by an array of practicalities. The preferred species sampling methods varied among taxonomic groups [39, 40] and despite our application of a suite of methods, including passive sampling in pitfall traps and Malaise traps, baited traps, soil core sampling and active search, our taxonomic coverage is still incomplete (e.g. aphids, Phorid flies and other species-rich groups living in the canopy are inevitably under-sampled). Our budget also forced us to be selective with the identification of the most difficult species groups, in particular within Hymenoptera and Diptera. Among identified groups, species coverage ranged between 20 and 75 % of all species known to Denmark, which is quite satisfactory. Invertebrate sampling and identification is extremely time consuming and relies on rare taxonomic expertise. We spent more than half of the inventory budget on invertebrate sampling and identification, and yet the site coverage remained modest in some sites across all invertebrate groups. Invertebrates constitute by far the largest fraction of the total biota and, for many species, the adult life stage is short-lived, highly mobile, and the range of active species varies with season [41, 42]. Trapping also implies a certain risk of suboptimal placement or vandalism by visiting humans, domestic livestock or wild scavengers. The resulting number of invertebrate species per site is relatively high and revealed a considerable variation, which gives ample opportunity for comparative analyses. Furthermore, the high coverage of invertebrates across the full 130 sites indicated that the variation in site conditions and biota was adequately sampled. The high number of new species for Denmark, particularly macrofungi, can most likely be attributed to the effort, but also to the inclusion of habitat types that would otherwise have been avoided or overlooked during voluntary monitoring. This underpins the qualitative differences between surveillance data and targeted monitoring [3].
Although methods for DNA extraction, amplification, sequencing and bioinformatic processing are continuously improved and may lead to better biodiversity metrics from environmental samples, collecting representative samples from larger areas with unevenly distributed species remains a challenge. We pooled and homogenized large amounts of soil, followed by extraction of intracellular as well as extracellular DNA, from a large subsample, to maximize diversity coverage within a manageable manual workload. Biodiversity metrics based on plant DNA were correlated to the same metrics for observational plant data. This indicates that the procedure for sampling, DNA extraction and amplification can be assumed to be adequate for achieving amplicon data to quantify variation in biodiversity across wide ecological and environmental gradients for plants, but most likely also for other organisms present in the soil. These methods are promising for biodiversity studies of many organism groups that are otherwise difficult to sample and identify (e.g. nematodes, fungi, protists, and arthropods). High throughput sequencing methods produce numerous errors [e.g., 43, 44] and it has been suggested that richness measures should be avoided altogether for HTS studies [45]. Despite the remaining challenge of relating genetic units to well-known taxonomic entities, our results along with those presented in Frøslev et al. (in review) indicate that reliable metrics of α-diversity and community composition are achievable. With respect to taxonomic annotation, reference databases are far from complete and the taxonomic annotation of reference sequences are often erroneous. Furthermore, for many groups of organisms, we have still only described and named a fraction of the actual species diversity, and the underlying genetic diversity within and between species is largely unknown for most taxa, leading to uncertainties in OTU (/species) delimitation and taxonomic assignment of sequence data. This also means that ecological interpretation of OTU/species assemblages assessed by eDNA is largely impossible as there is little ecological knowledge that can be linked to OTUs. Thus, for eDNA based biodiversity assessment to further mature, molecular biologists, ecologists and taxonomists need to work closely together to produce well-annotated reference databases and to relate unnamed OTUs to well-described ecospaces. Our environmental samples for eDNA, including soil and litter samples as well as extracted DNA will be preserved for the future. This material represents a unique resource for the further development of methods within ecology and eDNA. As more efficient technologies become available in the future, it will be possible to process this material at an affordable cost and derive further insights on the relationship between traditional species occurrence, OTU data and environmental variation.
Conclusion
We have presented a generic protocol to obtain a representative, un-biased sample of multi-taxon biodiversity stratified with respect to the major abiotic gradients. By testing and evaluating the protocol, we conclude that it is operational and that observed biodiversity variation may be accounted for by the measured abiotic and biotic variables. We believe that the ecospace concept, on which this protocol is developed, can be successfully up-scaled and applied to biodiversity studies at regional to continental scale. Despite the obvious advantages of eDNA data (economically and logistically), barcode reference libraries are as yet far too incomplete. Thus, combining classical taxonomic identification with eDNA sampling proves a promising approach for biodiversity science.
Additional files
Additional file 1: Appendix A: Site characteristics for each of the 130 40 × 40 m sites.
Additional file 2: Appendix B: Protocols for data collection.
Additional file 3: Appendix C: Ranges of environmental (abiotic and biotic) variables measured within the 130 sites as well as species richness of various taxonomic groups.
Additional file 4: Appendix D: Correlation matrix for NMDS axes 1, 2 and 3 and environmental variables
Authors’ contributions
AKB, HHB, AC, TGF, TL, MDDH and RE conceived and designed the study. AKB, HHB, LB, KF, TGF, IG, TL, GN, LS, US and RE conducted field work. AKB, RE and TGF analyzed the data and prepared the figures. LB, KF, IG, MDDH, TL, LS, US and HHB sorted and identified specimens. AKB, HHB, LB, ATC, KF, IG, MDDH, TTH, TGF, TL, GSN, LS, US and RE wrote the manuscript. All authors have read and approved the final version of the manuscript.
Acknowledgements
We thank Ako O. Mirza for plant and soil lab work, Vagn Alstrup and Roar Skovlund Poulsen for lichen monitoring, volunteers that have helped in data collection, land owners, Karl-Henrik Larsson for aid in identifying critical corticioid fungi, Leif Örstadius for identifying Psathyrella collections. In regard to identification of invertebrate specimens we would like to thank Henning Petersen for identifying Springtails (Collembola), Hjalte Kjærby for identifying Grasshoppers (Orthoptera) and Harvestmen (Opilliones), Kåre Fog for identifying snails (Gastropoda), Kåre Würtz Sørensen for identifying various Wasps (Symphyta, Spechidae, Crabronidae, and Vespidae), Lars Dyhrberg Bruun for identifying spiders (Aranea), Lars Brøndum for identifying Hoverflies (Syrphidae), Carrion Beetles (Silphidae) as well as Scarabs (Scarabidae), Lars Skipper for identifying True bugs (Heteroptera), Maja Møholt for identifying Dung beetles (Aphodius, Onthophagus and Geotrupidae) and Cantharidae, Marianne Graversen for identifying Longhorn beetles (Cerambycidae) and Ladybugs (Coccinellidae), Peter Wiberg-Larsen for identifying Caddisflies (Trichoptera), Mathias Holm for identifying True weevils and Seed weevils (Curculionidae and Apionidae), Monica Aimeé Harlund Oyre for identifying various Dipterans (Syrphida, Tachinadae, Stratiomyidae, Acroceridae, Rhagionidae, Tephritidae, Plastytomatidae, Asilidae) as well as Strepsipterans (Strepsiptera) and Book-and Barklice (Psocoptera), Morten D. D. Hansen for identifying Bees (Apoidea), Carrion beetles (Silphidae), Click beetles (Elateridae), Scarabs (Scarabaeidea) and Dung beetles (Aphodius, Onthophagus and Geotrupidae), Ole Fogh Nielsen for identifying net-winged insects (Neuroptera) and Strepsipterans (Strepsiptera), Oskar Liset Pryds Hansen and Emil Skovgaard Brandtoft for identifying Ground beetles (Carabidae), Sofie Amund Kjeldgaard and Steffen Kjeldgaard for identifying Owlet moths (Noctuidae), Mathias G. Skytte for identifying Rove beetles (Staphyllinidae), Simon Haarder for identifying galling and mining arthropods, and Ulrik Hasle Nielsen for identifying Cicadas (Cicadoidea) as well as numerous other volunteers. In regard to carrying out the eDNA lab work we would like to thank Anne Aagaard Lauridsen, Sarah Mak, Stine Raith Richter, Carlotta Pietroni and Ida Broman Nielsen.