Effects of species traits and ecosystem characteristics on species detection by eDNA metabarcoding in lake fish communities

Although environmental DNA (eDNA) metabarcoding is acknowledged to be an exceptionally useful and powerful tool for monitoring surveys, it has limited applicability, particularly for nationwide surveys. To evaluate the performance of eDNA metabarcoding in broad-scale monitoring, we examined the effects of species ecological/biological traits and ecosystem characteristics on species detection rates and the consequences for community analysis. We conducted eDNA metabarcoding on fish communities in 18 Japanese lakes on a country-wide scale. By comparing species records, we found that certain species traits, including body size, body shape, saltwater tolerance, and habitat preferences, influenced eDNA detection. We also found that the proportion of species detected decreased significantly with an increase in lake surface area, owing to an ecosystem-size effect on species detection. We conclude that species traits, including habitat preferences and body size, and ecosystem size should be taken into consideration when assessing the performance of eDNA metabarcoding in broad-scale monitoring.

Japan. 13 Department of Ecology and Environmental Sciences, Natural History Museum 25 and Institute, Aoba-cho, Chuo-ku, Chiba, Japan 26 27 † These authors contributed equally to this study. 28 29 Introduction 51 An ongoing global decline in biodiversity is beyond doubt and is particularly 52 pronounced in freshwater ecosystems [1][2][3] . Nevertheless, despite concerns regarding the 53 loss of local/global freshwater fish diversity 1,2,4 , there has been a recent tendency to 54 scale back on investments in monitoring and museum collections 5,6 . An understanding 55 of the prevailing status and trends of biodiversity provides a bedrock for ecological 56 research and conservation biology 1,2 , and in this regard, Matsuzaki et al. 7 highlighted 57 that long-term trends in local-scale fish diversity have not been comprehensively 58 monitored at broader scales (i.e., national or continental scales) and that performing 59 broad-scale surveys of fish communities is essential for planning fish management and 60 conservation measures 8 . However, the number of lake fish surveys conducted by 61 national and local governments have recently declined 5,6,9 . Given this undesirable trend, 62 it would be prudent to develop broad-scale survey methods for fish monitoring, which 63 could be employed in conducting national-level surveys and thereby provide a basis for 64 formulating conservation policies. 65 Recent advances in molecular ecology have seen the emergence of 66 environmental DNA (eDNA) analysis as a useful approach for investigating the 67 distribution and richness of aquatic and terrestrial organisms 10-16 , and high-throughput 68 parallel DNA sequencing has recently been applied to eDNA methods for simultaneous 69 detection of multiple taxa, known as eDNA metabarcoding [16][17][18][19][20][21]  In total, 119 fish taxa were detected based on eDNA metabarcoding, and all detected 125 taxa were identified to the species or genus level (Table S4, Figs. 1 and S3). We found 126 significant differences among the different sampling methods with respect to the 127 number of fish taxa detected using eDNA metabarcoding (nested ANOVA with LMM, 128 P < 0.001, Table S5 and Fig. S4). However, we also detected six species in 23 of the 129 negative and field controls. We obtained sequence reads from six species (Carassius 130 auratus, Chelon haematocheilus, Micropterus salmoides, Pagrus major, Rhinogobius 131 spp., and Tribolodon hakonensis, Table S4). Furthermore, five species found in the fish 132 records had lower sequence reads (sequences < 276) in the controls than in the samples. 133 In contrast, Pagrus major (Red seabream) was not found in the records and does not 134 inhabit the surveyed lakes. The number of sequence reads obtained for P. major 135 (sequences = 29 and 452) were found to be higher than those species in the samples 136 (sequences= 20 and 21). Thus, we assumed that the species DNA was contaminated and 137 excluded the species from further analyses. 138 We found differences in the number of fish taxa in the different lakes (Figs. 1 139 and S5), and the number of fish taxa detected using the Mix_cool and Mix_freeze 140 methods was significantly higher than those detected based on individual sampling (Fig.  141 2a, nested ANOVA with LMM, P < 0.001 and Tukey's multiple comparison, Table S5). 142 Nevertheless, we found that some species detected in the individual samples were not 143 detected in the Mix_cool and Mix_freeze samples (Table S4). 144

Percentage record covered 146
To evaluate the species detection performance of eDNA metabarcoding, we used 147 "percentage record covered (%)" as a measure to indicate the proportion of eDNA-148 detected species (number of species) matching those in the species records. We 149 accordingly found that the percentage record covered did not differ significantly among 150 the three sampling methods (Fig. 2b, nested ANOVA with LMM, P = 0.249, Table S5), 151 although values were found to differ among the surveyed lakes (Fig. S6). 152 We also compared the ecological/biological traits of species groups frequently 153 detected using eDNA metabarcoding or in the species records, which was evaluated 154 based indicator taxa analysis (Fig. 3). All the nested ANOVAs with linear mixed 155 models (LMMs) for the fish traits revealed significant differences between eDNA 156 metabarcoding and the species records, whereas no significant differences were detected 157 among the different sampling methods (Table S6). We found that species detected at a 158 higher frequency using eDNA metabarcoding tended to have shorter body lengths (Fig.  159 3a), mainly inhabit the benthopelagic zone (Fig. 3b), and are saltwater tolerant (Fig. 3c). 160 With respect to lateral body shape types, smaller proportions of eel-like species, either 161 short, deep bodied, or both, were identified among those species detected with a higher 162 frequency using eDNA metabarcoding than in the species records (Fig. 3d). 163 To evaluate the effect of lake limnological features on the percentage record 164 covered, we used generalized linear models (GLMs), which revealed a significant 165 negative effect of surface area on the percentage record covered (P < 0.018, Table S7, 166 Fig. 4), whereas for all sampling methods used, there were no significant differences 167 with respect to latitude, mean water depth, trophic state, or water type (P > 0.05, Table  168 S7). The relationship between the percentage record covered and the non-included 169 factors to the final GLMs are shown in Fig. S7. 170 171

Community analysis 172
Using indicator taxa analysis, we established that certain species were 173 detected at a significantly higher frequency using eDNA metabarcoding than indicated 174 in the species records for the 18 surveyed lakes (please refer to Table 1 for species with 175 significantly higher frequencies determined by eDNA metabarcoding and all results in 176 Table S8). For example, C. auratus was significantly more frequently detected using all 177 three sampling methods, whereas Zacco platypus was significantly more frequently 178 detected in the individual and Mix_freeze samples, Hemibarbus labeo and 179 Hypophthalmichthys nobilis were more frequently detected in individual samples, and 180 Rhinogobius spp. were significantly more frequently detected in Mix_freeze samples. In this study, we observed a difference in the ecological/biological traits of fish species 191 and lake limnological features relating to the eDNA-evaluated community structure and 192 species detection based on broad-scale eDNA and direct surveys. 193 We initially compared the efficacy of three different sampling methods 194 (individual, Mix_cool, and Mix_freeze sampling) for water collection, and accordingly 195 found that the number of fish taxa detected using Mix_cool and Mix_freeze methods 196 was significantly higher than that detected using individual sampling. In contrast, we 197 detected no significant differences among these sampling methods based on the 198 percentage of record covered, NMDS, or indicator taxa analyses. In this regard, Sato et 199 al. 20 suggested that mixed samples can be used for fish community comparisons despite 200 a somewhat lower detection rate for certain rare species, and this indeed appeared to 201 hold true with respect to the lakes we sampled in the present study. 202 Our findings tended to indicate that the water transportation method (cool vs. In order to evaluate the performance of eDNA metabarcoding, we examined 215 the percentage record covered for all three sampling methods. We accordingly found 216 that the values obtained for species records were approximately 30%, thereby indicating 217 that many of the species inhabiting the surveyed lakes, particularly pelagic species and 218 those with lower occurrence, would not be detected using eDNA metabarcoding. 219 Below, we consider the factors (limnological features and species traits) that might 220 contribute to the lower detection rates obtained using eDNA metabarcoding. 221 As factors that could potentially influence the efficacy of eDNA 222 metabarcoding, we examined the selected ecological/biological traits of fish species, 223 and accordingly found that species with shorter body length and normal body shape, 224 those inhabiting benthopelagic habitats, and those showing tolerance to saltwater were 225 frequently detected using eDNA metabarcoding. In previous eDNA studies, certain 226 biological traits of fish, notably body size, have been considered to influence eDNA 227 detection 30,31 . However, whereas the findings of several experimental studies have 228 indicated that fish of larger body size release larger quantities of eDNA 30 , we found that 229 species with shorter body length, as an index of body size, were detected more 230 frequently using eDNA metabarcoding than their occurrence in the species records. We 231 suspect that these differences are probably attributable to the size-abundance 232 distribution of fish communities, which indicates that species with smaller body sizes 233 are more abundant in lakes than are those of larger body size 34 . Lakes, particularly those 234 characterized by brackish waters, harbor numerous marine-migratory fish species that 235 show tolerance to saltwater, and eDNA metabarcoding surveys of lakes have detected 236 many such migratory species 16,27 . Given that the timing of occurrence of species 237 influences the detection of their eDNA 35 , the timing of migration would influence the 238 detection of migratory species in lakes using eDNA metabarcoding, and consequently, 239 may contribute to the infrequent detection of saltwater-tolerant species by eDNA 240 metabarcoding in eDNA sampling surveys. Spatial heterogeneity in fish eDNA 241 concentrations/detection in aquatic ecosystems has previously been observed 10,15,25,26, 242 and Pont et al. 31 found that benthic species were detected at a higher rate using eDNA 243 metabarcoding than by using traditional methods. This phenomenon would appear to 244 suggest that the habitat preferences of fish might influence their detectability using 245 eDNA metabarcoding. In this regard, the lakeshore water sampling conducted in the 246 present study revealed the presence of numerous benthopelagic species based on eDNA 247 metabarcoding, but relatively few pelagic species. Accordingly, in order to detect a 248 larger number of pelagic species, we would need to perform offshore sampling, for 249 example, using drone-assisted water collection for eDNA surveys 36 . The lateral body 250 shape traits of species may also be associated with their micro-habitat use; for example, 251 species with eel-like and short and/or deep body shapes appear to prefer sheltered 252 micro-habitats (i.e., beneath stones) and pelagic zones. Consequently, the preference for 253 macrohabitats, such as the benthic and pelagic zones, and their microhabitat use with 254 respect to body shape types could serve as useful indices for evaluating eDNA 255 detectability. 256 Evaluation of the effects of lake characteristics on the percentage record 257 covered based on our modeling results revealed significant negative effects of lake 258 surface area. Lakes with a larger ecosystem size are characterized by a broad diversity 259 of habitats types, and correspondingly tend to support a higher diversity of fish 260 species 37,38 and an increased proportion of pelagic zones 39 . Consequently, the size of 261 lake ecosystems will tend to influence the rate of detection using eDNA metabarcoding. 262 With respect to rivers, Bylemans et al. 17 found that river morphology influences the 263 optimal sampling strategy for eDNA metabarcoding in these habitats, and in the present 264 study, we identified a similar effect of ecosystem morphology on eDNA sampling 265 strategy in lakes. Taking into consideration the factors of sampling effort and 266 expenditure, broad-scale surveys, such as nationwide surveys, tend to be constrained 267 with respect to limited sample sizes, and given these circumstances, it may thus be 268 important to assess the performance of eDNA metabarcoding-based limited sampling 269 effort. In this regard, our GLM analysis predicted that the percentage record covered 270 would decline to almost half (for example, from 30% to 15%) with an increase in lake 271 surface area from 0.1 to 100 km 2 . 272 The utility of eDNA metabarcoding in evaluating fish community structure in 273 lakes has been examined in previous studies that have compared eDNA metabarcoding 274 and traditional sampling methods 16,26 . Similar to the previous results, we observed 275 significant differences in the fish communities detected using eDNA metabarcoding and 276 those in the species records. As discussed above, lake ecosystem size and the traits of 277 fish species can influence the results obtained using eDNA metabarcoding, and 278 consequently, we would expect these effects to be reflected in differences between fish 279 community structures determined based on eDNA metabarcoding and the species 280 record. We indeed found that similar to that shown in the species records, the fish 281 community structure determined by eDNA metabarcoding showed significant 282 differences among the surveyed lakes, thereby indicating that community analysis using 283 this technique could compare the communities of different lakes. Although eDNA 284 metabarcoding would be useful for broad-scale community surveys at low survey costs, 285 we should carefully consider the influence of ecosystem characteristics and the traits of 286 species within fish communities in such surveys. 287 We acknowledge that the present study does have certain limitations, and thus 288 the conclusions we draw should be viewed as provisional and in need of further 289 verification. Notably, as we performed only single samplings for the eDNA survey, we 290 were unable to evaluate the false negative/positive detection rates for fish species, 291 which is a vital consideration in applying eDNA metabarcoding to biomonitoring 40 . 292 eDNA sampling strategies are also prone to false-positive errors, owing to 293 contamination and/or errors in PCR or sequencing, which may result in the spurious 294 detection of species 41,42 . Accordingly, further studies are needed to confirm false 295 negative/positive detection rates in lake eDNA surveys. 296 In conclusion, on the basis of a comparison with existing fish records, we 297 were able to establish that certain traits of fish species and characteristics of ecosystems, 298 particularly body size/shape, species habitat preferences, and ecosystem size, would be 299 useful indices for optimizing eDNA metabarcoding sampling strategies, as well as for 300 assessing the rates of species detection using eDNA methods. From the perspective of 301 designing eDNA metabarcoding-based surveys to monitor aquatic communities across 302 wide geographical areas, we should consider the ecological/biological traits of species 303 and the characteristics of an ecosystem that can potentially influence species 304 detectability. 305

Study lakes and sampling sites 308
For the survey described herein, we selected 18 lakes distributed throughout Japan (Fig.  309 1), the locations of which are listed in Table S1. The lakes differed with respect to the 310 surface area, water depth, volume, trophic state, and water type (brackish and 311 freshwater), as shown in Table S1. Along the shores of each lake, we established six 312 sampling sites separated by approximately equal distances (Fig. S1), none of which 313 were located in the vicinity of river inflows/outflows. 314 As species-area accumulation curves in community surveys 43,44 , increasing 315 the sample size would increase the detectability of species by eDNA metabarcoding 20,45 . 316 Given that one of our aims in this study was to evaluate the effect of lake size on the 317 detectability of species using eDNA metabarcoding, we conducted surveys with an 318 equal number of sampling sites regardless of lake size. 319 320 Collection water samples for eDNA survey 321 Between 14 July and 4 November 2015, we collected samples of water at each of the six 322 sampling sites located along the shores of the surveyed lakes, using three sampling 323 methods, namely, "Individual," "Mix_cool," and "Mix_freeze" (Fig. S1) (Table S1). 324 For convenience, to evaluate the monitoring methods, we collected water samples from 325 the lakeshore and thereby avoided contamination via floating gears 14 . Initially, for the 326 individual samples, we collected 1 L of surface water in a bottle. To avoid DNA 327 contamination, we sterilized the bottles and all other equipment used in sampling, 328 including filtering apparatus, using 10% commercial bleach (ca. 0.6% hypochlorous 329 acid) followed by washing with DNA-free distilled water. For the Mix_cool and 330 Mix_freeze samples, we collected two 150-mL samples of surface water at each of the 331 six sampling sites, which were mixed in two sterilized bottles to give two composite 332 water samples with final volumes of 900 mL. The "field blank" sample contained 1 L of 333 DNA-free water, which we brought to the field and treated identically to the other water 334 samples, with the exception that it was not exposed to the external environment at the 335 field sites. The individual, Mix_cool, and field blank water samples were stored on-site 336 in a cooler with ice packs and transported to the laboratory in a 4°C refrigerator within 2 337 days. The Mix_freeze samples were stored on-site in a cooler with ice packs, transferred 338 to a -18°C freezer within 12 h, and transported frozen at -18°C within 2 days. The detailed procedures used for bioinformatics analyses have been described 410 previously by Fujii et al. 16 . Initially, low-quality tails were trimmed from each read, and 411 paired-end reads were then assembled. For the 5,225,947 reads thus obtained, primer 412 sequences were removed, and identical sequences (i.e., 100% sequence similarity) were 413 merged using UCLUST (usearch 7.0.1001) 48 . Sequences with ten or more identical 414 reads were subjected to downstream processing. For taxonomic annotation, we 415 conducted a local BLASTN search using BLAST 2.2.29 based on a previously 416 established reference database of fish species for processed reads 22 . For each assessed 417 sequence, the top BLAST hit with a sequence identity ≥98.6% was used for species 418 detection. Note that a majority of the species were identified with a match of ≥99%. 419 Based on the BLAST results, we identified the species and genus using previously 420 described methods 16,20 . The sequence reads in the pipeline processes are listed in Table  421 S2. non-governmental organizations (NGOs) and universities. In the present study, using 434 the same sources, we further added the distributions of secondary freshwater fish, which 435 have a certain degree of salt tolerance and are occasionally able to cross narrow sea 436 barriers, and peripheral freshwater fish, which are derived from marine ancestors and 437 include diadromous fish that migrate between fresh and marine environments. Having 438 compiled the distribution data, we generated a comprehensive dataset of fish fauna 439 consisting of native and exotic fishes after extirpations and introductions. Finally, we 440 obtained records for a total of 242 fish taxa (Table S3). To evaluate the performance of 441 our eDNA metabarcoding survey, we assumed that the records obtained provided a 442 reliable indication of the fish communities inhabiting the selected lakes. 443 We acknowledge the potential limitations associated with analyzing such 444 compiled data, given that these data were not systematically collected. Moreover, fish 445 were collected using a variety of apparatus and techniques (such as electrofishing, bag 446 seines, trap nets, minnow traps, gill nets, hand nets, and fyke nets), by different entities, 447 and within different studies with differing objectives. However, the utilization of 448 integrated multiple sources can serve to minimize bias and summarize fish fauna in 449 terms of temporal and spatial replication 7,48 . 450 , 451

Fish trait data 452
We obtained details on the ecological and biological traits of the detected and recorded 453 fish species from FishBase 50 , which was searched using species names, and obtained the 454 related data using the "species" and "ecology" functions of the "rfishbase" package 455 (ver. 3.1.0) on March 4, 2020. We were able to obtain sufficient data on body length 456 (cm, including both total and standard lengths, but mainly standard length), recorded 457 longevity in the wild (years), lateral body shape types, habitat types, and saltwater 458 tolerance, but were unable to obtain sufficient data (for less than 12% of taxa) for 459 certain traits, such as trophic position, and body weight, for further analysis. 460 461 Lake morphology and characteristic data 462 We obtained data on lake morphology [surface area (km 2 ), maximum water depth (m), 463 mean water depth (m), and volume (km 3 )], and lake types [trophic state and water types 464 (brackish or freshwater)] from Tanaka 51 (Table S1). 465 466

Statistical analyses 467
To evaluate the performance of eDNA metabarcoding with respect to detecting species, 468 we used "percentage record covered (%)" as a measure of the proportion of eDNA-469 detected species matching species recorded in the target lakes, which was calculated 470 from the number of eDNA-detected species matched to the fish records per total number 471 of recorded species in the dataset. 472 All statistical analyses and graphic preparations were performed using R ver. 473 3.6.0 52 , and statistical values were evaluated at a significance level of α = 0.05. 474 To compare eDNA metabarcoding and species record data, we compared 475 taxonomic levels in the species list compiled based on visual surveys with those in the 476 lists compiled using eDNA metabarcoding data (Table S1, S2) with reference to 477 previous studies that have used MiFish primers 16,20 . To confirm that the sequencing 478 depth was sufficient to detect fish diversity in the samples, we used the "rarefy" and 479 "rarecurve" functions of the "vegan" package (ver. 2.5.6) (Fig. S2). Thus, we used raw 480 data for further analyses without rarefying the data. 481 We examined differences in the number of detected fish taxa and percentage 482 records covered evaluated by eDNA metabarcoding among the sampling methods 483 (individual, Mix_cool, Mix_freeze, and field and negative controls) using nested 484 analysis of variance (nested ANOVA) based on LMMs with "lake" as the nested 485 (random) factor, using the "anova.lme" and "lme" functions of the "nlme" package (ver. GLMs. We found that the maximum VIF value was 1.23 and that all VIF values were 496 less than 5. 497 We also performed GLM analysis to evaluate the relationships between the 498 percentage record covered of the species record and the explanatory factors of lake 499 ecosystems, including lake latitude and altitude, morphology (surface area, maximum 500 water depth, mean water depth, and volume), trophic state (eutrophic vs. meso-and 501 oligotrophic), and water types (freshwater vs. brackish) using the "glm" function. We 502 set "binomial" as the error distribution of the GLMs and added the total number of 503 species records as the offset factor. In this case, we found that the maximum VIF value 504 was 49.7, indicating that collinearity among the factors could potentially influence the 505 parameter estimations in the GLMs. After removing the explanatory factors with a VIF 506 value >5 to reduce the collinearity effect on the GLMs, we finally performed the GLMs 507 using latitude, surface area, mean water depth, trophic state, and water types for all 508 sampling methods. 509 We performed indicator taxa analysis 53 to determine those taxa showing 510 significantly different frequencies between the eDNA metabarcoding and species record 511 species lists. The analysis was performed using the "signassoc" function of the 512 "indicspecies" package (ver. 1.7.6) based on presence/absence data 53 . We used mode = 513 1 (group-based) and calculated the P values with 999 permutations after applying 514 Sidak's correction for multiple testing. 515 Differences in community compositions were visualized using non-metric 516 multidimensional scaling (NMDS) with 500 separate runs of real data. The community 517 dissimilarity for NMDS was calculated using incidence-based Jaccard indices. We also 518 calculated NMDS stress to confirm the representation of the NDMS ordination and 519 evaluated differences in community structures between sampling methods and sites 520 using PERMANOVA, for which we used an incidence-based Jaccard similarity matrix 521 and calculated the statistical values with 999 permutations. For NMDS and 522 PERMANOVA analyses, we used the "metaMDS" and "Adonis" functions of the 523 "vegan" package (ver. 2.5.6), respectively.