An integrative assessment of aquatic ecosystem services based on guideline thresholds

Ecologists typically associate water quality with trophic status where oligotrophic ecosystems have excellent water quality and presumably provide more aquatic ecosystem services. However water quality is perceived differently among worldviews. Aquatic ecosystem service provisioning to the public health and agriculture sectors is determined using specific guidelines. But are these guidelines related to trophic status? Here, we developed an integrative ecosystem service framework using guideline thresholds for drinking, swimming, irrigation, suitability for livestock and aquatic wildlife in canadian rivers of varying trophic status. Drinkability was the most sensitive ecosystem service, met in 37% of cases, whereas livestock was the least, provided in 99%. Trophic status is a fair proxy for ecosystem services limited by fecal contamination as nutrients are related to human and animal populations, but not to those limited by metals. Using quantitative thresholds to assess the safe provisioning of multiple ecosystem services provides clear guidance for supporting resource management. In a nutshell Water quality is a commonly used term in management, but the metrics that determine whether a river can safely provide various aquatic ecosystem services differ among worldviews. We propose an integrative approach based on guideline thresholds to evaluate the frequency with which rivers are drinkable, swimmable, suitable for irrigation, livestock, and aquatic wildlife and compared this suitability with trophic status. Trophic status is a fair proxy for ecosystem services limited by fecal contamination, but not for those limited by metals. Using and developing more guideline thresholds provides a concrete way to assess ecosystem service provisioning that could help serve management.


Challenge
Water quality means different things to different people; but regardless of your worldview, an aquatic ecosystem with good water quality supposes that it is capable of providing multiple aquatic ecosystem services (UNEP 2016). Guideline thresholds of specific water usages provide a means to evaluate whether multiple aquatic ecosystem services can be delivered simultaneously or not. This could serve as a novel and valuable approach for management, but has never been applied in an ecosystem service context to our knowledge. From an ecological worldview, trophic status is known to directly and indirectly affect the delivery of multiple ecosystem services. For example, excess nutrients, as a function of human activities on the landscape, can result in toxic cyanobacterial blooms, anoxic conditions, and biodiversity loss resulting in significant environmental and economic damages (Orth et al. 2006, Dodds et al. 2006, Carpenter et al. 2011. But how well does trophic status serve as a proxy to the delivery of multiple aquatic ecosystem services that are typically assessed using guideline thresholds? Using data from Canadian rivers, we propose a novel integrative aquatic ecosystem service framework that tests the hypothesis of whether trophic status is an overarching indicator of water quality.

Water quality across worldviews
Water quality can be defined in multiple ways depending on one's worldview. In order to explore these potential differences, we compared the use of the term water quality throughout the literature across the following research fields: ecological, public health, agricultural, and biodiversity sciences. The use of the term water quality has gained traction within the scientific literature across these four fields since the late eighties ( Figure 1). This acceleration may be expected given the major amendments to water acts around that period in many different countries including the United States, Canada and the United Kingdom (Canada Fisheries Act 1985, US water quality act 1987, CEPA 1988, UK Water act 1989. But are the reasons why we observe this trend similar among fields? In order to evaluate this, the next step was to compare the most commonly used words around water quality with respect to each worldview. Figure 1: The temporal evolution (from 1980 to 2017) of the relative proportion (Z-score) of scientific papers with the term water quality in the title or abstract in the scientific literature in the fields of ecology, biodiversity, agriculture and public health, based on Web Of Knowledge Core Database. See Table S1 and WebPanel 1 for details.
We extracted the words most often associated with water quality in abstracts from over 600 articles published in 2017 in the four fields (Table S2). We then measured the cosine distance between each field (WebPanel1 , Table S3) to compare the co-occurrence of words. We found that the language used around public health was the most distinctive of the worldviews compared. Ecology was more similar to both agriculture and biodiversity, but since there was little overlap between the latter two, the reasons for association with ecology were different. In particular, we observed that in ecology 45% of abstracts included at least one of the following words: nutrient(s) , phosphorus, or nitrogen . This is consistent with the notion ecologists have of trophic status providing an indication of aquatic ecosystem health through low algal biomass and water clarity (Anders and Ashley 2007). By comparison, in public health, within the top 10 words were drinking, risk, contamination, and E. coli (Table S2). This emphasizes the worldview around the main water use of public concern, clean drinking water, and the metric of contamination most often used to evaluate potability is the presence of Escherichia coli (E. coli, Table S2). For agriculture, water is withdrawn for crop irrigation or livestock production which both have different criteria to assess suitability for each aquatic ecosystem services. However, agriculture itself impacts water quality and is globally recognized as the main source of nutrient pollution to aquatic ecosystems (Gordon et al. 2010, Mateo-Sagasta et al. 2017. This is reflected in the keywords which include components of the different ecosystem services for agriculture (e.g. crop yield, irrigation) and words associated with nutrients including nitrogen , phosphorus , and nitrate , which were present in 51% of the abstracts evaluated. In the biodiversity literature, the words associated with water quality were more similar to what we observed in ecology, although nutrients did not emerge while the more frequently used terms were related to diversity metrics (e.g. abundance , index , indicators ).
Overall there is surprisingly little overlap in the terms associated with water quality among worldviews. One possible explanation for this divergence is that every worldview focuses on different aquatic ecosystem services each with their own threat to provisioning that serves as the guideline threshold. One of the few exceptions was the term management , which was used across all fields . This emphasizes the need to manage ecosystems to maintain water quality for multiple ecosystem services, which integrates different worldviews.

Threshold guidelines of different aquatic ecosystem services
To assess the provisioning of multiple ecosystem services at once while evaluating their potential tradeoffs, an ecosystem service bundle approach has been proposed (Raudsepp-Hearne 2010). This, however, is typically done using continuous variables whereas much of water quality management often relies on guidelines using thresholds (Hart 1993). This offers an easy to implement, binary decision-making tool on whether an ecosystem service can be delivered or not, bridging management with ecosystem service science (Daily et al. 2009, Guerry et al. 2015. Given that many aquatic ecosystem services have their own set of biological and/or chemical guidelines, we assessed the ability of an aquatic ecosystem to safely provide conditions for drinking, irrigation, swimming, livestock, and biodiversity maintenance simultaneously. Thus, we were able to test several provisioning ecosystem services, a cultural one as well as biodiversity maintenance which is considered by some a regulatory ecosystem service and by others a regulator of ecosystem processes (Mace et al. 2012). In fact it can even be considered more broadly within the context of recreation and provisioning with regards to fisheries for example (Ziegler et al. 2017).
Guidelines for water quality assessments across worldviews are typically divided into five different classes that include biological indicators, inorganic chemicals, heavy metals, pesticides, and other organic pollutants (Health Canada 2017). Although water quality guidelines are regionally distinctive (Boyd 2006), for this exercise we used a series of Canadian guidelines. They were selected for their measurement frequency in publicly available databases as well as for their use in assessing multiple ecosystem services (Table  S4). For livestock and irrigation, guidelines were derived from the Canadian Council of Ministers of the Environment (CCME 1999), whereas for drinking and recreation, these came from Health Canada (Health Canada 2012. Finally, for biodiversity, we used the CCME guidelines for the protection of aquatic life (CCME 1999), which we considered as a proxy to the threat to biodiversity maintenance in a given aquatic ecosystem. For the ecological worldview, we used the trophic level criteria established by Dodds (1998) for rivers that are based on specific nutrients and chlorophyll a concentrations.

Overlap between public health, agriculture, and biodiversity guidelines; ecology stands alone
We observed three distinctive ecosystem service clusters (Table S5), which were largely based on the commonality of the metrics used for the guideline thresholds across fields. Drinking, livestock, irrigation, and the protection of aquatic wildlife all cluster quite strongly since they are all based primarily on the same, multiple toxicological guidelines (metals, pesticides, and other organic chemicals; Table S4). Although drinkability and irrigation cluster with recreation, the result is much weaker. This is likely due to the fact that they all have guidelines associated to fecal contamination, but contrary to drinkability and irrigation, which also have toxicological guidelines, recreation is based on fecal contamination alone.
The ecological worldview is the most distinctive as total phosphorus and total nitrogen are not used as guidelines to evaluate any of the other ecosystem services. While there is little debate among ecologists that ecosystems tend to degrade as they become more eutrophic (Smith 2003), how this degradation impinges on the safe delivery of aquatic ecosystems services based on guideline thresholds remains unknown. Therefore, we aimed to empirically test the widely assumed, but often implicit notion that ecosystems with lower trophic status have better water quality in terms of their capacity to deliver multiple aquatic ecosystem services.

A multiple ecosystem service framework based on guideline thresholds
We aimed to test the assumption that the frequency with which ecosystem services are provided decreases from oligotrophic to mesotrophic to eutrophic rivers. As a first step, we quantified the overall frequency with which Canadian rivers were able to deliver multiple ecosystem services. Secondly, we evaluated how well trophic status allowed us to predict the capacity of rivers to deliver these ecosystem services. To do so, we used open water quality data from 46 datasets on rivers across Canada (see WebPanel 2). We grouped data by sampling event, which we defined as water collection at a given location in a given month for a total of over 60 thousand observations. However, there were many gaps in this collated dataset. Although over 96% had available information on trophic status based on total phosphorus concentrations, information was more sparsely available for variables used as guidelines. There were 56 % on average for different metals, 1% on average for organic pollutants, 55% for inorganic pollutants and only 11% for E. coli (Table S6). The ability to deliver an ecosystem service was evaluated on the basis of each available guideline for a given sampling event. We focused on the 11% of observations that measured E. coli because it is a major limiting factor for drinkability, and irrigation, and the only guideline for swimmability (Table S6) .

What limits the delivery of different ecosystems services across rivers
We used a radar plot to visualize the total frequency with which rivers could provide different ecosystem services based on thresholds ( Figure 2a). Overall, the suitability of a river to serve as a source of drinking water for livestock was by far the least sensitive ecosystem service evaluated, where guidelines were not met in only 3 events (Figure 2a). In the rare cases where water wasn't suitable for livestock, it was limited by high concentrations of heavy metals such as cadmium and lead. Aquatic wildlife was also mostly limited by cadmium and lead, however, the suitability of water to protect biodiversity was much lower at 42% of sampling events. This result should be interpreted with caution because although we assumed the guidelines for the protection of aquatic wildlife could serve as a proxy for biodiversity maintenance, the thresholds are based on the response of the most sensitive species of a predefined multiple-proxy toxicological dataset (CCME 2003, Table S7). Therefore suitability for aquatic wildlife should be placed in a site-specific context to correct for natural background concentrations and species composition.
By comparison the suitability for drinking, irrigation, and recreation was almost always limited by E. coli (Table S6). Recreation was always limited by E. coli as it was the only metric of evaluation, whereas heavy metals limited drinking and irrigation in 2 % of events; neither of these services were impinged by pesticide concentrations. Rivers were swimmable in 84% of sampling events whereas irrigation and drinking were safe in 70% and 37% of cases, respectively. Drinking is clearly the most sensitive ecosystem service, since drinking water should be exempt of any fecal coliform.

Trophic status as a proxy of aquatic ecosystem service provisioning
We evaluated the adequacy of trophic status as a proxy for ecosystem services by comparing the proportion of oligotrophic, mesotrophic, and eutrophic ecosystems that met Canadian guidelines (Figure 2b). Overall a larger proportion of ecosystem services were rendered in more oligotrophic ecosystems (mean of 82% across services), whereas fewer were rendered in mesotrophic (mean of 62% across services) and eutrophic ones (mean of 46% across services), confirming the hypothesis that trophic status can inform on ecosystem service provisioning.
Not all ecosystem services were equally compromised by trophic status. Drinking, swimmability, and irrigation were the most responsive to trophy, as they tended to be limited by E. coli, which is generally well correlated with total phosphorus concentrations (rho= 0.40). Drinking water for livestock was unaffected by trophic status, as was aquatic wildlife. The latter is not surprising given that the metric for their limitation is based on metals and the correlation between lead and total phosphorus , for example, was rather weak (rho=0.22).
Although trophic status appeared to qualify the relative ability of a river to provide these specific aquatic ecosystem services, we were surprised, as ecologists, that close to 50% of oligotrophic rivers were considered non-drinkable. Our a priori assumption was that oligotrophic ecosystems would provide all ecosystem services and that this would decline precipitously with increased eutrophication. Therefore, we tested how this assumption was supported by the data. To do so, we created an expected distribution of whether an ecosystem service was delivered or not by sequentially reattributing their expression per sampling event based on increasing trophic status. For example, in the case of irrigation, from the 6866 sampling events assessed, water was considered usable for irrigation 4663 times. These 4663 events were distributed among 2714 sites were oligotrophic, 2591 were mesotrophic and 1425 were eutrophic. Therefore 100% of the events in oligotrophic rivers are expected to be suitable for irrigation, based on our initial assumption, leaving 1949 drinkable events or 75% in the mesotrophic category, whereas none remained for the eutrophic category (expected percentage is 0%) (Figure 2c).
Using a Chi-squared test, we compared the observed versus the expected number of events for which each individual service could be provided (WebPanel 2). If trophic status is a perfect proxy of drinkability, as described above, we would expect that 90% of oligotrophic systems would be drinkable (Table S9); however, only 49% were considered drinkable (Table S10). In contrast, 0% of eutrophic events were expected to be drinkable, but we found that 20% were. Similar patterns were observed for recreation, irrigation, and the protection of aquatic wildlife, where fewer services were provided in oligotrophic ecosystems than expected whereas more were provided in eutrophic ones.
Therefore, trophic status is a good indicator of the general ability of a river to deliver ecosystem services because it serves as an indicator of a longer-term condition, but it is incomplete from a sampling event perspective, which may represent a more punctual situation. It is clear that trophic status co-varies much more with E. coli as compared to metals, but the incidence of E. coli compromising ecosystem services may be more sporadic in both space and time.

Conclusions
Here we show that using guideline thresholds for specific water usages is a novel integrative approach to characterize the capacity of a river to deliver on multiple aquatic ecosystem services. We found that classic ecological trophic status is a fair proxy to evaluate if water is safe for drinking, swimming, and irrigation. This is perhaps not too surprising since these ecosystem services are largely comprised by the concentration of E. coli where the land use changes that result in excess nutrient runoff from waste due to high human and/or animal populations are also subject to increasing incidence of E. coli. However, the frequency with which an ecosystem cannot provide an ecosystem service based on E. coli is sporadic. We suggest that trophic status represents the more overarching condition of an aquatic ecosystem, but that the inability to deliver on any ecosystem service based on specific guidelines is much more variable through time. For example, in terms of a more emergent contaminant of concern, the frequency that concentrations of microcystin, a cyanobacterial toxin, will limit the provisioning of different ecosystem services will be higher in more eutrophic waters, but this occurrence is never chronic (Taranu et al. 2017). The phenomenon with E. coli is similar in that concentrations are much more variable over time as compared to nutrients (Levy et al. 2009), but that the probability E. coli will exceed threshold guidelines is indeed higher in high-nutrient ecosystems. Therefore limiting excess nutrient runoff to surface water should remain a water quality maintenance priority.
Trophic status, however, did not serve as a good proxy for the suitability for livestock or the protection of aquatic wildlife given that their thresholds are limited by metal concentrations. Given the broad spatial extent of this study, elevated metal concentrations are most likely a function of natural geology or other types of anthropogenic activities (e.g. mining). However, the thresholds for the protection of wildlife are not a good indicator of whether an aquatic ecosystem is capable of maintaining its natural biodiversity. It is difficult to believe that 38% of all of oligotrophic events assessed were compromising biodiversity maintenance. Furthermore trophic status may be too simplistic as other anthropogenic pressures such as damming, over exploitation, and climate change are all contributing to habitat loss and aquatic biodiversity corrosion (Vorosmarty et al. 2010, Tonkin et al. 2019). Indeed given the global aquatic biodiversity crisis (IPBES 2018), there is an urgent need to manage aquatic ecosystems integratively and adaptively (Chan 2006). In order to meet the UN sustainable development goals of improving water quality and protecting life below water by 2030 (UN SDG 6 and 14 respectively), we believe that the novel use of thresholds to assess whether multiple ecosystems services will be of great use to managers and should be developed further. Indeed given the plethora of data collected routinely by government water quality surveys and different environmental programs, this information should be opened to help identify which ecosystems are most at risk.

Acknowledgements
Financing for this project comes from NSERC Discovery to RM and JFL, OURANOS-Mitacs and a FQRNT fellowship to NFSG. This is contribution to the Strategic Network Lake Pulse and the GRIL.

Temporal trends in the use of term water quality between 1980 and 2017 in ecology, biodiversity, agriculture, and public health.
Using the Web of Knowledge Core Database, we downloaded metadata on all the scientific publications published between 1980 and 2017 with the term water quality in the title or abstract. In order to compare how the use of the term changed in ecological, public health, agricultural, and biodiversity sciences, we extracted metadata by field based on the Web of knowledge categories : agronomy, agricultural eng., agriculture multi for agriculture, ecology for ecology, public, Environmental & occupational health for public health and biodiversity conservation for biodiversity. To compare temporal trends in the use of the term water quality relative to the total number of publications in each field, we used a z-score approach. For each field, we fitted both a linear and segmented regression ( segmented R package) model, and selected the best model fit to the data using the Akaike information criterion (AIC). The relative use of the term water quality increased linearly in the overall literature (Table S1). However the segmented model often better fit the observed data when we considered each field separately. Indeed, for agriculture and ecology, we observed a sharp increase starting in the late eighties followed by a stabilization in the use of the term around 2010. In biodiversity science, we observed a similar increase in the late eighties followed by an acceleration that started in 2001 that continues to increase until 2017. For public health, we observed a break point around 1985. However, this breakpoint was not considered significant because it only captured the high initial variability caused by the very small number of studies that included the term water quality , so the linear model was selected.

Comparing the words most often associated with water quality among fields in 2017.
To further explore how the focus in the use of the term water quality differs among fields we extracted the words most often associated with water quality in the title and abstracts. The titles and abstracts from manuscripts published in 2017 only in each of the four fields were first downloaded from Web of Knowledge. After removing words that were not meaningful, we calculated the relative frequency of each word. To quantitatively compare how the words associated with water quality differed between the different fields, we measured the cosine distance (from the lsaR R package) to assess how different the top 100 words were (Table S3).

WebPanel2. Ecosystem services in Canadian Rivers
N. F. St-Gelais et al.

Aggregated guideline database
We created an aggregated guideline database using Canadian guidelines published by the Canadian Council of Ministers of the Environment (CCME) and Health Canada for the following ecosystem services: drinking, swimming, irrigation, livestock, and the protection of aquatic wildlife. We only included guidelines with a quantitative numeric threshold and as such, narrative guidelines were excluded. Furthermore, using guidelines that are under the detection limit (e.g. dicamba for irrigation) would result in artificially limiting the service each time it was measured. In those rare cases, we used the detection limit multiplied by 2 as a modified guideline which we specified in the database. We also included trophic status guidelines from Dodds 1998 classification for rivers.
We consolidated river water quality data from the two following openly available data sources: the National Long-term Water Quality Monitoring Data and the Provincial (stream) water quality monitoring network (PWQMN) from Ontario, for a total of 46 databases.

Aggregated limnological database
We developed the dbExtract R package and ETL (extract, transform and load) package to 1) combine and normalize the databases, 2) evaluate which ecosystem service(s) can potentially by provided at each sampling event, and 3) evaluate trophic status based on the Dodds 1998 classification. In the combined database, we had a total of 660 000 unique observations from rivers across Canada over 22 years (from 1995Canada over 22 years (from to 2017. We grouped observations by station and by month to create an aggregated limnological database of observations for a total of 65935 sampling events. As there are multiple guidelines for each ecosystem service (e.g. close to 75 for drinking water), for this study we selected a subset of guidelines in each group (Table S4) based on their measurement frequency in publicly available databases as well as for their use in assessing multiple ecosystem services. We developed the load_ES function to evaluate, for each sampling event in the aggregated limnological DB, which ecosystem services could potentially be provided or not based on the guideline threshold. Because there was considerable variability in which variables were measured at each sampling event, this evaluation was based on the guideline for which specific information was available. In the case when no variable was available to evaluate whether a service could be provided or not, the service wasn't evaluated for this specific sampling event.

Calculating limiting frequency for each guidelines
For each of the selected guidelines (see section 2.1), we assessed the frequency with which each was over the threshold as a function to the total times it was measured. As such, for each sampling event multiple guidelines could be limiting an ecosystem service, with the exception of swimming which only had E. coli concentration as a guideline metric (see Table S4).
We observed that in average E. coli was often limiting the following ecosystem services: swimming, irrigation, and drinking (see section 2.3 and Table S6), but was only measured in 11% of sampling events (see Table S6). Hence, because our approach is based on the variables measured during each sampling event, this could lead to an overestimation of how often an ES is provided, particularly for ecosystem services that had more than one guideline such as irrigation and drinking. To avoid this bias, for the analyses we used a subset of sampling events for which E. coli was measured.

Testing if oligotrophic rivers are providing more ecosystem services: observed versus expected
In order to verify our hypothesis that more ecosystem services are provided in oligotrophic than in mesotrophic and eutrophic rivers, we statistically tested, using a chi-squared test, if the distribution of events by trophic state is in agreement with the hypothesis that trophic status is a perfect proxy for whether or not an ecosystem service can be provided or not. To test this, we did the following: 1. For each ecosystem service, we selected sites for which the ecosystem service and trophic state could be evaluated: at least one guideline was measured as well as total phosphorus or total nitrogen.
2. We created the observed values table (Table S9) by calculating the number of sampling events for which an ecosystem service could be provided as a function of true trophic status.
3. To create the expected table (Table S10), we attributed all the sampling events for which an ecosystem service could be provided in order of oligotrophic, then mesotrophic and finally eutrophic sampling events. In other words, by default we attributed all the events for which a service could be provided to oligotrophic systems, and subsequently to mesotrophic systems and in the rare case to eutrophic rivers. Then we plotted the observed and expected in a radar plot based frequency of delivery as a function of trophic status.
For clarity we repeat the example as stated in the paper: in the case of irrigation, from the 6866 sampling events assessed, water was considered usable for irrigation 4663 times. These 4663 events were distributed among 2714 sites were oligotrophic, 2591 were mesotrophic and 1425 were eutrophic. Therefore 100% of the events in oligotrophic rivers are expected to be suitable for irrigation, based on our initial assumption, leaving 1949 drinkable events or 75% in the mesotrophic category, whereas none remained for the eutrophic category.