Global gut content data synthesis and phylogeny delineate reef fish trophic guilds

The diversity of life on our planet has produced a remarkable variety of biological traits that characterize different species. Such traits are widely employed instead of taxonomy to increase our understanding of biodiversity and ecosystem functioning. However, for species’ trophic niches, one of the most critical aspects of organismal ecology, a paucity of empirical information has led to inconsistent definitions of trophic guilds based on expert opinion. Using coral reef fishes as a model, we show that experts often disagree on the assignment of trophic guilds for the same species. Even when broad categories are assigned, 60% of the evaluated trait schemes disagree on the attribution of trophic categories for at least 20% of the species. This disagreement greatly hampers comparability across studies. Here, we introduce a quantitative, unbiased, and fully reproducible framework to define species’ trophic guilds based on empirical data. First, we synthesize data from community-wide visual gut content analysis of tropical coral reef fishes, resulting in trophic information from 13,961 individuals belonging to 615 reef fish species across all ocean basins. We then use network analysis to cluster the resulting global bipartite food web into distinct trophic guilds, resulting in eight trophic guilds, and employ a Bayesian phylogenetic model to predict trophic guilds based on phylogeny and maximum body size. Our model achieved a misclassification error of 5%, indicating that our approach results in a quantitative and reproducible trophic categorization scheme, which can be updated as new information becomes available. Although our case study is for reef fishes, the most diverse vertebrate consumer group, our approach can be applied to other organismal groups to advance reproducibility in trait-based ecology. As such, our work provides an empirical and conceptual advancement for trait-based ecology and a viable approach to monitor ecosystem functioning in our changing world.


Introduction
A fundamental goal in ecology is to understand the mechanisms behind the maintenance of 2 biodiversity and ecosystem functioning [1,2]. Understanding the ecological niches of species is 3 central to this endeavor [3,4]. In fact, the degree of niche overlap among species can be a major 4 determinant of the positive relationship among species richness [5], ecosystem productivity [6-5 8], and ecosystem vulnerability [9] since limited functional redundancy can make ecosystems 6 more prone to lose entire energetic pathways [10][11][12]. With growing threats to biodiversity, the 7 need to quantify the impact of biodiversity loss has amplified the use of functional groups, which 8 are species groups that share common ecological characteristics and are often defined with 9 coarse, categorical descriptors of species traits [13][14][15][16]. 10 Delineating the ecological niche with discrete categories has several operational 11 advantages. First, grouping species into categories helps decompose highly complex ecosystems 12 into comprehensible units, while traditional taxonomic analyses may be difficult to interpret in 13 highly diverse ecosystems. Second, ecological predictions tied to individual species are restricted 14 to the geographic range of the species, whereas predictions of functional groups can be globally 15 comparable. Third, the use of functional groups enables the quantification of functional metrics 16 (e.g. functional richness and functional redundancy) from a standard community data matrix 17 without complex experiments [17][18][19]. The promise of developing "user-friendly" metrics for 18 functional ecology has motivated the employment of trait-based data in community ecology; 19 even with a paucity of empirical information, it is often assumed that experts can achieve 20 accurate descriptions of the ecological niche of species [17,20,21]. 21 Coral reefs, one of the most diverse marine ecosystem on Earth, have inspired a plethora 22 of trait-based ecological studies, with significant recent efforts to compile trait-based datasets for 23 two major components of this ecosystem: corals and fishes [22,23]. For some traits, such as 1 maximum body size in fishes, the compilation process is simple because unidimensional, 2 quantitative data (e.g. maximum total length) are compiled in publicly accessible databases; 3 however, when it comes to species' diet or behavior, obtaining consensual data is much more 4 difficult. For example, dietary data are multidimensional (i.e. various prey items can be recorded 5 across individuals), ontogenetically variable (i.e. diet differs between juveniles and adults), 6 spatially variable (i.e. species may show dietary plasticity across locations), and prone to 7 methodological differences and observer bias. Therefore, researchers that employ traits to 8 delineate trophic groups or behavioral characteristics commonly rely on expert opinion [19]. 9 While there is some agreement among experts on which traits are relevant (e.g. diet, mobility, 10 body size, diel activity), there is often an implicit disagreement on the necessary categories to 11 describe these traits. For example, across the coral reef literature, the number and resolution of 12 reef fish trophic guilds substantially differs. Studies commonly define three [24] to eight [25] 13 trophic guilds, with particular ambivalence on the resolution at which to define herbivores and 14 invertivores [26][27][28][29]. 15 Among all trait classification schemes for reef fishes, only a few are openly accessible. 16 Consequently, different research groups tend to employ proprietary functional classifications, 17 with little possibility to cross-check and compare assigned traits with previous classifications. 18 The classification of species into functional groups has advantages for our understanding of 19 ecological patterns [30,31]. However, the lack of agreement and the limited transparency of trait-20 based datasets can conjure skepticism and inhibit the emergence of general patterns. 21 Here, we quantify expert disagreement in the definition of reef fish trophic guilds and 22 propose a novel, transparent, and quantitative framework to delineate trophic guilds. Using coral 23 reef fishes as a case study, we compiled all quantitative, community-wide dietary analyses from 1 several locations across the Pacific and Caribbean and used network analysis to define eight 2 modules that correspond to trophic guilds. We then examined phylogenetic niche conservatism 3 with a phylogenetic Bayesian multinomial model that predicted trophic guilds to the global pool 4 of coral reef fishes, including measures of uncertainty. Our framework is fully reproducible and 5 can be extended and updated as new data become available. 6 7

Assessment of expert agreement 9
We systematically searched Google Scholar, including papers since 2000, using the following 10 keywords: "coral reefs" AND "reef fish" AND ("fish community" OR "fish assemblage") AND 11 "diet" AND ("functional group" OR "functional trait" OR "functional entity" OR "trophic guild" 12 OR "trophic group"). The results were individually assessed to find data on trophic guilds. We 13 only considered studies performed at the community level that targeted all trophic levels. Most 14 studies were excluded because they only included specific families or groups, or the data were 15 not provided with the publication. We often found redundant results, with groups publishing 16 several papers using the same classification scheme. In those cases, only the first reference was 17 retained. We contacted authors when trophic classifications were widely used across the 18 literature, but data were not provided with the publications. 19 Our search yielded a total of eight independent trophic classifications, including Mouillot 20 et [29] with 3189 species. The classifications were not uniform in 1 terms of the number and nature of trophic guilds. To achieve comparability, we converted the 2 original classification to match five broad trophic guilds: herbivores and detritivores, 3 invertivores, omnivores, planktivores, and piscivores. All of the classifications could be 4 reattributed to these categories with the exception of Graham et al. [34], which did not include 5 the category omnivores. In this case, the comparison was made only across the four comparable 6 guilds. 7 In order to assess expert agreement, we compared each possible pair of classifications 8 that shared at least 30 species, generated a confusion matrix, and measured agreement as the 9 proportion of species with matching trophic guild assignments. We then calculated the average 10 agreement between classification pairs for each trophic guild. 11 12 Data collection on fish gut contents 13 To provide a quantitative definition of trophic guilds for reef fishes, we collected gut content 14 was quantified as volumetric percentage or item frequency. The data were standardized and 23 analyzed as proportions. To our knowledge, the compiled dataset represents the first compilation 1 of detailed coral reef food webs across ocean basins. A total of 13,961 non-empty fish guts 2 belonging to 615 species were analyzed, and more than 1,200 different prey items were 3 described in the original datasets. 4 First, fish species and family names were taxonomically verified and corrected with the R 5 package rfishbase [43]. Only species with at least ten non-empty guts were kept for further 6 analysis. The taxonomic classification of each prey item was then obtained, and all poorly 7 informative (e.g. unidentified fragments, unknown species) and redundant items (e.g. "crustacea 8 fragments" when co-occurring with an item already identified to lower taxonomic level such as 9 "shrimp") were discarded. Prey identification was highly heterogeneous across the six datasets, 10 differing in taxonomic level and the use of common or scientific names (e.g. crabs versus 11 Brachyura). In order to make the six datasets comparable, prey items were grouped into 38 12 ecologically informative prey groups (Table S1). Items were generally assigned to groups 13 corresponding to their phylum or class. Due to the high diversity and detailed descriptions of 14 crustaceans, they were assigned to the level of order or superorder. Most groups follow official 15 taxonomic classifications except for "detritus," "inorganic," and "zooplankton." In the West 16 Indies dataset [39], items labelled as "Algae & Detritus" were assigned to both of the categories 17 "detritus" and "benthic autotroph," and the percentage was equally divided in two. The category 18 "zooplankton" includes all eggs and larvae regardless of taxonomy. 19 20

Definition of trophic guilds 21
After data cleaning, we compiled dietary information for 615 species. Of those species, 516 were 22 present in only one location, 66 were collected in two locations, 25 in three locations, 7 in four 23 locations, and only 1 across five locations. Before running an analysis at the species level, we 1 tested whether there was a strong regional signal for species present across more than one 2 location. We created a quantitative bipartite network where fish species at each location were 3 linked to the 38 prey groups. This network was weighted so that edge weights represent the 4 proportional contribution of each prey group to the diet of a species at a given location. 5 In order to identify network modules that correspond to reef fish trophic guilds and their 6 preferred prey, we used the maximization of the weighted network modularity based on weighted 7 bipartite networks [44]. Since the modularity maximization algorithm has an initial random step, 8 it may converge to different (although similar) suboptimal solutions each time the analysis is 9 performed, which is common across several optimization algorithms, such as simulated 10 annealing [45]. To guarantee reproducibility and reduce the risk of basing our analysis on an 11 outlier, we performed the modularity maximization 500 times and retained the medoid solution, 12 which was identified as the solution with the highest similarity to the other 499 modules. 13 Similarity was assessed as the variation of information [46]. Overall, 68% of the site × location 14 combinations for the same species belonged to the same module. Therefore, we considered the 15 regional effect to be minor and performed the analysis on the global network, ignoring regional 16 variability and increasing the number of individuals per species. We quantified the phylogenetic signal by calculating the phylogenetic statistic δ, which 4 uses a Bayesian approach for discrete variables [49]. The δ statistic can be arbitrarily large with a 5 high level of variation, depending on the number of species and trait levels. To evaluate the 6 significance of the δ statistic, we applied a bootstrapping approach where we quantified δ one 7 hundred times after randomly shuffling the trait values. 8 We then fitted a multinomial phylogenetic regression to predict fish trophic guild 9 according to phylogeny and body size with the R package brms [50]. We used a multinomial 10 logit link function. As such, the probability of a particular trophic guild is computed as follows: 11 with muk defined as: 15 16 17 where =-is the category-specific fixed-effect intercept, )-is the slope for the natural 18 transformed maximum body size for each category k, and =6IJ×-is the matrix of random effect 19 coefficients that account for intercept variation based on relatedness as described by the 20 phylogeny for each diet category k. We used uninformative priors and ran the model for three 21 chains, each with 6,000 iterations and a warm-up of 1,000 iterations. We visualized the fitted 22 probabilities for each trophic guild with a phylogenetic tree, including the 535 species with 23 verified phylogenetic positions using the R package ggtree [49]. Next, we used our model to 24 predict the most likely trophic guild for the global pool of reef fish species. For the extrapolation, 25 we selected all species within reef fish families with more than one representative species (but 1 we also included Zanclus cornutus, which is the only species in the family Zanclidae), which 2 resulted in 50 families. Further, we only selected species with a maximum length greater than 3 3 cm, which was the maximum size of the smallest fish in our compiled database. This selection 4 process resulted in a list of 4,554 reef fish species. 5 Currently, no streamlined method exists to predict traits for new species from a 6 phylogenetic regression model. We circumvented this issue by extracting draws of the 7 phylogenetic effect ( =6IJ×-) for each species included in the model. We subsequently predicted 8 the phylogenetic effects for missing species with the help of the function phyEstimate from the R 9 package picante [51]. This function uses phylogenetic ancestral state estimation to infer trait 10 values for new species on a phylogenetic tree by re-rooting the tree to the parent edge to predict 11 the node. We repeated this inference across 2,000 draws. Per draw, we randomly sampled one of 12 the one hundred trees. Then, we predicted the probability of each species to be assigned to each 13 diet category by combining the predicted phylogenetic effects with the global intercept and 14 slopes for maximum body size for each draw. Finally, we summarized all diet category 15 probabilities per species by taking the mean and standard deviation across all 2,000 draws. 16 We quantified the total standard deviation (i.e. the square root of the quadratic sum of the 17 standard deviations in each category) and the negentropy value, a measure of certainty calculated 18 by subtracting one from the entropy value (i.e. uncertainty). Thus, the negentropy value lies 19 between 0 and 1, and the higher the value, the higher the certainty for trophic guild assignment 20 (i.e. if a given species has a high probability of assignment to a dietary category, the negentropy 21 value will be high). 22

2
Assessment of expert agreement 3 We evaluated the agreement among eight distinct and independent trophic guild classifications 4 by comparing the classification schemes in pairs. Considering the broadness of the expert-5 assigned categories, we found remarkably low agreement. The median agreement between pairs, 6 expressed as the proportion of species with matching trophic group assignments, was 77% (Fig.  7 1). For 50% of the pairwise comparisons, at least a quarter of the species were attributed to 8 different trophic groups. In the most severe disagreement, the proportion of mismatched 9 assignments reached 39%. In addition, expert agreement differed depending on the trophic 10 group. Despite a few peaks of disagreement for herbivores and detritivores (~20%), overall, 11 there was high agreement among experts for this trophic guild, with an average agreement of 12 94% (Fig. 1b). On the contrary, omnivores showed the highest mismatch, with experts 13 disagreeing on an average of 30% of the species and peaks of disagreement higher than 60% 14 (Fig. 1b). 15 Expert agreement was variable and often homogeneously distributed around the mean for 16 all the trophic categories. Therefore, the high agreement between a few combinations of experts 17 did not necessarily exclude peaks of disagreement (Fig. 1b). The analysis of individual confusion 18 matrices between pairs of experts revealed high heterogeneity (Fig. 2) Surprisingly, there was also a high heterogeneity in groups with high disagreement (i.e. 23 multiple alternative assignments for species not assigned to the same trophic group). Species 24 classified as invertivores according to one expert were considered omnivores, piscivores, or 1 planktivores according to other classification schemes (Fig. 2). Similarly, species considered 2 omnivores by one expert were alternatively considered invertivores, herbivores and detritivores, 3 or planktivores by another expert. 4 5

Definition of trophic guilds 6
We defined trophic guilds by identifying modules (i.e. combinations of predators and prey) that 7 maximize the weighted modularity of the global network. Our analysis robustly identified eight 8 distinct modules that correspond to different trophic guilds (Fig. 3). We identified these trophic 9 guilds as: 10 (1) Sessile invertivores: species predominantly feeding on Asteroidea, Bryozoa, Cirripedia, 11 Porifera, and Tunicata; 12 (2) Herbivores, microvores, and detritivores (HMD): species primarily feeding on 13 autotrophs, detritus, inorganic material, foraminifera, and phytoplankton; 14 (3) Corallivores: species primarily feeding on Anthozoa and Hydrozoa; 15 (4) Piscivores: species primarily feeding on Actinopterygii and Cephalopoda; 16 (5) Microinvertivores: species primarily feeding on Annelida, Arachnida, Hemichordata, 17 Nematoda, Peracarida, and Nemertea; 18 (6) Macroinvertivores: species primarily feeding on Mollusca and Echinodermata; 19 (7) Crustacivores: species primarily feeding on Decapoda and Stomatopoda; 20 To evaluate the significance of the phylogenetic statistic value (δ = 9.37), we applied a 2 bootstrapping approach and quantified δ after randomly shuffling the trait values 100 times. The 3 median δ of these null models was 0.000199 (95% confidence interval [0.000196, 0.000204]), 4 indicating a strong phylogenetic signal associated with the eight trophic guilds. 5 Phylogeny and maximum body size were sufficient to correctly predict the trophic guild of 6 97% of the species in our dataset. For most families, there was strong phylogenetic conservatism, 7 which resulted in the high confidence of these predictions (Fig. 4). Within some families, 8 however, closely related species displayed distinct dietary preferences. The uncertainty around 9 these family-level predictions was higher, as showcased by high negentropy values for families 10 such as Balistidae, Diodontidae, and Labridae. 11 Given the high predictive performance of our Bayesian phylogenetic model, we used our 12 model to extrapolate the probability of all reef fish species belonging to the eight trophic guilds 13 and assigned the trophic guild with the highest probability. Using leave-one-out cross validation, 14 the final accuracy of this approach was 65%, which is comparable to other phylogenetically-15 extrapolated traits applications, such as those involving microbial traits [52]. 16 By inspecting the confusion matrix of the leave-one-out cross validation, we obtained 17 more detailed information on the accuracy of the trophic guild predictions (Fig. S1). Most 18 categories were well predicted with our extrapolation approach. In particular, the sessile 19 invertivores, HMD, and piscivores trophic guilds were predicted with high accuracy (77%, 75%, 20 and 73% correct predictions, respectively). The confusion matrix also provided information on 21 incorrectly assigned categories. For example, when piscivores were incorrectly assigned, they 22 were mostly classified as crustacivores. However, the network plot revealed that the fishes 23 classified as piscivores also fed on crustaceans (mostly decapods), so this "incorrect assignment" 1 was grounded in ecological reality and reflected uncertainty within the model. Additionally, the 2 microinvertivores trophic guild had the highest proportion of inaccurate predictions (52% correct 3 predictions). Here, species were often misclassified as crustacivores or planktivores. Given the growing number of trait-based studies that assign trophic guilds to understand and 20 monitor ecosystem functioning in our changing world, it is imperative that we establish 21 comparable and reproducible trophic classification frameworks. 22 Our findings highlight the discordance of expert opinion in the assignment of trophic 1 guilds and the necessity to develop a quantifiable, reproducible classification scheme that is 2 accessible to the wider scientific community (c. f. [56]). To address this issue, the framework 3 proposed herein represents the first implementation of a quantifiable classification scheme for 4 coral reef fishes, including measures of uncertainty around trophic guild assignments and 5 providing a new path forward to standardize the definition of traits. Despite broad similarities 6 between the trophic guilds reported in the literature and the groups identified by our analysis, our 7 classification scheme reveals a higher level of partitioning among invertebrate-feeding fishes as 8 compared to previously proposed trophic guilds. In the past, invertebrate-feeding fishes were 9 generally considered sessile invertivores, mobile invertivores, or omnivores (e.g. [27,28,36]), but 10 we identify five distinct invertebrate-feeding groups: corallivores, sessile invertivores, 11 microinvertivores, macroinvertivores, and crustacivores. Given the extreme numerical 12 dominance of invertebrates in coral reef environments [57], the collapse of all invertebrate-13 feeders into two or three trophic groups was possibly an artefact of expert oversight, and the 14 expansion of invertebrate-feeding trophic guilds to five groups stands to improve ecological 15 resolution of fishes feeding on invertebrate prey. 16 In contrast to the high resolution achieved within invertebrate-feeding groups, our 17 classification achieved limited resolution among the nominally herbivorous species, herbivores, 18 microvores, and detritivores (HMD). Across the literature, past classification schemes often 19 separate macroalgal feeders, turf algae croppers, and detritivores (e.g. [26,27]). The lack of 20 precision in our framework is rooted in the difficulty in distinguishing algae, microbes, and 21 detritus within the alimentary tract of fishes, resulting in the pooling of these ingested items 22 during the visual assessment of fish gut contents. Consequently, species classified as HMD may 23 have fundamentally different foraging strategies, dietary preferences, and evolutionary histories 1 [58], which can greatly impact their functional role on coral reefs (e.g. [59]). Thus, while our 2 identified trophic guilds promise increased resolution for fishes that consume animal prey, our 3 identified groupings may not adequately capture consumer-producer dynamics on coral reefs. 4 Emerging techniques, such as gut content metabarcoding, may provide the additional resolution 5 needed to further discriminate prey items in this group [60,61]. Alternatively, coupling diet 6 categorization with other traits, such as feeding behavior, may help to pinpoint the variety of 7 feeding modes that exist within the HMD trophic guild. 8 Our results also highlight the necessity of integrating evolutionary history (i.e. 9 phylogenetics) in trait-based ecology (c.f. [62]). Recently, taxonomy and body size have been 10 revealed as important predictors of fish diet composition and size structure [63,64], and in the 11 highest resolution analyses of coral reef fish diet, taxonomic family was a better predictor of fish 12 diet than broad trophic guilds [60]. Given the exceedingly low rate of misclassification error in 13 our predictions of fish trophic guilds, we posit that phylogeny is a critical variable that should be 14 consistently considered in the assignment of trophic guilds. Across a plethora of organismal 15 Especially in complex, hyperdiverse environments such as coral reefs, it is imperative to 6 standardize how we measure and report these traits to prevent idiosyncratic results based on 7 subjective trait assignments [19,74]. Trophic guilds are among the most commonly applied traits 8 to assess ecosystem functioning because they directly relate to energy and nutrient fluxes across 9 trophic levels. Thus, our standardized framework to quantitatively assign trophic guilds across 10 coral reef fishes represents a major step forward for coral reef functional ecology, while heeding 11 the call for openly-accessible, reproducible trait databases [22,55,75]. As trait-based ecology 12 continues to be used to examine disturbances and implement management strategies, our 13 cohesive and accessible framework to predict reef fish trophic guilds can provide key insights 14 into the trajectory of coral reef communities. Coupling our trophic guild assignment framework 15 with predictive models could spur the emergence of an early detection system to forecast shifts 16 in ecosystem functioning [30]. extrapolations to closely related species are more likely to be assigned erroneous trophic guilds. 7 Consequently, an ongoing, extensive compilation of dietary traits across coral reef fishes will 8 continuously improve our predicted trophic guild assignments. 9 Finally, our proposed framework is not limited to coral reef fishes; indeed, trophic guild 10 assignments can be quantifiable, reproducible, and transparent, with the inclusion of uncertainty 11 metrics, across many organismal groups. However, the standardization of trophic guilds is sorely   S c a to p h a g id a e P ri a c a n th id a e S ig a n id a e

Sessile invertivores Microinvertivores Macroinvertivores Crustacivores Piscivores
Negentropy a higher probability of assignment. In the outer black ring, each distinct segment represents a fish 1 family (with silhouettes included for the most speciose families). Uncertainty of overarching 2 trophic guild assignment for each fish family is visualized with negentropy values (i.e. reverse 3 entropy); thus, darker shades indicate a higher degree of certainty of trophic guild assignment. proportion per trophic guild