1 Abstract
Modeling species interactions in diverse communities traditionally requires a prohibitively large number of species-interaction coefficients, especially when considering environmental dependence of parameters. We implemented Bayesian variable selection via sparsity-inducing priors on non-linear species abundance models to determine which species-interactions should be retained and which can be represented as an average heterospecific interaction term, reducing the number of model parameters. We evaluated model performance using simulated communities, computing out-of-sample predictive accuracy and parameter recovery across different input sample sizes. We applied our method to a diverse empirical community, allowing us to disentangle the direct role of environmental gradients on species’ intrinsic growth rates from indirect effects via competitive interactions. We also identified a few neighboring species from the diverse community that had non-generic interactions with our focal species. This sparse modeling approach facilitates exploration of species-interactions in diverse communities while maintaining a manageable number of parameters.
2 Introduction
Understanding what maintains the diversity of life—where and how species abundances change through time—has long fascinated and challenged ecologists. It is widely accepted that community composition in any given time and place is driven by the interplay of species interactions, responses to environmental conditions, and feedbacks between local and regional dynamics (Chesson, 2000; HilleRisLambers et al., 2012; Vellend, 2020). However, given the myriad of biotic interactions that may, themselves, be mediated by underlying environmental conditions (Bulleri et al., 2016; Germain et al., 2018; Letten et al., 2018), feasibility and model overfitting concerns quickly arise when trying to incorporate observed levels of diversity. Arguably, the magnitude of this methodological limitation has even shaped our historical theoretical frameworks and empirical tests. For example, classic species trait trade-offs, such as the competition-colonization trade-off, apply for species pairs (Levins & Culver, 1971; Tilman, 1982). Similarly, while modern coexistence theory (Chesson, 2000) can be applied to any level of species richness (Spaak & De Laender, 2020), the vast majority of empirical studies focus on pairwise species comparisons (e.g. Kraft et al. 2015; Wainwright et al. 2019) and the effect of environmental variation on these comparisons (Bimler et al., 2018; Lanuza et al., 2018). Yet nonlinearity, higher-order interactions, and intransitivity in diverse systems may yield complex dynamics that dramatically alter population growth and coexistence dynamics (Allesina & Levine, 2011; Li et al., 2021; May & Leonard, 1975; Mayfield & Stouffer, 2017). The further development and empirical testing of these theories thus requires a statistical approach that is applicable in diverse communities and is capable of identifying and incorporating key species interactions and environmental covariates.
To date, empirical studies of population dynamics and species coexistence frequently take one of two approaches for dealing with parameterization limitations that arise in diverse communities and varied environments. In the first approach, experimental studies focus on a few focal species. For example, Wainwright et al. (2019) examined coexistence based on pairwise interaction coefficients between four annual forbs in two locations and across two water availability treatments—a lofty number of species interaction coefficients to estimate—but still a relatively small subset of the community’s full diversity (10 − 14 species in 0.09 m2; Dwyer et al. 2015). Finer-scale environmental variation can further limit the number of species that can be feasibly incorporated: in a study of grass and forb coexistence under variable rainfall regimes, Hallett et al. (2019) considered four rainfall conditions, requiring estimates of eight distinct species interaction coefficients even with only two species. Isolating species interactions across environmental conditions is a high barrier in species rich communities, even in laboratory and microcosm based studies (Letten et al., 2018).
In the second approach, often used to interpret observational data, species are grouped into broad categories. At the most extreme, a single interaction coefficient is then calculated between the focal species and all heterospecific individuals—regardless of their identity (Clark et al., 2020a; Uriarte et al., 2004). Heterospecifics may also be grouped more finely, for example, according to their taxonomic relationship (Uriarte et al., 2004) or their origin status and life form (e.g. native versus exotic and grasses versus forbs) (Martyn et al., 2020). Alternatively, functional groups can be created by grouping species according to their traits (e.g. specific leaf area, canopy height, seed number) (Kühner & Kleyer, 2008; Uriarte et al., 2004). However, this methodological approach often necessitates a priori knowledge of the system and makes an underlying assumption that species grouped together will interact similarly with each other and with the focal species. These assumptions are often not met (Mayfield & Levine, 2010), suggesting a need for a more parsimonious and robust methodology that would allow the data to inform species groupings.
Various alternative statistical approaches have been proposed to assess species interactions using observational data. For example, joint species distribution modeling has become a common approach to infer species interactions from co-occurrence patterns (Legendre & Gauthier, 2014; Ovaskainen et al., 2019, 2017b). However, in addition to species interactions, patterns of co-occurrence may result from environmental sorting (Barner et al., 2018), or dispersal patterns (Schamp et al., 2015). Further, co-occurrence patterns are scale dependent and regional analyses are not suited to assessing local-scale species interactions (König et al., 2021). Recognizing a need to directly estimate species interaction coefficients, recent work has expanded multivariate autoregressive models for use in more diverse communities (Picoche & Barraquand, 2020), including examining which linear combinations of species abundances best predict future growth rates (Ovaskainen et al., 2017a). This approach is effective for binning species based on their competitive effects, but does not account for variation in the environment. Clark et al. (2020b) recently developed a state-space hierarchical Bayesian model to assess the effect of environmental gradients on nonlinear species abundance patterns, incorporating environment responses in species’ density-independent growth rates, but not in species interactions (Clark et al., 2020b). Lastly, García-Callejas et al. (2020) developed a method to incorporate environment responses in species’ density-dependent growth rates but without the flexibility of Bayesian approaches. Independently, these different methodological developments each address one of the largest hurdles for modeling species abundances in diverse communities: (1) identifying important species interactions and (2) accounting for the mediating effect of the environment (here referred to as species-environment interactions). Addressing these two aspects simultaneously would solidify a path forward for characterizing species interactions in diverse communities and across environmental gradients.
Here, we present an approach for modeling dynamics in diverse communities and across environmental gradients. The approach balances realism and complexity without extensive experimental manipulation or a priori assumptions regarding species groupings. Our method is based on two innovations to standard population and community ecology models. First, we define heterospecific species interaction coefficients as linear combinations of the average interaction strength and species-specific deviations from this average. In parallel, we allow environmental covariates to modify species intrinsic growth rates and the strength of biotic interactions—both the average and species deviation terms. We implement this approach using a Beverton-Holt model of community dynamics (Beverton & Holt, 1957) within a single growing season, although the method can easily be adapted to other models of population abundance (e.g. Mayfield & Stouffer 2017; Ricker 1954) or incorporate additional dynamics such as seed banks or dispersal (Levine & HilleRisLambers, 2009; Thompson et al., 2020). Second, we extend Bayesian statistical methods for variable selection via sparsity-inducing priors in linear models (such as Lasso and Ridge regression; Hastie et al. 2015; Piironen et al. 2017) to our non-linear abundance model, thereby reducing the number of terms included in the final model fit, yielding a ‘sparse model.’ By coupling these two modeling approaches, we can identify heterospecific species that deviate in their interaction strength, and how environmental gradients alter species’ density-independent growth rates and biotic interactions. We explore model effectiveness using simulated data and apply the model to empirical data from a highly diverse (45 species) annual plant community.
3 Methods
3.1 Deconstructing species interaction coefficients and fecundity
Models of community dynamics incorporate species-specific interaction coefficients for each species pair, commonly denoted as αi,j, the effect of species j on species i, resulting in a large number of parameters required to model diverse communities or environmental relationships. To reduce the number of parameters required to model diverse communities and incorporate environmental variation, we start with a partitioning approach. We first define these interaction terms as where αe,i,j is the effect of species j on species i in environment e with i ≠ j. In Eqn. 1, ā0,i is the effect of an average heterospecific individual on individuals of species i, â0,i,j is the deviation from this average effect associated with species j, āe,i is the average slope of species i’s interaction coefficients with environmental covariate Xe, and âe,i,j is the deviation from this slope associated with species j. Upon first glance, using Eqn. 1 may seem counter productive as it increases the number of parameters compared to traditional interaction coefficients. However, in the next section we describe how coupling this approach with sparsity inducing priors in a Bayesian context can dramatically reduce the number of required parameters by identifying only the necessary species-specific terms (â0,i,j and âe,i,j) for accurately modeling population dynamics of species i.
While intraspecific competition (αe,i,i) could in principle be modeled according to Eqn. 1, we instead define it separately as: where a0,i,i and ae,i,i are the intercept and slope for the effect of intraspecific individuals. As both theoretical expectations (Chesson, 2000) and empirical results (Adler et al., 2018) point to the importance of intraspecific competition, we use Eqn. 2 to explicitly exclude the intraspecific terms from the sparsity inducing process defined in the next section. These terms, therefore, will always be included in the final model fit.
Interaction coefficients (Eqns. 1 & 2) can be incorporated in many different models of community dynamics. We use the Beverton-Holt model due to its legacy in studies of annual plant communities and coexistence theory (e.g. Godoy & Levine 2014; Kraft et al. 2015). We emphasize, however, that our general statistical approach can be adapted to other population models. In the Beverton-Holt model, the fecundity Fe,i of a focal species i in environment e is modeled as:
Fecundity depends on a species’ intrinsic growth rate (i.e. density-independent seed production; λe,i) and the competitive effects of all S species in the community (αe,i,j terms as defined by Eqns. 1 &2) scaled by each species’ abundance (Ne, j) (Levine & HilleRisLambers, 2009; Pérez-Ramos et al., 2019; Shoemaker & Melbourne, 2016). To incorporate environmental variation in intrinsic growth rates, we model λe,i as: where b0,i is the intercept of the intrinsic growth rate and be,i its slope with environmental covariate Xe. We use Eqn. 3 to model observed fecundity within a single growing season of both simulated and empirical data.
3.2 Incorporating sparsity-inducing priors
By deconstructing interaction coefficients into a combination of species-specific and generic terms, we can determine which, if any, species-specific terms are necessary for the final model. Allowing only a subset of parameters to take non-zero values is referred to as ‘sparse modeling,’ and various techniques exist to induce sparsity in linear models (Hastie et al., 2015; O’Hara et al., 2009).
To extend a sparse modeling approach to our non-linear model of fecundity (Eqn. 3), we employ sparsity-inducing priors which act to shrink all but a subset of parameters to 0, thus producing a sparsely parameterized model. Specifically, we model â0,i,j and â0,i,j, the species-specific intercepts and slopes of the inter-specific interaction coefficients (Eqn. 1), with regularized horseshoe priors which more accurately estimate large parameter values compared to other sparsity-inducing priors (Bhadra et al., 2019; Carvalho et al., 2009; Piironen et al., 2017; Van Erp et al., 2019). Parameters âi,j and âe,i,j, are given priors Normal and Normal respectively. (Note that since we fit the model for a single focal species, we drop the i subscript from the priors for simplicity.) In these priors, τ defines the global tendency towards sparsity through its effect on the priors’ standard deviations. In other words, with smaller values of τ, the priors for all âi,j and âe,i,j parameters become more tightly centered on 0. Conversely, the terms allow specific parameters to escape this global trend towards sparsity. As an individual term becomes large, its associated prior becomes wider, and that species-specific term is more likely to be included in the final model. In the regularized horseshoe prior, these terms are defined as:
Defining as the combination of a half-Cauchy and inverse-gamma distribution causes large coefficients to be shrunk towards 0 by a Student’s t distribution with ν degrees of freedom and a scale of s2 (Piironen et al., 2017; Van Erp et al., 2019). Following the recommendations of Piironen and Vehtari (2017), we set ν to 4 and s2 to 2. Rather than setting the global shrinkage parameter τ to a fixed value, we give it a half-Cauchy prior with scale parameter equal to 1 (τ ~ half-Cauchy(0, 1)) and allow the data to inform the posterior distribution of τ (Piironen et al., 2017; Van Erp et al., 2019).
We employ a hybrid approach in which we first fit the full model with regularized horseshoe priors to induce sparsity in the species-specific terms; we subsequently fit a final model using traditional, non-sparse methods. From the preliminary model fit, we identify which species-specific terms have sufficient evidence to be included in the final model fit. We calculate credible intervals (CIs) for each species-specific term in the preliminary model and include in the final model only those terms whose intervals do not overlap 0. By using this approach, we can directly adjust how conservative we wish to be in including model parameters, balancing model prediction, the proportion of variance explained, and simplicity depending on modeling goals (Tredennick et al., 2021) (i.e. using a 50% CI will lead to models including more parameters than if we use a 95% CI). Then, for the final model fit, the included species-specific terms (â0,i,j and âe,i,j) are given standard normal priors (i.e. Normal(0, 1)). In both preliminary and final model fits, the terms defining λe,i (b0,i and be,i; Eqn 4) are also given standard normal priors. The intercept and slope terms defining intraspecific competition (a0,i,i and ae,i,i; Eqn.2) and the generic intercept and slope defining interspecific competition (ā0,i and āe,i; Eqn. 1) are both given weakly informative priors in each model fit, matching the expected scale of these interaction coefficients: a0,i,i ~ Normal(−6, 3), ā0,i ~ Normal(−6, 3), ae,i,i ~ Normal(0, 0.5), and āe,i ~ Normal(0, 0.5). All models were fit using the stan language with the rstan package (version 2.18.2; Stan Development Team 2018 in R (version 3.5.3; R Core Team 2019). All code for the analyses and simulations presented here can be found at https://github.com/tpweiss06/SparseInteractions.
3.3 Simulation tests of model performance
To test our ability to predict changes in population size and recover true parameter values, we first paired our Bayesian sparse modeling approach with simulated Beverton-Holt data using Eqns. 1–4. For the simulations, we generated communities of 15 species in different plots, where each plot was a unique run of the simulation for a given community with a given environmental condition Xe. We aimed to generate population growth rates comparable to those found in a community adapted to its environment. Each species was assigned an intrinsic growth rate λe,i following Eqn 4 (Table 1), pairwise species competitive interactions αe,i,j were composed of the generic competition term ā0,i with small amounts of variation and a generic environmental response āe,i. Seven randomly selected species also had a non-generic competition term through species-specific deviations from ā0,i (â0,i,j). Seven separately selected species had a non-generic environmental response through species-specific deviations âe,i,j. Intraspecific competition ā0,i,i was set as a fixed value higher than interspecific competition to minimize extinction in the simulations (Table 1). Each plot simulation was run deterministically for 20 time steps with each time step Nt+1 = FtNt using Ft from equation 3. This resulted in some subset of the 15 species remaining with populations greater than zero in each plot. Then each population was perturbed by drawing from a normal distribution with mean and standard deviation equal to the previous population size, truncated at 0 to prevent negative population sizes. This perturbed state and the following time step generated our simulated ‘full-community’ data. In addition to 500 full-community plots, we simulated 500 ‘no-competition’ treatments with a single phytometer individual of the focal species per plot, running the Beverton-Holt function for one time step. This simulated treatment matches methods commonly used in experimental studies to parse intrinsic growth rates from competition parameters (Hallett et al., 2019; Wainwright et al., 2019). Simulation details are included in Supplement 1.
We used these simulations to measure our sparse modeling approach’s ability to predict population growth in diverse communities and recover underlying parameters. We selected one focal species and tested our model’s performance using varying numbers of full-community and no-competition plots. We tested out-of-sample predictions on 200 full-community plots not used to fit the model. We then calculated the posterior distribution of the root-mean-square error (RMSE) of model predictions compared to true values for each model fit. This allowed us to quantify the gain in predictive accuracy resulting from including more data.
3.4 Empirical application
We additionally applied our model to species interactions and their environmental dependencies in the annual plant understory of the York gum (Eucalyptus loxophleba Benth) - jam (Acacia acuminata) woodlands of southwestern Western Australia. This community is highly diverse and heterogeneous, with local composition of annual forbs and grasses influenced by gradients in soil nutrients and shade from York gum and jam trees (Dwyer et al., 2015; Lai et al., 2015). We focused on two York gum-jam woodland remnants: West Perenjori Nature Reserve (29°47’S, 116°20’E) and Bendering Nature Reserve (32°23’S, 118°22’E). Both sites experience a Mediterranean climate with mild winters and long, dry summers (Suppiah et al., 2007) and have high overlap in annual species composition, sharing several dominant species. Data used for this study were originally collected as part of a larger experiment described in full in Wainwright et al. (2019). We focus on two species used as focal species in the original study and common to both reserves: Waitzia acuminata, an abundant native annual forb, and Arctotheca calendula, a prevalent exotic annual forb.
We used data from 11 experimental blocks in Bendering Nature Reserve and 18 blocks in West Perenjori Nature Reserve. Each block was ≈ 15 × 15 m, a size selected to account for previously identified soil-nutrient turn-over rates (Dwyer et al., 2015). Each block was split into 50 × 50 cm plots and each plot was further subdivided into four 25 × 25 cm quadrats. One individual of either focal species near the center of each quadrat was assigned as the focal individual for that quadrat. Which focal species were in a given quadrat depended on the natural distribution of individuals. This experiment employed five thinning treatments at the plot level to manipulate local community compositions (individual focal individuals with no competitors, native dominated competitors, exotic dominated competitors, monocultures with only conspecific competitors, and unmanipulated plots) (Wainwright et al., 2019). This ensured a range of observed densities of both species and the background communities to inform model estimates of competition coefficients and intrinsic growth rates. Across both reserves we used data from 129 focal individuals in 69 plots interacting with 45 neighbouring species for W. acuminata and 95 focal individuals in 54 plots interacting with 40 species for A. calendula.
We applied our sparse modeling framework to quantify the effect of the competitive environment on fecundity in W. acuminata and A. calendula under different environmental conditions. Fecundity Fe,i was measured as the number of flowers produced by each focal individual. The competitive environment was characterized as the number of individuals of each interacting species in the quadrat after the experimental treatment had been applied (Ne, j). We considered two aspects of the physical environment Xe: percent overhead tree canopy cover, measured at the plot scale, and soil Colwell P (mg/kg), measured at the block scale. Both environmental covariates were standardized for inclusion in the model. We ran a separate model for each focal species and environmental covariate, for a total of four model fits. To account for regional differences between the Bendering and Perenjori reserves, we incorporated a fixed effect for the two different reserves into our sparse modeling approach by allowing λi, λe,i, , and to differ between reserves. Using this approach, we quantified λe,i and αe,i,j for both species across both environmental gradients in the York gum-jam woodland communities.
4 Results
4.1 Simulations
Our model accurately predicted growth rates for simulated communities even with relatively low sample sizes (Fig. 1) and across different model formalizations (Box 1). With only 10 full-community and 10 no-competition plots, the model predicted growth rates with a root-mean-square error (RMSE) of 0.495 (credible interval, CI: 0.353-0.665). While increasing sample size further increased model accuracy (RMSE of 0.315 (CI: 0.211-0.520) for 50 plots and 0.227 (CI 0.195-0.288) for 200 plots), these results indicate the model can accurately predict species’ realized growth rates using limited data. Furthermore, species’ growth rates can be accurately predicted using observed competitive communities paired with no-competition plots, rather than necessitating common manipulative experimental designs where each possible species combination is paired across a gradient of densities (Hallett et al., 2019; Kraft et al., 2015).
Box 1: Adapting the sparse modeling method to different ecological questions
This sparse modeling method is generalizable to a variety of underlying ecological models. The method’s flexibility allows researchers to pick and choose which parameters to include and how to specify them as best fits with their study system and questions of interest.
For example, the relationship between species’ growth rates and the environment can be modeled in multiple ways. A monotonic relationship would be appropriate for a study concentrated within a small spatial scale, while a humped-shape relationship would match expectations for a study over a broad environmental gradient. To demonstrate how our method can be modified for different underlying ecological models, we simulated environmental responses in growth rate two ways: with a monotonic relationship between species and the environmental conditions λe,i and with a curved environmental optimum with a defined niche breadth for each species . This resulted in two model formulations: in which b0,i is the mean intrinsic growth rate and be,i is the slope of the environmental response, and in which bmax,i is the maximum intrinsic growth rate, zi is the environmental optimum, and σi is the environmental niche breadth (following the parameterization in Thompson et al. (2020)). We tested these models using samples of 50 full community plots and 50 no-competition plots.
All growth rate parameters fell within the 95% credible intervals for parameters in both models. In the monotonic λe,i model, both the intercept b0,i and the slope be,i deviated from the true values by 3%. In the optimum model, the maximum bmax,i deviated from the true value by 6%, the niche breadth σi deviated by 1%, and the location of the environmental optimum zi deviated by 13%, which is an absolute difference of 0.04 (Fig. 2a and b).
As a further example, the species interaction components of the model can be adjusted depending on the main research questions of interest. For questions focused on species interactions, modeling interaction coefficients independent of environmental conditions would optimize the number of non-generic species pairs identified from a given sample size of data. We compared a simple simulation with competitive interactions independent of environmental conditions αi,j to a more complex model with competitive interactions dependent on the environmental conditions αe,i,j, and modeled each accordingly. We tested these models using different sample sizes of full community data points and focal individuals. Both models predicted true out-of-sample growth rates with average RMSE of ≈ 0.4 with only 10 full-community and no-competition plots. With 80 or more full-community and no-competition plots the simpler model had a RMSE of ≈ 0.2 and the model of the more complex simulation had a RMSE of ≈ 0.25 (Figure 2c). The simpler model without species-environment interactions highlighted five non-generic â0,i,j species pair interactions with just 50 full-community and no-competition plots. The more complex model highlighted only two non-generic terms with 50 full-community and no-competition plots. At higher sample sizes the number of non-generic terms was constrained by the global shrinkage parameter τ (Figure 2d)
We present these options as launching-off points for researchers to adapt the sparse modeling approach to their study systems and questions. Even more extensive modifications are possible; for example, replacing the Beverton-Holt community framework with a different underlying ecological model.
Our model was also able to accurately predict individual parameter estimates for simulated communities (Fig. 1). In particular, estimates of the intercept and slope parameters for intrinsic growth rate (b0, i and be, i respectively) dramatically increased in accuracy from 10 to 200 data points (Fig. 1a,e). The accuracy of the estimates for the slope and intercept of intraspecific competition (a0,i,i and ae,i,i respectively) also increased with more data, but less dramatically than the terms defining intrinsic growth rate. Parameters associated with interspecific competition (ā0,i representing the intercept and āe,i for the slope) also increased in accuracy with increasing data, although there was more variance in this relationship. This is likely because the model correctly identified a larger number of species-specific terms with more data, which decreased the total number of species contributing to the estimation of ā0,i and āe,i. When fit to only 10 simulated plots, the model did not identify any species-specific terms (â0,i,j or âe,i,j) and only used ā0,i and āe,i. The model identified two species-specific terms within a single species when fit to 50 plots and eight species-specific terms across six species when fit to 200 plots. In general, the estimates of species-specific terms were highly accurate; only two out of the eight estimated species-specific interaction terms did not include the true value in their 95% credible intervals.
4.2 Empirical application
Our method identified environmental dependencies in intrinsic growth rates (Fig. 3a,b; 4, a,b), relative strengths of intraspecific competition and average interspecific competition, along with competition-environment interactions (Fig. 3c,d; 4, c,d), all of which differed between our two focal species. Additionally, our model highlighted three species with deviations from the average interspecific effects on native W. acuminata, but no such species when fit to data on exotic A. calendula.
W. acuminata and A. calendula’s intrinsic growth rates differed in their relationship with the environmental gradients and reserves. The intrinsic growth rate of W. acuminata across both environmental gradients varied between the Bendering and Perenjori reserves (Fig. 3a,b). In contrast, λe,i for A. calendula was quite similar between the two reserves as it varied with both phosphorous and canopy cover (Fig. 4a,b). This could reflect local adaptation in regional populations of the native W. acuminata but not in the newly introduced A. calendula. Importantly, the intrinsic growth rate of W. acuminata declined with high phosphorous (marginally in Bendering, but substantially in Perenjori) while A. calendula’s intrinsic growth rate increased with phosphorous, potentially explaining the high prevalence of invasive species in areas with increased phosphorous (Dwyer et al., 2015).
Relative effects of competition between conspecifics versus heterospecifics also differed between the two focal species. For W. acuminata, the relationship between intraspecific competition and average interspecific competition varied with the underlying environmental gradients. At low levels of phosphorous and high levels of canopy cover, intraspecific competition in W. acuminata was greater than average interspecific competition (Fig. 3c,d). However, at high levels of phosphorous and low levels of canopy cover, intra- and interspecific competition converged to similar values. On the other hand, intraspecific competition for A. calendula was similar to or lower than generic interspecific competition across both environmental gradients (Fig. 4c,d). This likely contributes to the invasive status of A. calendula in this ecosystem, whereas W. acuminata populations self-regulate under certain environmental conditions—a necessary component of stable coexistence.
Our model highlighted multiple species with competitive effects on W. acuminata that differed from the generic interaction term. Across the observed gradient in phosphorous, Hyalosperma glutinosum had a higher than average effect on W. acuminata in the Perenjori reserve while Schoenus nanus had a lower than average effect in the Bendering reserve (Fig. 3c). Across the observed gradient in canopy cover, Hypochaeris glabra had a much higher than average effect on W. acuminata in Bendering (Fig. 3d). In contrast, all heterospecific interactive effects on A. calendula remained grouped in the generic competition term. The lack of species with unique effects on A. calendula (Fig. 4c,d) could be due to its exotic status (Lai et al., 2015). With no shared evolutionary history with any other community members, A. calendula could be experiencing a form of competitive release, wherein the identity of competitor species matters less than simply the presence of additional individuals.
5 Discussion
Given the inherent complexity of ecological communities, ecologists are often forced to rely on simplifying assumptions in order to perform tractable analyses, such as limiting the number of species considered or ignoring environmental variation. The sparse modeling approach presented here provides an alternative method to analyze community data without requiring extensive additional data or sacrificing complexity. This approach enabled us to accurately predict population growth rates with limited data and identify how species’ demographic rates and competitive interactions depend on the environment. Our results identify environment by species interactions that deviate from the species-averaged community effects without making a priori assumptions about species groupings (Figure 3c, d). This information and output from the sparse modeling approach generates concrete, testable hypotheses about species interactions and environmental conditions. We see broad potential for this method’s implementation in community ecology, from theory development to management applications.
The sparse modeling approach’s flexibility in modeling populations and communities allows easy adjustments for the best match between underlying model structure and the given study system and research questions. As we show in Box 1 and Fig. 2, these models can successfully be applied to different forms of species-environment interactions and be modified to be more or less complex based on underlying ecological questions and data availability. For example, the functional form of the relationship between intrinsic growth rate and the environment likely depends on a study’s spatial scale. For localized studies, a simple monotonic relationship (Figure 2a) might be appropriate to capture species’ expected responses across a small range of environmental variation. However, studies over larger spatial scales might require a functional form with optimal intrinsic growth reached at an intermediate environmental value and declining away from that value (Figure 2b), mimicking expected patterns of adaptation across species’ ranges (Angert et al., 2020). Additionally, while we used a Beverton-Holt framework in our examples (Beverton & Holt, 1957), the sparse approach is agnostic to the underlying ecological model. Thus, it could be used with different functional forms of competition (García-Callejas et al., 2020) with models incorporating both competitive and facilitative interactions (Stachowicz, 2001) or different underlying demography such as seed banks.
With this flexibility, sparse modeling has the potential to be a powerful tool to accelerate the development of community ecology theory and practice. It can provide important insights into the covariation of environmental conditions, species’ demographic rates, and competitive effects—critical aspects of modern coexistence theory (Chesson, 2000). This includes quantifying the relative strengths of intra-versus inter-specific competition, which is a key condition for stable coexistence (Adler et al., 2018; Chesson, 2000). Furthermore, the approach elucidates the effect of environmental conditions on species’ density-independent growth rates versus competitive interactions, potentially allowing for quantification of variation-dependent coexistence mechanisms, such as the storage effect, in diverse communities (Chesson, 2000). Similarly, output from our sparse modeling approach across environmental gradients can be used to quantify the relative importance of environmental (abiotic) filtering, biotic interactions, and the joint effect on species occurrence (Cadotte & Tucker, 2017). Applying such an approach is especially exciting for linking community theory to global change predictions, depending on the underlying environmental gradient of interest.
In addition to expanding theory, we see exciting potential for sparse modeling to address questions in applied contexts and generate new hypotheses from existing datasets that inform management strategies. This includes quantifying how environmental modifications can be used in conjunction with community manipulations to control invasive species or promote native species. For example, our results from the York gum-jam woodlands of Western Australia suggest the native W. acuminata experiences declining fitness with increasing levels of phosphorous, particularly in the Perenjori reserve (Fig. 3a). At the same time, the model identified H. glutinosum as having a stronger than average competitive impact on W. acuminata in Perenjori (Fig. 3c). Taken together, these results suggest that reserve managers could help maintain or expand populations of W. acuminata by mitigating phosphorous run-off while simultaneously removing H. glutinosum in key locals. In contrast, our results for the invasive A. calendula suggest that neighbor species identity is unimportant (Fig. 4c,d) and management strategies focusing solely on environmental factors would be most impactful.
Beyond the implementation of the sparse modeling approach presented here, the underlying model structure can be further adjusted to align the model with focal management questions. For example, if a management goal only requires knowledge of species interactions within a community, the model could be simplified to remove environmental covariates (Box 1). Alternatively, the global shrinkage parameter τ could be set at a fixed value to induce more or less sparsity in the final model results. Such a change could allow users to manually explore the trade-off between inclusion of species-specific terms and precision of parameter estimates, finding the balance that best suits their particular goals. For example, fixing τ to a higher value would yield more estimates of species-specific parameters, which could help to inform future research priorities, but those estimates would likely be less precise, limiting their utility in predicting community dynamics. Small adjustments such as these empower ecologists and managers to match the tool to their questions and aims.
In its current structure, the sparse modeling framework is most useful when applied to high-diversity communities with limited available data. As we observed when analyzing simulated data, the number of non-generic terms does not necessarily increase with sample size, and is limited by τ at higher sample sizes (Figure 2d). As described above, there may be cases where manual adjustments to τ would be beneficial depending on the available data and questions of interest. However, a traditional, non-sparse model in which every interaction term is included may still be preferable in situations with abundant data, lowerdiversity communities, or when answering questions requiring individual estimates of all potential species interactions. In contrast, the sparse approach is particularly helpful with limited data and in cases where traditional models often struggle to converge or provide overly broad parameter estimates.
The model we present here is currently analyzed for a single growing season and for use with traditional population dynamic models (e.g. Beverton-Holt or Lotka-Voltera models). Given the importance of temporal stochasticity to community dynamics (Shoemaker et al., 2020) and the need to predict community responses to changing anthropogenic pressures (Ma et al., 2017), sparse modeling with time series could provide invaluable insight into the importance of species-specific interactions through time as well as space. Further, extending the approach to a wider range of input data beyond individual counts (e.g. percent cover or biomass) would allow for future uses across observational datasets and especially for perennial-dominated systems.
Sparse modeling approaches have proved immensely valuable in fields as diverse as genomics (Gianola & Fernando, 2020) to economics (Fan et al., 2011). By dramatically reducing the parameter load required to model diverse communities across environmental gradients, we show these sparse modeling approaches can provide both theoretical and applied insights in community ecology as well. We demonstrate the flexibility of this approach across different ecological models and underlying biological assumptions, and are excited to see it expanded and applied to a variety of ecological questions and applications. Although the implementation of the sparse method requires an initial conceptual investment, the output results are easily interpretable—a quality that is particularly important for linking models to practice. The sparse modeling approach eliminates the need for a priori assumptions regarding species’ groupings or the exclusion of all but a handful of focal species, providing a critical method and step forward in expanding ecological theory and linking models to observational and experimental datasets of diverse communities.
6 Acknowledgements
This paper is a joint effort of the working group sToration kindly supported by sDiv, the Synthesis Centre of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, funded by the German Research Foundation (FZT 118, 02548816). CWL, CMW, LGS were supported by Modelscapes, NSF award #EPS-2019528. OG was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) and by the European Social Fund through the Ramón y Cajal Program (RYC-2017-23666); GB was supported by the Swedish Research Council (Vetenskapsra°det), grant 2017-05245. MMM was supported by the Australian Research Council (DP140100574).
Footnotes
Data accessibility statement: Upon acceptance, all data will be archived on Dryad and the data DOI will be included at the end of the article. Model code is available on GitHub, with the URL included in the manuscript. Upon acceptance, model code with be archived on Zenodo and the URL will be updated with the Zenodo link.