Copulas reveal complex and informative dependencies propagating throughout ecology

Abstract All branches of ecology study relationships among environmental and biological variables. However, ubiquitously used approaches to studying such relationships, based on correlation and regression, provide only a small slice of the complex information contained in the relationships. Other statistical approaches exist that provide a complete description of relationships between variables, based on the concept of the copula ; they are applied in finance, neuroscience and other fields, but not in ecology. We here explore the concepts that underpin copulas and examine the potential for those concepts to improve our understanding of ecology. We find that informative copula structure in dependencies between variables is exceedingly common across all the environmental, species-trait, phenological, population, community, and ecosystem functioning datasets we considered. Many datasets exhibited asymmetric tail dependence, whereby two variables are more strongly related in their left compared to right tails, or vice versa. We describe mechanisms by which observed copula structure and tail dependence can arise in ecological data, including a Moran-like effect whereby dependence structures between environmental variables are inherited by ecological variables; and asymmetric or nonlinear influences of environments on ecological variables, such as under Liebig’s law of the minimum. Copula structure and tail dependence can also reveal causal relationships between variables. We describe consequences of copula structure, including impacts on extinction risk and the stability through time of ecosystem services. By documenting the importance of a complete description of dependence, advancing conceptual frameworks, and demonstrating a powerful approach, we aim to elevate copula reasoning to widespread use in ecology.


46
All branches of ecology study relationships among environmental and biological variables. However, commonly used correlation and regression approaches to studying such relationships are limited, and provide only a partial description i (i = 1, 2) at time t. Assume the dynamics E i (t + 1) = bE i (t) + g( i (t)) + aδ i (t), where the δ i (t) are standard-normally 294 distributed and independent across time and locations, a = 0.2, b = 0.1, −0.1, 0.5, −0.5 in different simulations, and 295 the ( 1 (t), 2 (t)) were drawn, independently through time, from a bivariate normal distribution with var( i ) = 1 and 296 cov( 1 , 2 ) = 0.8. The function g describes how the environment i (t) influences E i (t). We used g equal to g 1 or g 2 in 297 different simulations: g 1 ( ) equals if < 0 and 0 otherwise; g 2 ( ) equals if > 0 and 0 otherwise. Thus g 1 represents 298 environmental effects that occur only for negative values and g 2 only for positive values of . Negative values of b correspond 299 to overcompensating dynamics and positive values to undercompensating dynamics; larger |b| means slower return to 300 equilibrium after a disturbance. Because ( 1 , 2 ) and (δ 1 , δ 2 ) have N copula structure, the Moran mechanism analyzed 301 below (Fig. 2, B) does not operate here. 302 For each b and g i , we simulated the model for 25000 time steps and retained the E i (t) for the final 2500 time steps. We 303 applied our model selection algorithms and non-parametric statistics (methods for the observational hypothesis) to these 304 outputs to discover if the model could produce non-N copula structure and asymmetric tail dependence in the E i (t).

305
Hypothesis B is an extension of the well known Moran effect, and was summarized conceptually in the Introduction. Let 306 E i (t) again be an ecological variable, i = 1, 2. We initially assume the AR(1) dynamics E i (t + 1) = βE i (t) + 1 − β 2 i (t), 307 with β = 0.5. The environmental noises i (t) are standard-normal random variables that are independent for distinct 308 times, t, but exhibited different kinds of dependence across locations in different simulations (see below). The variable E is 309 very general, and could represent deviations of a population density from a carrying capacity, deviations of total plant 310 community biomass from an average value, flux of a biogeochemical variable such as methane, or other quantities.

311
To explore the importance of nonlinear dynamics for hypothesis B, we also considered nonlinear population models 312 based on P i (t + 1) = P i (t) exp (r (1 − P i (t)/K)) + σ i (t)), using r = 0.5, K = 100, and σ = 0.1 or σ = 1. When σ = 0.1, 313 population dynamics stay close to the carrying capacity, K, and the nonlinearities of the model have limited influence on 314 dynamics. When σ = 1, model dynamics are strongly nonlinear. We refer to these as the weak-noise and strong-noise cases, 315 though the importance of the noise here is that, when it is strong, it brings the nonlinearities of the model into play. 316 For each of the model setups above, for each of several copula families, for each τ = 0.1, 0.2, . . . , 0.9, and for each of 50 317 replicate simulations, we generated 5000 noise pairs ( 1 (t), 2 (t)) from the bivariate random variable with the given copula 318 family and the given τ . We then used this noise to drive the model, and retained both the noise and population values 319 for the final 500 time steps. For each simulation, the following statistics were computed for both noises and populations: 320 Pearson, Spearman and Kendall correlations, cor l , cor u , P l , P u , D 2 l , D 2 u , cor l − cor u , P l − P u , and D 2 u − D 2 l . Values were typical ecological dynamics, as long as noise is small enough that dynamics stay relatively close to the model equilibrium.
Results that hold for "weak noise" in this sense are common in ecology. We repeated the analysis with r = 1.3, with similar 357 results. The deterministic one-habitat-patch Ricker model exhibits a monotonic approach to a stable equilibrium at K 358 when r < 1 (undercompensating dynamics, e.g., the value r = 0.5 used initially), but exhibits an oscillatory approach to K 359 when r > 1 (overcompensating dynamics, e.g., r = 1.3).

360
For strong noise and using our nonlinear model, copula statistics, generally speaking, were approximately similar 361 between noise and model outputs; however, similarity was reduced relative to previous results, and for a few simulations, 362 tail dependence asymmetry statistics even had opposite signs for noise and model outputs . For instance, 363 using a C copula with a large Kendall τ , cor l − cor u was slightly positive for noise, but slightly negative for population 364 model outputs (Fig. S43); and likewise for SG and SJ copulas (Figs S49,S53). Overall, similarities between noise and 365 model-output copula structure were more noticable than discrepancies, in spite of nonlinearities. We repeated the analysis 366 with r = 1.3. Discrepancies between noise and population copula structure in this case were often glaring. Apparently noise 367 of standard deviation 1 interacted strongly with model nonlinearities when the model was in its overcompensatory regime.

368
For our model, hypothesis D ( Fig. 2) was correct that asymmetric tail dependence in evolutionary changes can produce 369 similarly asymmetric tail dependence in character distributions across phylogeny tips (Appendix S9, Fig. S54 for details).

370
This topic is revisited in the Discussion.

371
Background, concepts and methods for testing the consequential hypotheses 372 The hypothesis was presented that the distribution (through time) of a spatially averaged quantity should be influenced 373 by dependencies between the local quantities, including their copula structure and tail dependence (Fig. 2, X). To refine 374 this hypothesis, suppose an ecological variable E i (t) is measured at locations i = 1, . . . , N and times t = 1, . . . , T , and the 375 spatial mean i E i (t)/N is of interest. The E i (t) could be, for instance, local abundances of a pest or exploited species, or 376 local fluxes of a greenhouse gas. If E i and E j are right-tail dependent for most location pairs i and j, then exceptionally 377 large values tend to occur at the same time in most locations. We hypothesize that this will tend to increase the skewness 378 of the distribution of the spatial mean. Similarly, left-tail dependence between local variables should tend to decrease 379 skewness. Strong positive skewness of the spatial mean corresponds to "spikiness" of the spatial-mean time series. The 380 spatial-mean time series and its skewness may be quantities of principal importance for pest or resource abundance, for 381 which extreme values (spikes) in the spatial mean have large effects.
randomizing the empirical data in a special way to have the copula structure of a multivariate-normal distribution, but to retain exactly the same distributions of values for each sampling location as the original data (Appendix S10). Surrogates 387 also had very similar Spearman correlations between pairs of sampling locations as the data. Our comparisons therefore 388 tested the null hypothesis that the skewness of the spatial mean took values on the empirical data no different from 389 what one would expect if the copula structure of the data were multivariate N, but the data were otherwise statistically 390 unchanged. Significant differences indicate that non-N copula structure in the data contributed to the skewness of the 391 spatial mean, i.e., to its "spikiness".

392
For green spruce aphid abundance data, C. furca abundance data, and methane-flux data, because these datasets 393 exhibited lower-tail dependence (  (Table 3), we did the analogous one-tailed test in the right tail. The test examines whether upper-tail dependence caused 398 the spatial average to have significantly higher skewness than expected without tail dependence. 399 We also examined hypothesis Y (Fig. 2), i.e., that spatial tail dependence of an environmental variable can influence 400 the extinction risk of a metapopulation. We hypothesized that environmental noises exhibiting greater left-tail dependence 401 across different habitat patches would cause higher metapopulation extinction risks because then very bad years for the 402 component populations would occur simultaneously in many patches, reducing rescue effects. Here we assume, for simplicity, 403 that low values of the environmental variable are "bad" for the populations and high values are "good". We tested the 404 reasonableness of Y using a metapopulation extension of the Lewontin-Cohen model, P (t + 1) = Dλ(t) P (t), where the 405 i th component of the length-N vector P (t) represents population density in the i th habitat patch at time t. The N × N quantities. These first two parts of A could be tested for numerous datasets by examining copulas for environmental and ecological variables. See below discussion about the green spruce aphid, and Appendix S14, which illustrate how to do this.

445
Causal mechanism C, which we also did not test, could be tested by analyzing copulas of abundances of competing species, 446 sampled across space or time.

447
There are also reasons to expect our Moran mechanism (Fig. 2 invest the resources not invested in maturing quickly into either fecundity or juvenile survival. These ideas suggest that 478 copulas will be useful for studying multi-dimensional trade-offs. But applications will require careful attention to the 479 possible consequences of biased sampling: if the degree of completeness of a dataset is associated with one or more of the 480 characters, then statistical artefacts can bias conclusions (Appendix S15, Fig. S57).

481
Other mechanisms probably also operate, and should be considered. For instance, measurement error may modify copula 482 structure. Our models were intentionally simple in other respects, too, and did not include delayed density dependence, 483 dispersal, population stage structure, trophic interactions, etc. We hope by enumerating a few potential mechanisms of 484 copula structure, we inspire additional research on the potentially numerous mechanisms that may operate in diverse 485 datasets.

486
Our hypotheses cover propagation of copula structure from causes, to data, to consequences. We here take a closer look 487 at the green spruce aphid, which simultaneously illustrates all these stages. Green spruce aphid abundance, as measured in 6). For each of our 10 sampling locations, we therefore examined (Appendix S14, Fig. S58, Table S49) the copula of winter 490 temperature and aphid abundance time series for the location, finding left-tail dependence: apparently winter temperature 491 has an asymmetric influence on abundance in that cold winters produce low abundances but warm winters often do not 492 yield higher abundances than moderate winters. Although we did not systematically test the first two parts of hypothesis 493 A due to much existing evidence, this result illustrates how those parts can be tested using copula techniques. The third 494 part of hypothesis A, which our modelling results supported, therefore suggests spatial dependence between green spruce 495 aphid counts in different locations should show left-tail dependence -exactly as observed (Table 3). And the consequences 496 of such tail dependence for the skewness of spatially averaged aphid counts was described previously (Fig. 6A). Thus the 497 asymmetric influence of winter temperature ultimately causes spatially averaged aphid abundances to have lower skew 498 than they would otherwise, because negative impacts of cold winters are spatially synchronous.

499
Our results showed that the skewness of the spatial average of local time series is influenced by their tail dependence.

500
But the same logic should also apply to any collection of time series, whether associated with locations in space or not.

501
Another potential application is time series of all species from a single community, e.g., all plants in a quadrat surveyed 502 repeatedly over time. A huge literature has focused on synchrony versus compensatory dynamics between such time series, 503 and the influence of interspecific relationships on the variability of community properties such as total biomass. Typically, 504 variability of community biomass is measured with the coefficient of variation, but skewness may also be of interest 505 because it can help characterize "spikiness" through time. Future work on copula structure of interspecific relationships in 506 communities and its implications for community variability is likely to be valuable.

507
17 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Although we demonstrated that tail dependence between environmental variables can influence extinction risk, substantial 508 work remains to determine the importance of this effect. First, we used a non-density-dependent model. Do similar 509 results pertain when density dependence acts? Second, we considered metapopulation extinction risk, but the huge   certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.      Labels are defined in the text to refer to parts of the figure. "Causal hypotheses" (i.e., hypotheses about causes of copula structure) are labeled with letters A-D. "Consequential hypotheses" (i.e., hypotheses about consequences of copula structure) are labeled with X-Z. The "observational hypothesis" (i.e., our hypothesis about copula structure of data) is labeled O.
23 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted May 27, 2019. . https://doi.org/10.1101/650838 doi: bioRxiv preprint 28 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted May 27, 2019. . https://doi.org/10.1101/650838 doi: bioRxiv preprint 29 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.