Copulas and their potential for ecology

All branches of ecology study relationships among and between environmental and biological variables. However, standard approaches to studying such relationships, based on correlation and regression, provide only a small slice of the complex information contained in the relationships. Other statistical approaches exist that provide a complete description of relationships between variables, based on the concept of the copula; they are applied in finance, neuroscience and other fields, but rarely in ecology. We here explore the concepts that underpin copulas and examine the potential for those concepts to improve our understanding of ecology. We find that informative copula structure in dependencies between variables is common across all the environmental, species-trait, phenological, population, community, and ecosystem functioning datasets we considered. Many datasets exhibited asymmetric tail associations, whereby two variables were more strongly related in their left compared to right tails, or vice versa. We describe mechanisms by which observed copula structure and asymmetric tail associations can arise in ecological data, including a Moran-like effect whereby dependence structures between environmental variables are inherited by ecological variables; and asymmetric or nonlinear influences of environments on ecological variables, such as under Liebig’s law of the minimum. We also describe consequences of copula structure for ecological phenomena, including impacts on extinction risk, Taylor’s law, and the stability through time of ecosystem services. By documenting the importance of a complete description of dependence between variables, advancing conceptual frameworks, and demonstrating a powerful approach, we aim to encourage widespread use of copulas in ecology, which we believe can benefit the discipline.


Introduction
is the portion of the Spearman correlation of u i and v i that is attributable to the points 439 between the bounds. Here sample means and variances are computed using all n data points, cor l = cor 0,0.5 (l is for "lower") and cor u = cor 0.5,1 (u is for "upper"). Likewise P l = P 0,.5 , 457 P u = P 0.5,1 , D 2 l = D 2 0,0.5 , D 2 u = D 2 0.5,1 . To test for asymmetry of association in upper and 458 lower portions of distributions, we used differences cor l − cor u , P l − P u and D 2 u − D 2 l for 459 smaller datasets (note the opposite order in the last of these); and cor 0,u b − cor 1−u b ,1 , 460 P 0,u b − P 1−u b ,1 and D 2 1−u b ,1 − D 2 0,u b with u b close to 0 for large datasets. We tested our 461 statistics in Appendix S5 (see also Figs S19-S20).

462
For bivariate datasets, we compared values of the statistics cor l b ,u b , P l b ,u b , D 2 l b ,u b , 463 cor 0,u b − cor 1−u b ,1 , P 0,u b − P 1−u b ,1 and D 2 1−u b ,1 − D 2 0,u b to distributions of values of the same 464 statistics computed on surrogate datasets that were produced from the empirical data by 465 randomizing it in a special way to have no tail dependence (Appendix S6). The associations. Thus this comparison helps address the third part of Q1 for bivariate datasets.

484
For multivariate datasets, we calculated Spearman and Kendall correlations and the 485 statistics cor l , cor u , P l , P u , D 2 l , D 2 u , cor l − cor u , P l − P u , and D 2 u − D 2 l for all pairs of confidence intervals excluded 0 (Table 5). Mean values of the statistics cor l − cor u , P l − P u , represented in Fig. 2 as solid outlines around the names of those datasets, in the middle row 621 of boxes. We also carried out the same analyses for abundance and first-flight data for the 18 622 aphid species for which we had data other than the green spruce and leaf-curling plum 623 aphids, as well as for the 21 plankton taxa for which we had abundance data other than 624 Ceratium furca (results not shown). Results supported the conclusion that non-normal 625 copula structure and tail dependence, and asymmetry of tail associations, are common. influenced by a "regional" environmental factor which is correlated across locations ( ) and 644 by an entirely local factor (δ). The function g describes how the environmental factor i (t) 645 25 influences E i (t). We used g equal to g 1 or g 2 in different simulations: g 1 ( ) equals if < 0 and 0 otherwise; g 2 ( ) equals if > 0 and 0 otherwise. Thus g 1 represents environmental 647 effects that negatively impact populations, and that occur only below a threshold of = 0;

654
For each b and g i , we simulated the model for 25000 time steps and retained the E i (t) 655 for the final 2500 time steps. We applied our nonparametric statistic cor 0,0.1 − cor 0.9,1 and 656 our Spearman-or Kendall-preserving normal-copula surrogate comparison methods (see 657 section 4) to these outputs to discover if the model could produce asymmetric tail 658 associations (and therefore non-normal copula structure) between the E i (t). Because ( 1 , 2 ) 659 and (δ 1 , δ 2 ) have normal copula structure (they were drawn from bivariate normal 660 distributions), the Moran mechanism analyzed below does not operate here.

661
Our second mechanism is an extension of the well known Moran effect, and was 662 summarized conceptually in the Introduction. We consider a linear model, as well as two 663 parameterizations of a nonlinear population model which includes density dependence of 664 population growth. For the linear model, let E i (t) again be an ecological variable, i = 1, 2. 665 We use AR(1) dynamics, E i (t + 1) = βE i (t) + √ 1 − β 2 i (t), with β = 0.5. The K. We refer to these as the weak-noise and strong-noise cases, though the importance of the 679 noise here is that, when it is strong, it brings the nonlinearities of the model into play. cor l − cor u , P l − P u , and D 2 u − D 2 l . Values were plotted against τ for noises and populations. 689 If the hypothesis from the Introduction was reasonable that characteristics of the copula 690 structure of spatial dependence in an ecological variable may be inherited from 691 characteristics of spatial dependence in an environmental variable through a Moran-like 692 effect, then plots should be similar for populations and noises.

693
The next mechanism we investigated is evolutionary, and pertains to bivariate trait data 694 across species, e.g., our bird and mammal data. This mechanism is a hypothetical 695 explanation for the bias toward right tail association observed in those data (Fig. 8F,G). 696 The hypothesis is that asymmetric tail association occurs in evolutionary changes in for both g 1 and g 2 , asymmetry of tail association was strong; for b = ±0.5, asymmetry was 718 weaker but still apparent ( Fig. S24C-F). Lower-tail (respectively, upper-tail) spatial 719 associations in the effects of noise (g 1 , respectively, g 2 ) produced lower-tail (respectively, 720 upper-tail) spatial associations in the ecological variable, E i . Results using our statistic 721 cor 0,0.1 − cor 0.9,1 strongly reflected the visually apparent asymmetry (Table 7). Thus 722 asymmetry of environmental effects is a mechanism that may be partly responsible for 723 non-normal copula structure and asymmetric tail associations across space in ecological variables. This result is represented in Fig. 2  copula with a large Kendall τ , cor l − cor u was slightly positive for noise, but slightly negative 752 for population model outputs (Fig. S28). 753 We repeated our analyses using the nonlinear model with r = 1.3. The deterministic 754 one-habitat-patch Ricker model exhibits a monotonic approach to a stable equilibrium at K 755 when r < 1 (undercompensating dynamics, e.g., the value r = 0.5 used initially), but 756 exhibits an oscillatory approach to K when r > 1 (overcompensating dynamics, e.g., 757 r = 1.3). For weak noise, similarities were again dominant between values of our statistics on 758 noise and population time series. For strong noise, however, discrepancies were often glaring.

759
Apparently noise of standard deviation 1 interacted especially strongly with model 760 nonlinearities when the model was in its overcompensatory regime. We repeated all analyses 761 using the Gumbel, survival Gumbel, Joe, and survival Joe copulas, with substantially similar 762 main conclusions (not shown).

763
Thus it is reasonable to hypothesize that a Moran-effect-like mechanism may produce 764 non-normal copula structure and asymmetric tail associations across space in ecological 765 variables. This result is represented in Fig. 2 as the dashed box around "Moran effects" and 766 the arrows labelled "B".

767
For our evolutionary model, the hypothesis was correct, for the models we employed, 768 that asymmetric tail association in evolutionary changes can produce similarly asymmetric 769 tail association between characters across phylogeny tips. Once character evolution was 770 simulated 100 times for each of the dependence structures we considered for evolutionary 771 changes, we had 817 bivariate characters for each of 500 simulations (see section 6). We 772 computed our nonparametric asymmetry statistics cor 0,0.2 − cor 0.8,1 , P 0,0.2 − P 0.8,1 , and 773 D 2 0.8,1 − D 2 0,0.2 for each simulation output and produced a histogram for each statistic and for 774 each dependence structure (Fig. 12). Results showed that asymmetries of tail associations in 775 evolutionary changes were associated with similar asymmetries of tail associations in extant 776 characters. Thus, based on our simulations, we cannot reject the possibility that this is a proximate mechanism behind observed asymmetric tail associations in our bird and mammal 778 data. This result is represented in Fig. 2 as the dashed box around "If character evolution 779 has copula structure" and the arrow labelled "D". This topic is revisited in the Discussion, as 780 is the box "Asymmetric species interactions" and the arrow labelled "C" in Fig. 2 were multivariate normal (i.e., the same copula as a multivariate normal distribution), but 814 the data were otherwise statistically unchanged. Significant differences indicate that 815 non-normal copula structure in the data contributed to the skewness of the spatial mean 816 time series, i.e., to its instability and "spikiness" through time.

817
For green spruce aphid abundance data, C. furca abundance data, and methane-flux 818 data, because these datasets exhibited stronger lower-than upper-tail associations ( significance.

866
To explore the influence of copula structure and tail associations on spatial Taylor's law, 867 we carried out a series of simulations that generated sets of population matrices that had 868 distinct copula structure between locations but that were otherwise statistically similar. For t,j 2 was from the family X. We used N = 50 and n = 25. Details of 878 how these matrices were generated are in Appendices S11 and S12. Thus for each k, the 879 poulation matrices m (X,k) for X taking the values Clayton, normal, Frank, and survival 880 Clayton were statistically similar except for the copula structure between locations, which 881 was X. Thus comparing how the log(v g ) versus log(m g ) relationship may manifest differently 882 for m (X,k) for different values of X constitutes a test of the influence of copula structure on copulas to explore a range of tail association patterns.

885
A variety of Taylor's law statistics were computed for each populaton matrix m (X,k) .

886
First, the N = 50 spatial means, m g , and variances, v g , were computed, and log(v g ) vs.
887 log(m g ) plots were considered. Linearity of the log(v g ) vs. log(m g ) relationship was tested Empirical results were consistent with the hypothesis that the skewness of a spatial-average 908 time series is influenced by tail associations between the local quantities being averaged. For 909 datasets that exhibited highly asymmetric tail associations in earlier analyses (green spruce 910 aphid abundance data had stronger lower-tail association, and, respectively, leaf-curling 911 plum aphid first flight data had stronger upper-tail association, Table 6), skewness of the 912 spatial average was less than (respectively, greater than) a significant fraction of surrogate 913 skewness values (Fig. 13A,B). For datasets with moderately stronger lower than upper-tail 914 dependence (Ceratium furca abundance and methane data), skewness of the spatial average 915 showed a non-significant or marginally significant tendency toward being less than surrogate  Also consistent with hypothesis, copula structure had a substantial effect on Taylor's 926 law for the models we considered. Taylor's law was strongly influenced, and was often even 927 invalidated in its assumptions of linearity and homoskedasticity, by non-normal copula 928 structure. For normal copula structure (i.e., for simulations that gave normal-copula were also strongly affected by copula structure (Fig. 15H,I), though some of the effect here 939 was because linear regressions do not always adequately represent the log(v g ) versus log(m g ) 940 relationship when copula structure was not normal. Thus our results substantiated the 941 hypothesis, at least for the models we used, that Taylor's law can be influenced by copula 942 structure and asymmetric tail associations. This is represented in Fig. 2  open-source computer implementations exist (e.g., the copula and VineCopula packages in 964 the R programming language). Ecologists can apply these tools immediately. We created 965 several interrelated randomization procedures (Appendices S6, S9, S12) that have built upon 966 existing copula methods.

967
The approaches we have demonstrated should apply equally well to data from tropical 968 or temperate ecosystems. There is no reason to expect that non-normal copula structure and 969 asymmetric tail associations should be special properties of datasets from temperate regions.

970
The mechanisms we have proposed of non-normal copula structure and asymmetric tail 971 associations seem equally likely to apply anywhere. For instance, the Moran effect, which 972 underpins one of our proposed mechanisms (Fig. 2, B), is a standard mechanism that occurs 973 whenever environmental variables influence populations. And Liebig's law and nonlinear 974 environmental influences on ecosystems, which underpin another one of our proposed 975 mechanisms (Fig. 2, A), are widely demonstrated phenomena that apply in all biomes.

976
Our first proposed causal mechanism (Fig. 2 which are the essential ingredients of the mechanism, are so common, it is reasonable to 982 hypothesize that the mechanism may operate commonly. It may be a dominant cause of 983 asymmetric tail associations and non-normal copula struture of ecological dependencies 984 across space. We provide some further support for the mechanism in our discussion of green 985 spruce aphids and winter temperature below.

986
There are also reasons to hypothesize that our Moran mechanism (Fig. 2 analysis was also used in forecasting the co-occurrence of extreme events (flood or drought) 1000 over the North Sikkim Himalayas using spatial datasets (Goswami et al. 2018). 1001 We suggested in the Introduction that asymmetric competitive relationships between 1002 species could yield asymmetric tail associations between abundance measurements for the 1003 species. This is another theoretical mechanism for non-normal copula structure and 1004 asymmetric tail associations, represented in Fig. 2 by the box "Asymmetric species 1005 interactions" and the arrow labelled "C". It could be tested by analyzing copulas of 1006 abundances of competing species, sampled across space or time. We note that, whereas all the datasets we studied here have been positively associated when they were significantly 1008 associated, for negatively associated variables such as abundances of competing species, the definitions of left-and right-tail association no longer apply, strictly speaking: the left tail of 1010 one distribution corresponds to the right tail of the other. One must be careful with 1011 terminology, but it is still possible to study asymmetries of association.

1012
Our simulations of character evolution suggest the hypothesis that changes through when we lack defensible estimates of branch lengths. We note that our phylogenetic analyses 1049 here consisted merely of simulations to assess whether interesting copula structure in a 1050 simple evolutionary process could leave a detectable signal on the trait data for extant 1051 species. Substantial work remains to be done before we have a copula-based method for 1052 analyzing data on a phylogenetic tree. But the nonparametric nature of approaches based 1053 solely on data ranks, as many of our copula methods are, may be a promising avenue for 1054 avoiding inaccuracies that arise via the highly structured model assumptions implicit in the 1055 method of phylogenetically independent contrasts and related methods.

1056
Additionally, character evolution was simulated using one random draw from the it is well known that energy allocation to a life function, F (e.g., reproduction) will reduce 1069 the energy that can be allocated to other functions, G 1 , G 2 , G 3 (e.g., growth, predation 1070 avoidance). This is the principle of allocation. But F can trade off against any or all of the 1071 G i . Therefore, for large F, approaching absolute limitations, there may be a strong 1072 association between F and G 1 , for instance. For small F, there may be little association 1073 because resources not allocated to F can instead be allocated to any combination of the G i .

1074
This constitutes asymmetric tail association between F and G 1 . Winemiller & Rose (1992) 1075 described a three-way trade-off in fishes between age of reproductive maturity, juvenile 1076 survivorship, and fecundity. The trade-off should, in theory, produce a tight association 1077 between age of maturity and fecundity for fishes with low age of maturity, but little such 1078 association for later-maturing fishes because those fishes may invest the resources not 1079 invested in maturing quickly into either fecundity or juvenile survival. These ideas suggest 1080 that copulas may be useful for studying multi-dimensional trade-offs. But applications will 1081 require careful attention to the possible consequences of biased sampling: if the degree of 1082 completeness of a dataset is associated with one or more of the characters, then statistical 1083 artefacts can bias conclusions (Appendix S13, Fig. S30). Multi-dimensional copulas may also 1084 be the appropriate copula approach to studying multi-dimensional trade-offs. We used only 1085 bivariate analyses in this study solely for simplicity. But statistical theory on 1086 multi-dimensional copulas is also well developed (Nelsen 2006; Joe 2014; Mai & Scherer copula methods (e.g., the VineCopula package for the R programming language). Such approaches may be a useful next step in life-history theory and other areas.
Additional mechanisms of non-normal copula structure and asymmetric tail associations 1091 probably also operate, and should be considered. For instance, measurement error may 1092 modify copula structure. Our models investigating potential causes of copula structure were 1093 intentionally simple in other respects, too, and did not include factors such as delayed 1094 density dependence, dispersal, population stage structure, trophic interactions, etc.; and we 1095 did not comprehensively explore parameter space for our models. We re-emphasize that 1096 fuller explorations, in future work, of some of our models and of variant model may be 1097 informative. We hope by enumerating a few potential mechanisms of copula structure we 1098 will inspire additional research on the potentially numerous mechanisms that may operate in 1099 diverse datasets, and their relative importance under different circumstances. 1100 We also elaborated potential consequences of copula structure for ecological phenomena 1101 and understanding. Our results showed that the skewness of the spatial average of local time  (2009)).

1109
Typically, variability of community biomass is measured with the coefficient of variation, but 1110 skewness may also be of interest because it can help characterize "spikiness" through time.

1111
Future work on copula structure of interspecific relationships in communities and its 1112 implications for community variability is likely to be valuable, in our opinion.

1113
Although we demonstrated that tail associations between environmental variables can 1114 influence extinction risk, substantial work remains to determine the importance of this effect.

43
First, we used a non-density-dependent model. Do similar results pertain when density dependence is involved? Second, we considered metapopulation extinction risk, but the large life-stage-specific fecundity and survival rates) are considered to vary stochastically through hence may alter spatial tail associations for weather variables.

1126
Our hypotheses and results cover the presence, causes, and consequences of non-normal 1127 copula structure and asymmetric tail associations in ecological systems (Fig. 2), but this is 1128 done using a variety of datasets and models. We here take a closer look at the green spruce 1129 aphid, which simultaneously illustrates causes and consequences with one system. Green  (Table 8). Apparently winter 1139 temperature has an asymmetric influence on aphid abundance in that cold winters generally 1140 produce low abundances but warm winters often do not yield higher abundances than 1141 moderate winters. One of our hypothesized mechanisms (Fig. 2, A), which our modelling results supported (section 7), therefore suggests that spatial dependence between green spruce aphid counts in different locations should show stronger left-than right-tail 1144 associations. This is exactly what was observed (Table 6), providing empirical evidence 1145 supporting the mechanism (this is why the box around "Nonlinear environmental effects,

1146
Liebig's law" and some of the arrows labelled "A" are solid instead of dashed in Fig. 2). The 1147 consequences of such tail dependence for the skewness of spatially averaged aphid counts was 1148 described previously (Fig. 13A). Thus the asymmetric influence of winter temperature 1149 ultimately causes spatially averaged aphid abundance time series to have lower skewness (i.e., 1150 less spikiness, and greater stability through time) than they would otherwise. It seems likely 1151 that asymmetric influence of winter termperature on populations may be a common 1152 phenomenon, so effects such as we have documented for green spruce aphid may be common.

1153
We carried out most of our analyses on data ranks, u i and v i , for good reasons 1154 mentioned in sections 1, 2 and 4 and reviewed here; but under some circumstances it may 1155 also be appropriate to use techniques which parallel our nonparametric statistics but that     Figure 2: Summary and guide for analyses presented in the text. Middle boxes correspond to ecological datasets for which Q1 was examined (in sections 3-5, see also Table 1). Upper boxes correspond to potential causes of non-normal (non-N) copula structure that were examined (in sections 6-7 of the text). Lower boxes correspond to potential consequences we examined of non-normal copula structure for ecological understanding and for applications (in sections 8-9 of the text). Arrow labels (A-D for analyses pertaining to causes, Q2; and X-Z for analyses pertaining to consequences, Q3) link to locations in the text. distributions. Note that a normal copula is distinct from a normal marginal distribution: a normal copula is the copula of a bivariate normal distribution. See section 2 for more information on normal, Clayton, and other copulas. Each copula was used with both sets of marginals and each set of marginals was used with both copulas, to demonstrate that: both copula and marginals contribute fundamentally to the resulting distribution, the copula contributing the information on the dependence between the variables; and that one can select the copula and marginals independently. Bivariate distributions, including the copulas, are depicted via their log-scale probability density functions (pdfs), and marginals are depicted via their linear-scale pdfs. Grey dots are 50 random samples from each distribution. The parameter 0.7 was used for the normal copula, the parameter 2 for the Clayton copula, and shape and scale parameters 5 and 1 for the gamma marginals.
Variables u and v were used for copulas and x and y for distributions created by pairing a copula with (non-uniform) marginals.  Figure 6: Log-transformed pdfs for example BB1 copulas. K is Kendall correlation; p 1 and p 2 denote the two parameters of the family (it is a two-parameter family); and LT and UT are our measures of lower-and upper-tail dependence, respectively (for the definitions of these, see the section on Background on copulas, section 2). The parameter ranges for the family are p 1 ∈ (0, ∞) and p 2 ∈ [1, ∞), lower-tail dependence is 2 −1/(p 1 p 2 ) and upper-tail dependence is 2 − 2 1/p 2 .    Figure 9: Nonparametric tests for tail association and other deviations from normal copula structure, for soil C and N data. As described in the main text, the values of the statistics cor l b ,u b , P l b ,u b and D 2 l b ,u b for real data (red crosses) were compared to distributions of their values on 1000 Kendall-or Spearman-preserving normal surrogates of the data (blue and green x's show 0.025 and 0.975 quantiles), separately in two comparisons for each of the ranges (l b , u b ) = (0, 0.1), . . . , (0.9, 1). Whenever the red cross was outside the range given by the x's, text at the top of panels indicates the number of surrogate values the real-data value was greater than or less than. For instance, a value > N (respectively, < N ) means the value of the statistic on real data was greater than (respectively, less than) its value on N surrogates. When the statistic was greater than 975 or less than 975 surrogate values, it indicates significance (95% confidence level). When cor or P values (respectively, D 2 values) were greater than surrogates, it means association in that part of the distributions was stronger than (respectively, weaker than) expected from a normal-copula null hypothesis.  Table 7. Results from all 1000 simulations were plotted on the same axes for each panel, but two example simulations are shown in red and green to help assess to what extent variation on the plots was between simulations or within simulations. For each simulation, we tested the linearity (E) and homoskedasticity (F) of the log(variance) vs. log(mean) plot for that simulation, and quantified the root mean squared error of points from the linear regression line (G). We also calculated the linear regression intercept (H) and slope (I), the quadratic term of a quadratic regression through the points on the log(variance) vs. log(mean) plot (J), and the curvature of that quadratic regression (K). Distributions of values across all 1000 simulations are displayed. Spearman-preserving surrogates were used, though results using Kendall-preserving surrogates were similar. See section 8 and Appendices S11 and S12 for details.     Table 3: Soil C and N results for nonparametric tests of whether asymmetry of tail association was significant, compared to a normal-copula null hypothesis. The values of the listed statistics (cor F S -cor LS = cor 0,0.1 − cor 0.9,1 , P F S -P LS = P 0,0.1 − P 0.9,1 , D 2 LS -D 2 F S = D 2 0.9,1 − D 2 0,0.1 ; here FS stands for 'first slice' and LS for 'last slice') for real data were compared to their values for 1000 Kendall-or Spearman-preserving normal surrogates, in seperate comparisons. A table entry < X indicates the value of the given statistic on the data was less than its value on X of the surrogates, so entries of the form < X for X equal to 975 or above indicate that upper-tail dependence was significantly stronger than lower-tail dependence. Results were significant only for the P and D 2 statistics.

Statistic
Kendall Spearman cor F S -cor LS <695 <696 P F S -P LS <1000 <1000 D 2 LS -D 2 F S <1000 <1000 Table 4: Summary of Q1 results for bivariate datasets. The p-values (rows 5-6) are for the minimum-AIC copula. Model averaging used for rows 7-9 was based on AIC weights. Row 10 shows values as in Table 3, upper right table entry. Although the result shown in row 10 for the soil C and N data is non-significant, see Table 3 for significant results using the P and D 2 statistics. The first 3 datasets use (l b , u b ) = (0, 0.1) for the first slice (FS) and (0.9, 1) for the last slice (LS), whereas (0, 0.25) and (0.75, 1) were used for Cedar Creek due to fewer data for that system. Results for Cedar Creek for the year 2000 are shown.