Abstract
We do not yet have a solid empirical understanding of the processes that produce biogeographic patterns of species richness. This is partly due to a lack of knowledge about corresponding spatial patterns of genome-wide diversity, which will be inextricably linked to species richness. We use estimates of gene diversity calculated from open data to show that genetic diversity and species richness share spatial structure. Species richness hotspots tend to harbor low levels of within species genetic variation. Fitting multiple response and predictor variables structured as a hypothesis network showed that a single model encompassing eco-evolutionary processes related to environmental energy availability, niche availability, and proximity to humans explains 75% of variation in the gene diversity gradient and 90% of the variation in species-level diversity. This advances our understanding of the patterns and joint causes of variation in the two most fundamental products of evolution.
Introduction
Biogeographical patterns in species richness are particularly well described and studied. Identifying how eco-evolutionary processes shape species gradients is important for our basic knowledge of the products of evolution and societal wellbeing, but an empirical understanding of the processes causing these gradients has remained elusive (Etienne et al. 2019). Genetic diversity is the most fundamental level of biodiversity. But until recently, comparable multi-species, continent-wide data describing genome-wide variation was not available. These two levels of diversity are so entangled that they are nearly inseparable. By not incorporating genetic diversity into analyses of species richness we have been missing a critical piece of the complete picture.
Species richness gradients
Relationships between species richness and environments suggest how we can integrate genetic diversity into a joint analysis of both levels of variation. Though described as the latitudinal species gradient, latitude can only be a correlate of the gradient’s causes. Indeed, species richness patterns sometimes deviate significantly from latitudinal trends, and these deviations suggest its underlying causes. In North America (Simpson 1964) (Fig. 1) the latitudinal trend is pronounced between the Arctic and the Canada-USA border (~50°N), before disappearing across the continental USA and reappearing near its border with Mexico (~30°N). Species richness in the continental USA varies longitudinally, peaking in the west. These deviations suggest that species richness is correlated with environmental energy availability and habitat heterogeneity (Simpson 1964; Kerr and Packer 1997), which in turn are both correlated with latitude in some areas.
Maps depicting spatial patterns of biodiversity and environmental factors. (Top row) Points are the locations of 801 North American mammal populations for which raw microsatellite data was available in public repositories. Point color indicates predicted values of genetic diversity and species richness based on spatial patterns detected in the data. (Bottom row) Maps showing the three environmental variables which we tested for simultaneous effects on genetic diversity and species richness.
This pattern suggests that environments play a prominent role in determining species richness and genetic diversity in at least two important ways. First, energy availability can impose an upper limit on both the number of individuals and the number of species a given area can support (the more-individuals hypothesis) (Storch et al. 2018). Diversity tends to increase with the number of individuals in an assemblage, both in terms of the number of species in a community (Hubbell 2001), and genetic diversity within populations (Kimura 1983). Second, more complex environments have more niches. This can limit dispersal and support a greater diversity of species, albeit at smaller population sizes, if those species come to specialize on different resources (heterogeneity hypotheses) (Kadmon and Allouche 2007). As specialized populations diverge, genetic variation will be divided among species that no longer interbreed. These smaller populations will also lose genetic diversity due to genetic drift faster than large populations.
These two processes suggest that spatial variation at species and genetic levels are partly set by environmental carrying capacities that limit the number of individuals and species a given area can support. The effects of limited carrying capacity on genetic diversity at the population level are well understood: smaller populations should be less diverse than larger populations due to drift. Recent modelling suggests that carrying capacity, relative to other processes that also produce gradients, is the strongest and most stable contributor to species richness (Etienne et al. 2019; Brodie 2019).
Shared spatial variation in species and nuclear genetic gradients
To test for shared spatial variation we quantified spatial patterns of nuclear genetic diversity for North American mammals at a continental scale: the scale at which biogeographical diversity patterns occur. To do this, we repurposed raw microsatellite genotypes posted in public data repositories (Schmidt et al. 2020b). Microsatellite markers estimate genome-wide diversity—the quantity we are interested in—well (Mittell et al. 2015). They were also the most commonly archived marker type, which allowed us to maximize sample size. Our final data set consisted of 34,841 raw genotypes from 38 mammalian species sampled across 801 sites. We calculated gene diversity, an estimate of the evenness and spread of alleles that is not particularly sensitive to sample size (Fig. S1), for each site as our measure of genetic diversity (Charlesworth and Charlesworth 2010). We then estimated species richness at each site by counting the number of native mammal species whose ranges overlapped the site (IUCN 2019) so that we had directly comparable estimates of diversity at the genetic and species levels. We summarized spatial patterns in genetic diversity and species richness using distance-based Moran’s eigenvector maps (MEMs) (Dray et al. 2006). Sixty-five percent of the variation in species richness and 24% of variation in genetic diversity was spatial (Fig. S2). Variance partitioning suggested that 85% of the total spatial variation in genetic diversity, and 32% of spatial variation in species richness was accounted for by spatial patterns shared at both levels of diversity.
We used the linear regression predicted values of spatial vectors (MEMs), calculated above, that had Moran’s I measures of spatial autocorrelation > 0.25 (indicating broad spatial patterns) to produce maps of continental-scale spatial variation in both levels of diversity (Fig. 1). Consistent with previously described patterns of species richness in North America, species richness hotspots at our genetic sample sites appear to be related to topography, a measure of environmental heterogeneity, and potential evapotranspiration, a measure of energy availability (Fig. 1). There was no obvious relationship between latitude and nuclear genetic diversity. Similar to patterns of species richness, a longitudinal gradient in genetic diversity is the dominant pattern for North American mammals—however, diversity gradients at the two levels trend in opposite richness. Genome-wide genetic diversity appears markedly lower in regions with high species richness, such as on the west and mid-Atlantic coasts, where there is high energy availability and topographic relief. This pattern is consistent with the prediction that these areas would support more species with smaller, less genetically diverse populations. Our map of neutral nuclear genetic diversity contrasts with relatively consistent latitudinal gradients in mitochondrial (mtDNA) genetic diversity, the only other genetic marker explored at a similar spatial scale (Miraldo et al. 2016; Theodoridis et al. 2020). This is perhaps not surprising because mtDNA diversity is not systematically related to population size (Bazin et al. 2006), species ecology, or life history (Nabholz et al. 2008)—all key features of the processes we explore here.
Common causes of species and nuclear genetic gradients
To explore the common causes of genetic and species-level diversity, we built a conceptual model based on the idea that carrying capacity is a strong driver of diversity. We fit this conceptual model to data using structural equation modelling, an approach for examining cause-effect relationships within hypothesis networks that can accommodate multiple predictor and response variables. Structural equation modelling is an extension of multivariate multiple regression where variables can be thought of as nodes in a network, and directional paths connecting nodes represent causal relationships. The strengths of paths are equal to regression coefficients (Shipley 2016). In addition to direct effects, you can quantify indirect effects between variables by multiplying direct effects over paths. Using standardized coefficients, we can compare the strength of relationships both within and across levels of biodiversity. The appropriateness of links in the hypothesis network can be tested using tests of directed separation (Shipley 2016), where the null hypothesis is that the two variables are independent, conditional on other predictors of either variable. This means that although we start with a focus on carrying capacity, the data can suggest the addition or removal of links representing alternative hypotheses.
Our conceptual model was built around the predicted effects of carrying capacity related to the more individuals and environmental heterogeneity hypotheses (Fig 3a). The more individuals hypothesis predicts that increasing energy availability should act through population size to increase both species and genetic diversity. We use body size, which is inversely related to population size, as our estimate of species-level population size (Damuth 1981). We measured energy availability as the mean annual potential evapotranspiration across a species range. Heterogeneity, measured at the range level by quantifying the area-corrected range in elevation across a species range, should reduce population sizes but lead to greater species richness by increasing niche diversity. Heterogeneity and energy availability were measured at the range-level because the spatial coverage of genetic sample sites in the data is not evenly distributed. Additionally, some species ranges could be oversampled if we considered population-level environmental variation and thus overrepresented compared to species ranges that contain fewer sampled populations. Finally, we included human population density in our model, predicting negative relationships with species richness, genetic diversity, and mass (Merckx et al. 2018). Contemporary rapid environmental change is rarely considered at the same time as long-term processes, but humans are known to influence both levels of biodiversity and so should be modelled.
Our final model fit the data well (SEM p= 0.23, Fisher’s C= 2.92; Fig. 3b, Table S1). Energy availability, niche heterogeneity, and human population density, acting both directly and indirectly through species population size, explained 32% of the variation in genetic diversity. The species-level variation explained by the random effect for species brought the total variation in genetic diversity explained by our model to 75%. The same model explained 90% of the variation in species richness. There was no strong spatial autocorrelation in the model, suggesting that the spatial structure of the diversity data was well captured by our model covariates (Fig. S3).
All links in our conceptual model were supported, with additional direct links suggested from energy availability to species richness, genetic diversity to species richness, and heterogeneity to genetic diversity (Fig 3b). Mammals conformed to the zero-sum carrying capacity related expectations of the more individuals hypothesis at both genetic and species levels of biodiversity. Our data indicated that when resources are limited, environmental carrying capacity is limited. These environments supported fewer but larger-bodied species with smaller population sizes and lower genetic diversity. In resource-rich areas, organisms are generally smaller, and populations and communities larger, harboring greater genetic diversity and species richness. The strength of effects related to the more individuals hypothesis was most prominent at the genetic level of diversity: the strength of the indirect effect of energy on genetic diversity acting via population size was 0.13 compared to 0.02 for species richness. We also detected a direct effect of energy on species richness (path coefficient = 0.44 ± 0.01 SE; Fig. 3b, Table S1), even after accounting for population size and topographic heterogeneity. This relationship has been noted elsewhere and has sometimes been interpreted as refuting the more individuals hypothesis (Storch et al. 2018). Vegetation structure may drive the link between species richness and temperature (Pautasso and Gaston 2005; Jiménez-Alfaro et al. 2016), as complex, vegetation-rich habitats in warmer environments also have greater niche availability. Because both links are retained in our model it seems clear that this additional link does not negate the more individuals hypothesis, but rather is additive and indeed more important in determining species richness than the more individuals effect.
Heterogeneity was the strongest single predictor of species richness (path coefficient = 0.70 ± 0.01), and a good predictor of genetic diversity (path coefficient = −0.30 ± 0.07). Directions of effects were as expected if niche diversity reduces population sizes, carving existing variation into multiple species and leading to increased drift (Fig. 2). Because gene diversity is not a measure of divergence, we also tested whether environmental heterogeneity predicted evolutionary divergence at the population level. To do this we calculated a population-specific FST (Weir and Goudet 2017) from raw genotypes and related it to environmental heterogeneity. Population-specific FST can be interpreted as a relative estimate of the time since a population has diverged from a common ancestor. Results from this linear mixed model while controlling for species as a random effect and spatial structure, showed that heterogeneity indeed increased population divergence (β = 0.13 ± 0.06 SE, n = 785 sites), suggesting that genetic drift is strong and gene flow limited in these areas. Heterogeneous environments impose greater spatially varying selection and coupled with low gene flow this creates ideal conditions for local adaptation, which can happen even under relatively high levels of genetic drift (Hämälä et al. 2018). This lends support to the idea that there are higher diversification rates in more complex environments because there are more opportunities for speciation.
(a) Our conceptual hypothesis network combining the more individuals hypothesis (solid lines) with the effects of environmental heterogeneity (dashed lines) and human presence (dotted lines). Arrows represent unidirectional relationships between variables. (b) Structural equation model results. Green and black lines positive and negative relationships, respectively. Line widths reflect coefficient estimates, which are listed above each path with standard errors. R2 values are the amount of variation explained for each response variable. Mass and species richness were measured at the species level, and genetic diversity was measured at the population level and fit with a random effect for species: R2m is the variation explained by fixed effects only, and R2c is the variation explained by fixed and random effects.
Recent human-caused environmental disturbance both directly and indirectly (via body mass/population size) affected both species and genetic diversity (Fig. 2b). Notably, although it seems human presence and heterogeneity both reduce genetic diversity and limit dispersal, human-dominated environments do not yet appear to be creating opportunities for coexistence by niche-packing. Perhaps this will change with time, but if conditions for energetic requirements and niche variability are not met, such a scenario is unlikely. Although a subset of species do well in cities, the broader effects of habitat loss and homogenization make cities, as they are currently built, inhospitable substitutes for the variety of natural niches they replace.
Discussion
The latitudinal species richness gradient has been recognized since the 1800s (Willig et al. 2003). The gradient’s consistency has generated >30 hypotheses aiming at explaining its relationship with environments (Pontarp et al. 2019). Hypotheses fall into three broad categories: evolutionary time, diversification rates, and ecological limits (differential carrying capacities), which are often, at least implicitly, treated as competing ideas. But we cannot have speciation without ecology and evolution. Pontarp and Wiens (2017) advocate a more interconnected view, reporting that time for speciation effects on diversity should be strongest at short time scales, especially during the initial colonization of new environments. When all locales are colonized, habitats that provide more opportunities for speciation should over time become the most diverse. All the while, differential carrying capacities determine demographic parameters such as colonization success, population viability, and risk of local extinction, as well as the efficiency of selection. Etienne et al. (2019) used simulations to determine that ecological limits on carrying capacity present the most parsimonious explanation for the latitudinal diversity gradient, though all categories of hypothesis produced gradients. Worm and Tittensor (2018) were able to recreate diversity gradients with simulations using only two parameters: temperature and community size. Also using simulations, Vellend (2005) found that by increasing environmental heterogeneity, rare species increased in abundance so much that the population size and genetic diversity of other species in the community decreased. Our findings, explaining 75 and 90% of the variation in genetic diversity and species richness respectively, provide strong empirical support for the theory-based inferences described above and extend them to the genetic level—multiple eco-evolutionary processes simultaneously produce the gradients in genetic and species level biodiversity we see in nature.
Communities with diverse species provide important ecosystem functions and services that contribute to human physical and psychological well-being. Ecosystem sustainability in the face of environmental perturbations, occurring more frequently due to human causes, depends on the resiliency of landscapes, communities, and populations (Oliver et al. 2015). The intimate connections between the environment, species richness and genetic diversity we find here suggest that changes on one level can cascade throughout the system and profoundly reshape broad patterns of global biodiversity across multiple biological levels in ways we do not yet fully grasp. Tradeoffs between genetic and species level biodiversity present a conundrum for management practitioners because complex environments that are hotspots for species richness are more likely to harbor relatively low intraspecific genetic variation, and consequently populations that may be less resilient to environmental change. Thus it appears designating conservation areas with the preservation of both species and genetic diversity in mind is not an advisable strategy. Instead, programs focusing on conserving environments and native biological communities should be separate from those aiming to preserve population size and standing genetic variation.
Author contributions
C.J.G. and C.S. conceptualized the study. C.S., S.D. and C.J.G. designed the study and C.S. conducted the statistical analysis with input from S.D. and C.J.G. All authors contributed to data interpretation. C.S. wrote the first draft of the manuscript and all authors participated in editing subsequent manuscript drafts.
Supplementary Information
Methods
Data assembly
Genetic diversity and body size
We obtained estimates of neutral genetic diversity at 801 sites across North America from the database compiled by Schmidt et al. (Schmidt et al. 2020a, 2020b), where diversity metrics for each site were computed from raw georeferenced microsatellite data. We chose gene diversity at each site as our metric for genetic diversity. Gene diversity estimates the richness and evenness of alleles in a population and is minimally affected by sample size (Charlesworth and Charlesworth 2010). Next, we recorded mean adult body mass (g) for each species using data from the PanTHERIA database (Jones et al. 2009). Mass was log-transformed in our models.
Species richness
We downloaded range maps for terrestrial mammals native to North America from the IUCN Red List database (IUCN 2019). Ranges for all species were mapped in a single Esri shapefile layer with polygon features. We filtered these maps to retain ranges for extant, native, resident, mainland species in ArcMap Desktop 10.3.1 (ESRI, Redlands, CA). We estimated species richness at each of our genetic diversity sample sites to describe spatial variation at both levels of biodiversity across North America. We used a spatial join to count the number of species ranges which overlapped each site to provide a site-level index of species richness. For a broader scale measure of species richness, we also measured richness at the level of each species’ range. We did this to avoid introducing bias in biodiversity patterns at the continental scale due to the spread of our genetic diversity sites, which did not consistently sample a species’ entire range. We computed species-level species richness by counting the number of intersecting species ranges using a spatial join in ArcMap. To correct for potential biases due to differences in range size, we divided the number of overlapping ranges by the species’ range area (km2).
Environmental variables
Potential evapotranspiration measures the atmosphere’s ability to remove water from the Earth’s surface, and is an indicator of atmospheric energy availability. Potential evapotranspiration is one of the strongest environmental correlates of species richness in mammals (Currie 1991; Kreft and Jetz 2007; Fisher et al. 2011; Jiménez-Alfaro et al. 2016). We estimated mean potential evapotranspiration (mm/yr) within each species’ range using annual potential evapotranspiration data from 1970-2000 available via the CGIAR Consortium for Spatial Information (Trabucco and Zomer 2019). We used a global topography map (NOAA and U.S. National Geophysical Data Center) to record the range in elevation across focal species ranges to quantify environmental heterogeneity. We also corrected elevation range for potential biases introduced by species range area, because larger ranges tended to encompass greater topographical heterogeneity. Human population sizes were recorded for each site in the aforementioned genetic diversity database (Schmidt et al. 2020a).
Analysis
Spatial patterns in genetic diversity and species richness
All analyses were conducted in R (version 3.6.1, R Core Team 2013). Our first step was to identify spatial patterns in genetic diversity. We accomplished this by adopting a method used in landscape genetics to control for unmeasured environmental variables when investigating adaptive variation associated with environmental factors (Manel et al. 2012). Distance-based Moran’s eigenvector maps (MEMs) detect spatial patterns in data from a modified matrix of distances between sites—a ‘neighbor’ matrix—whose eigenvalues are proportional to Moran’s I index of spatial autocorrelation (Borcard and Legendre 2002; Borcard et al. 2004; Dray et al. 2006). MEMs are vectors that represent spatial relationships between sites at all scales detectable by the sampling scheme, and can be included in linear models to account for effects of unknown spatial processes. Our next step was to determine which MEMs reflected important spatial patterns in genetic diversity and site-level species richness, using a forward selection procedure (Blanchet et al. 2008). This gave us two sets of MEMs which described spatial patterns present in genetic diversity and species richness. To produce continental maps of genetic and species levels of biodiversity we selected MEMs which modeled broad scale spatial patterns based on Moran’s I (MEMs with Moran’s I > 0.25). We then fit individual linear regression models for species and genetic diversity with corresponding MEMs, and plotted the predicted values.
Variation partitioning
We then examined the extent to which genetic diversity and species richness covary spatially. Because MEMs for species richness and genetic diversity were computed from the same set of coordinates, they were directly comparable: this allowed us to identify shared spatial patterns that might have a common environmental cause. We used linear regressions and variance partitioning to determine what fraction of the total variation in species richness and genetic diversity could be attributed to: (1) non-spatial variation, (2) non-shared shared spatial variation, and (3) shared spatial variation.
Joint analysis of causes of genetic diversity and species richness
Next, we tested the hypothesis that differential carrying capacities are important drivers of biodiversity, specifically the more individuals hypothesis and the effects of environmental heterogeneity and human activity. We developed a causal framework based on these hypotheses (Fig. 3a) which we tested using structural equation modeling (SEM). SEM is an extension of multivariate regression in which a series of regressions representing causal relationships between variables are assessed as components of a hypothesis network (Shipley 2016). We implemented SEMs in the piecewiseSEM package (Lefcheck 2016; Lefcheck et al. 2019). PiecewiseSEM offers greater flexibility than other SEM software because it uses a local estimation approach where each model is assessed individually (Lefcheck 2016). This frees the user from assuming multivariate normality and linear relationships between variables when using global estimation to evaluate SEM fit. All variables were scaled and centered prior to analysis.
We translated our conceptual diagram (Fig. 3a) into a network of 3 linear models with a single model for each response variable: gene diversity, body size, and species richness. Gene diversity was measured for populations nested within species, thus we used a hierarchical model within the SEM framework to control for species differences by fitting it as a random effect.
Goodness-of-fit in SEM is determined by evaluating whether there are any missing links in the causal structure, i.e. whether adding paths between pairs of variables would be more consistent with the data. In piecewiseSEM missing links are tested using tests of directed separation (Shipley 2016), where the null hypothesis is that the two variables are independent, conditional on other predictors of either variable. Starting with our conceptual model (Fig. 3a), we iteratively updated models by adding links according to tests of directed separation until no further biologically sensible links were suggested. A non-sensible link, for example, would be genetic diversity causing energy availability. We assessed model fit using the p-value for the model network, where the null hypothesis is that the model is consistent with the data. Thus, models with p > 0.05 are considered acceptable – we fail to reject our causal structure. We also assessed fit using R2 values for each response variable in the model network. For genetic diversity, we used marginal (R2m) and conditional R2 (R2c) values which respectively measure the total variation explained by fixed effects and the variation explained by both fixed and random effects. We then tested the residuals from component models for spatial autocorrelation using Moran’s tests and spatial correlograms. The body size model residuals were not spatially autocorrelated, but genetic diversity and species richness models had statistically significant spatially autocorrelated residuals at very local scales (genetic diversity Moran’s I = 0.025, species richness Moran’s I = 0.029). These Moran’s I values do not indicate strong spatial structure in the data and we decided not to integrate it into our model. The positive spatial autocorrelation at such short distances is likely an artifact of irregular site locations and the hierarchical nature of the data.
Effect of heterogeneity on population divergence
We tested whether topographic heterogeneity caused greater population differentiation using population-specific FST (Weir and Goudet 2017), a measure of genetic divergence. We controlled for isolation-by-distance by including dbMEMs significantly related to FST to account for spatial structure. We scaled and centered all variables, then used a linear mixed model of the form: FST ~ heterogeneity + MEMs + (1|species), controlling for species differences by including it as a random effect.
Supplementary Results
Spatial patterns in genetic diversity and species richness
We detected spatial patterns in genetic diversity at several scales. A total of 199 MEMs corresponded to positive spatial autocorrelation. Of these, 13 explained important spatial variation in gene diversity. In order of increasingly fine spatial scales, significant patterns were MEMs 2, 3, 4, 5, 22, 27, 30, 31, 47, 49, 101, 145, 152. Forty-three MEMs were important predictors of species richness, and 8 of these patterns were shared by genetic diversity (Fig. S1). The cut-off for broad-scale MEMs—defined as those having Moran’s I values > 0.25—was MEM 5 for genetic diversity and MEM 11 for species richness.
Plot of gene diversity vs. sample size. Gene diversity as a metric of genetic diversity depends on allele frequencies and is minimally affected by sample size. Larger populations have more rare alleles, which contribute little to gene diversity.
Variation partitioning results. This graph shows the proportion of variation in genetic diversity and species richness which can be explained by spatial factors, determined using Moran’s eigenvector maps (MEMs). Spatial variation is further broken down into shared and non-shared spatial variation. Shared spatial variation is variation in genetic diversity and species richness explained by shared MEMs; non-shared variation is the remaining fraction of spatial variation not accounted for by shared MEMs.
Correlation coefficients for spatial patterns (MEMs) and environmental variables measured at the site level: potential evapotranspiration (PET), elevation, and human population density. MEMs describe spatial patterns in genetic diversity, species richness, or both (shared spatial patterns). MEMs are ordered from broad (MEM1) to fine scale (MEM194) patterns. Strong correlations indicate that environmental variables included in structural equation models account for broad scale spatial patterns present in genetic diversity and species richness.
Path coefficients and standard errors for SEM model (Fisher’s C = 2.92, p = 0.23, 2 degrees of freedom).
Acknowledgements
We would like to thank the Population Ecology and Evolutionary Genetics group for their feedback on this manuscript. We are also grateful to the authors whose work provided the raw data for this synthesis. C.S. and C.J.G. were supported by a Natural Sciences and Engineering Research Council of Canada Discovery Grant to C.J.G. C.S. was also supported by a U. Manitoba Graduate Fellowship, and a U. Manitoba Graduate Enhancement of Tri-council funding grant to C.J.G.