Abstract
The historical course of evolutionary diversification shapes the current distribution of biodiversity, but the main forces constraining diversification are unclear. We unveil the evolutionary structure of tree species diversity across the Americas to assess whether an inability to move (dispersal limitation) or to evolve (niche conservatism) is the predominant constraints in plant diversification and biogeography. We find a fundamental divide in tree lineage composition between tropical and extratropical environments, defined by the absence versus presence of freezing temperatures, respectively. Within the Neotropics, we uncover a further evolutionary split between moist and dry forests. Our results demonstrate that American tree lineages, though broadly distributed geographically, tend to retain their ancestral environmental relationships and that phylogenetic niche conservatism is the primary force structuring the distribution of tree biodiversity.
Main text
A central challenge in biogeography and macroevolution is to understand the primary forces that drove the diversification of life. Was diversification confined within continents, and characterized by adaptation of lineages to different major environments (i.e., biome switching), or did lineages tend to disperse across great distances, but retain their ancestral environmental niche (i.e., phylogenetic niche conservatism)? Classically, the attempts to define biogeographic regions based on shared plant and animal distributions lend support to the first hypothesis, that large-scale patterns may be explained by regionally confined evolutionary diversification, rather than long-distance dispersal (1-3). Alternatively, recent studies of the distribution of plant lineages at global scales have documented high levels of inter-continental dispersal (e.g., 4-8), and revealed that lineages tend to retain their ancestral biomes when dispersing (9,10). These latter findings suggest that dispersal is not limited in plants and that strong environmental associations of lineages may be the primary force organizing the course of diversification. However, there remain relatively few studies comparing the degree of evolutionary similarity between species assemblages across biomes at broad scales to elucidate the relative importance of phylogenetic niche conservatism versus dispersal limitation in structuring the distribution of biodiversity.
With high mountain chains running north to south across a mosaic of contrasting environments, the Americas represent a natural laboratory to investigate how the distribution of biodiversity has been shaped by evolution. Although different lines of evidence suggest that plant diversity in the Americas presents a latitudinal structure (11-17), the evolutionary forces driving this pattern remain largely unexamined. Within the Neotropics, the evidence of past processes of diversification shaping the current distribution of plant diversity is contradictory. While some studies show phylogenetic niche conservatism in lineages from moist tropical forests (18) and tropical dry forest (19-21), most of the plant lineages present today in tropical savannas seem to have originated in other forested biomes and made their evolutionary shift to savannas within the last ten million years (22-23). Thus, there is a need to define a general pattern of the distribution of evolutionary diversity to understand the forces that drove this diversification.
Here, we examine the phylogenetic composition of angiosperm tree assemblages across the Americas as a means to determine whether dispersal limitation or phylogenetic niche conservatism had greater impact on the present-day evolutionary structure of biodiversity. If lineages tend to retain their environmental niche as they diversify across space, we would expect major evolutionary groups to be restricted to specific biomes, and for their distributions to mirror that of their preferred environmental regime. This leads to the prediction that lineage composition of assemblages from extratropical regions in both hemispheres should be more similar to each other than to assemblages occurring in intervening tropical regions. In addition, we would predict that assemblages from arid tropical environments across the Neotropics should show greater similarity in tree lineage composition than to assemblages from moist environments with which they may be spatially contiguous or interdigitated (19). Alternatively, if diversification is spatially restricted and biome switching is common, the major evolutionary grouping of assemblages should be segregated geographically, irrespective of environmental conditions, and we might expect, for example, because of the physical isolation of South America through the Cenozoic, that its assemblages constitute one group and North and Central American assemblages another.
To test the contrasting scenarios of phylogenetic niche conservatism and biome switching, we analyzed data on ∼ 10, 000 tree assemblages, largely compiled from vegetation inventories (see Materials and Methods), from locations spanning extensive geographic and environmental gradients in the Americas. We constructed a temporally-calibrated, genus-level phylogeny that includes as many of the inventoried angiosperm tree genera as possible (1,358 total; an average of ∼ 90% of the genera sampled per assemblage). We assessed similarity in lineage composition among assemblages using clustering analyses and ordinations based on shared evolutionary history, quantified as shared phylogenetic branch length. Next, we identified the indicator lineages for each major group in the clustering analysis. Finally, we explored the geographic and environmental correlates of the distribution of the main evolutionary clusters, and estimated their unique versus shared evolutionary diversity. The former indicates the total amount of diversification, or phylogenetic branching, that has occurred within lineages that are largely restricted to individual evolutionary groups, while the latter represents diversification in lineages that span evolutionary groups, including that shared across all evolutionary groups.
Our results suggest that the evolutionary lineage composition of American tree assemblages is structured primarily by phylogenetic niche conservatism. The two principal clusters of tree assemblages defined by similarity in evolutionary lineage composition have a tropics-extratropics structure (Fig. 1, Fig. S4). Moreover, the extratropical group is not geographically segregated, because it includes temperate tree assemblages from forests of North America and southern South America connected by a corridor of high-elevation forests via mountain chains across the Andes and Central America (Fig. 1 a,b). In order to test the correspondence of these two main clusters with environmental or geographical variables, we compared them with the eight data layers proposed by (24) to separate the extratropics from the tropics. We found the strongest correspondence (97% match, Fig. S1) with the occurrence, or absence, of freezing temperatures within a typical year (see Fig. 1 c,d). In assessing evolutionary diversity, measured as summed phylogenetic branch length, either restricted to or shared between these two groups, we observe that most evolutionary diversity occurs within the tropics, but that there is unique evolutionary diversity restricted to the extratropics (∼ 10% of the total, Fig. 2b, S3a). Ordination and indicator clade analyses revealed that the tropics-extratropics segregation is associated with the distribution of specific clades, especially the Fagales, which includes the oaks (Quercus), beeches (Fagus), coihues (Nothofagus) and their relatives (Fig. 3, Table S1, S2).
Our clustering analyses identified that K=3 and K=4 groups are also supported as additional informative splits, with subsequent partitions of the data resulting in little additional information explained (Fig. S2). Each of the major groups in K=3 and K=4 captures substantial unique evolutionary diversity (Fig. 2 b, Fig. S3, Table S2). In K=3, the main extratropical cluster grouped assemblages from North America and extreme southern South America, while the remaining assemblages from temperate southern South America and the Andean tropics grouped with assemblages from the arid or semiarid tropics and subtropics and the moist tropics formed a third group (Fig. S5). For K=4, the extratropics were splits into a largely temperate North American group and a group that includes subtropical sites in South and Central America, the Andes and southern temperate forests. In the tropics there is one group including assemblages found in ever-moist and warm conditions, and a second one of assemblages that extend into drier areas (Fig. 2 c), including most tropical dry forest (Fig. 2 a; Fig. S6; Table S3). Hereafter, we refer to the four clusters of assemblages in K=4 as the Northern Extratropical, Southern Extratropical, Tropical Moist and Tropical Dry groups.
Tropical and Extratropical conservatism
Phylogenetic niche conservatism drives two key processes structuring the distribution of tree diversity in the Americas. First, it constrains the diversification within the tropics or extratropics and, second, it organizes the recent migrations of extratropical lineages tracking their preferred environments into low latitudes. Our results demonstrate that the tropics-extratropics evolutionary structure of tree diversity is principally associated with the environmental threshold of the presence or absence of freezing temperatures in a typical year. This pattern is consistent with evidence documenting that only angiosperm lineages that were able to evolve traits to avoid freezing-induced embolism radiated into high latitudes (25). In addition, we found that a unique, sizeable portion of the total evolutionary diversity of angiosperm trees is restricted to extratropical environments, as the fossil record corroborates (26,27). Collectively, this evidence suggests that the phylogenetic conservatism of lineages from the extratropics has a major relevance for the diversification of angiosperm trees in the Americas. Kerkhoff et al. (2014) estimated that in the extratropical region (defined as those distributed north of 23°N and south of 23°S) angiosperm ancestors produced extratropical descendants at least 90% of the time. Considering that some areas subjected to regular freezing at high elevations in equatorial latitudes may be better classified as part of the extratropics, as demonstrated here by our results, the extratropical phylogenetic conservatism could even be greater (16).
While the effect of tropical phylogenetic niche conservatism on patterns of biodiversity distribution has been broadly discussed (e.g., follow the references to (28)), the role of extratropical conservatism has received less attention. However, some studies illustrate that lineages tracking extratropical environments in high tropical mountains can shape patterns in the distribution of phylogenetic diversity across these elevation gradients (29). In the Americas, the relatively recent uplift of the Andes (30) would have created novel, extratropical environments (i.e., with regular freezing temperatures) at low latitudes, allowing lineages previously diversified at high latitudes to move from both north and south to equatorial latitudes (31). Fossil pollen demostrates the arrival in the northern Andes of tree genera from temperate forests in the northern hemisphere, including Juglans (Juglandaceae), Alnus (Betulaceae) and Quercus (Fagaceae), at about 2.2 Ma, 1.0 Ma and 300 Ka respectively, and the arrival of southern genera, including Weinmannia (Cunoniaceae) and Drymis (Winteraceae), during the late Pliocene and Pleistocene (1.5–3.2 Ma) (31,32). Likewise, phylogenetic evidence shows recent diversification in the Andes of lineages that seem to have originated in the extratropics, including Lupinus (Fabaceae) (33), Adoxaceae/Valerianceae (34, 35) and Gunnera (Gunneraceae) (36).
Pattern within the Neotropics
Our results also point to a moist versus dry evolutionary divide within the Neotropics. The Tropical Moist Group holds the greatest amount of evolutionary diversity, both overall and unique to it, despite occupying the most restricted extent of climatic space of any of the K=4 groups (Fig. 2 b,c). The Tropical Dry Group, in contrast, extends across a broader climatic space, but holds less evolutionary diversity (Fig. 2 b,c). This asymmetry in the accumulation of diversity may reflect phylogenetic conservatism for a putatively moist and hot ancestral angiosperm niche (28), or could result from a favorable environment that can be occupied by any angiosperm lineage, even those that also occur in cooler or drier conditions (37,38). Regardless, the similarity in the lineage composition of the extensive but discontinuously distributed tropical dry forests (19), indicates their separate evolutionary history. Although tropical dry forest inhabiting taxa have often been described as more dispersal-limited than those from rain forests (e.g., 19), dispersal over evolutionary time-scales seems to have been sufficient to maintain this floristic cohesion. Such evolutionary isolation of the dry forest flora has previously been suggested by studies in Fabaceae (19,39), and is shown here to be evident at the evolutionary scale of all angiosperm tree species.
Our results also help to clarify the contentious evolutionary status of savanna and Chaco regions in Neotropics. On one hand, we find that the southern savannas (the Cerrado region of Brazil) are more evolutionary related to tropical moist forests than dry forests (Fig. 2 a, Fig. S5). This finding agrees with previously suggested evolutionary links between the tropical savanna and moist forest biomes (39), and more specifically with evolutionary biome switching from moist forests to Cerrado savannas (22). However, northern tropical savannas (i.e., Llanos of Venezuela and Colombia and those in Central America) are split in their evolutionary affiliation between the Tropical Moist and Tropical Dry groups, indicating linkages to moist and dry tropical forests (Fig. 3, Table S1). Accordingly, this may reflect the distinct ecology of many northern savannas (e.g., the Llanos are hydrological savannas; 40) and suggest a divergent evolutionary history for northern and southern savannas. On the other hand, our results help to resolve the debates around the status of the Chaco, which has been suggested to be a distinct biome with temperate evolutionary affinities or as part of a wider dry forest biome (e.g., 41-43). Our results show that this geographically defined region houses a mix of extratropical and tropical lineages. Indeed, our analyses consistently point to evolutionary links between assemblages in seasonally dry and seasonally cold areas (Fig. 2, S5, S6). For example, when we consider K=3 evolutionary groups, a single ‘dry and cool’ group coalesces, with the other two groups being the tropical moist forest group and a largely northern, extratropical group (Fig. S5).
We show that the evolutionary structure of tree diversity in the Americas is determined primarily by the presence or absence of freezing temperatures, dividing tropical from extratropical regions. Within the tropics we find further subdivision among lineages experiencing moist versus seasonally-dry conditions. These findings strongly demonstrate that phylogenetic niche conservatism is the primary force organizing the diversification and, therefore, the biogeography of angiosperm trees. Tree species that can inhabit areas experiencing freezing temperatures and/or environments subjected to seasonal water stress belong to a restricted set of phylogenetic lineages, which gives a unique evolutionary identity to extratropical forests and tropical dry forests in the Americas. While our study is restricted to the New World, we suggest that plant biodiversity globally may be evolutionarily structured following a tropics-extratropics pattern, while diversity within the tropics may be structured primarily around a moist-dry pattern. These findings advocate strongly for integrating the concept of extratropical conservatism and tropical-dry conservatism into our understanding of macroevolutionary trends and biogeographic patterns at intercontinental scales.
Materials and Methods
Tree assemblage dataset
Our tree assemblage dataset was derived by combining the NeoTropTree (NTT) database (44) with selected plots from the Forest Inventory and Analysis (FIA) Program of the U.S. Forest Service (45), accessed on July 17th, 2018 via the BIEN package (46). We excluded from the latter any sites that had less than five angiosperm genera. Sites in the NTT database are dened by a single vegetation type within a circular area of 5-km radius and contains records of tree and tree-like species, i.e., freestanding plants with stems that can reach over 3m in height (see www.neotroptree.info and (47) for details). Each FIA plot samples trees that are ≥ 12.7 cm diameter at breast height (dbh) in four subplots (each being 168.3 m2) that are 36.6 m apart. We aggregated plots from the FIA dataset within 10 km diameter areas, to parallel the spatial structure of the NTT database. This procedure produced a total dataset of 9937 tree assemblages distributed across major environmental and geographic gradients in the Americas.
Genera phylogenetic tree
We obtained sequences of the rbcL and matK plastid gene for 1358 angiosperm tree genera, from Genbank (www.ncbi.nlm.nih.gov/genbank/), building on previous large-scale phylogenetic efforts for angiosperm trees in the Neotropics (48,49). Sequences were aligned using the MAFFT software (50). ‘Ragged ends’ of sequences that were missing data for most genera were manually deleted from the alignment.
We estimated a maximum likelihood phylogeny for the genera in the RAxML v8.0.0 software (51), on the CIPRES web server (www.phylo.org). We constrained the tree to follow the order-level phylogeny in Gastauer et al. (2017) (52), which is based on the topology proposed by the Angiosperm Phylogeny Group IV. We concatenated the two chloroplast markers following a General Time Reversible (GTR) + Gamma (G) model of sequence evolution. We included sequences of Nymphaea alba (Nymphaeaceae) as an outgroup.
We temporally calibrated the maximum likelihood phylogeny using the software treePL (53). We implemented age constraints for 320 internal nodes (family-level or higher, from (54)) and for 123 genera stem nodes (based on ages from a literature survey, Table S4). The rate smoothing parameter (lambda) was set to 10 based on a cross-validation procedure. The final dated tree can be found in Supplementary Information.
Phylogenetic distance analysis and clustering
We used the one complement of the Phylo-Sorensen Index (i.e., 1 – Phylo-Sorensen) to build a matrix of phylogenetic dissimilarities between plots based on genera presence-absence data. The Phylo-Sorensen index sums the total branch length of shared clades between sites (55) relative to the sum of branch lengths of both sites: where BLij is the sum of branch lengths shared between plots i and j, and BLi and BLj are the sum of branch length of tips within plots i and j, respectively. Thus, if all branches are shared between two plots, the dissimilarity measure takes on a value of 0. If no branches are shared between plots (i.e. the plots comprise two reciprocally monophyletic clades), the dissimilarity measure will take on a value of 1. This metric was estimated using the phylosor.query() function in the PhyloMeasures (56) package for R.
We used K-means clustering to explore the main groups, in terms of (dis)similarity in the tree assemblage dataset, according to the Phylo-Sorensen dissimilarity measures. The K-means clustering algorithm requires the number of clusters (K) to be specified in advance. In order to estimate the best value for K, the optimal number of clusters to parsimoniously explain the variance in the dataset, we used the Elbow Method and an approach based on the average Silhouette width (Fig. S2). Based on these results, we selected K=2 (Fig. 1), K=3 (Fig. S5) and K=4 (Fig. 2) for further analysis and interpretation. No geographic or environmental data were used to inform the clustering analyses. The K-means clustering was carried out with the kmeans() function in base R (R Core Development Team, 2016). We assessed the robustness of the K-means clustering results using a silhouette analysis with functions in the “cluster” package (57). In order to assess variation in group fidelity, we classified individual sites as to whether the silhouette widths were larger or smaller than 0.2. In this way, we could detect areas of geographic, environmental and compositional space where clustering results were strongly or weakly supported.
In addition, we performed an evolutionary ordination of tree assemblages based on their phylogenetic lineage composition, following protocols developed by Pavoine (2016) (58). We specifically used an evolutionary PCA, implemented with the evopca() function in the “adiv” package, with a Hellinger transformation of the genus by site matrix, as this is a powerful approach to detect phylogenetic patterns along gradients, while also allowing positioning of sites and clades in an ordination space (58). The first two axes explained 9.6% and 6.7% of the variation in the data, with subsequent axes each explaining <5.5%.
Correspondence between clustering results and environmental variables
We tested the correlation between our K=2 clustering result and eight different delimitations of the tropics, as per Feeley and Stroud (2018) (24). These delimitations were: C1) all areas between 23.4°S and 23.4°N; C2) all areas with a net positive energy balance; C3) all areas where mean annual temperature does not co-vary with latitude; C4) all areas where temperatures do not go below freezing in a typical year; C5) all areas where the mean monthly temperature is never less than 18°C; C6) all areas where the mean annual “biotemperature” ≥ 24 °C; C7) all areas where the annual range of temperature is less than the average daily temperature range; C8) all areas where precipitation seasonality exceeds temperature seasonality. We calculated the correspondence between our binary clustering (i.e. tropical vs. extratropical) and each of these delimitations as the proportion of sites where the delimitations matched.
To assess the environmental space occupied by different groups from our clustering analyses, we obtained estimates of mean annual temperature, mean annual precipitation and minimum temperature of the coldest month from the Worldclim dataset (59) and Maximum Climatological Water Deficit (CWD) from Chave et al. (2014) (60). We estimated the density of the distribution of sites in the environmental space using ellipses containing 95% of the sites with the kde() function from “ks” package (61).
Shared versus Unique “Phylogenetic Diversity” (PD)
As the Phylo-Sorensen estimation of evolutionary (dis)similarity cannot distinguish variation associated to differences in total phylogenetic diversity (PD), or phylogenetic richness versus variation associated to phylogenetic turnover per se, we measured the shared and unique PD associated with each group for the K=2, K=3 and K=4 clustering analyses. First, we estimated the association of genera with each group by an indicator species analysis following de Caceres et al. (2009) (62). Specifically, we used the multipatt() function in the R Packages indicspecies (63) to allow genera to be associated with more than one group (when K > 2). The output of the multipatt function includes the stat index, which is a function of the specificity (the probability that a surveyed site belongs to the target site group given the fact that the genus has been found) and fidelity (the probability of finding the genus in sites belonging to the given site group). We constructed pruned phylogenies including those genera with specificity greater than 0.6 to a group, or combination of groups, to estimate the total PD found in each group or combination of groups. Then, we subtracted these totals from the total for the complete, unpruned phylogeny to determine the amount of phylogenetic diversity restricted to each group, or combination of groups. Finally, we estimated the PD shared across all groups as that which was not restricted to any particular group or any combination of groups. We fit these different PD totals as areas in a Euler diagram with the euler() function in the “eulerr” package (64) for the K=2 and K=3 clustering, and with the Venn() fuction in the “venn” package (65) for the K=4 clustering.
Indicator lineages for clusters
In order to further characterise the composition of the evolutionary groups, we conducted an indicator analysis to determine the clades most strongly associated with each group. We created a site x node matrix (see function used in Appendix 1), which consists of a presence/absence matrix for each internal node in the phylogeny and ran an indicator analysis for the nodes. We selected the highest-level, independent (i.e. non-nested) nodes with the highest stat values to present in Tables S1 and S2. The indicator node analysis was carried out with function multipatt() in the R Package indicspecies (63).
Supplementary Materials
1) Table S1 Indicator clades for K=2 groups.
2) Table S2 Indicator clades for K=4 groups.
3) Table S3 Affiliation of principal vegetation formations in the tropics with the two main tropical groups from the K=4 clustering analysis.
4) Table S4. Stem ages for genera nodes used to callibrate the phylogenetic tree.
5) Figure S1 Fig. S1. Match between tropics vs. extratropics groups from K=2 clustering and eight delimitations of the tropics.
6) Figure S2 Selection of number of clusters.
7) Figure S3 Shared versus unique Phylogenetic Diversity for K=2 and K=3 clustering analyses.
8) Figure S4 Clustering K = 2.
9) Figure S5 Clustering K = 3.
10) Figure S6 Clustering K = 4.
11) References Supplementary Material