gen3sis: the general engine for eco-evolutionary simulations on the origins of biodiversity

Understanding the origins of biodiversity has been an aspiration since the days of early naturalists. The immense complexity of ecological, evolutionary and spatial processes, however, has made this goal elusive to this day. Computer models serve progress in many scientific fields, but in the fields of macroecology and macroevolution, eco-evolutionary models are comparatively less developed. We present a general, spatially-explicit, eco-evolutionary engine with a modular implementation that enables the modelling of multiple macroecological and macroevolutionary processes and feedbacks across representative spatio-temporally dynamic landscapes. Modelled processes can include environmental filtering, biotic interactions, dispersal, speciation and evolution of ecological traits. Commonly observed biodiversity patterns, such as α, β and γ diversity, species ranges, ecological traits and phylogenies, emerge as simulations proceed. As a case study, we examined alternative hypotheses expected to have shaped the latitudinal diversity gradient (LDG) during the Earth’s Cenozoic era. We found that a carrying capacity linked with energy was the only model variant that could simultaneously produce a realistic LDG, species range size frequencies, and phylogenetic tree balance. The model engine is open source and available as an R-package, enabling future exploration of various landscapes and biological processes, while outputs can be linked with a variety of empirical biodiversity patterns. This work represents a step towards a numeric and mechanistic understanding of the physical and biological processes that shape Earth’s biodiversity.


Introduction
Divergence between geographically isolated clusters of populations increases over time while (re-)connected clusters decrease down to zero; speciation happens when the divergence between two clusters is above the speciation threshold, but can also consider trait differences. geographic occupancy (species range), and determines when geographic isolation between 242 population clusters is sufficient to trigger a lineage-splitting event of cladogenesis. A species' 243 range can be segregated into spatially discontinuous geographic clusters of sites and is 244 determined by multiple other processes. The clustering of occupied sites is based on the 245 species' dispersal capacity and the landscape connection costs. Over time, disconnected 246 clusters gradually accumulate incompatibility (divergence), analogous to genetic 247 differentiation. Disconnected species population clusters that maintain geographic isolation for 248 a prolonged period of time will result in different species after the differentiation threshold Ϟ is 249 The computer model delivers a wide range of outputs that can be compared with empirical 305 data ( Figure 1, Table 2). Gen3sis is therefore suitable for analysing the links between 306 interacting processes and their multidimensional emergent patterns. By recording the time and 307 origin of all speciation events, as well as trait distributions and abundance throughout 308 evolutionary history, the simulation model records the information required to track the 309 dynamics of diversity and the shaping of phylogenetic relationships. The most common 310 patterns observed and studied by ecologists and evolutionary biologists, including species 311 ranges, abundances and richness, are emergent properties of the modelled processes (Table  312 2). All internal objects are accessible to the observer function, which is configurable and 313 executed during simulation runs. This provides direct simulation outputs in a format ready to 314 be stored, analysed and compared with empirical data. Given the flexibility of gen3sis, it is 315 possible to explore not only parameter ranges guided by prior knowledge available for a given 316 taxonomic group, but also variations in landscape scenarios and mechanisms ( Figure 3) We implemented one model for each of these hypotheses and simulated the spread, 348 speciation, dispersal and extinction of terrestrial organisms over the Cenozoic. We evaluated 349 whether the emerging patterns from these simulated mechanisms correspond to the empirical 350 LDG, phylogenetic tree imbalance and range size frequencies computed from data of major 351 tetrapod groups, including mammals, birds, amphibians and reptiles ( Figure 3).  Table  356 S1), and model evaluation and test, based on multiple patterns including: LDG, range size 357 distributions and phylogenetic balance. Selection criteria were based on empirical data from 358 major tetrapod groups, i.e. mammals, birds, amphibians and reptiles (Table 3). 359

360
The Cenozoic (i.e. 65 Ma until the present) is considered key for the diversification of the 361 current biota [113] and is the period during which the modern LDG is expected to have been 362 formed [114]. In the Cenozoic, the continents assumed their modern geographic configuration 363 [24]. Climatically, this period was characterized by a general cooling, especially in the 364 Miocene, and ended with the climatic oscillations of the Quaternary [115]. We compiled two 365 global paleoenvironmental landscapes (i.e. L1 and L2) for the Cenozoic at 1° and ~170 kyr of 366 spatial and temporal resolution, respectively (Note S1, Animations S1 and S2). To account for 367 uncertainties on paleo-reconstructions on the emerging large-scale biodiversity patterns, we used two paleo-elevation reconstructions [116,117]  Hypothesis implementation 376 We implemented three hypotheses explaining the emergence of the LDG as different gen3sis 377 models. The models (i.e. M0, M1 and M2) had distinct speciation and ecological processes 378 ( Figure 3, Note S1, Table S1). All simulations were initiated with one single ancestor species 379 spread over the entire terrestrial surface of the Earth at 65 Ma, where the temperature optimum 380 of each population matched local site conditions. Since we focused on terrestrial organisms, 381 aquatic sites were considered inhabitable and twice as difficult to cross as terrestrial sites. 382 This approximates the different dispersal limitation imposed by aquatic and terrestrial sites. 383 The spherical shape of the Earth was accounted for in distance calculations by using 384 haversine geodesic distances. Species disperse following a Weibull distribution with shape 2 385 or 5 and a scale of 550, 650, 750 or 850, resulting in most values being around 500-1500 km, 386 with rare large dispersal events above 2000 km. The evolution function defines the 387 temperature niche optimum to evolve following Brownian motion. Temperature niche optima 388 are homogenized per geographic cluster by an abundance-weighted mean after ecological 389 processes happen. We explored three rates of niche evolution, with a standard deviation 390 equivalent to ±0.1°C, ±0.5°C and ±1°C. 391 the species population abundance, where the abundance increases proportionally to the 394 distance between the population temperature niche optimum and the site temperature (Note 395 S1). Clusters of populations that accumulated differentiation over Ϟ = 12, 24, 36, 48 and 60 will 396 speciate, corresponding to events occurring after 2, 4, 6, 8 and 10 myr of isolation, 397 respectively. The divergence rate between isolated clusters was kept constant (i.e. +1 for 398 every 170 kyr of isolation). Model M0, assuming time for species accumulation, acted as a 399 baseline model. This means that all mechanisms present in this model were the same for M1 400 and M2 if not specified otherwise. 401

M1.
In the implementation of the diversification rates, the speciation function applies a 402 temperature-dependent divergence between population clusters [61, 62]. Species in warmer 403 environments accumulate divergence between disconnected clusters of populations at a 404 higher rate (Note S1). The rate of differentiation increase was the average site temperature of 405 the species clusters to the power of 2, 4 or 6 plus a constant. This created a differentiation 406 increase of +1.5 for isolated clusters of a species at the warmest range and +0.5 at the coldest 407 range for every 170 kyr of isolation (Note S1, Figure S1). Using Ϟ = 12, 24, 36, 48 and 60, this 408 corresponds to a speciation event after 1.3, 2.7, 4.0, 5.3, 6.7 myr and after 4, 8, 12, 16, 20 409 myr for the warmest and coldest species, respectively. 3). In addition, we explored dispersal distributions and parameters ranging in realized mean 421 and 95% quantiles between less than a single cell, i.e. ~50 km for a landscape at 4°, and more 422 than the Earth's diameter, i.e. ~12'742 km ( Figure S2). Trait evolution frequency and intensity 423 ranged from zero to one. We ran a full factorial exploration of these parameter ranges at a 424 coarse resolution of 4° (i.e. M0 n=480, M1 n=720, M2 n=480) and compared these to empirical 425 data. Simulations considered further: (i) had at least one speciation event; (ii) did not have all 426 species becoming extinct; (iii) had fewer than 50'000 species; or (iv) had fewer than 10'000 427 species cohabiting the same site at any point in time (Note S1). After parameter range 428 exploration, we identified realistic parameters and ran a subset at 1° for high-resolution outputs 429 ( Figure 4). 430 Correspondence with empirical data 431 In order to explore the parameters of all three models and compare their ability to produce the 432 observed biodiversity patterns, we used a pattern-oriented modelling (POM) approach [23, 433 86]. POM compares the predictions of each model and parameter combination with a number 434 of diagnostic patterns from empirical observations. In our case, we used the LDG slope, tree 435 imbalance and range size frequencies as diagnostics patterns (Figure 3, Note S1  Simulations results and synthesis 458 We found that model M2 was the best match for all the empirical patterns individually, and the 459 only model able to pass all acceptance criteria (Table 3). Although all three models were able 460 to reproduce the LDG, M2 was superior in explaining the LDG, phylogenetic tree imbalance 461 and species range size frequencies simultaneously (Table 3). Most simulations of model M2 462 (67%) resulted in a decrease in species richness at higher latitudes, indicating that the LDG 463 emerged systematically under M2 mechanisms ( Figure S3, Tables S2, S3 and S4). Increasing 464 the spatial resolution of the simulations (n=12) resulted in an increase in  richness and 465 computation time and a slight decrease of the LDG (Figure S5), which was associated with a disproportionally larger number of sites towards higher latitudes, which also affects population 467 connectivity and therefore speciation rates [137]. We then selected the best matching 468 simulation of M2 in L1 at 1° (n=12) that predicted realistic biodiversity patterns (Figure 4, 469 Animation S4), The emerging LDG (i.e. 4.6% of species loss per latitudinal degree) closely 470 matched empirical curves, with good agreement for mammals (Pearson r=0.6), birds (r=0.57), 471 amphibians (r=0.57) and reptiles (r=0.38) (Note S1, Figure 4C, Figure S6). Finally, we found 472 that the support for M2 over M0 and M1 was consistent across the two alternative landscapes 473 L1 and L2 ( Figure S3, Table S4). 474 Our sensitivity analyses of parameters further provided information about the role of dispersal 475 and ecological processes in shaping the LDG (Note S1,

509
Understanding the emergence of biodiversity patterns requires the consideration of multiple 510 biological processes and abiotic forces that potentially underpin them [20,26,35,36]. We 511 have introduced gen3sis, a modular, spatially-explicit, eco-evolutionary simulation engine 512 implemented as an R-package, which offers the possibility to explore ecological and 513 macroevolutionary dynamics over changing landscapes. Gen3sis generates commonly 514 observed diversity patterns and, thanks to its flexibility, enables the testing of a broad range 515 of hypotheses (Table 4) Using a case study, we have illustrated the flexibility and utility of gen3sis in modelling multiple 522 eco-evolutionary hypotheses in global paleo-environmental reconstructions (Figures 3 and 4). 523 Our findings suggest that global biodiversity patterns can be modelled realistically by 524 combining paleo-environmental reconstructions with eco-evolutionary processes, thus moving 525 beyond pattern description to pattern reproduction [35]. Nevertheless, in our case study we 526 only implemented a few of the standing LDG hypotheses [20,34]. Multiple macroecological 527 and macroevolutionary hypotheses still have to be tested, including the role of stronger biotic 528 interactions in the tropics than in other regions [142], and compared with more biodiversity 529 patterns [20]. Considering multiple additional biodiversity patterns will allow a more robust 530 selection of models. Apart from the global LDG case study, we propose an additional case 531 study (Note S2, Figure S7) illustrating how gen3sis can be used for regional and theoretical 532 studies, such as investigations of the effect of island ontology on the temporal dynamics of 533 biodiversity [39,143]. Further, illustrations associated with the programming code are offered as a vignette of the R-package, which will support broad application of gen3sis. Altogether, 535 our examples illustrate the great potential for exploration provided by gen3sis, promising future 536 advances in our understanding of empirical biodiversity patterns. 537 Verbal explanations of the main principles underlying the emergence of biodiversity are 538 frequently proposed but are rarely quantified or readily generalized across study systems [20]. 539 We anticipate that gen3sis will be particularly useful for exploring the consequences of 540 mechanisms that so far have mostly been verbally defined. For example, the origins of 541 biodiversity gradients have been associated with a variety of mechanisms [7], but these 542 represent verbal abstractions of biological processes that are difficult to evaluate [20]. 543 Whereas simulation models can always be improved, their formulation implies formalizing 544 process-based abstractions via mechanisms expected to shape the emergent properties of a 545 system [144]. Specifically, when conveying models with gen3sis, decisions regarding the 546 biological processes and landscapes must be formalized in a reproducible fashion. By 547 introducing gen3sis, we encourage a standardization of configuration and landscape objects, 548 which will facilitate future model comparisons. This standardization offers a robust framework 549 for developing, testing, comparing, and applying the mechanisms relevant to biodiversity 550 research. 551 Studying multiple patterns is a promising approach in disentangling competing hypotheses 552 [20,86]. A wide range of biodiversity dimensions can be simulated with gen3sis (Table 2) to explore the formation of the LDG and account for uncertainties and limitations. For instance, 578 we represented Quaternary climatic oscillation using ~170 kyr time-steps, which correspond 579 to a coarser temporal scale compared with the frequency of oscillations, and thus do not 580 account for shorter climatic variation effects on diversity patterns [25, 26, 44]. We also did not 581 consider ice cover, that can mask species' habitable sites, which probably explains the the 582 mismatch between simulated and empirical LDG patterns below 50° ( Figure 4C). Moreover, 583 paleo indicators of climate from Köppen bands have major limitations, and the temperature 584 estimation derived in our case study can suffer from large inaccuracies. Lastly, extrapolation 585 of the current temperature lapse rate along elevation might lead to erroneous estimates, 586 especially in terms of the interaction with air moisture [155], which was not further investigated 587 here. Hence, the presented case study represents a preliminary attempt for illustrative 588 purposes. Further research is required to generate more accurate paleolandscapes, and 589 research in biology should improve empirical evidence and our understanding of mechanisms. 590 Table 4. A non-exhaustive list of expected applications of gen3sis. Given the flexibility and the 593 range of outputs produced by the engine, we expect that gen3sis will serve a large range of 594 purposes, from testing a variety of theories and hypotheses to evaluating phylogenetic 595 diversification methods. 596

Use
Examples from Figure 1 Testing phylogenetic inference methods, including diversification rates in phylogeographic reconstructions.
Infer diversification rate in gen3sis simulated phylogenies (E) and compare with a known diversification in gen3sis (A, B & G).
Providing biotic scenarios for past responses to geodynamics.
Based on model outputs (C-F) and comparisons with empirical data (H), select plausible models (B).
Testing paleo-climatic and paleotopographic reconstructions using biodiversity data.
Based on model outputs (C-F) and comparisons with empirical data (H), select plausible landscape(s) (A).
Comparing expectations of different processes relating to the origin of biodiversity; generating and testing hypotheses.
Compare models (A, B & G) with outputs (C-F) and possibly how well outputs match empirical data (H).
Comparing simulated intra-specific population structure with empirical genetic data.
Compare simulated divergence matrices with population genetic data.
Forecasting the response of biodiversity to global changes (e.g. climate or fragmentation).

Extrapolate plausible and validated models (A, B & G) on landscapes under climate change scenarios (A).
Investigating trait evolution through space and time.
Combine past and present simulated species traits (F) and distributions (C, D) with fossil and trait data (H).
Modelling complex systems in space and time in unconventional biological contexts in order to investigate ecoevolutionary processes in fields traditionally not relying on biological principles.
Model eco-evolutionary mechanisms (A, B & G) in an unconventional eco-evolutionary context.

598
Here we have introduced gen3sis, a modular simulation engine that enables exploration of the 599 consequences of ecological and evolutionary processes and feedbacks on the emergence of 600