Fossils know it best: using a new set of fossil calibrations to improve the temporal 2 phylogenetic framework of murid rodents (Rodentia: Myomorpha: Muroidea: 3 Muridae)

26 Murid rodents (Rodentia: Myomorpha: Muroidea: Muridae) represent the most diverse and 27 abundant mammalian group. In this study, we reconstruct a dated phylogeny of the family using 28 a multilocus dataset (six nuclear and nine mitochondrial gene fragments) encompassing 160 29 species representing 82 distinct murid genera from four extant subfamilies (Deomyinae, 30 Gerbillinae, Lophiomyinae, and Murinae). In comparison with previous studies on murid or 31 muroid rodents, our work stands out for the implementation of multiple fossil constraints within 32 the Muridae thanks to a thorough review of the fossil record. Before being assigned to specific 33 nodes of the phylogeny, all potential fossil constraints were carefully assessed; they were also 34 subjected to several cross-validation analyses. The resulting phylogeny is consistent with 35 previous phylogenetic studies on murids, and recovers the monophyly of all sampled murid 36 subfamilies and tribes. Based on nine controlled fossil calibrations, our inferred temporal 37 timeframe indicates that the murid family likely originated in the course of the Early Miocene, 38 23.0-16.0 million years ago (Ma), and that most major lineages (i.e. tribes) have started 39 diversifying ca. 10 Ma. Historical biogeography analyses support the Paleotropical origin for 40 the family, with an initial internal split (vicariance event) followed by subsequent migrations 41 between Afrotropical and Indomalayan lineages. During the course of their diversification, the 42 biogeographic pattern of murids is marked by several dispersal events toward the Australasian 43 and the Palearctic regions, mostly from the Indomalaya. The Afrotropical region was also 44 secondarily colonized at least three times from the Indomalaya, indicating that the latter region 45 has acted as a major centre of diversification for the family. Afrotropical region (Deomyinae, Gerbillinae and Lophiomyinae) and 541 one in the Indomalaya (Murinae). Our study also supports a dynamic biogeographic scenario in which repeated colonisation events occurred in the Australasian (Hydromyini, Rattini), Afrotropical (Malacomyini, Praomyini, Arvicanthini, Otomyini) and Palearctic (Apodemini) regions. One of the strong aspects of this study lies in the assessment and treatment of fossil


Introduction 51
With about 150 genera and more than 730 recognized species, Muridae is the most diverse 52 family of mammals (Musser and Carleton, 2005). Collectively murids have colonized highly 53 distinct ecological niches, adapting to a wide array of environments ranging from warm (deserts 54 or tropical forests) to cold habitats (high altitude mountain ranges, tundra; Vaughan et al., Murinae is the genus †Antemus Jacobs (Jacobs and Downs, 1994;Jacobs and Flynn, 2005;127 Kimura et al., 2015). †Antemus possesses a new cusp (anterostyle, also known as t1), which is 128 a synapomorphy of Murinae. The earliest record of †Antemus chinjiensis is dated at 13.8 Ma 129 (Jacobs et al., 1990) based on specimens from the locality YGSP 491, Chinji Formation in the 130 Potwar Plateau, Pakistan (Jacobs, 1977). In the fossil record of the Potwar Plateau, two more 131 derived fossil genera are of particular interest: †Karnimata Jacobs and †Progonomys Schaub. 132 Based on the relative position of the anterostyle to the lingual anterocone on M1, Jacobs (1978) 133 hypothesized that †Karnimata is related to Rattus and that †Progonomys is a member of the 134 lineage including Mus. Hence, their first stratigraphic occurrence has been used to define the 135 widely used Mus/Rattus calibration (ca. 12 My; Jacobs and Downs, 1994). However, in 2015, 136 Kimura et al. revisited these fossils from a paleontological perspective and proved this 137 calibration point to be controversial. They showed that †Karnimata is a member of the 138 Arvicanthini-Millardini-Otomyini clade rather than a member of the lineage encompassing the 139 genus Rattus and its relatives (i.e. tribe Rattini). Therefore, they demonstrated that the 140 continuous fossil record of the murine rodents from the Potwar Plateau actually provides a 141 minimum age for the most recent common ancestor of the lineages leading to Arvicanthis 142 Lesson and Mus (= Mus/Arvicanthis split). 143 Recent progresses in divergence dating analyses lead us to revisit results previously 144 obtained by favouring the implementation of a new set of well-justified primary fossil 145 calibrations within a Bayesian framework. In comparison to previous studies (listed above), our 146 study can be considered as medium-sized in terms of taxonomic sampling, and essentially 147 focused on the family Muridae. But our study stands out for rigorous evaluation of the fossil 148 data for this highly diverse mammalian family. The present study has four main objectives: (i) 149 7 to design a comprehensive multi-marker molecular dataset for the family Muridae, (ii) to review 150 the murid fossil record in order to identify reliable and suitable primary fossil calibrations, (iii) 151 to provide a reliable estimate of the timing of diversification of the family using multiple fossil 152 calibrations, and (iv) to lean on the resulting dated phylogeny to reconstruct the biogeographic 153 history of the family using up-to-date analytical approaches. The newly generated sequences were further combined with data from GenBank. 10 and one partition is associated with a Hasegawa-Kishino-Yano (HKY +Γ +I) model (see Table  224 2). 225 Bayesian inference analyses were carried out using MrBayes v3.2.6 (Ronquist et al., 226 2012b). Two independent runs with four MCMC (one cold and three incrementally heated 227 chains) were conducted: they ran for 50 million generations, with trees sampled every 1,000 228 generations. A conservative 25% burn-in was applied after checking for stability on the log- Maximum likelihood analyses were performed using RAxML v8.2.8 (Stamatakis, 2014). 234 Because this software does not allow simpler substitution models, we used three partitions with 235 a General Time Reversible (GTR +Γ +I) model (see Table 2). The best ML tree was obtained 236 using heuristic searches with 100 random addition replicates. Clade support was then assessed 237 using a non-parametric bootstrap procedure with 1,000 replicates. Following Hillis and Bull 238  Table 3). The candidate fossils possess the 244 information for the collection site, unique identification number, and the state of preservation 245 along with justification for the age of the fossil (i.e., age of fossil-bearing formation and 246 stratigraphic level, preferably with an absolute age by radiometric dating and/or reliable relative 247 age estimates, for example, by magnetostratigraphy). 248 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017. ; https://doi.org/10.1101/180398 doi: bioRxiv preprint In the next step, diagnostic morphological characters were reassessed to determine 249 whether they could be reliably used as minimum age constraints in our phylogeny, either as 250 crown or stem calibrations. Seven fossils were discarded following this step (see the 'Results' 251 section). 252 For the remaining 11 fossils, we used the cross-validation procedure developed by Near 253 and Sanderson (2004) and Near et al. (2005). The following approach was used: (i) we 254 identified potential inconsistencies within the 11 remaining fossil calibrations, and (ii) we 255 explored the impact of the inclusion of each of these fossils on our divergence time estimates. 256 Each of the 11 fossil constraints was enforced at a time in a specific Bayesian relaxed-clock 257 (BRC) analysis to estimate the ages of the remaining nodes (see also section 2.5). First, the sum 258 of the squared differences between the molecular and fossil age estimates (SS) was calculated 259 (for more details see Near and Sanderson, 2004). All calibration points were then ranked based 260 on the magnitude of its SS score; here the fossil with the greatest SS score is assumed to be the 261 most inconsistent with respect to all other fossils in the analysis (Near and Sanderson, 2004). Ancestral biogeography was reconstructed using the R package 'BioGeoBEARS' 320 (Matzke, 2013). Data for species' ranges were obtained from the International Union for 321 Conservation of Nature website (https://www.iucn.org/). Five major biogeographic areas were 322 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017.

Evaluation of suitable fossil calibrations 360
We summarized all fossils considered in this study in Table 3 and Appendix B regarding 361 taxonomic information and specification for prior settings (see also Figure 2 for their respective 362 positions within the tree). Five out of 18 preselected fossils (i.e. †Parapelomys robertsi Jacobs, 363 †Potwarmus primitivus, †Preacomys kikiae Geraads, †cf. Progonomys sp. Schaub and †aff. 364 Stenocephalomys Frick) were excluded from further analyses because the scarcity of 365 paleontological interpretation about their phylogenetic relationships impeded assigning them to 366 specific nodes of the phylogeny (Appendix B). We also excluded fossils of Acomys and 367 Lemniscomys Trouessart from the Lemudong'O locality, Kenya (Manthi, 2007), because first 368 upper molars, which possess the most diagnostic characters in the murine dentition, are not 369 described from the locality (Table 3; see more details also in Appendix B). The two-step cross-370 validation procedure resulted in a further reduction of the fossil set of possible calibration 371 points. Specifically, we excluded two fossils: one is a 2.4 Ma fossil identified as the genus 372 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017. ; https://doi.org/10.1101/180398 doi: bioRxiv preprint Gerbillus Desmarest, while the other one corresponds to a 6.1 Ma fossil identified as the genus 373 Mastomys Thomas (Appendix C). The rationale is that (i) these two fossil calibrations exhibited 374 the largest magnitude of SS, and that (ii) their removal also resulted in a very high (fivefold) 375 decrease in s (see Appendix C for more details). As a result of the latter series of selection steps, 376 nine fossils were finally retained for divergence dating (see Figure 2 for their position on the 377 tree, Table 4 for specification of priors, and Appendix B for more details on all considered 378 fossils). 379 380

Historical biogeography and divergence dating 381
Among the six models of geographic-range evolution compared in a likelihood 382 framework in BioGeoBEARS, the Dispersal-Extinction Cladogenesis model with founder-383 event speciation (DEC +J) was chosen because of its best likelihood and AICc associated scores 384 (lnL=-117.1, AICc=240.4; Table 5). 385 The dated tree resulting from the BRC analyses is shown in Figure 2 while dating 386 estimates for all internal nodes are provided in Table 6 within Arvicanthini is not resolved (Fig. 1); the dispersal to Indomalaya of the lineage leading 426 to the extant Golunda species at ca. 8.5 Ma (95% HPD: 7.33-9.57 Ma; as suggested in Fig. 3) 427 should therefore be taken with caution. 428

Selection of taxa and molecular markers 430
Our sampling of 160 species from 82 genera represents 22% of known murid species 431 diversity and more than half of the generic diversity of the family Muridae. When one compares 432 our sampling effort to previous studies (Table 7) (Table 6). 445 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017. ; https://doi.org/10.1101/180398 doi: bioRxiv preprint Other fossils frequently used for molecular clock calibration are from the genus 446 †Parapodemus. Recent studies used these fossils in two ways: the first occurrence of 447 †Parapodemus sp. (Martín-Suáres and Mein, 1998)  The last faunal interchange of murid taxa between Africa, Asia and Western Palearctic 519 (Benammi et al., 1996;Sabatier, 1982;Sen, 1977Sen, , 1983Winkler, 2002) is coincident with 520 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017. ; https://doi.org/10.1101/180398 doi: bioRxiv preprint

Conclusion and perspectives 537
In this study, we provided an improved multilocus dated phylogeny for the highly 538 speciose family Muridae. Both our dating and historical biogeography analyses suggest that the 539 family originated during the Early Miocene, and subsequently gave rise to four extant 540 subfamilies: three in the Afrotropical region (Deomyinae, Gerbillinae and Lophiomyinae) and 541 one in the Indomalaya (Murinae). Our study also supports a dynamic biogeographic scenario 542 in which repeated colonisation events occurred in the Australasian (Hydromyini, Rattini), 543 Afrotropical (Malacomyini, Praomyini, Arvicanthini, Otomyini) and Palearctic (Apodemini) 544 regions. One of the strong aspects of this study lies in the assessment and treatment of fossil 545 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017.  Table 4 for more details). 932 was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017. ; https://doi.org/10.1101/180398 doi: bioRxiv preprint 39 Tables: 940   Table 1: Brief summary of taxonomic changes in the family Muridae. The taxonomic position 941 of particular lineages has changed significantly and they were either considered as 942 separate families, or included in other families outside Muridae (names of families 943 are in bold). 944   and lognormal prior distribution. 951 Table 5: Comparison of models used for BioGeoBEARS; likelihood scores (LnL), number of 952 parameters (numparams), dispersal rate (d), extinction rate (e), free parameter controlling 953 the relative probability of founder-event speciation events at cladogenesis (j), corrected 954 Akaike Information Criterion (AICc), and AICc model weights. 955  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted September 10, 2017. ; https://doi.org/10.1101/180398 doi: bioRxiv preprint