Global biogeographical regions of freshwater fish species

To define the major biogeographical regions and transition zones for freshwater fish species.


| INTRODUC TI ON
For almost two centuries, biogeographers have classified continental areas of the world into distinct biogeographical regions on the basis of organism distributions across the Earth (Cox, 2001;Holt et al., 2013;Wallace, 1876). Indeed, early biogeographers observed that many organisms share constellated distributions of endemics in particular regions. Furthermore, they observed that these patterns of endemism are often similar for distinct groups of organisms, resulting in very similar biogeographical regions. This marked similarity has led to the hypothesis that these regions reflect a shared history of diversification among taxa and have been conditioned by geography, geology and climate (Lomolino, Riddle, & Whittaker, 2016;Morrone, 2015).
The earliest classifications outlined six major biogeographical regions for birds (Sclater, 1858) and non-flying mammals (Wallace, 1876;Nearctic, Neotropical, Palearctic, Ethiopian, Oriental and Australian). During recent years, these major regions have been confirmed by an upsurge in bioregionalization studies. This upsurge was facilitated by the increase in quality and quantity of large-scale datasets, as well as the development of new analytical tools (Edler, Guedes, Zizka, Rosvall, & Antonelli, 2016;Kreft & Jetz, 2010;Vilhena & Antonelli, 2015). Consequently, multiple studies have tried to identify the major biogeographical regions for birds (Holt et al., 2013;Procheş & Ramdhani, 2012;Rueda, Rodríguez, & Hawkins, 2013), mammals (Holt et al., 2013;Kreft & Jetz, 2010;Procheş & Ramdhani, 2012;Rueda et al., 2013), amphibians (Edler et al., 2016;Holt et al., 2013;Procheş & Ramdhani, 2012;Rueda et al., 2013;Vilhena & Antonelli, 2015) and reptiles (Procheş & Ramdhani, 2012). The result of this upsurge was a debate on the precise limits of biogeographical regions. Indeed, some studies explicitly defined transition zones as distinct regions (e.g. Holt et al., 2013), whereas others included transition zones in major regions (Kreft & Jetz, 2010). This question of transition zones was settled to some extent in the major synthesis of Morrone (2015), proposing that transition zones should not be considered as distinct regions, but rather as transitional boundaries between major regions. Indeed, some regions share sharp boundaries, reflecting a long history of isolation by tectonics (Ficetola, Mazel, & Thuiller, 2017), whereas others share diffuse boundaries, reflecting recent interchanges, generally limited by mountain or climatic barriers (Ficetola et al., 2017;Morrone, 2015). Morrone (2015) proposed that five major transition zones emerged from anterior works, which could be explained by a vicariance-dispersal model based on tectonic history. This synthetic model can be considered as a general framework to test for biogeographical regions.
However, the recent upsurge in continental bioregionalization studies has concentrated exclusively on terrestrial vertebrates, which represent but a fraction of the continental organisms. There are other continental organisms with constraints to their dispersal and ecology that are fundamentally distinct from terrestrial vertebrates and for which one might expect distinct biogeographical regions. For example, terrestrial plants are generally characterized by higher degrees of endemism than animals, because they are more constrained than animals in terms of dispersal and tolerance to surmount climatic and other physical barriers (Lomolino et al., 2016). Hence, major phytogeographical regions were described as manifold small regions (De Candolle, 1820, 1855Takhtajan, 1986).
However, Cox (2001) later proposed a handful of large floral regions comparable to biogeographical regions, thus suggesting that the major biogeographical regions are universal across the tree of life. A second example concerns human microbial diseases whose biogeography has also been shown recently to match terrestrial vertebrate biogeography (Murray et al., 2015). Another possibility concerns strictly freshwater organisms (i.e. organisms that live and disperse exclusively in freshwaters) as they have lower dispersal abilities than terrestrial vertebrates, and are geographically isolated in drainage basins usually flowing to the oceans. Terrestrial boundaries and salt waters represent strong barriers to dispersal, hence drainage basins have been considered as "island-like" systems for strictly freshwater organisms (Dias et al., 2014;Hugueny, Oberdorff, & Tedesco, 2010;Rahel, 2007;Tedesco et al., 2012). Dispersal can occur actively or passively via underground waters, stream captures, exceptional floods, glacier melting causing stream overflow, confluence during sea-level lowering and displacement by other organisms or typhoons (see also discussion in Capobianco & Friedman, 2018).
However, such dispersal events are rare, therefore, immigration and speciation presumably occur on similar time-scales . Consequently, one might expect that, because of peculiarities of riverscape changes through geological times, strictly freshwater organisms have been subject to different histories of diversification from those of terrestrial vertebrates (Rahel, 2007) and thus have original biogeographical boundaries. Because dispersal is physically constrained, a higher degree of provincialism and endemism could be anticipated for such organisms, resulting potentially in smaller and more numerous biogeographical regions.
In this paper, we focussed on the global biogeography of strictly freshwater actinopterygian fishes (i.e. excluding marine and amphidromous families of fish), hereafter called freshwater fishes. Several studies delineated biogeographical regions of freshwater fishes at regional to continental scales (e.g. Oikonomou, Leprieur, & Leonardos, 2014;Unmack, 2001), and studies conducted at the global scale also focussed on subregional provinces (ecoregions) based on a combination of data and expert decisions (Abell et al., 2008;Lévêque, Oberdorff, Paugy, Stiassny, & Tedesco, 2008). Only one work hinted at nine potential freshwater fish biogeographical regions that covered the same biogeographical regions as terrestrial vertebrates (Matthews, 1998), but this work was based on a coarse geographic scale (52 approximate drainage basins for the whole world) and a low taxonomic resolution (family level). In addition, Matthews (1998) included marine and diadromous fish families, which could conceal the effect of long-term isolation on freshwater fish endemicity patterns.
Consequently, it remains unresolved whether Sclater-Wallace's biogeographical regions are also applicable to freshwater fishes and to other freshwater organisms; therefore, a global-scale quantitative bioregionalization would represent an important step forward.
In this study, we aimed to define the major biogeographical regions for strictly freshwater fish species at the global scale. To delineate biogeographical regions, we capitalized on the recent development of a comprehensive dataset on freshwater fish distributions in drainage basins covering more than 80% of the Earth surface (Tedesco et al., 2017). First, we identified the large biogeographical regions of freshwater fishes using a recently developed hierarchical approach based on networks (Vilhena & Antonelli, 2015), recommended for bioregionalization studies (Bloomfield, Knerr, & Encinas-Viso, 2017;Edler et al., 2016;Rojas, Patarroyo, Mao, Bengtson, & Kowalewski, 2017). Then, we mapped the transition zones between regions and investigated species distributed across region boundaries. Finally, we compared biogeographical regions with terrestrial vertebrate biogeographical regions and discussed our findings in the light of the synthetic biogeographical model proposed by Morrone (2015).

| Distribution data
We based our bioregionalization on the most comprehensive global database on freshwater fish species occurrence in drainage basins (Tedesco et al., 2017). This database comprises 110,331 occurrence records for 14,953 species in 3,119 drainage basins of the world. Species names in the database were validated according to FishBase (Froese & Pauly, 2017) and the Catalogue of Fishes (Fricke, Eschmeyer, & van der Laan, 2017), and occurrence records were screened by the team developing the database (see details in Tedesco et al., 2017). We applied additional filters and corrections to the database. Since our aim was to describe the natural biogeographical regions resulting from long-term isolation of freshwater ichthyofaunas, we excluded documented records of introduced species, but included species considered to be recently extinct in their historical river basins. Additionally, to exclude most species that could disperse through marine waters, we retained only families having fewer than 10% of their species occurring in marine waters. This filter retained all "primary" and almost all "secondary" families of fishes (only Pseudomugilidae and Fundulidae were excluded), that is, families with respectively no or limited salt-tolerant species, as well as 22 families that had never been classified (based on Table 2 of Berra, 2007). It also included eight families with marine ancestors, seven of which had no species classified as tolerating salt water.
Finally, we removed all diadromous species, according to FishBase.
Additionally, we detected a few errors that were corrected in the database, mostly related to the native/introduced status for some species. The resulting dataset included 59,373 records of 11,295 species in 2,581 basins ( Figure 1, Appendix S1).
To define our bioregions, we worked at the species level and used drainage basins as geographical units. Indeed, (a) in the absence of a unified phylogeny for actinopterygian fishes, species is the most standard unit available and (b) contrary to terrestrial vertebrates (for which gridded distribution data of reliable quality are available), the most precise distribution data available for actinopterygian fishes is at the drainage basin unit. However, it is important to note that even if drainage basins are uneven in size, they are biogeographically meaningful for freshwater organisms because water bodies are generally connected within basins but not between basins (Hugueny et al., 2010).
F I G U R E 1 Global distribution of freshwater fish species richness per drainage basin based on the global database on freshwater fish species occurrence in drainage basins (Tedesco et al., 2017). Grey-shaded areas correspond to basins without records of native strictly freshwater species [Colour figure can be viewed at wileyonlinelibrary.com]

| Delineation of biogeographical regions
Until recently, the prevailing procedure for bioregionalization has been based on hierarchical clustering methods applied to compositional dissimilarity (Holt et al., 2013;Kreft & Jetz, 2010;Procheş & Ramdhani, 2012). Since then, an approach based on biogeographical networks was introduced by Vilhena and Antonelli (2015), and has been recommended for delineating biogeographical regions (Bloomfield et al., 2017;Edler et al., 2016;Rojas et al., 2017). A network is composed of a series of nodes which can be connected to each other by links (or edges). In bioregionalization, the network is composed of both sites (i.e. drainage basins here) and species, which constitutes a bipartite network. When a taxon is known to occur at a particular site, a link is drawn between the taxon and the site. A site cannot be connected to another site, and a taxon cannot be connected to another taxon. By definition, site-site and species-species links are not allowed in this type of analysis. Our final network had 13,876 nodes (11,295 species and 2,581 basins) and 59,373 links. We handled the network under gephi 0.9.2 (Bastian, Heymann, & Jacomy, 2009), with the forceatlas2 algorithm. This software groups nodes that are tightly interconnected (such as groups of sites and species from the same biogeographical region) and separates groups of nodes that are not interconnected (distinct biogeographical regions). Such a graphical representation is useful for analysing and exploring the network.
We applied a community-detection algorithm to the entire network in order to group nodes into clusters (i.e. biogeographical regions). We applied the map equation algorithm (www.mapeq uation.org, Rosvall & Bergstrom, 2008) because it has been tested and recommended to identify biogeographical regions (Edler et al., 2016;Rojas et al., 2017;Vilhena & Antonelli, 2015) and it features hierarchical clustering. Clusters are identified by the algorithm as having high intra-group but low inter-group connectivity, which corresponds well to the definition of biogeographical regions, that is, regions of distinct assemblages of endemic taxa. We ran map equation (software version Sat Oct 28 2017, Rosvall & Bergstrom, 2008) with 100 trials to find the optimal clustering. We ran the hierarchical clustering (i.e. multilevel) in order to test whether larger regions have a nested hierarchy of subregions. It is important to note that a hierarchy of regions identified at the species level illustrates how biogeographical regions (i.e. distinct assemblages of endemic taxa) are currently spatially nested, but does not represent a historical (i.e. evolutionary) hierarchy of how these regions emerged.
The biogeographical network approach presents several advantages over distance-based approaches that were instrumental in our choice. Foremost, species identities are not lost, that is, they are not abstracted into dissimilarity matrices between sites. Consequently, the network approach allows one to map how sites are connected by individual species, which presents an unquestionable asset to investigate between-and within-regions structures, such as potential dispersal pathways or barriers. A second practical novelty is that the algorithm assigns each species to a specific bioregion, which enables species-level descriptions (e.g. for online databases such as FishBase) and analyses. Lastly, the Map Equation algorithm is robust to differences in sampling intensities, making the removal of basins with low species richness unnecessary. On the other hand, distancebased approaches have limitations (see e.g. ) and can produce inconsistent results when transforming such large occurrence datasets into a single dimension during the clustering procedure (see Appendix S2).
However, we provide clustering results using two additional methods in Appendix S2 for comparison: another network-based algorithm (Simulated Annealing, Bloomfield et al., 2017) and a distance-based method (following the framework of Kreft & Jetz, 2010).

| Sensitivity analysis
We analysed the robustness of the identified regions by randomly extirpating a percentage of species (random value between 0.01% and 10.00% of the total number of species in the database) and rerunning the whole bioregionalization process. This process was repeated 200 times. Then, for each region, we quantified the percentage of each region initial area that was retrieved in each simulation (Appendix S3).

| Transition zones and species shared between regions
We calculated the participation coefficient (Bloomfield et al., 2017;Guimerà & Amaral, 2005) for each node of the biogeographical network. The participation coefficient indicates the degree to which a node is connected to different regions. A high participation coefficient for a given basin indicates that it contains species from different regions and can be assimilated to a transition zone between regions. A low participation coefficient indicates that all species in the basin belong to the same region. The participation coefficient of a node is calculated as follows: where P i is the participation coefficient of node i, k is is the number of links of node i to region s, k i is the total number of links of node i and N m is the total number of regions. We calculated the participation coefficient at each level of the biogeographical structure identified by Map Equation.
We also summarized the list of species that were shared between the major regions (i.e. excluding tiny clusters) and their distribution characteristics.

| RE SULTS
The Map Equation algorithm identified a hierarchy of biogeographical regions with up to six nested levels. For this global-scale study, we investigated the first three levels, termed as supercontinental regions, regions and subregions.

| Supercontinental regions
At the first level, we found that the world of freshwater fishes was divided into two supercontinental regions that we named New World (Americas) and Old World (Eurasia, Africa and Australian; Figure 2a).
Each supercontinental region contained nearly half the world's 11,295 species with virtually 100% of endemic species (Table 1). Only two species occurred in both supercontinental regions (Figure 3 and In each of these two supercontinental regions, 99% of genera and around 80% of families were endemic. At this first level, we also found 14 tiny clusters of 49 basins (exclusively located in the Old World) without endemic families and genera but with endemic species (40 species in total). These tiny clusters were most often composed of species-poor basins located in remote islands (e.g. Madagascan) or isolated arid areas (e.g. Arabian Peninsula). Therefore, we post hoc assigned these clusters to the Old World supercontinental region (see details in Appendix S5).

| Regions
At the second level, we found six major regions spatially nested within the two supercontinental regions (Figure 2b), that we named following Morrone (2015). In the Old World supercontinental region, we found four regions and a minor cluster (Figure 2b). The richest one (Table 1) was the Ethiopian region with nearly 50% of Old World species, covering the entire African continent and including areas north from the Sahara and a few basins in the Arabic Peninsula. The second richest one was the Sino-Oriental region which included south-eastern Asia from India to Borneo, most of China and Mongolia, Korea and Japan. The third one was the Palearctic region with fewer than 10% of Old World species, covering Europe, Central Asia (up to Pakistan and Kazakhstan) and Siberia. The fourth one, the poorest in species, was the Australian region, with only 80 species in total, covering Australia, Tasmania and Papua-New Guinea. Lastly, we identified Madagascan as a distinct minor cluster of the Old World, with 100% of endemic species and genera. Within the New World supercontinental region, we found two major regions. The first one was the Neotropical region, containing 85% of New World species and 42% of the world's known freshwater fish species (Table 1) Most regions had very high degrees of endemism (Table 1b)  the Sino-Oriental cluster (Figure 3). We also observed other minor changes, such as some clusters of basins appearing as small distinct regions, for example, in Central Asia.

| Subregions
At the third level, we observed different patterns among regions.
Three regions (Sino-Oriental, Nearctic and Australian) had only two to three main subregions (Appendix S7) that were spatially coherent and had high degrees of endemism (68.7%-91.5% of endemic species, Appendix S7). The other three regions (Ethiopian, Palearctic and Neotropical) were characterized by a high number of subregions, which were also generally spatially coherent (Appendix S7).
The Ethiopian and Neotropical subregions were generally characterized by a high number of species and endemics ( Figures S7.9 and   S7.14). The Palearctic subregions were characterized by a low number of species and generally low endemicity ( Figure S7.11).

| Transition zones and species shared between regions
At the supercontinental region level, we obtained participation coefficients of basins between 0.0 and 0.5, with transition zones (i.e. basins with high participation coefficients) located in north-eastern Siberia as well as in northern North America. At the regional level, we observed participation coefficients also ranging from 0.0 to 0.5 ( Figure 4). Unsurprisingly, we found the same transition zones between the Nearctic and the Palearctic as for supercontinental regions. However, the major transition zones were located at the Overall, the species that were distributed across boundaries could be separated into two broad categories (Appendix S4). First, we found that the majority of shared species had restricted distributions close to regional boundaries, with occasional occurrences beyond. For example, the two-spot livebearer Heterandria bimaculata has a distribution endemic to Central America at the northernmost part of the Neotropical region and incurred in two basins of the Nearctic. Second, we found a limited number of species with large spatial distributions that were able to incur in multiple basins of other regions. The best example is the Northern pike Esox lucius which is one of the two species distributed across both supercontinental regions. Another example is the Eurasian minnow Phoxinus phoxinus that is widespread in the Palearctic with multiple occurrences in the Sino-Oriental region.
On the other hand, we found almost no transition zones at the boundaries of the Ethiopian or the Australian regions. For the Ethiopian region, only a few basins in northern Africa and the Middle East shared species between the Ethiopian and Palearctic regions F I G U R E 3 Global biogeographical network of freshwater fishes. In this network, both species and drainage basins are represented as nodes. When a species is known to occur in a drainage basin, a link between the species and the basin is drawn. The network is very complex because of the high number of nodes (13,876 nodes corresponding to 11,295 species and 2,581 basins) and links (59,373 occurrences). We spatialized the network in Gephi with the ForceAtlas 2 algorithm in order to group nodes that are strongly interconnected (i.e. basins that share species in common) and spread away from all other nodes that are not interconnected (i.e. basins that have few or no species in common). We

| D ISCUSS I ON
Here, we provide the first global bioregionalization of freshwater fishes based on a quantitative analysis of species distributions. We found that the freshwater fish world is first divided into two super- Our results compellingly contradict our initial hypothesis that freshwater fish may have biogeographical regions different from the terrestrial vertebrate scheme of Sclater-Wallace because of their restricted dispersal abilities and the specific spatial-temporal dynamics of riverscapes. Indeed, we found a total of six major biogeographical regions that were similar in size and location to the six biogeographical regions identified by Wallace (1876). The only differences were the locations of several boundaries. More surprisingly, our regions were also similar to the coarse biogeographical regions based only on freshwater fish families identified by Matthews (1998) and that also included diadromous fish species. Our Neotropical region is the only major difference from results obtained by Matthews (1998) who identified a distinct cluster south from the Andes. This difference may be explained by the fact that this area was mostly colonized by the family of Galaxiidae that migrates between freshwater and oceans, and consequently have not been included in our analysis.
In addition, we also identified transition zones broadly following the bioregionalization model of Morrone (2015), suggesting that freshwater fish regions were shaped by vicariance and geodispersal events similar to other groups. However, we observed extremely high rates of endemism for each region; in each region, more than 96% of species were endemic except the Palearctic (89%). These endemism rates far exceed the endemism rates for other continental vertebrates such as birds (11%-84% endemism in major regions, see Appendix S8), mammals (31%-90%), herptiles (amphibian and reptiles: 46%-95%; amphibian only: 66%-98%), as calculated for regions of Procheş and Ramdhani (2012,  rates also exceed rates reported in marine biogeographical realms (17%-84%, Costello et al., 2017). Therefore, we conclude here that freshwater fishes are likely to have among the highest rates of species endemism for major biogeographical regions.
The two major supercontinental regions we identified might seem to contradict Morrone's biogeographical kingdoms (2015). Indeed, Morrone (2015) hypothesized that three major kingdoms could be derived from formerly disconnected land masses (i.e. Holarctic, Holotropical and Austral). This apparent contradiction is explained by our analyses at the species level which was not designed to reflect ancient biogeographical kingdoms resulting from the Gondwana split.
Rather, the two supercontinental regions described here suggest that of the world's 11,295 species can be found across supercontinental or regional boundaries Therefore, 99.24% of the world's freshwater fishes occur in a single supercontinental region and a single region, which ascertains the biological reality of our delineated clusters.

| Regions and transition zones
The Neotropical and Nearctic regions of freshwater fishes are very similar to the Neotropical and Nearctic regions highlighted for terrestrial mammals, amphibians and bird species (Holt et al., 2013;Kreft & Jetz, 2010;Procheş & Ramdhani, 2012). These two regions are, therefore, in agreement with the synthetic biogeographical regionalization of Morrone (2015). As for other groups of organisms, we identified a Mexican transition zone between Nearctic and Neotropical regions, suggesting that these organisms were affected by biotic interchange between the Americas, but to a much lesser extent than terrestrial vertebrates (Bussing, 1985;Smith & Bermingham, 2005). Indeed, we found that only 25 species with restricted distributions were shared between Nearctic and Neotropical regions, as illustrated by the restricted area of the transition zone (Figure 4). These species belong to both primary and secondary freshwater families, which colonized Mesoamerica separately, as suggested by molecular analyses (Smith & Bermingham, 2005). Secondary freshwater fishes probably dispersed through Mesoamerica before the formation of the Panama isthmus, during periods of high runoff leading to temporary freshwater or brackishwater bridges in marine waters coupled with northward discharge of the proto-Amazon during the Miocene ~18-15 Ma (Hoorn et al., 2010;Smith & Bermingham, 2005). Later, primary freshwater families were dispersed through Mesoamerica during the Isthmus formation via landscape diffusion (Smith & Bermingham, 2005).
Subsequent changes in the landscape led to increasing isolation of basins within Mesoamerica as well as occasional connectivity events, which in turn shaped the regional patterns of dispersal and diversification (Dias et al., 2014;Smith & Bermingham, 2005) that probably drove the restricted extent of the Mexican transition zone for freshwater fishes.
The Ethiopian region of freshwater fish resembles the Ethiopian regions of terrestrial vertebrates except for its northern limit. We found that this region expanded beyond the limits of the Sahara up to the Mediterranean Sea, similarly to flightless terrestrial mammals (Kreft & Jetz, 2010). However, flightless terrestrial mammals were also found to expand their Ethiopian boundary beyond the Arabian Peninsula into Central Asia, which was not the case here.
Moreover, most studies for terrestrial vertebrates located the northern boundary of the Ethiopian region south of the Sahara (Holt et al., 2013;Procheş & Ramdhani, 2012;Rueda et al., 2013;Vilhena & Antonelli, 2015). This discrepancy may be explained by several factors. First, most studies on vertebrates identified the Sahara and the northern coast of Africa as transition zones between the Palearctic and Ethiopian regions (Holt et al., 2013;Morrone, 2015).
This transition zone was not identified in our analyses: Palearctic fishes only anecdotally crossed the Mediterranean Sea, while their African counterparts merely ventured into the Middle East.
Therefore, we can hypothesize that the Mediterranean Sea was an insurmountable barrier for freshwater fishes, contrary to terrestrial vertebrates that could disperse across Straits of Gibraltar.
Dispersal could have during the Messinian salinity crisis about 6 Ma because of the desiccation of the Mediterranean Sea, according to the Lago Mare hypothesis (Bianco, 1990). This hypothesis stated that, during the refilling of the Mediterranean Sea, a freshwater or brackish phase occurred which would have permitted large-scale dispersal of freshwater fishes across the Mediterranean basin.
We can speculate that this characteristic Sino-Oriental region for fishes arose from several factors. Firstly, the entire region was prone to fish speciation since the Eocene (55Ma), probably because of the very high diversity of aquatic habitats combined with the repeated rearrangement of rivers through capture and glaciation melting (Dias et al., 2014;Kang et al., 2014;Kang, Huang, & Wu, 2017;Xing, Zhang, Fan, & Zhao, 2016). For example, the Cyprinidae family, which accounts for 45% of Sino-Oriental species, originated from the Indo-Malaysian tropical region and has likely radiated into Asia since the Eocene (Gaubert et al., 2009). Secondly, the northern boundary is located farther North than from other groups, suggesting that mountain barriers were more important in defining boundaries for fishes than for other groups, whose boundaries appeared to be rather defined by a combination of tectonics and climate (Ficetola et al., 2017). Consequently, the fish transition zone between Sino-Oriental and Palearctic is not located near areas of recent tectonic merging as reported for other groups (Morrone, 2015). In addition, this transition zone is asymmetrically distributed towards the Palearctic (Figure 4) and possibly exceeding the asymmetry reported for other groups (Sanmartín, Enghoff, & Ronquist, 2001), probably because of the extreme differences in fish richness between both regions. Given the mountainous nature of the boundary, dispersal pathways probably emerged at river confluences when the sea level dropped (Dias et al., 2014), both at the north-eastern and south-western parts of the Sino-Oriental boundaries (Gaubert et al., 2009).
The Australian region is the most depauperate of all fish regions, probably owing to the combination of the complete isolation of this region for the last 60 million years (Scotese, 2016)

| Subregions
At subregional and finer levels, we found multiple clusters with varying degrees of endemism, with a substantial number of species distributed between clusters (see Figure 3). While it indicates that strictly freshwater fish species display strong endemism patterns at subregional spatial scales, it also suggests a reticulated history of river basins. The differences in number and endemism of subregions among major regions may be explained by the combination of habitat size and diversity, past climate change and paleoconnectivity during the Last Glacial Maximum (LGM, see Dias et al., 2014;Leprieur et al., 2011;Tedesco et al., 2012). Past climate change had an enormous impact on high-latitude regions, such as the Nearctic and northern parts of the Palearctic . Most of these areas were covered by ice sheets during the LGM. These northern areas were colonized by species after the LGM (e.g. Rempel & Smith, 1998) from refuges located in the southernmost parts of these regions (Mississipi basin for the Nearctic, Danube basin for the Palearctic). Consequently, the relatively recent recolonization explains these large species-poor subregions.
On the other hand, such climate events were less extreme in tropical regions thereby allowing lineages to thrive for a long period (Tedesco, Oberdorff, Lasso, Zapata, & Hugueny, 2005). The combination of this prosperity with the high diversity and size of habitats in tropical regions, the long-term isolation of drainage basins during the Pleistocene (Dias et al., 2014) as well as stable climatic history and favourable climatic conditions (Wright, Ross, Keeling, McBride, & Gillman, 2011) probably generated conditions favourable to divergence and radiation processes in tropical subregions. This last hypothesis may explain the numerous tropical subregions with high diversity and endemism we found. The only apparent contradiction could arise from the two large subregions with high endemism of the Sino-Oriental. However, these two subregions probably reflect the uplift of the Tibetan plateau that led to their isolation (Kang et al., 2014). In turn, these two subregions included numerous smaller ones with high diversity and endemism (see Figure S7.15) similar to the other tropical regions.

| Robustness of findings
This first global quantitative analysis of the biogeography of freshwater fishes is based on a large-scale database compiling occurrence data from thousands of sources and is thus inevitably subject to errors and incomplete data (Tedesco et al., 2017).
To minimize errors, a careful screening and correction procedure has been implemented for this database (see Tedesco et al., 2017).
Reassuringly, the results obtained from other clustering methods as well as our sensitivity analysis suggested that the regions we identified are robust. Furthermore, regions were all spatially coherent (even though no spatial information was provided at any stage of the process) for the first two levels, with high degrees of endemism, indicating the quality of both the dataset and the bioregionalization approach.
The network method assigns clusters to species (as explained in the methods), which is an asset over distance-based clusters.
However, one major caveat needs to be acknowledged. Species are assigned to the region where their present-day distribution is largest -this region is not necessarily the region where they originated from. A perfect example is the Characidae family at the transition between Nearctic and Neotropical: Astyanax mexicanus was assigned to the Nearctic, although its lineage is assumed to have colonized Mesoamerica during the Panama Isthmus formation (Smith & Bermingham, 2005).
Our results at the subregional scales have several limits. Firstly, while species introductions are relatively well documented between major continents or regions, we can expect that some human-assisted translocations of species among basins have not been documented at smaller spatial scales, thereby blurring subregional patterns of endemism. Secondly, heterogeneity in land topology led to vast differences in size and number of drainage basins within different geographical areas of the world. Furthermore, drainage basins (especially large ones) may have a reticulated history challenging their validity as biogeographical units at subregional levels (e.g. see Dagosta & Pinna, 2017 and references therein). Third, for areas with numerous small basins, species lists were not necessarily available for all of them, and thus identification of provinces beyond the subregional scale would be speculative. Likewise, fine-scale data were not available in similar quantity or quality within different regions of the world (e.g. remote areas of Africa or Papuasia remain poorly sampled compared to other regions). Last, the Map Equation is expected to identify transition zones as distinct clusters (Bloomfield et al., 2017;Vilhena & Antonelli, 2015). We did not observe this pattern at large scales, except for a few basins at the transitions between Nearctic and Neotropical regions or between Sino-Oriental and Palearctic or Australian regions. However, at subregional and finer scales, transitions are likely to stand out as separate zones (Bloomfield et al., 2017), which may not necessarily be an appealing property since the participation coefficient is informative enough to describe transition zones. For all these reasons, we deemed preferable not to investigate our results below the third level, as such fine-scale provinces would be better studied in regional studies (e.g., Kang et al., 2014;Smith & Bermingham, 2005).

| Concluding remarks
This first quantitative study of freshwater fish bioregions revealed that their biogeography was probably shaped by the same major drivers as other continental groups of organisms, with peculiar exceptions such as the Sino-Oriental region. These regions identified with species distributions probably reflect relatively recent processes of dispersal and isolation. Ancient processes will be explored in future studies, thanks to the newly available dated phylogenies of actinopterygian fishes (Rabosky et al., 2018).
We found that freshwater fishes, in addition to being the most diverse group of continental vertebrates, have extremely high rates of endemism, above 96% for all regions except the Palearctic.
Furthermore, we found that tropical regions have a myriad of subregions with very high endemism and richness. These figures compellingly bespeak that freshwater fishes ought to be considered in hotspot analyses and raise many questions about the biogeographical consequences of the current high rates of freshwater fish introductions and extirpations (Villéger, Blanchet, Beauchard, Oberdorff, & Brosse, 2015).

ACK N OWLED G EM ENTS
We thank Céline Bellard, Marine Robuchon and Philippe Keith for useful discussions, and Aldyth Nyth and Lissette Victorero for English editing. We thank Syd Ramdhani, Serban Procheş, Ben Holt and Jean-Philippe Lessard for sharing their data for endemism calculations. We thank François-Henri Dupuich from derniercri.io for

DATA AVA I L A B I L I T Y S TAT E M E N T
The database used in this publication is available in Appendix S1.