Spatial Congruence Analysis (SCAN): An objective method for detecting biogeographical patterns based on species’ range congruences

Similar species ranges may represent outcomes of common biological processes and so form the basis for biogeographical concepts such as areas of endemism and ecoregions. Nevertheless, spatial range congruence is rarely quantified, much less incorporated in bioregionalization methods as an explicit parameter. Furthermore, most available methods suffer from limitations related to the loss, or the excess of range information, or scale bias associated with the use of grids, and the incapacity to recognize independent overlapped patterns or gradients of range distributions. Here, we propose an analytical method, Spatial Congruence Analysis (SCAN), to identify biogeographically meaningful groups of species, called biogeographic elements. Such elements are based on direct and indirect spatial relationships among species’ ranges and vary depending on an explicit measure of range congruence controlled as a numerical parameter in the analysis. A one-layered network connects species (vertices) using pairwise spatial congruence estimates (edges). This network is then analyzed for each species, separately, by an algorithm that accesses the entire web of spatial relationships to the reference species. The method was applied to two datasets: a simulated gradient of ranges and real distributions of birds. The gradient results showed that SCAN can describe gradients of distribution with a high level of detail, without confounding transition zones with true biogeographical units, a frequent pitfall of other methods. The bird dataset showed that only a small portion of range overlaps is biogeographically meaningful, and that there is a large variation in types of patterns that can be found with real distributions. Distinct reference species may converge on similar or identical groups of spatially related species, may lead to recognition of nested species groups, or may even generate similar spatial patterns with no species in common. The biological significance or causal processes of these patterns should be investigated a posteriori. Patterns can vary from simple ones, composed by few highly congruent species, to complex, with numerous alternative component species and spatial configurations, depending on particular parameter settings as determined by the investigator. This approach eliminates or reduces limitations of other methods and permits pattern description without hidden assumptions about processes, and so should make a valuable contribution to the biogeographer’s toolbox. “If there is any basic unit of biogeography, it is the geographic range of a species.” - Brown, Stevens & Kaufman [1]. “[spatial] congruence […] should be optimized, while realizing that this criterion will most likely never be fully met” - HP Linder [2].

128 Determining spatial congruence is not a trivial problem because ranges may differ in position, area, 129 and shape. Two ranges of equal area and shape may vary in the amount of overlap, just as two 130 ranges of equal shape and central position may differ in size, and yet areas of the same size and 131 position may overlap only slightly if their shapes are very different (Fig 1A). To distill these 132 differences into a single index, spatial congruence between two species can be calculated by the 133 product of area of overlap weighted by the relative area of each (Eq 1). This generic spatial index 134 was proposed in the "Goodness of Fit" method to compare maps of Hargrove et al. [32] and is 135 hereafter referred to as the "Spatial Congruence Index" -C S , defined as follows: 146 Spatial congruences, here, are always evaluated with respect to a specific congruence threshold 147 (C T ). Two species are directly congruent when their calculated C S is greater than or equal to a 148 particular C T (Fig 1B). Indirectly related species are those linked through a chain of direct 149 connections using a third linking species. For example, if species ranges A and B are directly 150 congruent at a given C T , and B and C are also directly congruent, then A and C are indirectly 151 congruent at this C T , using B as a link (A↔B↔C). These direct and indirect relationships are 152 recovered and organized by the algorithm at every referential C T value. Thus, the method begins 153 setting the current C T and comparing the spatial distribution of a reference species to the range of all 154 other species. At each C T analyzed, any species with C S ≥ C T is directly congruent. The next pass 155 involves comparing each of these directly congruent species to all other species. Any additional 156 species added in this second pass (and all subsequent passes) are indirectly congruent to the 157 reference species at the same C T . The number of passes required such that the species group closes 158 (i.e., no additional congruent species are included) is equal to the number of links to the longest 159 indirect chain and is called the "depth" of that species group (Fig 1B-C). Depth is a metric that 160 emerges from the analyses; however its exact biological significance and its computational utility 161 are not clear, but may be subject to future study.
162 Including indirect congruences in the recognition of these biogeographic elements has two distinct 163 advantages. First, it allows the recognition of syndromes of shared distributions ( Fig 1C). The set of 164 species in a closed group, at any given C T , will be very similar to that formed by starting the 165 analysis with other members of the group. Thus, this is a "natural" grouping. Second, indirect 166 congruences allow the recognition of gradual relationships among species ranges ( Fig 1C).
167 Although it is conceivable that virtually all species be related to one other through such gradual 168 indirect congruence, this is controlled by congruence threshold requirements. In practice, groups of 169 species sharing biogeographical properties may close even at fairly low C T (see Results). 174 (A) The C S index (y-axis) compares position, extent, and shape between two species ranges. For 221 SCAN differentiates those reference species forming biogeographic elements, "informative 222 species", from those not leading to closed groups, at any congruence threshold, "non-informative 223 species". For any informative reference species, the set of all biogeographic elements over the range 224 of congruence thresholds constitutes a "biogeographic complex". "Synonymous" elements or 225 complexes (or simply "synonyms") are those derived from distinct reference species that converge 226 on the same group of taxa. Nested patterns are those composed by subsets of larger patterns.
227 Parameters such as maximum and minimum C T , depth, and derived metrics, such as the number of 228 species, can be used to identify synonyms and compare elements or complexes derived from 229 distinct species or regions (S1 and S2 Tables). Although synonyms and nested complexes are 230 overlapped by definition, partial spatial overlaps may also occur between independent elements, 231 with no species in common (see Results). Synonymous complexes necessarily are generated by 232 different reference taxa; here, they are represented by and named after the reference taxon with the 233 highest mean congruence (C S ) with all other species in the complex (S3 Table).
234 Applications of the method 235 To explore and illustrate this protocol we use two datasets, one hypothetical and the other of real 236 bird species distributions. First, a hypothetical set of species ranges was used to evaluate the 237 capacity of SCAN to detect gradients. In addition, the effects of maximum depth settings over 238 pattern detection were explored (see S1 Script). The problem, proposed by Kreft and Jetz [34], is 239 when two seemingly distinct sets of species (for example a northern and a southern groups) contain 240 a succession of species with ranges each slightly greater than the previous such that those with the 241 most extensive ranges in each group actually overlap partially the widest-ranging species of the 242 other group (Fig 2). Any biogeographical method would be seen to fail if it did not recover the two 243 groups as separate congruent units or, worse, if it identified the transition zone between them as an 244 independent biogeographical unit, which is, obviously, a false conclusion from a biogeographical 245 perspective.   Fig 3). SCAN always correctly recognized north and south as distinct groups. The 287 number of informative species varied according to max-depth settings (S2 Table). At the shallowest 288 depth limit (maximum depth=3), approximately half of the community gave rise to biogeographic 289 elements. At the extremely relaxed max-depth=10, all species expanded their indirect congruences 290 to encompass either the entire southern or northern groups, but never included species from the 291 opposite group if the "spatial overlap criterion" (see Methods) was implemented. Intermediate 292 scenarios with alternative and plausible classification schemes were achieved at depth limits of 5 293 and 7, but always nested within the overall north-south distinction (Fig 3). These alternative 294 configurations are ideal to show how the C T relaxation allows the incorporation of slightly less 295 congruent species, at each lower C T round (direct) or extended depth (indirect).  (Fig 4), and a genus of hummingbirds (the "brilliants", Heliodoxa spp.), 323 with a large variety of habitat associations and peculiar ranges (Fig 5), 329 Reference species with larger areas had higher C S means (i.e. the average C S between one species 330 and all its overlapping ranges). Range areas and C S means had no effect on the number of species 331 (richness) in patterns detected, but relaxed congruences and depths led to patterns with greater 332 species richness (S3 Fig). The full set of graphic preliminary explorations is presented as 333 Supplementary Material (S2 and S3 Figs).
334 Biogeographical complexes derived from this real-life bird community were highly diverse in many 335 aspects (S1 Table; Figs 4 and 5). The simplest patterns had only one element with a fixed group of 336 species, usually in a very limited range of threshold values, as shown by the lowland hummingbird 337 Heliodoxa aurescens (Fig 5A). Alternatively, "shallow" patterns added species across a range of 338 congruence thresholds but showed no indirect connections, grouping species only at the first depth 339 level (Figs 4B and 5D-F). The depth 'dimension', as predicted, revealed gradients through 340 indirectly related ranges. This potential was elegantly illustrated by the Icterus nigrogularis 341 complex, which gradually, range by range, extended its total area through the Guianan coast and 342 subsequently reached the central Amazon via the Amazon river channel (Fig 4A).
343 SCAN can recognize more than one biogeographic element centered on essentially the same general 344 area, but with considerably different spatial limits and containing mutually exclusive sets of species, 345 as demonstrated by I. nigrogularis and A. tobaci (Fig 4). Similarly, the H. xanthogonys pattern 346 grouped highland birds of the Tepuis region that are completely overlapped (at this geographic 347 macro-scale) with birds typically associated with lowland patterns (Fig 5C).  388 6C and S4 Fig G-I).
389 The characteristic relational web of ranges based on grid-cells presented by Infomap resembles, and 390 is probably analogous to, the biogeographic elements identified by SCAN (Fig 6E). Specialized 391 birds clearly belonging to distinct biogeographic niches, such as highland, riverine specialists, or 392 lowland terra firme birds, are mixed as indicative species in these cells, while adjacent ones often 393 give very distinct results (Fig 6D and S5 Fig). Unfortunately, these cell-based patterns are totally 394 idiosyncratic, i.e., each cell has its unique pattern, which hampers the direct interpretation of these 395 relationships in biological or geographical terms (Fig 6E and S5 Fig). 404 This classification matches previous species turnover assessments [38]. D) Infomap uses patterns 500 Adding or removing species will readily influence detection of connections or breaks in congruence 501 through indirect links. Thus, it is important that the species pool analyzed be made explicit.
502 Perhaps the most important take-home message of this paper is a cautionary 'step-back' regarding 503 the generic use of bioregionalization methods for spatial classification, recognition of 504 biogeographical patterns, and assessment of their historical and ecological drivers. In some cases, 505 spatialized biological information may be being discarded. Conversely, the excess of idiosyncratic 506 range information may bias classification schemes of standard methods.   Fig 3). For larger ones, the amount of 518 complementary information may constitute a real challenge (e.g. , Figs 4 and 5). SCAN asks the 519 biogeographer to interpret individual complexes as a suite of plausible biogeographical scenarios.
520 Highly congruent elements highlight potential barriers or sharp ecological limits. Relaxed patterns 521 may reveal taxonomic or environmental correlates of selective permeability of barriers, and 522 potential dispersal routes. Trade-offs and criteria of spatial coherence and comprehensiveness may 523 also be complemented by biological or environmental information to support both spatial