ABSTRACT
Genetic diversity is a key component of population persistence. However, most genetic investigations of natural populations focus on a single species, overlooking opportunities for multispecies conservation plans to benefit entire communities in an ecosystem. We developed a framework to evaluate genomic diversity within and among many species and demonstrate how this riverscape community genomics approach can be applied to identify common drivers of genetic structure. Our study evaluated genomic diversity in 31 co-distributed native stream fishes sampled from 75 sites across the White River Basin (Ozark Mountains, USA) using SNP genotyping (ddRAD). Despite variance in genetic divergence, general spatial patterns were identified corresponding to river network architecture. Most species (N=24) were partitioned into discrete sub-populations (K=2–7). We used partial redundancy analysis to compare species-specific genomic diversity across four models of genetic structure: Isolation by distance (IBD), isolation by barrier (IBB), isolation by stream hierarchy (IBH), and isolation by environment (IBE). A significant proportion of intraspecific genetic variation was explained by IBH (x□=62%), with the remaining models generally redundant. Our results indicated that gene flow is higher within rather than between hierarchical units (i.e., catchments, watersheds, basins), supporting the Stream Hierarchy Model and its generality. We discuss our conclusions regarding conservation and management and identify the 8-digit Hydrologic Unit (HUC) as the most relevant spatial scale for managing genetic diversity across riverine networks.
1 INTRODUCTION
Genetic diversity is a quantitative metric applied across spatial and temporal scales (Huber et al., 2010; Leonard et al., 2017) tied to the evolutionary trajectories of species (Shelley et al., 2021). It also serves as a barometer for population-level persistence in accurately reflecting demographic history, connectivity, and adaptive potential (Davis et al., 2018; DeWoody et al., 2021; Paz-Vinas et al., 2018). Surprisingly, and despite its many accolades, genetic diversity is often underutilized in conservation planning (Laikre, 2010; Paz-Vinas et al., 2018), in part due to a suite of affiliated necessities (i.e., specialized equipment, technical expertise, and required externalities such as genomics centers), all of which expand its bottom line (Blanchet et al., 2020). Moreover, when assessment does occur, it is most often limited to populations within a single species or a small cadre of entities within a species-group, thus minimizing the potential for much-needed generalizations (Anthonysamy et al., 2018).
When the concept of genetic diversity is applied in a comparative sense across co-distributed species, it provides a solid framework from which community-wide management and policy can be defined. For example, multispecies assessments can reveal common dispersal barriers (Pilger et al., 2017; Roberts et al., 2013), congruent distributions of genetic diversity (Hotaling et al., 2019; Ruzich et al., 2019), relevant spatial scales for management (Blanchet et al., 2020), and associations among species characteristics and genetic diversity (Bohonak, 1999; Pearson et al., 2014). Despite its potential complexity, a comprehensive management strategy can emerge, one more appropriately aligned towards managing numerous species, with long-term conservation goals beneficial to an entire community (Blanchet et al., 2017). In addition, it also tacitly encourages support by stakeholders for an overarching management plan, one representing a consensus across multiple species and ecosystems (Douglas et al., 2020).
The spatial structure of genetic variation is primarily dictated by gene flow and genetic drift within a species (Holderegger et al., 2006), with the uniformity of its distribution (i.e., panmixia; Rosenberg et al., 2005) serving as an implicit null hypothesis. The de facto alternative is that genetic variation is spatially autocorrelated (i.e., isolation by distance, IBD; Wright, 1943). For most species, a significant relationship between genetic dissimilarity and geographic distance is the expectation (Meirmans, 2012), yet the strength of this association will vary (Bohonak, 1999; Singhal et al., 2018). For example, genetic divergence may be promoted by environmental dissimilarities (i.e., isolation by environment, IBE; Wang & Bradburd, 2014) or by physical barriers to dispersal (i.e., isolation by barrier, IBB; Cushman et al., 2006; Ruiz-Gonzalez et al., 2015).
For aquatic biodiversity, patterns of genetic divergence will also be governed by the structure and architecture of the riverine network (in contemporary and historic representations). Organisms within such dendritic networks are demonstrably impacted by the physical structure of the habitat (Peterson et al., 2013; White et al., 2020), with genetic relatedness as a surrogate for the underlying structural hierarchy (Hughes et al., 2009). While this is most apparent within the contemporary structure of river networks, their historic structure, i.e., paleohydrology, also serves to bookmark genetic diversity (Mayden, 1988; Strange & Burr, 1997). Moreover, the hierarchical complexity of these networks will likewise dictate population processes, as reflected within genetic diversities and divergences (Chiu et al., 2020; Hopken et al., 2013; Thomaz et al., 2016). Thus, spatial genetic structuring within such biodiversity should reflect isolation by stream hierarchy (IBH; sensu Stream Hierarchy Model (SHM); Meffe & Vrijenhoek, 1988). The initial genesis for the SHM was narrowly defined within desert stream fishes of the American West (Meffe and Vrijenhoek, 1988). An assessment of its generality, as compared to alternative isolating regimes, was thus imperative (Brauer et al., 2018; Hopken et al., 2013).
The factors that cause genetic structure can be confounding on the one hand (Perez et al., 2018) but also correlated on the other (Meirmans, 2012; Wang & Bradburd, 2014). Different mechanisms can mask the occurrence of major drivers by promoting those more ancillary with regard to single-species assessments. The emerging results are twofold: Potentially erroneous conclusions, which in turn beget ineffective management strategies. These issues can be effectively mitigated using replicated multispecies assessments to allow influential major processes to surface, thus effectively categorizing both ‘signal and noise’ components with the former driving patterns of regional biodiversity (Roberts et al., 2013).
Hypotheses relating to genetic structure are best contrasted by partitioning available genetic variation via partial redundancy analysis (Borcard et al., 1992; Chan & Brown, 2020), thus allowing the contrast of multiple alternative models. In turn, the best-performing model should be substantially correlated with other (more redundant) models but also provide the best explanation for residual variation once competing models adequately explain antecedent variability (Cushman et al., 2006). If alternative models explain significant amounts of genetic variation, then the null hypothesis of panmixia would be rejected. The main drivers of genetic diversity should then emerge as comparisons are made across the community’s many species. This approach also allows the appropriate scale to be defined at which genetic and conservation perspectives can be integrated to optimize benefits across species.
Our objective was to establish a framework from which the generality of the SHM could be tested across constituents of a riverscape fish community. This framework would allow key drivers to be identified, with a concurrent expectation of common processes re-emerging within these ecological networks as the analysis was processed. We accomplish this by comparing patterns of genetic diversity across 31 fish species within the White River Basin of the Ozark Mountains (AR/MO, USA). For each, we contrasted four alternative models (Cushman et al., 2013) representing major drivers of genetic structure: Isolation by distance (IBD), isolation by stream hierarchy (IBH), isolation by barrier (IBB), and isolation by environment (IBE). Our data represent thousands of SNPs (single nucleotide polymorphisms), as derived via recent advances in high-throughput sequencing (Peterson et al., 2012). This has, in turn, allowed thousands of individuals to be genotyped as a financially and logistically practical research endeavor across multiple non-model species (da Fonseca et al., 2016). We offer our approach as a potential blueprint for developing more comprehensive genetic management plans at the community level.
2 MATERIALS AND METHODS
2.1 Study system
Our study system, the White River Basin, is located within the Western Interior Highlands of North America, a previous component of the more extensive pre-Pleistocene Central Highlands extending north and east but subsequently subsumed by numerous glacial advances into two disjunct sub-components: Western Interior Highlands (i.e., Ozark Plateau, Ouachita Mountains), and Eastern Highlands (i.e., Appalachian Plateau, Blue Ridge, Appalachian Highlands) (Mayden, 1985). The Ozark Plateau remained an unglaciated refugium with elevated endemism and diversity (Warren et al., 2000). The White River Basin was established by at least Late Pliocene (>3 MYA; Jorgensen, 1993), but its eastern tributaries were captured by the Mississippi River when it bisected the basin during the Pleistocene (Mayden, 1988; Strange & Burr, 1997). This paleohydrologic signature may remain in contemporary patterns of population divergence in the White River Basin, as manifested by replicated patterns of genetic structure between eastern and western populations.
2.2 Sampling
The sampling region for our study is composed of the White River and St. Francis River basins (AR/MO) (Figure 1). Both are tributaries to the Mississippi River, draining 71,911 km2 and 19,600 km2, respectively. Five sub-basins are apparent: St. Francis, Upper White, Black, Lower White, and Little Red rivers (Figure 1). These are further subdivided into the following hierarchical Hydrologic Units (HUC) (USGS & USDA-NRCS, 2013; USGS, 2021) representing different spatial scales: HUC-4 Subregions (N=2); HUC-6 Basins (N=3); HUC-8 Subbasins (N=19); HUC-10 Watersheds (N=129) (Figure 1).
Sampling was approved by the University of Arkansas Institutional Animal Care and Use Committee (IACUC: #17077), with collecting permits as follows: Arkansas Game & Fish Commission (#020120191); Missouri Department of Wildlife Conservation (#18136); US National Parks Service (Buffalo River Permit; BUFF-2017-SCI-0013). Fishes were sampled using seine nets in wadable streams during low flow between June 2017 and September 2018. Time spent sampling a site ranged from 30–60 mins, with a target of 5-10 individuals/species encountered. Individuals were euthanized by immersion in tricaine methanesulfonate (MS-222) at a concentration of 500 mg/L, buffered to pH=7 with subsequent preservation in 95% ethanol. Formal species diagnosis occurred in the laboratory, and the right pectoral fin was removed from each specimen and stored in 95% ethanol at -20 °C prior to subsequent DNA extraction. Specimens are housed at the Arkansas Conservation and Molecular Ecology Lab, University of Arkansas, Fayetteville.
2.2 Genomic data collection and filtering
Genomic DNA was isolated (Qiagen Fast kits; Qiagen Inc.) and quantified by fluorometry (Qubit; Thermo-Fisher Scientific). Individuals were genotyped using double-digest restriction site-associated DNA (ddRAD) sequencing (Peterson et al., 2012), with procedures modified appropriately (Chafin et al., 2019). Standardized DNA amounts (1,000 ng) were digested at 37°C with high-fidelity restriction enzymes MspI (5’-CCGG-3’) and PstI (5’-CTGCAG-3’) (New England Biosciences), bead-purified (Ampure XP; Beckman-Coulter Inc.), standardized to 100 ng, and then ligated with custom adapters containing in-line identifying barcodes (T4 Ligase; New England Biosciences). Samples were pooled in sets of 48 and size-selected from 326-426 bp, including adapter length (Pippin Prep; Sage Sciences). Illumina adapters and i7 index were added via 12-cycle PCR with Phusion high-fidelity DNA polymerase (New England Biosciences). Three libraries (3x48=144 individuals/lane) were pooled per lane and single-end sequenced on the Illumina HiSeq 4000 platform (1x100bp; Genomics & Cell Characterization Core Facility; University of Oregon, Eugene). Quality control checks, including fragment analysis and quantitative real-time PCR, were performed at the core facility before sequencing.
Raw Illumina reads were demultiplexed, clustered, filtered, and aligned in IPYRAD v.0.9.62 (Eaton & Overcast, 2020). Reads were first demultiplexed, allowing up to one barcode mismatch, yielding individual FASTQ files containing raw reads (N=3,060 individual files). Individuals averaged >2 million reads, with those extremely low removed (< x□ – 2s) to reduce errors from poor quality sequencing. Individuals were screened for putative hybrids (Zbinden, Douglas, et al., 2022), and those with admixed ancestry were removed. Raw sequence reads were partitioned by species (N=31) and aligned de novo in IPYRAD (Eaton & Overcast, 2020). Adapters/primers were removed, and reads with >5 bases having Phred quality <20 or read length <35 bases (after trimming) were discarded. Clusters of homologous loci were assembled using an 85% identity threshold. Putative homologs were removed if any of the following were met: <20x and >500x coverage per individual; >5% of consensus nucleotides ambiguous; >20% of nucleotides polymorphic; >8 indels present; or presence in <15% of individuals. Paralogs were identified (and subsequently removed) as those clusters exhibiting either >2 alleles per site in consensus sequence or excessive heterozygosity (>5% of consensus bases or >50% heterozygosity/site among individuals).
Biallelic SNP panels for each species were then visualized and filtered with the R package RADIATOR (Gosselin, 2020). To ensure high data quality, loci were removed if: Monomorphic; minor allele frequency <3%; Mean coverage <20 or >200; Missing data >30%; SNP position on read >91; and if HWE lacking in one or more sampling sites (α = 0.0001). To reduce linkage disequilibrium, only one SNP per locus was retained (that which maximized minor allele count). Finally, singleton individuals/species at a sampling site and those with >75% missing data in the filtered panel were removed.
2.3 Genetic structure
Genetic structure was assessed using the resultant SNP genotypes. For each species (N=31), pairwise FST (Weir & Cockerham, 1984) was calculated among sites (HIERFSTAT; Goudet et al., 2017). Jost’s D was also quantified among sites and globally, as it is based on the effective number of alleles rather than heterozygosity and hence less biased by sampling differences (Jost, 2008). Additional global intraspecific FST analogs were also quantified for comparison: Multi-allelic GST (Nei, 1973) and unbiased G”ST (Meirmans & Hedrick, 2011) (MMOD; Winter, 2012). We tested for isolation by distance (IBD) using both linearized FST and Jost’s D. Their relationships with river distance (log-transformed) were assessed using the Mantel test (Mantel & Valand, 1970) (ECODIST; Goslee & Urban, 2020), then visualized using linear regression (Rousset, 1997).
Admixture analysis of population structure and ancestry coefficients were estimated using sparse non-negative matrix factorization (sNMF) (Frichot et al., 2014). We ran sNMF for each species, with 20 repetitions per K value (1 to N sites or 20, whichever was smallest) and α=100 (LEA; Frichot & François, 2015). The best K (i.e., number of distinct gene pools) from each sNMF run minimizes the cross-validation entropy criterion (Alexander & Lange, 2011). The best K was then used to impute missing data (impute function using method=‘mode’ in LEA). The sNMF algorithm was then repeated (as above) using imputed genotypes. The resulting Q-matrices of ancestry coefficients were used to map population structure and served as the “IBB” (isolation by barrier) model below.
We further assessed among-site genetic variation between Hydrologic Units (HUCs) and discrete population clusters (determined via sNMF) using analysis of molecular variance (AMOVA) (Excoffier et al., 1992). AMOVA was performed for each species at four HUC levels (4-, 6-, 8-, and 10-digit) to compare the amount of genetic variation among HUCs, among all sites, and among sites within HUCs. The Watershed Boundary Dataset (USGS, 2021) assigned HUC classifications to each site. AMOVA was then performed for each species with genetic clusters K>1 to compare the amount of genetic variation among populations, among all sites, and among sites within populations. The variance components were used to estimate Φ-statistics (analogous to F-statistics): ΦCT = the genetic variation among groups (either HUCs or discrete populations); ΦST = the genetic variation among sites across all groups; and ΦSC = the genetic variation among sites within groups. The wrapper R package POPPR (Kamvar et al., 2015) was used to implement the PEGAS (Paradis, 2010) version of AMOVA with default settings.
2.4 Modelling genetic structure
We employed a variation partitioning framework (Capblancq & Forester, 2021; Chan & Brown, 2020) to compare four models of genetic structure for each species based on: IBD, IBB, IBH, and IBE. Individual genetic variation within each species was reduced to major axes of variation using principal components analysis (PCA) on each SNP panel. The appropriate number of PCs retained for each species was based on observed eigenvalues, Rnd-Lambda (Peres-Neto et al., 2005), implemented in the R package PCDimension (Coombes & Wang, 2019). Individual scores on the retained PCs represented individual genetic variation.
The first model (IBD) relied on river network distance measured between individuals (RIVERDIST; Tyers, 2017). The resulting distance matrix was then decomposed into positively correlated spatial eigenvectors using distance-based Moran’s eigenvector maps (Chan & Brown, 2020) within the R package ADESPATIAL (Dray et al., 2020).
The second model (IBB) was based on individual population coefficients, i.e., population structure, from the Q-matrix generated above using sNMF. The assumption was that population structure indicates a reduction of gene flow between discrete populations due to a barrier (or high resistance) to dispersal. Note: This model could not be incorporated for species in which population structure was not apparent (K=1), and these species were thus tested using only three models.
The third model (IBH) was constructed using four levels of HUCs (4-, 6-, 8-, and 10-digit) that characterized an individual’s position within the stream hierarchy, i.e., hydrologic unit (USGS, 2021). We transformed the data matrix of individuals by HUC so that each unique HUC was represented at each corresponding level as a binary ‘dummy’ variable.
The fourth model (IBE) relied on contrasting environmental variation across sites that harbored individuals. Environmental variables were taken from a compendium of 281 factors related to five major classes: hydrology/physiography, climate, land cover, geology/soil composition, and anthropogenic impact (HydroRIVERSv.1.0; Linke et al., 2019). Variables for each site were extracted prior to being separated into the five major classes, with invariant factors and those exhibiting collinearity being removed in a stepwise manner (USDM; Naimi, 2013) until each had a variation inflation factor (VIF) <10. Standardization occurred by subtracting means and dividing by standard deviations. Variables within each class were selected for subsequent analyses using forward selection (Blanchet et al., 2008).
In summary: Variables were first tested for a relationship with the response data (individual genetic variation) using redundancy analysis (RDA). If the relationship was significant (α < 0.05), a stepwise forward procedure was carried out such that variables were selected if the adjusted R2 of the model increased significantly (α < 0.05) and the adjusted R2 did not exceed that of the overall model. This procedure was employed using the ordiR2step function in the R package VEGAN (Oksanen et al., 2020). The selected variables from each of the five classes were combined into a single matrix, then reduced to a set of PCs using robust principal components analysis (ROBPCA; Hubert et al., 2005). The number of PCs retained for each category was determined following Hubert and coworkers (2005), as implemented in the R package ROSPCA (Hubert et al., 2016).
Individual genetic variation (a matrix of PCs for each species) was then partitioned among the four explanatory models of genetic structure (Partial redundancy analysis; Anderson & Legendre, 1999; Capblancq & Forester, 2021). This allowed an estimation of individual genetic variation explained by each model, all models combined, and then each “pure” model after partitioning out variability explained by the other three. This allows the correlation structure among competing models to be visualized as redundant relationships.
3 RESULTS
3.1 Sampling and data recovery summarized
Collections (N=75; Figure 1) yielded N=72 species and N=3,605 individuals. On average, we collected ∼11 species/site, typical for streams sampled with seine nets in North America (Matthews, 1998) and similar highland streams within the Mississippi Basin (Zbinden, Geheber, Lehrter, & Matthews, 2022; Zbinden, Geheber, Matthews, Marsh-Matthews, 2022).
We genotyped N=3,060 individuals across N=31 species, with at least two individuals collected at ≥5 sampled sites. Simulations and empirical evaluations underscore the accuracy of FST estimates when large numbers of SNPs (≥1,500) are employed across a minimum of two individuals (Nazareno et al., 2017; Willing et al., 2012). After removing samples with missing data >75% and those as singletons of their species at a site, the remaining N=2,861 were analyzed for genetic structure (Table 1). The number of individuals analyzed per species ranged from 15–358 (x□=92.3; s=80.8), and the sites at which each species was collected ranged from 5–50 (x□=16.8; s=11.2). Number of individuals/species/site ranged from 2–15 (x□=5.1; s=1.5). Mean number of raw reads/individual/species spanned from 1.65 million to 3.22 million (x□=2,289,230.0; s=341,159.5). Mean N of loci/species recovered by IPYRAD ranged from 14,599–30,509 (x□=20,081.7; s=4,697.6) with a mean sequencing depth/locus of 73.6x (s=12.0x). After filtering loci and retaining one SNP per locus, the panels for each species contained 2,168–10,033 polymorphic sites (x□=4,486.7; s=1,931.1) with mean missing data/species at 12% (s=2%).
3.2 Genetic structure
3.2.1 Among-site genetic divergence
Distributions of among-site FST and D varied widely among species (Figure 2), as did global indices of genetic divergence (Table 2). All three global indices of fixation or genetic divergence (GST, G”ST, D) were negatively correlated with within-site heterozygosity (HS), positively correlated with total heterozygosity (HT), and highly, positively correlated with each other (Table 3).
Regarding IBD, a significant relationship was found between linearized among-site FST and log-transformed among-site river network distance for 23 (74%) of the N=31 species (Figure 3). Mantel coefficients ranged from 0.11–0.88 (x□=0.51; s=0.19). The slope of the linear relationship between FST and distance for each species ranged from 0.003–2.62 (x□=0.46; s=0.76). Results were largely similar when IBD was tested with Jost’s D, again with the same 23 species showing a significant relationship, along with two additional taxa: Smallmouth Bass (Micropterus dolomieu; Lacepède, 1802) and Largemouth Bass [Micropterus salmoides; (Lacepède, 1802)]. Mantel correlation coefficients ranged from 0.15–0.92 (x□=0.51; s=0.19). The slope of the linear relationship between Jost’s D and log river network distance for each species ranged from 0.0007–0.28 (x□=0.04; s=0.06).
3.2.2 Population structure
An apparent lack of discrete genetic structure emerged across seven species, suggesting continuous structuring at the spatial scale of our study (Figure 4). For the remaining 24 species, at least two and up to seven discrete sub-populations were identified (Figure 5). This structure corresponded at the broadest hierarchical level to the two major northern basins: Upper White and Black rivers, for all species sampled in both sub-basins (N=22). There was also evidence of fine-scale structure for five species within the Little Red River Basin. Smaller catchments with distinct gene pools across multiple species included: North Fork (4 spp.), Buffalo (3 spp.), Upper Black (4 spp.), Current (3 spp.), and Spring rivers (4 spp.).
3.2.3 AMOVA
Discrete genetic structuring was also supported via AMOVA. Genetic variation among HUCs was significant for 24 species (Table 4). The genetic variance explained for these species by HUCs ranged from 1–70% (x□=25.0%; s=20.7%). For the other seven species, variation among HUCs was ≤ 1%, save for Ozark Sculpin (Cottus hypselurus; Robins & Robison, 1985) and Creek Chub [Semotilus atromaculatus; (Mitchill, 1818)]. HUC differences for these accounted for >80% of the genetic variance but were non-significant due to a lack of power. Southern Redbelly Dace [Chrosomus erythrogaster; (Rafinesque, 1820)] could not be tested due to a lack of repeated samples within HUC levels. Further evidence of genetic structure among HUCs was revealed in the pattern of ΦSC (genetic divergence among sites within HUCs) < ΦST (divergence among all sites) found across 26 species. The 8-digit HUC level explained the greatest genetic variance across 21 species (Table 4).
Genetic variation among discrete population clusters (based on sNMF) was significant for 21 of the N=31 species (Table 4). Seven species were best described as single populations (K=1) and were therefore not tested further. For those exhibiting structure, the genetic variance among clusters ranged from 5–95% (x□=38.0%; s=26.5%). The three species without significant structure, despite K>1 via sNMF, could likely be explained by low power resulting from a small number of sample sites. Again, as with HUCs, ΦSC < ΦST was observed. However, all tested species showed this pattern (i.e., sites within the same population were less differentiated than sites across all populations).
3.3 Models of genetic structure
Variability in genetic diversity was partitioned across four models of genetic structure for the N=31 species. Principal components of SNP panel variation served as representatives of genetic variation. Across species, the number of genetic PCs ranged from 2–93 (x□=20.0; s=20.1; Table 1). Cumulative genomic variance explained ranged from 24.7–88.7% (x□=46.2%; s=14.3%; Table 1).
Combining the four models (IBD, IBB, IBH, IBE) accounted for between 3–100% of the genomic diversity across species (x□=63.0%; s=35.3%; Figure 6). Isolation by stream hierarchy (IBH; x□=62.0%; s=34.7%) and barrier (IBB; x□=49.3%; s=30.0%) contributed most to the total variation explained, while distance (IBD; x□=32.1%; s=25.1%) and environment (IBE; x□=33.0%; s=21.4%) explained less (Figure 6). Variation explained by “pure” models, after accounting for that explained by the other three, was >0 only for stream hierarchy and barrier (Figure 6), suggesting that distance and environment are encapsulated by the former. Indeed, correlative structure among models revealed most genetic variance was explained by stream hierarchy, with the other models largely redundant (Figure 7).
4 DISCUSSION
Genetic diversity is an essential metric for inferring evolutionary processes and guiding conservation. Single-species estimates of genetic diversity are standard given practical constraints, e.g., funding mandates for species of conservation concern. However, adopting a multispecies framework for analyzing genetic diversity could allow for more comprehensive management plans to be developed by focusing on commonalities (rather than differences) among species. The Stream Hierarchy Model (Meffe & Vrijenhoek, 1988) posits that the dispersal of stream-dwelling organisms is more limited between hierarchical units (basins, sub-basins, watersheds) than within. If this model was generalizable, it could determine relevant scales and regions for managing genetic diversity.
Our multispecies approach yielded two salient points: 1) From a macro-perspective, river network topology and complexity are manifested in common patterns of genetic structure across species; and 2) on a finer scale, the degree of intraspecific genetic divergence varies widely among co-distributed species. Most species showed significant IBD patterns but also discrete population sub-structure, as reflected most strongly by sub-basin delineations (e.g., HUC-8). These patterns were corroborated by AMOVA and variance partitioning and are generalized across species. Overall, stream fish genetic structure was indicative of dispersal limited primarily among versus within river catchments.
4.1 Drivers of isolation at the basin-wide scale
4.1.1 Isolation by Distance and river networks
IBD is expected when a genetic study’s spatial extent is greater than individuals’ average dispersal distance, i.e., distance moved from natal habitat to breeding habitat. Indeed, significant IBD patterns were detected in 81% of the species in our study. However, the strength of the relationship was generally weak (Mantel r =0.47 & 0.51 for linearized FST and D, respectively).
While IBD may primarily explain genetic variation along a single stream or river, i.e., linear scale, it fails to incorporate the spatial structure of riverine networks (Thomaz et al., 2016). Therefore, IBD may not be an appropriate general model for fish genetic structure at the network scale (Hopken et al., 2013). IBD plots for many species (Figure 3) showed high genetic divergence even among relatively proximate localities, with apparent clusters indicating discrete rather than continuous structure (Guillot et al., 2009). This evidence suggests that — at the network scale — a more nuanced pattern occurs, with high residual variation resulting. The failure of IBD to account for large amounts of variation in genetic divergence reflects additional resistance to dispersal, as caused by longitudinal changes in habitat characteristics such as slope, depth, volume, and predator composition. For example, two river reaches of equal length can have very different habitat matrices, and these can be more influential on gene flow than space alone (Guillot et al., 2009; Lowe et al., 2006; Ruiz-Gonzalez et al., 2015).
4.1.2 Stream Hierarchy Model
Our results show that individual genetic variation is best explained by the Stream Hierarchy Model (Brauer et al., 2018; Hopken et al., 2013; Meffe & Vrijenhoek, 1988). In other words, the majority of variation explained by IBD, IBE, and IBB could be accounted for by IBH alone. This was corroborated via variation partitioning, in which IBD, IBE, and IBB models were redundant with IBH. A concordance of population structure with stream hierarchy yielded a similar percentage of among-site genetic variation, as explained by among-HUC and among-population groupings. In short, variance explained by distance and environment was due to differences among HUC drainages. These results highlight the necessity of accounting for population structure prior to exploring the relationship between genotypes and environmental heterogeneity, e.g., within genotype by environment frameworks (Lawson et al., 2020).
4.1.3 Disentangling cumulative effects
Our analyses also revealed complex spatial patterns of genetic diversity. We evaluated competing isolation models using a framework that identified distance and barriers as putative drivers, with strong genetic divergence identified even across short geographical distances (Chan & Brown, 2020; Ruiz-Gonzalez et al., 2015). This interaction can confound analyses that incorporate either alone. For example, if sampling is clustered, discrete genetic groups can be spuriously inferred along an otherwise continuous gradient of genetic variation (Frantz et al., 2009). Furthermore, a continuous pattern can be erroneously extrapolated when the underlying reality is described by distinct clusters separated by geographic distance (Meirmans, 2012). Here we echo the importance of testing various hypotheses concerning genetic structure (Perez et al., 2018). Idiosyncrasies and complex interactions cannot be discerned by testing single models in isolation (e.g., discrete structure or IBD).
4.2 Drivers of variation within and among species
The species assayed herein display marked differences concerning dispersal capability (Shelley et al., 2021). Given this, we expected genetic structure to widely vary among species across our study region (Comte & Olden, 2018; Husemann et al., 2012; Pilger et al., 2017). Dispersal-related traits drive gene flow among localities and determine the spatial scale at which patterns of genetic structure emerge (Bohonak, 1999; Riginos et al., 2014). The physical structure of the river network then further modulates these patterns by dictating dispersal pathways of metapopulations and their colonization and extinction probabilities (Falke et al., 2012; Labonne et al., 2008; Fagan, 2002). These superimposed processes promote genetic divergence among distal populations (Thomaz et al., 2016; Chiu et al., 2020). Similar patterns emerge when analyzing community diversity via species composition. Headwater streams tend to have very different communities due to dispersal limitations (Finn et al., 2011; Zbinden & Matthews, 2017; Zbinden, Geheber, Lehrter, & Matthews, 2022). Hence the interaction between traits and environment is an overarching influence that unites ecology and evolution.
Many species studied herein are small-bodied with aggregate distributions in upland and headwater streams (Robison & Buchanan, 2020). Thus, species-specific dispersal limitations, as imposed by unsuitable riverine habitats (Radinger & Wolter, 2015; Schmidt & Schaefer, 2018), explain considerable variation in genetic structuring within the White River. Large rivers are hypothesized as inhospitable habitats to upland fishes (e.g., resources, depth, turbidity, substrates) and impose resistance to successful migration (e.g., higher discharge, greater density of large-bodied predators). These characteristics constrain migration and limit gene flow amongst basins that drain into large rivers (Fluker et al., 2014; Schmidt & Schaefer, 2018; Turner & Robison, 2006). The results are asymmetric gene flow and source-sink metapopulation dynamics, with susceptible species, those smaller and less tolerant, diverging most rapidly (Campbell Grant et al., 2007).
Other life-history traits may also play a role as well. For example, those that directly influence effective population size (Nei & Tajima, 1981) may generate differences among species regarding the rate at which genetic differences arise (Blanchet et al., 2020). Species with ‘slow’ life histories, characterized by longer generations and delayed maturity, show an increased probability of local extirpation, inflating genetic drift concomitant with global extinction risk (Hutchings et al., 2012; Pearson et al., 2014; Chafin et al., 2019). Similar contingencies exist for other ecological traits, such as highly specialized trophic adaptations, narrow environmental tolerances, or those that follow the same general mechanism by predisposing species to fragmented population structures (Olden et al., 2008). Ecological traits are mirrored by morphology (Douglas & Matthews, 1992), underscoring an interaction of trait effects that are difficult to disentangle. Ultimately, intraspecific genetic divergence is driven by a combination of factors that influence population size, demographic history, and connectivity. Clearly, these complex interactions among drivers require more comparative multispecies assessments as they shape genetic diversity and structure within and among species (microevolutionary scale) and thus ultimately lead to speciation and extinction (macroevolutionary scale). Our analytical framework outlined herein provides a template for such community-genomics studies.
4.3 Disentangling historic and contemporary drivers
4.3.1 Paleohydrology in the White River system
In this study, discrete population structure coincides with major topological divides within the White River stream network, such as a consistent east/west divide between Upper White and Black rivers, mirroring prior community composition studies (Matthews & Robison, 1988; 1998). Similar patterns were observed at smaller scales among drainages within the study region, as reported for White River crayfish (Fetzner & DiStefano, 2008). While the Lower White and Black rivers are certainly contemporary large-river habitats, both would have been much larger pre-Pleistocene when together they represented the main channel of the Old Mississippi River (Mayden, 1988; Strange & Burr, 1997). This large-river habitat would have separated the eastern and western highland tributaries, with inhospitable habitat for upland species. Pronounced limitations regarding historic dispersal induced by the Old Mississippi could explain the greater isolation of the Little Red River and Black River tributary populations compared to those in the Upper White River. Here, additional work should incorporate coalescent perspectives (e.g., Oaks, 2019) that test the role of past geomorphic events in driving co-divergence and co-demographic patterns, such as the Pleistocene incursion by the Old Mississippi into the modern Black River channel.
4.3.2 Contemporary drivers
Spatial discontinuities in genetic structure can also reveal contemporary barriers to migration/gene flow (Lee et al., 2018; Ruiz-Gonzalez et al., 2015). The Upper White River dams (e.g., Norfork, Bull Shoals, Table Rock, and Beaver dams) represent the most apparent anthropogenic barriers to gene flow. Dams elsewhere have demonstrated discrete population structures above and below the structure (Roberts et al., 2013). However, impacts can be limited due to the relatively short period these dams have been in place (Ruzich et al., 2019). Those on the White River were constructed between 1912 (Taneycomo Powersite Dam) and 1966 (Beaver Dam).
Our study was not explicitly designed to assess impoundment effects on diversity, nor were they directly tested. Nevertheless, evidence of discrete population structure has emerged, corresponding to the location of such dams. Four species showed discrete populations within the North Fork River above the Norfork Dam: Southern Redbelly Dace [Chrosomus erythrogaster; (Rafinesque, 1820)]; Yoke Darter (Etheostoma juliae; Meek, 1891); Northern Studfish [Fundulus catenatus; (Storer, 1846)]; and Blackspotted Topminnow [Fundulus olivaceus; (Storer, 1845)] (sites colored magenta; Figure 5). One species, Orangethroat Darter [Etheostoma spectabile; (Agassiz, 1854)], showed a distinct population in the James River above Table Rock Dam (sites colored gold; Figure 5). However, both North Fork and James rivers drain eight-digit HUC watersheds, which explains high amounts of genetic variation across the study region, regardless of dams. This highlights the importance of understanding ‘natural’ network-wide patterns of genetic structure prior to deriving conclusions regarding anthropogenic barriers, particularly when they coincide with stream hierarchy. Differentiating dams as barriers versus stream hierarchy could be accomplished through divergence time estimates (Hansen et al., 2014). That aspect, as it now stands, is beyond the scope of our study.
5| CONCLUSIONS
The multispecies comparative approach employed here revealed general patterns that could not have been discerned from a singular study of any one species. Additionally, the variability in intraspecific genetic structure among species provides a specific, all-encompassing dimension that single-species studies cannot. While meta-analytic frameworks have some potential, they are limited by confounding effects that stem from differences between studies, such as markers, sample sizes, environmental exigencies, and historic context. This necessitates a community-level approach within a study region. Further work aimed at modeling variables can lead to greater insight, ultimately improving our hypotheses regarding genetic diversity for which contemporary data are unavailable.
Importantly, our comparative framework supports the Stream Hierarchy Model as a general model for the genetic structure of lotic fish species and suggests that hydrologic units characterize regional genetic diversity quite well. Out of this result emerged the potential for HUC units to serve as a ‘rule of thumb’ for riverine biodiversity conservation. None of the species evaluated herein were panmictic. Genetic variation among HUCs was apparent despite limited evidence of discrete population or continuous structure. Across a suite of commonly occurring fishes representing seven families, we identified greater intraspecific gene flow within basins/sub-basins, rather than gene flow among them. Therefore, fish populations within separate HUCs at the 8-digit+ level (e.g., HUC6, HUC4, HUC2) should be considered isolated until proven otherwise (Shelley et al., 2021).
As previously recognized, independent populations warrant independent management (Hopken et al., 2013). When migration is low or non-existent, management of one population is unlikely to impact another. Genetic variation unique to hydrologic units could allow for adaptation to future environmental change, while on the other hand, isolation of populations could underscore elevated extirpation risks (Harrisson et al., 2014). Furthermore, efforts to propagate populations via stocking or translocation should carefully assess the genetic landscape of the species in question, particularly before co-mingling diversity from different sub-basins (Meffe & Vrijenhoek, 1988). Such uninformed mixing of genetic stocks could promote outbreeding and the erosion of unique genetic diversity within river catchments. However, this must be weighed against the risks of local extirpation (Pavlova et al., 2017).
Given this study’s general and comparative nature, we refrain from designating populations within species as potential management units (MUs). However, species showing high levels of genetic structure (Table 2) should be assessed individually for such designation, possibly requiring more fine-scaled, targeted sampling. Additional river/sub-basin-specific management efforts could also be justified, given the presence of unique populations across multiple species (Hopken et al., 2013). Here we specifically refer to: The Little Red, North Fork, Buffalo, Upper Black, Current, and Spring rivers. These may indeed represent evolutionarily significant catchments, and this insight underscores the potential for community-level genetic examination to elevate management to the ecosystem scale.
CONFLICT OF INTEREST
The authors declare that they have no competing interests.
AUTHOR CONTRIBUTIONS
ZDZ conceived the research with input from all authors. Specimen collection was done by ZDZ & TKC. ZDZ did laboratory work, bioinformatics, data analysis, and manuscript drafting. All authors contributed to interpretation of results, formulating conclusions, and critically revising the manuscript. MRD and MED administered funding through their University of Arkansas Endowments.
ACKNOWLEDGEMENTS
We thank M. Flurry, M. George, T. Goodhart, K. Hollar, and M. Reed, who assisted with DNA extractions. The Arkansas High-Performance Computing Center provided analytical resources. Funding was provided by the University of Arkansas Distinguished Doctoral Fellowship and Harry and Jo Leggett Chancellor’s Fellowship (ZDZ), the Bruker Professorship in Life Sciences (MRD), the Twenty-First Century Chair in Global Change Biology (MED), and by an NSF Postdoctoral Research Fellowship in Biology (TKC) [DBI: 2010774]. The findings, conclusions, and opinions expressed in this article represent those of the authors and do not necessarily represent the views of the NSF nor other affiliated or contributing organizations.