Gene flow between two thick-billed grasswren subspecies with low dispersal creates a genomic pattern of isolation-by-distance

Context In the era of the Anthropocene, habitat loss and environmental change threaten the persistence of many species. Genotyping-By-Sequencing (GBS) is a useful molecular tool for understanding how patterns of gene flow are associated with contemporary habitat distributions that may be affected by environmental change. Two parapatric subspecies of the threatened thick-billed grasswren (TBGW; Amytornis modestus) more frequently occur in different plant communities. As such, a preference for plant community type could reduce subspecific introgression and increase genetic diversity at the parapatric boundary. Aims We aimed to measure gene flow within and among two TBGW subspecies and tested whether divergent genomic markers were associated with plant community type. Methods We sequenced 118 individuals from either of the two TBGW subspecies or in the region of parapatry and identified 7583 SNPs through ddRADseq. Key results We found evidence of asymmetric gene flow and a genomic pattern of isolation-by-distance. There were sixteen genomic outliers correlated with plant community type (regardless of location). Conclusions These findings show that plant community type does not prevent introgression in one subspecies (A. m. raglessi), but low dispersal and habitat heterogeneity could contribute to the maintenance of distinct subspecific morphotypes. Local adaptation in different plant community types could also provide a mechanism for future divergence. Implications We suggest subspecific introgression could increase genetic variation and the adaptive potential of the species, facilitating species persistence under conditions of climate change. Introgression between grasswren subspecies Characterising gene flow facilitates conservation management. This study used genomic markers to measure gene flow between thick-billed grasswren subspecies and found results that support taxonomic identification of the two subspecies and suggests grasswrens have low dispersal and may benefit from increased genetic diversity. Recognition of models of divergence with gene flow will be necessary for future conservation management.

Finally, we used spatial autocorrelation (Smouse and Peakall 1999) in GenAlEx v6.5 307 to further evaluate spatial structure in the genetic data at an individual level. A 308 pairwise matrix with Roussets's a genetic distance (Rousset 1997;Rousset 2000) 309 between all individuals with the n-SNP dataset was calculated using SPAGeDi v1.4b 310 (Hardy and Vekemans 2002). Geographic distances between individuals were 311 calculated in GenAlEx using the same method to create the geographic distance 312 matrix for the mantel tests. Distance classes were sufficiently small enough to 313 evaluate any non-linear correlations with the autocorrelation coefficient (r) where the 314 sample size within each distance class was relatively even. We looked for the 315 presence of IBD within each distance class as well as the detectability of IBD across 316 multiple distance classes (Diniz-Filho and Pires de Campos Telles 2002). Significance 317 was assessed for both tests using 95% confidence intervals for the null hypothesis of 318 no spatial structure using 999 random permutations, and for estimates of r by 319 bootstrapping 999 pairwise comparisons for each distance class. 320 was <0.9 or >0.1 in any individual. For the STRUCTURE analysis, three replicate 335 runs for each K were analysed (as Standard Deviation of LnP(K) was small) using 336 default settings, unless stated. We used the admixture model with correlated allele 337 frequencies and an MCMC chain of 1,000,000 iterations with a burnin of 10,000 338 iterations to test K between 1 and 5. To estimate the probability of mixed ancestry for 339 each individual, the option ANCESTDIST was used. Admixture was inferred if the 340 confidence intervals of the individual population assignment did not include 1 or 0 in 341 all three replicate runs. STRUCTURE HARVESTER (Earl and vonHoldt 2011) was 342 used to estimate the best fitting value for K. When the highest LnP(K) was not K = 1, 343 then the most likely K was determined using Delta K (Evanno et al. 2005 We used samples from across three zones: zone A (n = 44), zone B (n = 61), and zone 370 AB (n = 13) to assess gene flow between the two parapatric TBGW subspecies. 371 Following DNA extractions, samples stored on FTA® cards produced considerably 372 lower quantities of DNA (<500 ng) compared to blood stored in salt solution (>1,000 373 ng). Following illumina sequencing, the average number of reads per sample (before 374 filtering) was 2,539,005 with a coefficient of variation of 24.6%. The average 375 between run reproducibility, calculated by determining when the genotype was the 376 same in duplicates on different plates, was 95.7% (n = 12,192 loci, excluding missing 377 genotypes). The average within run reproducibility, calculated by determining when 378 the genotype was the same in duplicates on the same plate, was 90.5% (n = 146,304 379 loci, excluding missing genotypes). The average genotyping error rate, calculated 380 from the number of allelic mismatches across duplicates, was 0.31% (n = 316,992 381 loci). There were no individuals that exceeded > 30% missing data; overall the dataset 382 contained 5.56% missing data. Following SNP calling in STACKS, we removed 5 383 loci that appeared in the negative control and 2929 loci that had low coverage across 384 samples. An initial PCA showed two putative genetic clusters with individuals from 385 zone A forming one cluster, individuals from zone B forming the second cluster and 386 19 potentially admixed individuals from zone AB and zone B ( Figure S1). We 387 observed similar amounts of missing data between the clusters, excluding potentially 388 admixed individuals (cluster 1 [zone A]: 6.41%, cluster 2 [zone B]: 6.58%). We 389 removed a further 625 loci from further analysis that did not conform to HWE and 390 5428 loci that could potentially introduce linkage disequilibrium. 391

Subspecies variation 392
The proportion of heterozygous SNPs per sample varied from 0.234 in a sample from 393 zone A to 0.301 in samples from both zone A and zone B. Mean ± SE estimates of 394 heterozygosity (He) were slightly higher for zone A and zone B compared to zone AB 395 (zone A = 0.303 ± 0.002, zone B = 0.304 ± 0.001; zone AB = 0.288 ± 0.002). The 1 4 number of private alleles within zone B was greater (n = 16) than for either zone A (n 397 = 1) or zone AB (n = 0). We identified 39 loci as potential F ST outliers under selection 398 which left 7543 loci that were treated as neutral loci not under selection. Therefore, 399 the dataset o-SNP contained 39 loci and the dataset n-SNP contained 7543 loci. Of the 400 39 outlier loci, nine loci were monomorphic in zone A; three loci were monomorphic 401 in zone B and four were monomorphic in zone AB. Of the four monomorphic loci in 402 zone AB, three were shared with the monomorphic outliers of zone A and one was 403 shared with the monomorphic outliers of zone B. Four outliers had hits to nucleotide 404 sequences from the zebra finch GenBank assembly (Table S2) but there were no 405 matches to protein sequences from the refseq assembly. These blast hits did not reveal 406 why there could be associations between outlier loci and plant community type. 407 Zone B had slightly more polymorphic loci in the n-SNP dataset (99.9%) compared to 408 zone A (99.2%). Using the n-SNP dataset, the proportion of total genetic variance was 409 shared among individuals and populations similarly when the region of parapatry 410 (zone AB) was combined with either zone A or zone B, or even when it was excluded 411 (Table 1). The proportion of variance in the case of n-SNP was greater among 412 individuals (mean 0.080%, p < 0.001) than among populations ( mean 0.008%, p < 413 0.001; (Table 1). Using the n+o-SNP dataset, the proportion of total genetic variance 414 explained by population was greater than that explained among individuals for all 415 three tests (A+AB v B, B+AB v A, A v B). This difference was greatest when zone 416 AB was excluded. When zone AB was not excluded the difference was greater when 417 combined with zone B (among individuals = 0.185%, p < 0.001; among individuals = 418 0.094%, p < 0.001). Using the n-SNP dataset, there was no difference in pairwise 419 estimates of F ST when the region of parapatry was combined with either zone A or 420 zone B (Table 2). The pairwise estimates of F ST using n+o-SNP was higher when 421 zone AB was combined with zone B (0.202, p < 0.001) compared to when zone AB 422 was combined with zone A (0.165, p < 0.001; Table 2). 423 where R 2 was also the highest and where ΔAIC C < 2 was for 3 sub-populations 443 (OOW, MTL, and MUR) to be potential outliers however this was not significant 444 (Table 3) Acacia spp and Rhagodia spinescens and was predominant in Zone B (Table S4). 490

Isolation-By-Distance
Using K = 2 output from structure, the LFMM analysis identified 328, 333 and 419 491 loci associated with PC1, PC2 and PC3 respectively. Of the 39 F ST outliers, there 1 7 were 12 loci that correlated with PC2 (two of these also correlated with PC3) and six 493 loci that correlated with PC3. No loci were found to correlate to PC1. The results 494 from BAYESASS suggested that zone AB received more migrants per generation 495 than zone A or zone B (Figure 6). Zone AB received more migrants per generation 496 from zone A than zone B; the mean ± SD migration from zone A = 21.0 ± 4.7% and 497 from zone B = 10.3 ± 4.6%. Zone B received some migration per generation from 498 zone A (4.5 ± 1.6 %), but zone A received < 1% migration per generation from either 499 zone B or zone AB. preference, landscape ecological productivity and stability, and mito-nuclear 620 incompatibilities. 621 This study shows that two parapatric TBGW subspecies introgressed and that gene 622 flow is asymmetric towards A. m. raglessi. Gene flow between the subspecies is 623 limited by distance probably due to the low dispersing ability of the species as well as 624 landscape heterogeneity. We suggest that these subspecies should be taxonomically 625 (and administratively) managed as distinct units despite considerable introgression. 626 Plant community type does not appear to limit geneflow nor does it provide a 627 mechanism for increased genetic diversity at the parapatric margin as was predicted. 628 However, adaptation to different plant community types suggests divergence with 629 gene flow could be a pathway towards increased genomic variation in the future. This 630 study provides an Australian arid zone example to show that gene flow between 631 subspecies can increase genetic variation within a species. Increased gene flow is 632 expected to facilitate persistence of the species through enhanced adaptive capacity. 633 Populations that contain distinct genomic variation should be managed separately 634 particularly in environments that are likely to be affected by future climate change. 635

Data availability statement 636
The data that support this study will be shared upon reasonable request to the 637 corresponding author. 638

Conflicts of Interest 639
The authors declare no conflicts of interest 640  Table S1. Numbers indicate sample size at that locality.     Table S1. The proportion of each colour shows the posterior mean proportion of ancestry from the subspecies A. m. indulkanna or western haplotype (dark grey) and A. m. raglessi or eastern haplotype (light grey).

Declaration of Funding
Individuals marked with an asterisk were identified as admixed.     Table 3. Fit of alternative isolation by distance models with and without putative outlier populations (see Figure 1 for population codes) n shows the number of populations in the model, AIC C shows the corrected Akaike's information criteria, ΔAIC C shows the difference in AIC C between alternative models. These values are used to assess the most likely model. Models are ranked from highest to lowest.