Whole-genome sequences suggest long term declines of spotted owl (Strix occidentalis) (Aves: Strigiformes: Strigidae) populations in California

We analyzed whole-genome data of four spotted owls (Strix occidentalis) to provide a broad-scale assessment of the genome-wide nucleotide diversity across S. occidentalis populations in California. We assumed that each of the four samples was representative of its population and we estimated effective population sizes through time for each corresponding population. Our estimates provided evidence of long-term population declines in all California S. occidentalis populations. We found no evidence of genetic differentiation between northern spotted owl (S. o. caurina) populations in the counties of Marin and Humboldt in California. We estimated greater differentiation between populations at the northern and southern extremes of the range of the California spotted owl (S. o. occidentalis) than between populations of S. o. occidentalis and S. o. caurina in northern California. The San Diego County S. o. occidentalis population was substantially diverged from the other three S. occidentalis populations. These whole-genome data support a pattern of isolation-by-distance across spotted owl populations in California, rather than elevated differentiation between currently recognized subspecies.


Introduction
Spotted owls (Strix occidentalis) in California are an important protected species that is declining throughout most of its range. Their status has enormous economic implications (Montgomery et al. 1994 (Barrowclough et al. 2011;Tempel et al. 2017). 40 Strix occidentalis occidentalis occurs from this line south to the southern Sierra Nevada and in mountainous areas of southern California, including the coastal ranges west of the Central Valley as far north as the Santa Lucia Mountains and Monterey Bay (Barrowclough et al. 1999(Barrowclough et al. , 2011(Barrowclough et al. p. 2017; Davis and Gould 2008;Gutiérrez et al. 2017). Strix occidentalis caurina currently has a threatened listing status under the United States of America Endangered Species Act as well as 45 the California Endangered Species Act. Strix occidentalis occidentalis is in decline throughout much of its range (Tempel et al. 2017). Although not presently listed under either endangered species act, S. o. occidentalis is currently designated by the California Department of Fish and Wildlife as a Species of Special Concern (Davis and Gould 2008). There have been multiple excellent studies of the population genetics of Strix occidentalis 50 caurina and S. o. occidentalis. Most of these relied on mitochondrial DNA (Barrowclough et al. 1999(Barrowclough et al. , 2005(Barrowclough et al. , 2011Haig et al. 2004), which comprises a single genetic locus that does not always accurately represent the history of the entire nuclear genome. Additional studies analyzed microsatellites (Funk et al. 2008), RAPD markers (Haig et al. 2001), or a combination of mitochondrial DNA and microsatellites (Miller et al. 2017), but these mostly focused on S. o. 55 caurina and northern populations of S. o. occidentalis. A whole genome analysis would test whether earlier conclusions based on mitochondrial genes and several nuclear markers are generalizable. By sequencing whole genomes, we can acquire data for thousands of independent loci and, even with one individual per population, we can obtain information for thousands of additional samples of the evolutionary history of a population as compared with previous S. 60 occidentalis genetic studies.
The barred owl (Strix varia) has invaded the entire range of Strix occidentalis caurina and is in contact with S. o. occidentalis from the Lassen area to the southern Sierra Nevada (Keane 2017). Strix occidentalis and S. varia hybridize, but the frequency and primary directionality of introgression is unknown. In order to detect introgression on the leading edge of 65 the S. varia expansion in California, we need a better understanding of the genetic variation present in pure S. occidentalis populations. The recently assembled genome of a Strix occidentalis caurina individual from Marin County, California (Hanna et al. 2017d(Hanna et al. , 2017c provides a useful resource for characterizing the genomic variation in S. occidentalis throughout California. Here we report data from high-70 coverage whole-genome sequences of that individual and new sequences of three additional S. occidentalis. Our sampling spans the northern and southern geographic extremes of S. o. caurina and S. o. occidentalis within California (Figure 1). We use these data to provide a broad-scale assessment of the genomic divergence of S. occidentalis in California and estimate long-term trends in population size. 75

Results
The nuclear reference genome contained 1,113,365,877 non-N nucleotides after masking. Our final set of variants included 486,582 biallelic sites. Across individuals, the mean coverage at the final set of variant sites ranged from approximately 33.2 to 56.7X (Table S1). Grouping the results by subspecies, we recovered a higher nucleotide diversity within Strix occidentalis 80 occidentalis (1.676 × 10 -4 ) than S. o. caurina (1.454 × 10 -4 ) in California (Table 1). We calculated an FST of 0.1727 for the two subspecies. When we treated each sample as a separate population, additional patterns emerged. We found no divergence (FST = 0) between northern spotted owl populations in California. The Marin County population had a lower nucleotide diversity (1.374 × 10 -4 ) than the Humboldt 85 County population (1.591 × 10 -4 ); between them, these two populations had the lowest nucleotide diversity (1.399 × 10 -4 ) of any population pair (Table 1) Across all individuals, the proportion of the genome comprised by runs of homozygosity (ROH) ranged from approximately 6.08-36.23%, with the lowest proportion found in the Humboldt County population and the highest in the San Diego County population (Table S2). We found the two longest tracts in the Marin and San Diego County populations (7.5 and 8.9 Mb, respectively) ( Figure 2). As the majority of the tracts across all individuals were less than 4 100 Mb long and none of the ROH exceeded 10 Mb (Figure 2), we found no evidence for very recent inbreeding (e.g. within the last three generations). Rather, the frequency and length of the ROH suggest low historic population sizes, especially in the Marin and San Diego County populations. Our estimates of the effective population size (Ne) of each population indicate that all four populations have undergone long-term population declines over the past approximately 70,000 105 generations. Of note, the Ne of the Strix occidentalis caurina populations grew about an order of magnitude larger approximately 3,000 to 700 generations ago before declining again ( Figure 3). Taken with the caveat that recent population size changes are difficult to estimate accurately using the SMC++ method or other similar methods, especially with low sample sizes (Terhorst et al. 2017

Discussion
Early genetic studies of spotted owls (Strix occidentalis) utilizing mitochondrial DNA (mtDNA) sequence data concluded that S. o. caurina and S. o. occidentalis are largely reciprocally monophyletic, but that there are migrants across the subspecies' boundaries and evidence of incomplete lineage sorting (Barrowclough et al. 1999(Barrowclough et al. , 2005Haig et al. 2004 (Gutiérrez et al. 1995), five (Barrowclough and Coats 1985;Barrowclough 160 et al. 1999), or ten years (Noon and Biles 1990; USDA Forest Service 1992) as the S. occidentalis generation time, the S. o. caurina effective population size trough and subsequent growth 3,000 generations ago could coincide roughly with the beginning of the retreat of glaciers in North America approximately 19,000-20,000 years ago after the last glacial maximum (Clark et al. 2009). However, it is important to consider that recent population crashes in some or all of 165 these populations (Gutiérrez 1994;Davis and Gould 2008;Dugger et al. 2015;Tempel et al. 2017) may be affecting our ability to estimate historic population sizes accurately (Terhorst et al. 2017 sequencing. Our whole-genome data show the San Diego County population being clearly diverged from the other three populations. The genetic diversity of the San Diego population was the lowest of the four. Previous investigators have reported low to no genetic diversity in the mitochondrial control region of Strix occidentalis occidentalis populations in southern California 185 (Barrowclough et al. 1999(Barrowclough et al. , 2005Haig et al. 2004). Haig et al. (2004) (Chi 2006) and 0.090 (Henke 2005) for mtDNA and microsatellite markers, respectively. A mtDNA study by Barrowclough et al. (2005)

Sequence data
We utilized whole genome sequencing data generated by a previous study (Hanna et al. 2017c 50 km northwest of the California and northern spotted owl contact zone identified by Miller et al. (2017).

Alignment and filtering
We used Trimmomatic version 0.36 (Bolger et al. 2014) along with relevant adapter files (Bolger et al. 2014;Hanna 2018a) to remove adapter sequences and low quality leading and 270 trailing bases. For all four samples we used BWA-MEM version 0.7.17-r1188 (Li 2013) to align the processed sequences to the S. o. caurina nuclear genome "StrOccCau_1.0_nuc.fa" (Hanna et al. 2017d(Hanna et al. , 2017c with the mitochondrial genome sequence from Hanna et al. (2017a) added to the nuclear genome reference sequence. We merged and sorted the alignments and then marked duplicate sequences using Picard version 2.17.6 (http://broadinstitute.github.io/picard; Accessed 275 2018 Mar 22).

SNP calling and filtering
We called variants using the Genome Analysis Toolkit (GATK) version 3.8.0 HaplotypeCaller tool (McKenna et al. 2010;DePristo et al. 2011;Van der Auwera et al. 2013) to produce a genomic variant call format (gVCF) file for each sample. We then used the GATK 280 version 3.8.0 GenotypeGVCFs tool to produce a VCF file containing the variant information for all samples. We used the GATK version 3.8.0 SelectVariants and VariantFiltration tools to filter the variants and remove those of suspect quality. We then followed an adapted version of the GATK guidelines for base quality score recalibration BQSR (https://gatkforums.broadinstitute.org/gatk/discussion/2801/howto-recalibrate-base-quality-285 scores-run-bqsr; Accessed 2018 Mar 15) and used the GATK version 3.8.0 BaseRecalibrator and PrintReads tools to recalibrate all of the alignment files based on our set of high-quality variants. We performed a new round of variant calling using the GATK version 3.8.0 HaplotypeCaller and GenotypeGVCFs tools. We filtered the variant file to retain only high-quality, biallelic single nucleotide polymorphisms (SNPs) that were not on the mitochondrial genome using the GATK 290 version 3.8.0 SelectVariants and VariantFiltration tools as well as GNU Awk (GAWK) version 4.2.0 (Free Software Foundation 2017). In order to utilize up-to-date repetitive sequence libraries to identify and remove variants in low-complexity and repetitive regions, we performed a new masking of repetitive sequences in the reference Strix occidentalis caurina genome StrOccCau_1.0_nuc.fa (Hanna et al. 2017d, 295 2017c) with the mitochondrial genome from Hanna et al. (2017a). We created a de novo model of the repeats in the reference genome using RepeatModeler version 1.0.8 , which utilized RepeatMasker version open-4.0.7 (Smit et al. 2013) with the Repbase RepeatMasker libraries version 20170127 (Jurka 1998(Jurka , 2000Jurka et al. 2005;Bao et al. 2015), RMBlast version 2.2.28 , tandem repeats finder (TRF) version 4.09 (Benson 300 1999), RECON version 1.08 (Bao and Eddy 2002), and RepeatScout version 1.0.5 (Price et al. 2005). We first performed a homology-based masking of repeats and low-complexity regions and then implemented a second round of masking using the de novo repeat model we had generated  (Eddy 1998; http://hmmer.org). We created a browser extensible data (BED) formatted file of the repeat and low-complexity regions using seqtk version 1.2-r94 , sorted them using sort (GNU coreutils) version 8.25 (Haertel and Eggert 2016), and then used BEDTools version 2.25.0 (Quinlan and Hall 2010) to remove variants from our VCF 310 file that fell in those regions. Finally, we removed variants at sites where the unfiltered read depth exceeded the mean plus five times the standard deviation, as suggested by the GATK documentation (https://software.broadinstitute.org/gatk/documentation/article.php?id=3225; Accessed 2018 Mar 16), using dp_cov_script.sh from SPOW-BDOW-introgression-scripts version 1.1.1 (Hanna et al. 2017b). We calculated the mean and standard deviation of the 315 coverage depth across variant sites using DP_sample_calc.sh from genetics-tools version 1.0.0 (Hanna 2018b).

Population size and diversity statistics
After compressing and indexing our variant and repetitive region files using the bgzip and tabix tools from HTSlib version 1.7 (Li 2011;, we treated each of the 320 four samples as its own population and estimated the effective population size of each through time using SMC++ version 1.12.2.dev10+gaf60af6 (Terhorst et al. 2017) with the per-generation rate of collared flycatcher (Ficedula albicollis) (Smeds et al. 2016). We filtered the SMC++ files using GAWK version 4.2.0 and the GNU core utilities find version 8.25 (Decker et al. 2016), zcat version 8.25 (Gailly et al. 2016), and rm version 8.25 (Rubin et al. 2016). We converted our 325 variant file format using BCFtools version 1.6 (Li et al. 2017) with HTSlib version 1.6 (Davies et al. 2017) and GAWK version 4.2.0. To estimate nucleotide diversity and population differentiation, we calculated πBet ween (Nei and Li 1979) and the genome-wide fixation index (FST) (Hudson et al. 1992) between Strix occidentalis caurina and S. o. occidentalis as well as between each of the population pairs using the tableFstPi script from diversity-divergence-stats version 330 1.0.0 (Wall and Hanna 2018) with cat (GNU coreutils) version 8.25 (Granlund and Stallman 2017). In addition, we calculated πWit hin (Nei and Li 1979) for each of the subspecies and populations using the tablePiTheta script from diversity-divergence-stats version 1.0.0 with cat version 8.25. In order to check for evidence of recent inbreeding that could serve as an

Additional information and declarations
345

Competing interests
The authors declare that no competing interests exist.  We list here the taxonomic identification of each sample, the county and state of origin, and the sample collection date.

Figures
385 Figure 1. California map of the collection locations of the Strix occidentalis specimens from which we obtained whole-genome sequences for this study. The yellow shading illustrates the distribution of Strix occidentalis in California. We have indicated the locations of the Pitt River and Lassen Peak. Owls to the north of the region between these two geographic features are considered northern spotted owls (S. o. caurina) while those to the south of this region are 390 California spotted owls (S. o. occidentalis) (Barrowclough et al. 2011;Tempel et al. 2017).

Supporting information
575 Table S1. Variant site coverage values for each sample. The site coverage is the average sequence coverage across all variant sites in our final variant set after filtering. "SD" stands for "standard deviation". Table S2. Homozygous fraction of genome. The total length of all runs of homozygosity 580 (ROH) expressed as a fraction of the length of the reference genome for each population. Table S3. Specimen institution data. We here provide information regarding the collections that archive specimens of the Strix occidentalis individuals we utilized in this study. Table S4. Genomic sequence details. We here provide sample identifier information, including 585 the NCBI Sequence Read Archive (SRA) run accessions of the raw genomic sequences we analyzed in our study.