Abstract
The vaquita is the most critically endangered marine mammal, with fewer than 19 remaining in the wild. First described in 1958, the vaquita has been in rapid decline resulting from inadvertent deaths due to the increasing use of large-mesh gillnets for more than 20 years. To understand the evolutionary and demographic history of the vaquita, we used combined long-read sequencing and long-range scaffolding methods with long- and short-read RNA sequencing to generate a near error-free annotated reference genome assembly from cell lines derived from a female individual. The genome assembly consists of 99.92% of the assembled sequence contained in 21 nearly gapless chromosome-length autosome scaffolds and the X-chromosome scaffold, with a scaffold N50 of 115 Mb. Genome-wide heterozygosity is the lowest (0.01%) of any mammalian species analyzed to date, but heterozygosity is evenly distributed across the chromosomes, consistent with long-term small population size at genetic equilibrium, rather than low diversity resulting from a recent population bottleneck or inbreeding. Historical demography of the vaquita indicates long-term population stability at less than 5000 (Ne) for over 200,000 years. Together, these analyses indicate that the vaquita genome has had ample opportunity to purge highly deleterious alleles and potentially maintain diversity necessary for population health.
Introduction
In the afternoon of November 4, 2017, an adult female vaquita porpoise (Phocoena sinus), the smallest and rarest cetacean in the world, was captured in a massive effort to save the species by bringing into captivity as many as possible of the estimated maximum of 30 remaining individuals at the time (Thomas et al., 2017). This represented only the second live capture of a vaquita ever, the first of which, just a few weeks earlier, resulted in release of the animal after only hours when it showed signs of continuing stress. Despite the efforts of an international team of scientists and experts in porpoise capture and care, the second captured vaquita (V02F), suffered stress-induced cardiac failure and died approximately seven hours after initial capture (Rojas-Bracho et al., 2019). That death ended the effort by the Vaquita Conservation, Protection, and Recovery (VaquitaCPR) project to temporarily protect vaquita near their native habitat in the northern Gulf of California, near San Felipe, Mexico. However, the careful planning and presence of veterinarian experts in marine mammal stranding response allowed for an immediate necropsy that went through the night, with harvest and storage of ovaries and other tissues for delivery to facilities 260 miles north near San Diego, California for tissue culture and cryopreservation. By eight p.m. the next day, within 24 hours of the animal’s cardiac arrest, the tissues were delivered to the Institute for Conservation Research, San Diego Zoo Global, for the culture of cells from as many tissues as possible. After weeks of tissue culture, cells were harvested and banked for future research, and frozen samples sent to the Vertebrate Genome Lab at The Rockefeller University to extract ultra-high molecular weight DNA and RNA for genome sequencing, assembly and transcriptome annotation.
This extraordinary effort to extract as much information as possible from the VaquitaCPR project reflects the broad scientific value placed on biodiversity and conservation. Sequencing of reference genomes is increasingly recognized as an important contribution to identify, characterize and conserve biodiversity (Garner et al., 2016; Harrisson, Pavlova, Telonis-Scott, & Sunnucks, 2014; He, Johansson, & Heath, 2016; Kraus et al., submitted; Morin et al., in revision; Supple & Shapiro, 2018), especially for species that are naturally rare and difficult to study. Reference genomes provide primary data to understand evolutionary relationships (Arnason, Lammers, Kumar, Nilsson, & Janke, 2018; Zhou et al., 2018), historical demography (Armstrong et al., 2019; Andrew D Foote et al., 2016; Morin et al., 2018a; Robinson et al., 2016; Westbury, Petersen, Garde, Heide-Jorgensen, & Lorenzen, 2019), evolution of genes and traits (Autenrieth et al., 2018; Fan et al., 2019; A. D. Foote et al., 2015; Morin et al., in revision; Springer et al., 2016a; Springer, Starrett, Morin, Hayashi, & Gatesy, 2016b; Yim et al., 2014) and susceptibility to inbreeding and outbreeding depression (Chattopadhyay et al., 2019; Hedrick, Robinson, Peterson, & Vucetich, 2019; Robinson, Brown, Kim, Lohmueller, & Wayne, 2018; Tunstall et al., 2018). Genomic resources also provide the tools for broader studies of population structure, relatedness and potential for recovery (e.g., Garner et al., 2016; Morin et al., 2018b; Tunstall et al., 2018).
The vaquita was described for the first time in 1958 (Norris & McFarland, 1958) and has been characterized as a naturally rare endemic species, limited to shallow, turbid and highly productive habitat in the upper Gulf of California between Baja California and mainland Mexico (Rodriguez-Perez, Aurioles-Gamboa, Sanchez-Velasco, Lavin, & Newsome, 2018). The vaquita’s closest relatives are the congeneric Burmeister’s (P. spinipinnis) and spectacled (P. dioptrica) porpoises, which are found only in temperate and cold waters in the Southern Hemisphere, separated by at least 5000 km of ocean and two million years of divergence (Ben Chehida et al., in revision; McGowen, Spaulding, & Gatesy, 2009; Rosel, Haygood, & Perrin, 1995). Similar to other porpoises, vaquitas become entangled and die in gillnets set for finfish and shrimp (Rojas-Bracho & Reeves, 2013). The mortality rate was known to be unsustainable when studies on the bycatch rate (D’Agrosa, Lennert-Cody, & Vidal, 2000) and life history (Hohn, Read, Fernandez, Vidal, & Findley, 1996) were combined with the first abundance estimate of N=567 individuals (95% C.I. = 177-1073) in 1997 (Armando M. Jaramillo-Legorreta, Rojas-Bracho, & Gerrodette, 1999). The rate of decline has increased since approximately 2011 due to entanglement in illegal gillnets targeting totoaba (Totoaba macdonaldi), a large fish approximately the same size as the vaquita, captured for the black market trade of their swim bladders in China (Rojas-Bracho et al., 2019). The most recent estimates from 2018 indicate that fewer than 19 vaquita survive (A. M. Jaramillo-Legorreta et al., 2019). Initial genetics studies found no variation in mitochondrial DNA (mtDNA; Rosel & Rojas-Bracho, 1999) and low variation in the MHC DRB locus (Munguia-Vega et al., 2007). These authors have suggested that the low genetic diversity is due to long-term low effective population size (Ne) rather than to a recent bottleneck or the current rapid population decline (Munguia-Vega et al., 2007; Rojas-Bracho & Taylor, 1999; B. L. Taylor & Rojas-Bracho, 1999), but these data from few loci provide limited power to estimate timing or duration of demographic changes.
As part of the effort to prevent extinction of the vaquita and to further develop genomic resources to facilitate conservation and management planning for this and other endangered species, we used the Vertebrate Genomes Project (VGP) pipeline to generate a chromosomal-level, haplotype-phased reference vaquita genome assembly that exceeds the “platinum-quality” reference standards established by the VGP (Rhie et al., 2020a). The VGP standards are guidelines to ensure minimum error rates (QV40 or higher, or no more than 1 nucleotide error per 10,000 bp), highly contiguous and complete assemblies (contig N50 ≥ 1 Mb; chromosomal scaffold N50 ≥ 10 Mb), phasing of paternal and maternal haplotypes to reduce false gene duplication errors and manual curation to reduce errors and improve genome assembly quality. Based on the reference-quality assembly, we analyzed genomic diversity and historical demography to infer the cause of current low genomic diversity and whether genetic factors should be considered to be of concern for recovery if the immediate reason for decline, incidental bycatch in gillnets, can be halted in time to prevent extinction.
Materials and Methods
Genome data generation
Skin, mesovarium, kidney, trachea, and liver tissues were obtained during necropsy of the adult female vaquita that died during an attempt to begin ex-situ protection from illegal fishing operations (Rojas-Bracho et al., 2019). Cells were harvested and cultured at the Institute for Conservation Research, San Diego Zoo Global (Frozen Zoo®). From these cells, we generated a reference quality genome using the VGP pipeline 1.5 (Rhie et al., 2020a). In particular, we collected four genomic data types: Pacific Biosciences (Menlo Park, CA, USA) continuous long reads (CLR), 10X Genomics (Pleasanton, CA, USA) linked-reads, Bionano Genomics, Inc. (San Diego, CA, USA) DLS optical maps, and Arima Genomics, Inc. (San Diego, CA, USA) v1 Hi-C data. From one tube containing ~4 million cells in XPBS buffer with 10% DMSO and 10%
Glycerol, ultra-high molecular weight DNA (uHMW DNA) was extracted using the agarose plug Bionano Genomics protocol for Cell Culture DNA Isolation (Bionano Genomics, document No. 30026F). uHMW DNA quality was assessed by a Pulsed Field Gel assay and quantified with a Qubit 2 Fluorometer. From these extractions, 10 μg of uHMW DNA was sheared using a 26G blunt end needle (PacBio protocol PN 101-181-000 Version 05). A large-insert PacBio library was prepared using the Pacific Biosciences Express Template Prep Kit v1.0 (PN 101-357-000) following the manufacturer protocol. The library was then size selected (>20 kb) using the Sage Science BluePippin Size-Selection System and sequenced on 30 PacBio 1M v3 SMRT cells on the Sequel I instrument with the sequencing kit 3.0 (PN 101-597-800) and 10 hours movie. We used the same unfragmented DNA to generate a linked-reads library on the 10X Genomics Chromium linked-reads library (Genome Library Kit & Gel Bead Kit v2, PN 120258, Genome HT Library Kit & Gel Bead Kit v2, PN 120261, Genome Chip Kit v2, PN 120257, i7 Multiplex Kit, PN 120262). We sequenced this 10X Genomics library on an Illumina Novaseq S4 150 bp PE lane.
An aliquot of the same DNA was labeled for Bionano Genomics optical mapping using the Bionano Prep Direct Label and Stain (DLS) Protocol (document No. 30206E) and run on one Saphyr instrument chip flowcell. Hi-C reactions were performed by Arima Genomics according to the protocols described in the Arima-HiC kit (PN A510008). After the Arima-HiC protocol, Illumina-compatible sequencing libraries were prepared by first shearing purified Arima-HiC proximally-ligated DNA and then size-selecting DNA fragments from ~200-600 bp using SPRI beads. The size-selected fragments were then enriched for biotin and converted into Illumina-compatible sequencing libraries using the KAPA Hyper Prep kit (PN KK8504). After adapter ligation, DNA was PCR amplified and purified using SPRI beads. The purified DNA underwent standard QC (qPCR and Bioanalyzer (Agilent)) and was sequenced on the Illumina HiSeq X to ~60X coverage following the manufacturer’s protocols.
Transcriptome data generation
Total RNA extraction and purification was conducted with QIAGEN RNAeasy kit (PN 74104). The quality and quantity of all RNAs were measured using a Fragment Analyzer (Aligent Technologies, Santa Clara, CA) and a Qubit 2.0 (Invitrogen). PacBio Iso-seq libraries were prepared according to the ‘Procedure & Checklist - Iso-Seq™ Template Preparation for Sequel® Systems’ (PN 101-763-800 Version 01). Briefly, cDNA was reverse transcribed using the NEBNext® Single Cell/Low Input cDNA Synthesis & Amplification Module (NEB E6421S) from 238 ng total RNA. Amplified cDNA was cleaned with 86 μl ProNex beads. The PacBio Iso-seq library was sequenced on one PacBio 8M (PN 101-389-001) SMRT Cell on the Sequel II instrument with sequencing kit 1.0 (PN 101-746-800) using the Sequel II Binding Kit 1.0 (PN 101-726-700) and 30 hours movie with two hours pre-extension.
The same RNA was used for mRNA-seq. The RNA-Seq library was prepared with 100 ng total RNA using the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB, PN E7490S) followed by NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (PN E7760S). The library was then amplified over 14 cycles. Library quantification and qualification were performed with the Invitrogen Qubit dsDNA HS Assay Kit (PN Q32854). Libraries were sequenced on the Illumina NextSeq 500 in 150PE mid-output mode (Rockefeller Genomics Center). Data quality control was done using fastQC (v0.11.5; https://qubeshub.org/resources/fastqc).
Genome assembly and annotation
We assembled the vaquita genome using the VGP 1.5 pipeline on the DNAnexus cloud computing system (https://platform.dnanexus.com/). Briefly, this pipeline is composed of an assembly step, scaffolding step and final polishing step. First, we assembled raw PacBio data with Falcon 2.0.0/Falcon-unzip 1.1.0 (Chin et al., 2016). Then, we polished the primary and alternate contigs using the same PacBio reads with arrow (PacBio smrtanalysis 6.0.0.47841). Prior to scaffolding, we detected and reassigned haplotype duplicated contigs in the primary contig set using purge_haplotig 1.0.4 (Roach, Schmidt, & Borneman, 2018) and we also extracted the mitochondrial reads to assemble the mitochondral sequence (Formenti et al., in prep). From this step, we only scaffolded the primary contigs using 10X Genomics data with scaff10x 4.1 (https://github.com/wtsi-hpag/Scaff10X), Bionano CMAP with Bionano Hybrid Solve 3.3_10252018 (Bionano Genomics) and Hi-C data with Salsa 2.2 (Ghurye, Pop, Koren, Bickhart, & Chin, 2017). Finally, the resulting primary scaffolds and alternate contigs were processed together through three polishing rounds: one additional round of arrow polishing and two rounds of polishing using 10X Illumina data mapped with Long Ranger 2.2.2 (https://github.com/10XGenomics/longranger) and base calling with FreeBayes 1.2.0 (Garrison & Marth, 2012). Primary scaffolds and alternate contigs were contamination checked and curated manually using gEVAL (Chow et al., 2016). For the primary assembly, this resulted in a further reduction of scaffold numbers by 11% and an increase of the scaffold N50 by 12% to 115 Mb. The primary and associated alternate assemblies were submitted to NCBI (accession GCA_008692025.1), and annotation was performed through their standard pipeline incorporating our RNA-seq and Iso-seq data (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/). The primary assembly was screened for repetitive elements using RepeatMasker v4.0.5 (Smit, Hubley, & Green, 2013-2015) and the RepeatMasker combined database Dfam_Consensus-20181026. Base accuracy (QV) was measured using k=21 with Merqury (Rhie, Walenz, Koren, & Phillippy, 2020b). Gene content of the primary scaffolds was assessed using BUSCO v3.1.0 (Waterhouse et al., 2017) searches of the Laurasiatheria and mammalian gene set databases.
Historical demography
To conduct analysis of historical demography using pairwise sequentially Markovian coalescent (PSMC; Li & Durbin, 2011), we first generated a diploid consensus genome from the 10X Genomics paired-end reads aligned to the primary haplotype assembly (Armstrong et al., 2019). The reads were trimmed with the BBduk function of BBTools (sourceforge.net/projects/bbmap/), removing the first 22 nucleotides of the R1 reads introduced during the Chromium library preparation (https://support.10xgenomics.com/genome-exome/library-prep/doc/technical-note-assay-scheme-and-configuration-of-chromium-genome-v2-libraries) and trimming all reads for average quality (q≥20), 3’ ends trimmed to q≥15 and minimum length (≥40 nucleotides). Unpaired reads were removed from the trimmed fastq files using the BBTools repair.sh function. Trimmed reads were aligned to the vaquita mitogenome (accession CM018178.1) using BWA mem (Li & Durbin, 2009), and the unmapped reads exported as reads representing only the nuclear genome. Nuclear reads were aligned to the primary haplotype assembly (accession GCA_008692025.1), and duplicate reads removed using Picard-Tools (http://broadinstitute.github.io/picard/). The resulting genome alignments from four 10X Genomics libraries were assessed for average depth of coverage using ANGSD (Korneliussen, Albrechtsen, & Nielsen, 2014), and combined for 47.8X average depth of coverage. From this coverage pile-up, the diploid consensus genome was extracted (Li & Durbin, 2011) and used as input for PSMC with generation time of 11.9 years based on the estimated generation time of harbor porpoise (Barbara L Taylor, Chivers, Larese, & Perrin, 2007), and an autosomal mutation rate (μA) of 1.08 x 10-8 substitutions per nucleotide per generation (Dornburg, Brandley, McGowen, & Near, 2012). PSMC atomic time intervals were combined as suggested by the authors (https://github.com/lh3/psmc) such that after 20 rounds of iterations, at least ~10 recombinations are inferred to occur in the intervals each parameter spans: p = (8+23*2+9+1). The remaining parameters were left as the default values used for humans (Li & Durbin, 2011), and we performed 100 bootstrap resamplings on all PSMC analyses to assess variance of the model.
Genome-wide heterozygosity
The distribution of heterozygosity across the genome was determined using previously described analysis pipelines (Robinson et al., 2019). Briefly, we used HaplotypeCaller in the Genome Analysis Toolkit (GATK; McKenna et al., 2010) to call genotypes from the short-read pile-up (above), filtering out sites with <1/3X or >2X the average depth of coverage. Heterozygosity was calculated as the number of heterozygous sites divided by the total number of called genotypes in nonoverlapping 1Mb windows across each scaffold.
Modeling demographic effects on heterozygosity
A coalescent simulation was constructed to estimate recent effective population size (rNe), historical effective population size (hNe) and time since a bottleneck (b) in which the population reduced in size from hNe to rNe. The analysis computed the likelihood of the empirical distribution of the number of heterozygous sites per kb (Hkb) observed in 2244 1 Mb windows in the vaquita genome (from above) given similar distributions drawn from an equivalent genome arising from random draws of each of these parameters, which were sampled as:
We initially drew 50,000 random values from these distributions. We then randomly selected 20,000 of these values where average growth rates ((rNe / hNe) / b) were less than 1.06, as values above this were considered to be biologically improbable (B. L. Taylor et al., 2019).
For each of the 20,000 scenarios, we generated one million independent SNPs for a single individual with a mutation rate of 1.08 x 10-8 substitutions/site/generation and a generation time of 11.9 years. To capture variability in the coalescent, we ran 4488 replicates of each scenario, which was twice the number of ~1 Mb windows in the empirical vaquita genome. This ensured that we could produce enough random sets of 2244 1 Mb windows from which to compute the scenario likelihoods as described below. The simulations were run with fastsimcoal v2.6.2 (Excoffier, Dupanloup, Huerta-Sanchez, Sousa, & Foll, 2013) through the R package strataG (v4.9.05).
For each of the 4488 replicates of one million SNPs in a scenario, we calculated the number of heterozygous SNPs per KB (H’kb). We then drew a random 2244 values of H’kb without replacement to represent one simulated genome for this scenario. We fit a gamma distribution to these values, which was used to compute the negative sum of log-likelihoods (−logL) of the empirical Hkb from the vaquita genome. For each scenario, we repeated this random draw of 2244 values of H’kb and computation of −logL 100 tiimes and recorded the mean and standard deviation of −logL. Likelihoods were plotted as heatmaps of the LOESS smoothed fit of −logL across pairs of simulation parameters. LOESS models were fit to each pair of parameters separately, and the surfaces represent the predicted −logL of 100,000 (10,000 x 10,000) evenly spaced points across each plot.
Results
A highly contiguous assembly of the vaquita genome
We assembled a 2.37 Gb genome (Table 1) in only 64 scaffolds, of which 21 represented arm-to-arm autosomes, named according to synteny with the blue whale (Balaenoptera musculus) and the X chromosome, in agreement with the 22-chromosome karyotype. The remaining 42 unplaced scaffolds consisted of only 0.198 Gb combined (0.08% of the total length), meaning that 99.92% of the assembled sequence has been assigned to chromosomes. Consistent with this mostly complete assembly, the N50 contig value was 20.22 Mb (273 contigs), N50 scaffold was 115.47 Mb, and base call accuracy was QV40.88 (0.82 errors per 10,000 bp). There were only 208 gaps, of which the annotated chromosomes had 3-17 gaps each. The Hi-C heat-map showing genomic interactions (Figure 1) indicates strong agreement between the close interactions and chromosome-length scaffolds. The alternate haplotype contigs are made up of 1 Gb of the genome, indicating low heterozygosity. Depth of coverage for each data type are presented in Table 2. Assemblies of both primary and alternate haplotypes have been deposited at DDBJ/ENA/GenBank under the accessions VOSU00000000 (principle haplotype) and VOSV00000000 (alternate haplotype) in BioProjects PRJNA557831 and PRJNA557832, respectively.
BUSCO analysis showed 89.9% and 91.6% gene content identification from the primary haplotype when compared to the Laurasiatheria and mammalian data sets, respectively, with only 1.0 and 1.1% of the complete genes duplicated, respectively, and 4.3 and 4.6% fragmented (Supplemental Table S1). Genome annotation identified 26,497 genes and pseudogenes, 19,069 of which are protein coding (Table S2). The cumulative number of genes with alignment to the UniProtKB/Swiss-Prot curated proteins was 18,748 (89%) at ≥90% coverage of the target protein. This coverage was 5-48% higher than the number of genes aligned from other annotated cetacean genomes (Table S2). Similar to other cetacean genomes (e.g., Fan et al., 2019; Keane et al., 2015; Tollis et al., 2019), the vaquita genome consisted of about 46% repeats (Table 3) based on RepeatMasker.
Low heterozygosity of the vaquita genome
Genome-wide heterozygosity was 0.0105% overall, with even distribution of heterozygosity across the genome (Figure 2A). Heterozygosity per 1 Mb window ranged from 0 to 1.2/kb, but only two (noncontiguous) windows out of 2247 had no heterozygotes, and the standard deviation of heterozygosity across the windows was very low (SD = 0.0000767). None of the 1 Mb windows had heterozygosity of >1.3/kb, and 94% of the windows had heterozygosity of <0.2/kb (Figure 2B). In comparison to other mammals, the vaquita genome exhibits the lowest heterozygosity yet detected in an outbreeding mammalian species (Figure 3), with the exception of the San Nicolas Island fox (Urocyon littoralis), an endemic subspecies found only on a 58 km2 island approximately 100 km off the coast of California, with an estimated population size of about 500 individuals (Robinson et al., 2016). However, unlike the vaquita, heterozygosity is not evenly distributed across the genome in the San Nicolas Island fox and other small inbred populations of canids, due to the effects of recent inbreeding in addition to long-term small population sizes (Robinson et al., 2019).
Vaquita population size over time
This low, relatively even heterozygosity across the vaquita genome could be indicative of a long-term small, outbred population (Robinson et al., 2019; Westbury et al., 2019) To test this hypotheses, we performed PSMC analysis. The results indicates that the vaquita effective population size has been small, ranging from about 1,400 to 3,200 for most of the last ~300,000 years (Figure 4A). This finding corroborates previous conclusions based on single-locus analyses (Munguia-Vega et al., 2007; B. L. Taylor & Rojas-Bracho, 1999) but extends the duration of persistence of the species at low Ne to the mid Pleistocene, prior to the penultimate glacial period, the Saalian, which lasted from approximately 300,000 to 130,000 years ago.
Discussion
We have assembled the most complete cetacean genome to date, as measured by the low number of scaffolds, small number of gaps per chromosome scaffold, high percentage of scaffolds assigned to 22 chromosomes, cumulative number of genes with an alignment to the UniProtKB/Swiss-Prot curated proteins and small amount of missing data. Identification of gene content was also in the expected range for a high-quality mammalian genome at 90.5% of complete single-copy genes from the BUSCO mammalian gene set, with a low level of false duplicates and low levels of fragmented genes.
The PSMC analysis indicates that the vaquita population declined during the late Pleistocene, most likely due to climate change and the associated habitat changes in the eastern North Pacific coastal regions of North and Central America, and that it remained small over the last approximately 300,000 years. PSMC results can be affected by population structure, inbreeding, changes in connectivity among populations and stochastic variation in coalescent events when diversity is low (Beichman, Phung, & Lohmueller, 2017; Li & Durbin, 2011; Mazet, Rodriguez, Grusea, Boitard, & Chikhi, 2016; Orozco-terWengel, 2016). The coalescent results are consistent with the PSMC-inferred historical demography being the most likely cause of current heterozygosity levels rather than a recent severe bottleneck or inbreeding. Importantly, the duration of the small population size indicates that the observed level of heterozygosity is the result of a population at genetic equilibrium, where mutations are balanced by drift and selection, and that highly deleterious mutations are likely to have been purged from the population (Day, Bryant, & Meffert, 2003; Dussex et al., in revision; Robinson et al., 2018; Westbury et al., 2018; Westbury et al., 2019).
Examples of species with low diversity but long-term viability and potential for adaptability are becoming more common (Dussex et al., in revision; Andrew D Foote et al., 2019; Robinson et al., 2018; Westbury et al., 2018; Westbury et al., 2019; Xue et al., 2015). Among odontocetes (toothed whales, dolphins and porpoises), in particular, there are examples of species with nearly as low diversity as the vaquita that exhibit strong evidence of the influence of demographic factors influencing genome-wide diversity over tens to hundreds of thousands of years of diversification and adaptation (Andrew D Foote et al., 2019; Andrew D Foote et al., 2016; Van Cise et al., 2019; Westbury et al., 2019). In several of these cases where it has been examined, genome-wide heterozygosity patterns do not indicate that low diversity was caused by rapid bottlenecks or inbreeding; instead, these patterns indicate that low diversity has been present for extended periods while species persist and diversify (e.g., narwhal (Westbury et al., 2019), orca (Andrew D Foote et al., 2019)). These examples and others (Robinson et al., 2018; Robinson et al., 2016; Westbury et al., 2018) indicate that, contrary to the paradigm of an “extinction vortex” (Gilpin & Soulé, 1986) that may doom species with low diversity, some species have persisted with low genomic diversity and small population size. Long-term small population size enables the purging of recessive deleterious alleles, thereby reducing the risk of inbreeding depression, perhaps allowing for continued future persistence with relatively small population sizes and an increased tolerance to the genetic consequences of bottlenecks.
The vaquita’s current habitat in the upper Gulf of California was likely diminished or absent due to low sea levels several times through the last 350,000 years (Siddall et al., 2003), with the lowest sea level occurring at the end of the Saalian complex and the LGM (Figure 2) followed by a rapid rise of 120-140 m (similar to the present level) during the Eemian warm period between 115,000 and 130,000 years ago and after the LGM (Figure 5). Over much of the last 100,000 years, sea level has been intermediate between the high points (present and Eemian warm period) and lows (end of Saalian and the LGM) (Rohling et al., 2017). There is no fossil record or other indication that vaquita have ever inhabited colder parts of the eastern North Pacific along the west coast of Baja California, Mexico, or further north off of California at the southern end of the current range of the congeneric harbor porpoise (Phocoena phocoena) (Brownell Jr., 1983). The closest relative of the vaquita, the Burmeister’s porpoise or the ancestor of two sister species, Burmeister’s and spectacled porpoise (Ben Chehida et al., in revision), are both found only in temperate and cold waters of the southern hemisphere. Based on the closer relationship to southern hemisphere species and on the similar timing of rapid climate warming and vaquita population decline, it appears that climate change at the end of the Saalian ice age caused a northward shift of the species range, resulting in a remnant population being isolated in the Gulf of California, where it has persisted in the newly expanded and shallow, highly productive upper Gulf region.
The reference genome presented here has provided important insight into the demographic history of the critically endangered vaquita, reinforcing a previous hypothesis (B. L. Taylor & Rojas-Bracho, 1999) that the low genetic diversity of the vaquita is not due to a recent extreme bottleneck or current inbreeding. These results taken together with recent evidence of healthy looking vaquitas, often with robust calves (B. L. Taylor et al., 2019), suggest that population recovery may not be hindered because of genetic issues. Analysis of re-sequenced genomes from multiple individuals sampled over the previous few decades will shed light on changes in inbreeding as the population has declined due to bycatch in gillnets, and whether deleterious mutations are likely to have been purged from the genome as a result of the long-term persistence at a small population size, as has been suggested for some other species and populations (e.g., Dussex et al., in revision; Robinson et al., 2018; Westbury et al., 2018; Westbury et al., 2019).
Finally, this genome assembly is the highest quality, most complete genome in the odontocete lineage that consists of all dolphins, porpoises and toothed whales. As such, it provides a genomic resource for better reference-guided assemblies and scaffolding of other cetacean genomes (Alonge et al., 2019; Lischer & Shimizu, 2017; Morin et al., in revision) and for comparative genomics, especially for variation in genome structure. We expect that the vaquita genome, along with expected assembly of reference genomes for other endangered species, will continue to contribute to both understanding and conservation of global biodiversity (Kraus et al., submitted).
Data Availability
The vaquita reference genome and all sequence data are available via the Vertebrate Genome Project GenomeArk website (https://vgp.github.io/genomeark/Phocoena_sinus/) and NCBI Genome database (Bioprojects PRJNA557831 and PRJNA557832). Annotation is available at NCBI (http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Phocoena_sinus/100/). Ensembl annotation for the vaquita is available via the VGP pre-release data portal (projects.ensembl.org/vgp) and will be fully integrated into the Ensembl genome browser (ensembl.org), including comparative data, in release 101, due to go live by August 2020.
Author Contributions
PAM, EDJ and OAR initiated the project, and PAM, EDJ and OF designed and led research and analyses and co-wrote the manuscript. FIA, BH, JRB, JM, and OF generated data. JM and SPaez initiated the project for the VGP. AP, AR, BH, AF, GF, KH, JR, JTorrence, MJPC, WC, SPalen and YVB contributed to data processing and genome assembly. MLH, ACM, JAF and CDA cultured cell lines, and AW, BLT, CRS, FMDG, JTeilmann, LR-B, MPH-J, RSW, SS, TR and WM conducted the field work to obtain and process the tissue samples. All authors contributed to interpretation of results and preparation of the manuscript.
Acknowledgements
The planning and diligence of the VaquitaCPR team were critical in collection, preservation and rapid delivery of tissue samples for tissue culture that made this project possible. We are grateful to all people involved in obtaining and culturing the tissue samples. Kelly Robertson was instrumental in ensuring rapid import of the samples under CITES permit (permit No. 17US774233/9; Mexican export permit MX89760). Tissue samples are stored in the SWFSC Marine Mammal and Sea Turtle Research (MMASTR) collection under MMPA permit 19091. We thank Annabel Beichman for help with PSMC analysis and Prof. Eelco Rohling for assistance with interpretation of sea level data. We are grateful to Cisco Werner, Director of Scientific Programs and Chief Science Advisor for NOAA Fisheries, for funding sequencing of the vaquita genome. Earlier versions of the manuscript have been improved thanks to careful review by Love Dalén and Mark Chaisson.
Footnotes
DISCLAIMER: This work has not yet been peer reviewed. The scientific results and conclusions, as well as any views or opinions expressed herein, are those of the author(s) and do not necessarily reflect the views of NOAA or the Department of Commerce.