Ancient genomes redate the extinction of Sussemionus, a subgenus of Equus, to late Holocene

The exceptionally-rich fossil record available for the equid family has provided textbook examples of macroevolutionary changes. Horses, asses and zebras represent three extant subgenus of Equus lineage, while the Sussemionus subgenus is another remarkable Equus lineage ranging from North America to Ethiopia in Pleistocene. We sequenced 26 archaeological specimens from northern China in Holocene showing morphological features reminiscent of Equus ovodovi, a species representative of Sussemionus, and further confirmed them as this species by genetic analyses. Thus, we present the first high-quality complete genome of the Sussemionus that we sequenced to 12.0× depth-of-coverage and demonstrate that it survived until ∼3,500 years ago, despite the continued demographic collapse during the Last Glacial Maximum and the great human expansion in East Asia. We also confirmed the Equus phylogenetic tree, and found Sussemionus diverged from the ancestor of non-caballine equids ∼2.3-2.7 Million years ago and admixture events could have taken place between them. Our works suggest the small genetic diversity but not the enhanced inbreeding mainly limited the chances of survival of the species, and illustrates how ancient DNA can inform on extinction dynamics and the long-term resilience of species surviving in cryptic population pockets.

(HH06D) is highlighted with a square. The black circles indicate sites that provided 113 complete mitochondrial genome sequences in previous studies (Druzhkova et al.,114 2017; L. Orlando et al., 2009;Vilstrup et al., 2013;Yuan et al., 2019). The temporal 115 range covered by the different samples analyzed is given in years before present (YBP) 116 and follows the name of each site. Numbers between parentheses indicate the number   the extinct quagga zebra (Huang et al., 2015;Jonsson et al., 2014;Kalbfleisch et al., 133 2018; L. Renaud et al., 2018) (Table S3). All the Chinese 134 8 specimens analyzed in this study were found to cluster together along the first two 135 PCA components, in a group that was distinct from all other equine species ( Figure   136 2A and Figure 2-figure supplement 3) but closer to non-caballine equine species 137 than to the horse (Figure 2A). This suggested that they were all members of a unique 138 taxonomic group, most related to non-caballine equids. 139 Maximum likelihood (ML) phylogenetic analyses including the nearly-complete 17 140 mitochondrial genomes reported in this study (Table S1,  provided specimens that belonged to the E. (Sussemionus) ovodovi species. 147 We also inferred maternal lineages of E. ovodovi using complete mitogenomes. 148 Phylogenetic analyses showed that the species were mainly divided into two major 149 haplogroups (A and B), of which most of the specimens were characterised by the 150 haplogroup B, including all previously reported sequences, whereas 2 Muzhuzhuliang 151 and 3 Honghe individuals belonged to the new haplogroup A ( Figure 2B). Further 152 distinction could also be carried out in the two haplogroups ( Figure 2B). 153 To further assess phylogenetic affinities, we used the two genomes characterized to 154 at least 3× average depth-of-coverage (HH04D and HH06D) to replace Sussemiones 155 within the equine phylogenetic tree. To achieve this, we used ML phylogenetic  Figure 2C). Similar tree topologies were recovered using whole-genome SNPs 160 by TreeMix (Pickrell & Pritchard, 2012) (Figure 2-figure supplements 8 and 9). 161 Combined with the analysis of the occlusal surface of the molars, in particular the 162 absence of the caballine notch, the shape of metacones and protocones, and the 163 reduced tooth size ( Figure 1B), our analyses allowed us to conclude that the material 164 analyzed represented small specimens of the extinct Equus (Sussemionus) ovodovi. 165 This lineage, thus, survived in China during the Holocene, and until cal 3,477-4,481 166 cal BP, which is approximately ~8,300-9,300 years after the latest known specimen to 167 date (Druzhkova et al., 2017;L. Orlando et al., 2009;Vilstrup et al., 2013;Yuan et al., 168 2019).   (Jonsson et al., 2014). We thus 208 next assessed whether the genomic data showed evidence for gene flow between 209 Sussemiones and other non-caballine equids. To achieve this, we first applied 210 D-statistics (Soraggi, Wiuf, & Albrechtsen, 2018) to the genome sequence underlying 2). This suggested that at least one admixture event could have taken place between 214 Sussemiones and the ancestor of asses after their divergence from zebras. 215 We next leveraged the ancient genome characterized to high depth-of-coverage 216 (HH06D) to reconstruct the equine demographic history using G-PhoCS (Gronau,217 Hubisz, Gulko, Danko, & Siepel, 2011). More specifically, we first selected members 218 of each equine lineage representing a total number of 10 genomes, and assumed that 219 the genus Equus emerged some 4.0-4.5 Mya, following previous estimates (L.  (Eisenmann, 2010).

225
Allowing for migrations provided support for gene flow between Sussemiones and the 226 13 ancestor of asses and zebras ( Figure 3). However, weak to no migrations were 227 detected between Sussemiones and extant equids (Table S6). Importantly, the 228 admixture between Sussemiones and the ancestor of asses seems to have been 229 stronger than that between Sussemiones and the ancestor of zebras, in line with the 230 results of D-statistics. G-PhoCS also supported the presence of significant 231 unidirectional gene-flow prior to ~2.3-2.7 Mya, from the horse branch into the 232 ancestral branch to all non-caballine equids, including Sussemiones (total migration 233 rate 2.2-9.2%, Table S7). This is consistent with previous HMMCoal analyses applied 234 to whole genome sequences of all extant equine species, which indicated significant   Dynamic demographic profiles, heterozygosity and inbreeding levels 260 We next leveraged the high-coverage Sussemiones genome characterized here to 261 explore further the demographic dynamics of this lineage until its extinction. When Burchell's zebra (Table S8). Pairwise Sequential Markovian Coalescent (PSMC) 265 analyses, however, provided us evidence for population size variation through time. 266 First, the Sussemiones demographic trajectory was found to diverge from that of other 267 non-caballine equids (specifically, E. hemionus) after ~2.0 Mya, confirming the 268 divergence date estimate retrieved by G-PhoCS (Table S8). Second, we found that the 269 Sussemiones demographic trajectory constantly increased during the last million year 270 but stay at a level which was lower than that of other lineages for a long time, until it 271 reached a peak between 74-84 kya. It was, then, followed by an approximately    Holocene with the most recent specimens analyzed dating to ~3,477-3,637 cal BP.

315
This is almost 9,000 years after the latest specimens previously documented in the 316 fossil record (Druzhkova et al., 2017;Vilstrup et al., 2013;Yuan et al., 2019). Our 317 work, thus, shows that Sussemionus represents the last currently known Equus 318 subgenus to become extinct. Our work also adds to the list of recently identified 319 members of the horse family that were still alive at the time horses and donkeys were 320 first domesticated, some ~5,500 years ago (Fages et al., 2019;Gaunitz et al., 2018;321 Rossel et al., 2008). In contrast to those divergent members that were identified in    Limited genetic diversity before extinction 374 The demographic profile of Sussemiones shows that after the peak of population size 375 culminating some ~74 kya, Sussemiones went through a slow and continuous decline was previously described for the woolly mammoth (Palkopoulou et al., 2015).

385
In conclusion, our study clarifies the phylogenetic placement, speciation timing and The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

539
Convergence was assessed visually using Tracer v1.6 and posterior date estimates 540 were retrieved using 10% as burn-in. The final consensus tree was produced using 541 TreeAnnotator 2.5.1.0 (Drummond & Rambaut, 2007) and plotted using ITOL 542 (Letunic & Bork, 2016) (Figure 2-figure supplement 6). higher than 3 were considered to be statistically significant. To eliminate the bias of 579 the reference genome, we also rerun the same analysis using sequence alignments 580 against the donkey reference genome (Figure 3-figure supplement 2). for G-PhoCS (Gronau et al., 2011). Genotypes were called by GATK and candidate 587 'neutral' loci were identified by applying the following filters:

588
(1) The simple repeats track available for the reference genome was obtained from 589 Ensembl v99 release; corresponding regions were masked.

590
(2) All exons of protein-coding genes were discarded together with their 10 kb 591 flanking regions; this was done based on the GTF format annotation file of the 592 reference genome available from Ensembl v99 Genome Browser.

593
(3) We identified conserved noncoding elements (CNEs) using phastCons scores 594 (based on the 20-way Conservation track provided for the mammal clade according to 595 the genomic coordinates of the human reference) downloaded from the Table Browser   596 of UCSC. All CNEs and their 100 bp flanking regions were masked, using liftOver to 597 convert human genome coordinates into EquCab3.0 horse genome coordinates.

598
(4) Exons of noncoding RNA genes together with their one kilobase flanking 599 regions were removed, based on the annotations available for the reference genome.

600
(5) Gaps in the reference genome were disregarded.

601
Besides the various hard filters described above, regions/sites likely to (1) be 602 enriched for misaligned bases, and to (2)

607
We selected 1 kb loci located with minimum inter-locus distance of 30 kb from the 608 intervals that pass all the criteria described above. Then consensus sequences were 609 generated for each individual from the vcf file generated in Section 5 using bcftools by calibrating θ and τ parameter using g and μ (Table S6), given by: Ne = θ/(4μg) and

636
T =τ/μ (Gronau et al., 2011).  (Table S6). A significant migration band was considered 651 supported if both the 95% Bayesian credible interval of total migration rate (M) did 652 not include 0 and the mean value of M was estimated to be greater than 0.03.

653
Settings for the migration bands between extant caballines are based on previous 654 research (Jonsson et al., 2014). The significant migration band from horse to the 655 non-caballine ancestor were identified (Table S6), in line with previous work (Jonsson 656 et al., 2014). However, no other non-negligible (M > 3%) migration bands was found 657 in our analyses (Table S6). 658 We then tried to estimate the migration events between E. ovodovi and other  (Table S6).

665
Finally, the total four migration bands were combined into one demographic model 666 (Table S7) and compared the estimates to the one including no migration (Table S8). In order to reconstruct the past demographic dynamics of the E. ovodovi lineage, we 671 applied the PSMC algorithm (version 0.6.5-r67) (Li & Durbin, 2011)  After filtering the bases with Phred quality scores strictly lower than 35, we ran 681 PSMC with following command: 'psmc -N25 -t15 -r5 -p "4+25*2+4+6" '. Calibration 682 was carried out using a generation of 8 years and mutation rate of 7.242×10 -9 per 683 generation per site, following previous work (Jonsson et al., 2014). However, as for supplements 1 and 2), we also performed analyses without transitions using mutation 686 rates of 2.3728×10 -9 that was obtained assuming that the most recent common 687 ancestor of living equine species emerged 4 Mya (L. . 688 We found a great expansion of HH06D in the past 50,000 years when retaining   (Table S5).  (Table S5).