Genomic analyses of the extinct Sardinian dhole (Cynotherium sardous) reveal its evolutionary history

The Sardinian dhole (Cynotherium sardous)1 was an iconic and unique canid species of canid that was endemic of Sardinia and Corsica until it became extinct at the end of the Late Pleistocene2–5. Given its peculiar dental morphology, small body size and high level of endemism, several canids have been proposed as possible ancestors of the Sardinian dhole, including the Asian dhole and African hunting dog ancestor 3,6–9. Morphometric analyses3,6,8–12 have failed to clarify the evolutionary relationship with other canids. We sequenced the genome of a ca 21,100 year old Sardinian dhole in order to understand its genomic history and clarify its phylogenetic position. We found it represents a separate taxon from all other living canids from Eurasia, Africa and North America, and that the Sardinian and Asian dhole lineages diverged ca 885 ka. We additionally detected historical gene flow between the Sardinian and Asian dhole lineages, that ended approximately 500-300 ka, when the landbridge between Sardinia and mainland Italy was broken, severing their population connectivity. Our sample showed low genome-wide diversity compared to other extant canids - probably a result of the long-term isolation - that could have contributed to the subsequent extinction of the Sardinian dhole.


Summary
The Sardinian dhole (Cynotherium sardous) 1 was an iconic and unique canid species of canid that was endemic of Sardinia and Corsica until it became extinct at the end of the Late Pleistocene [2][3][4][5] . Given its peculiar dental morphology, small body size and high level of endemism, several canids have been proposed as possible ancestors of the Sardinian dhole, including the Asian dhole and African hunting dog ancestor 3,6-9 . Morphometric analyses 3,6,[8][9][10][11][12] have failed to clarify the evolutionary relationship with other canids.
We sequenced the genome of a ca 21,100 year old Sardinian dhole in order to understand its genomic history and clarify its phylogenetic position. We found it represents a separate taxon from all other living canids from Eurasia, Africa and North America, and that the Sardinian and Asian dhole lineages diverged ca 885 ka. We additionally detected historical gene flow between the Sardinian and Asian dhole lineages, that ended approximately 500-300 ka, when the landbridge between Sardinia and mainland Italy was broken, severing their population connectivity. Our sample showed low genome-wide diversity compared to other extant canids -probably a result of the long-term isolation -that could have contributed to the subsequent extinction of the Sardinian dhole.

Results & Discussion
We successfully resequenced the genome of a Sardinian dhole (SD) specimen from Corbeddu Cave (Sardinia) (Fig 1A-B) to an average coverage of ca. 5x (Table S2). This sample has been radiocarbon dated to ca 21,137 calibrated years before present ( Fig 1C) and shows high DNA damage levels (Fig S1A-C). Analysis of the reads coverage at each chromosome revealed that the Sardinian dhole specimen was female (Fig S1-E).
To place the Sardinian dhole in an evolutionary context along other canids, we analysed it together with 46 previously published canid genomes (Table S3). The samples used as reference dataset span from 4.4X to 28.2X genome wide coverage and cover the species individuals (excluding the outgroup). Consistent with previous studies 13 , we found that along PC1 (55.9%) and PC2 (8.45%), the African hunting dogs (AHDs) cluster together and are differentiated from the other canids included in this study. At the same time, the Sardinian dhole holds a distinct placement, whereby, along the first two principal components, it is placed near the two representative modern Asian dholes (Fig 2A). The second principal component separates the genus Canis from the dhole samples. Similarly, when we performed PCA excluding the AHDs, the first component placed the Sardinian dhole between the Asian dhole and the Canis group (Fig S2-A). To further investigate the relationships between the Sardinian dhole (Cynotherium sardous) and other canids, we performed an admixture analysis based on genotype likelihoods for all the samples excluding the Andean fox ( Fig. 2B and Fig S2-B). Clearly, when only two ancestral clusters (K=2) were estimated, the most basal canids -AHDs -separated from the Canis genus, with the two dhole species being represented as mixtures of these two clusters.
Upon increasing the number of estimated ancestry clusters to four (K=4), we observed a division between the AHDs, the dholes, wolves/dogs and the rest of the canids. Specifically, the AHD maintains the same structure while the Sardinian dhole and the Asian dholes are grouped into a cluster of their own. Upon increasing the number of estimated ancestry clusters, the samples fall in clusters according to species and/or populations, with the Sardinian dhole clustering with the two Asian dholes.
To explore the phylogenomic placement of the Sardinian dhole among other canids, and especially understand its relationship with the Asian dhole, we used ASTRAL-III 14 to estimate the species tree of the canids included in this study by combining 1000 gene trees estimated from randomly chosen 5kb regions across the nuclear genome. The estimated species tree was rooted using the Andean fox as the outgroup (Fig S2-C). In the multispecies coalescent tree estimated by ASTRAL-III, the Sardinian dhole forms a distinct clade, inside the Asian dhole and sister to Canis. We subsequently tested the discordance between the species tree and the gene trees to quantify the uncertainty of the branch that split the Sardinian dhole, Asian dhole, Canis and the basal canids. The frequencies of the three bipartitions induced by the aforementioned branch (identified as branch 16 in Fig 3A) are shown in Fig 3B, along with similar measures for all the internal branches of the species tree.
Two of the three possible bipartitions induced by branch 16 have a frequency greater than 33% -the cutoff previously shown to be required for identifying the true topology 15 .
Essentially, although gene trees clustering the Sardinian dhole the Asian dholes are more likely, both topologies, i.e. Sardinian Dhole clustering with Asian dholes or Canis are observed in more than 33% of all gene trees, implying that both these topologies could represent the true phylogeny. A) Species tree phylogeny generated by Astral-III estimated for 1000 genomic regions. The tree was rooted on the Andean fox (Lycalopex culpaeus). Monophyletic clusters were collapsed into the same leaf node. B) DiscoVista relative frequency analysis. Each box title indicates the corresponding branch on the tree in panel A and shows the frequency of three topologies around the focal internal branch of ASTRAL species tree. The first topology represented in red is the main topology followed by the other 2 alternatives in blue. On the y-axes the relative frequency is indicated and the dashed lines represent the ⅓ threshold. On the x-axis each quartet topology is shown using the neighboring branch labels. C-D) In these panels the gene flow among different canids is shown using ABBA-BABA test in ANGSD. On the y-axes of panel C) different combinations of Asian dholes (AD) and AHD individuals are considered (e.g. from the top to the bottom dhole (Berlin zoo)-AHD (Kenya), dhole (Beijing zoo)-AHD (Kenya) and so on). In the panel C) there is significant gene flow between the Asian dhole (AD) and the African hunting dog (AHD) showing a higher degree of genetic affinity between these two groups compared to the Sardinian dhole (SD) and AHD. In panel D) allele sharing between the AD and SD is higher than when considering AD with any of the other canid lineages in the study. E) Model of the phylogenetic relationships among canids augmented with admixture events. The qpGraph shown here was estimated considering the pairwise D-statistics. Dotted lines represent admixture events and the estimated mixture proportion is shown along them (%).
Genetic drift (expressed in drift units per 1,000) is shown along solid lines. This admixture graph represents the best fitting graph (-3 > Z < +3) to model African hunting dog-like ancestry into the Asian and Sardinian dhole and Ethiopian wolf.
The uncertain placement of the Sardinian dhole on the phylogenetic tree led us to investigate historical gene flow events between this species and the other canids. D-statistics, which uses quartets of populations/samples, were used to identify gene flow between canid lineages. We computed the D-statistics using all possible triplets of samples with the Andean fox as the outgroup (Table S4). Assuming that the ancestor of the Cynotherium arrived in Sardinia-Corsica through a terrestrial connection between the islands and mainland Europe, this sets the possible colonization of Sardinia-Corsica at ca 5 Ma, during the Messinian salinity crisis 16 , or around 3 Ma -close to the Plio-Pleistocene boundary 12,17 . However, since then, there is no proof of a landbridge that would allow movements of species between the two islands and the continent.
Nevertheless, Pleistocene mammals of Sardinia are currently divided into two major faunal complexes, an older one (Nesogoral Faunal Complex) and a younger one (Tyrrhenicola Faunal Complex) divided by the end of the Early Pleistocene 18,19 . Given this complex scenario, a number of earlier studies considered the Sardinian dhole as a subgenus of Xenocyon, Cuon, or a derived form of Canis 3,10-12 that might have reached Sardinia and Corsica by sweepstake or passive dispersal at the transition between Early and Middle Pleistocene 4,20 . This phenomenon is especially feasible during periods of fluctuation of the sea level and have been known to contribute to the faunal turnover in Sardinia 4,18,20 .
Therefore, to further investigate the demographic history of dholes, we used admixture graphs to test whether the diffuse ancestries can be explained by AHD-like admixture in Asian dhole or Canis-like admixture in Sardinian dhole. First, we estimated the proportion of ancestry derived from the ancestor of the Asian dhole into the Sardinian dhole using AHD, dholes, Ethiopian wolf and Andean fox as outgroup. We find that the node representing the ancestral population of the Asian and Sardinian dholes derives 60% ancestry from the node that diverged from the ancestral AHD population in the past ( Fig 3E). A second admixture event brings 25% of AHD ancestry into only the ancestor of the Asian dholes. Subsequently, when including the Portuguese wolf in the previous graph we found that it was necessary to model one more admixture event (see Fig S4) between the ancestors of the Ethiopian wolf and the dholes. The Ethiopian wolf is best modelled as a mixture of 4% from an ancestral population related to the dholes and 96% from the Canis lineage. This last admixture event confirms the results obtained in admixture using four ancestry components, (Fig 2B) in which the Ethiopian wolf shares a proportion of its ancestry with the dholes.
Given the geological history of Sardinia and Corsica, along with cycles of long-term isolation and past colonization, we explored the effects of isolation on genetic diversity in the Sardinian dhole using estimates of genome-wide heterozygosity and effective population size. We thus first inferred the heterozygosity in sliding windows for all the representative canid species in our dataset and we found that the Sardinian dhole shows remarkably low levels of heterozygosity across the whole genome, comparable to other isolated canids with small population sizes, such as the Ethiopian wolf and the AHD (Fig. 4A). We then calculated the divergence time between the Sardinian dhole and the contemporary Asian dhole using the statistic F(A|B), which estimates the probability of an individual A (Sardinian dhole) carrying the derived allele at sites that are heterozygous in individual B (Asian dhole). The assumption behind this approach is that when two populations start to diverge, they will also accumulate mutations that -due to isolation -will not be shared with other populations. We estimated that the Sardinian dhole carried the derived allele at ~2% of sites that were heterozygous in the Asian dhole. The proportion of derived alleles was then used as a summary statistic to estimate the divergence time of the Sardinian dhole and the Asian dhole from simulations of multiple divergence times using the population history of the Asian dhole inferred from PSMC. The simulations were calibrated using the wolf mutation rate (μ = 4 x 10 -9 /bp/generation) 30  not confined to their present-day ranges. For instance, the first fossil of a Lycaon dating back to the Middle Pleistocene was found in Israel, suggesting that the ecological boundaries that separate these two species were not in place yet 35 . This could explain the 25% ancestry derived from the AHD lineage observed only in the Asian dhole (Fig 3E), which probably occurred after the Sardinian dhole lineage became isolated in Sardinia and Corsica.
In conclusion, by generating the first whole-genome of a Sardinian dhole, we clarified the phylogenomic placement of this species among the genomic diversity of living canids from Eurasia, Africa and North-America. We detected historical gene flow between the Sardinian dhole and Asian dhole lineages, and evidence of past admixture from the ancestor of the AHD into the Asian dhole lineage. We found that the Sardinian dhole lineage diverged from the Asian dholes around ca 885 ka, followed by post-divergence gene flow ceasing later, between 560 and 310 ka, meaning that Sardinia could have been colonized several times during the Pleistocene. However, from these results it can be inferred that, probably sometime during the Middle Pleistocene, the physical barrier created by the sea separating Sardinia from the continent effectively ended further canid migrations. We also found that our sample shows an overall reduced genome wide diversity that -together with the longterm isolation on the island -could have contributed to its extinction. In this complex and delicate situation, the hypothesis that humans played a crucial role in the extinction of the last Pleistocene mammalian fauna cannot be excluded 36,37 . However, the direct or indirect anthropogenic pressure exerted on the Sardinian dhole population could have been a concause -together with the long-term isolation, loss of genetic diversity and reduction of Ne -leading to the extinction of this species. The results presented here highlight also the importance of sequencing ancient genomes from island ecosystems to improve our knowledge of the past evolution, colonization and migration events of extinct species.

Samples Information
A Cynotherium sardous petrous bone (CB83-D1101) from Corbeddu cave was selected at the collection of the National Archaeological Museum of Nuoro. Corbeddu cave is an archeopaleontological site located in the Lanaittu Valley (Oliena, Nuoro Province, Northeastern Sardinia, 40.254921°, 9.485078°) famous for its lithic and osteological evidence of H. sapiens 38,39 , disputedly referred to the Late Pleistocene or Early Holocene 40,41 . Cynotherium sardous remains, among which a nearly-complete skeleton, come mainly from the upper level of layer 3 of hall II 3,42 . Such specimens represent the last occurrence of the species, recently re-dated by Palombo and colleagues 41 to 12,945 ± 75 years.

Data generation
The sample was processed under strict clean laboratory conditions at the GLOBE Institute, University of Copenhagen. Small petrous bone chunks of around 350 mg, were divided into 3 eppendorf tubes -A, B and C (ca 120 mg each) -and washed with diluted bleach, ethanol and ddH2O, following Boessenkool et al. 43 .  Table S1 for further details). From the extract C two aliquots of 16 l were used to build BGIseq compatible libraries following Mak et al. 46 and Carøe et al 45

Radiocarbon Dating
Fragments of the petrous bone of the Sardinian dhole were submitted for AMS 14

Quality control and alignment
Short reads obtained from BGI and Illumina sequencing platforms were processed using PALEOMIX v1.2.13 58 pipeline. The same pipeline was run for each sample in the study. In the first step, the adapters were trimmed with AdapterRemoval2 59 with default settings and the alignments were performed against the wolf reference genome 60 and the dog reference genome (CanFam3.1) 61 using BWA v0.7.12 backtrack algorithm 62 with minimum base mapping quality set to 0 to ensure that all the reads were retained in this process. Mapping quality and base quality filters will be applied in the later steps of the analysis. PCR duplicates were filtered out using Picard MarkDuplicates v2.9.1 63  Post mortem DNA damage profiles and nucleotide misincorporation patterns (sub C>T and G>A) were computed using mapDamage2.0 66 .

Sex determination
No Y-chromosome reference sequence is available for CanFam3.1 therefore to infer the biological sex of the Sardinian dhole we used the statistics generated from mapping to the dog reference genome. Rstudio v1.2.5003 67,68 and ggplot2 69 were used to visualize the depth of coverage for all the chromosomes of the Sardinian dhole. We then identified as female the individual if the depth of coverage for the X chromosome was similar or exceeded the mean coverage of all the autosomal chromosomes.

Genotype likelihoods
The samples analysed in the present study span from 4.4X to 28.2X coverage, including in this range also the ancient Sardinian dhole. In order to avoid biases, when possible, we use genotype likelihood over genotype calling. SAMtools model (-GL 1) in ANGSD 70 was used to estimate the genotype likelihood at variant sites for all the scaffolds above 1 MB. Bases with base quality lower than 20 and reads with mapping quality lower than 20 were discarded (-minQ 20 -minmapq 20). We retained sites with the default minimum depth for a minimum of 95% of the individuals in our dataset (-minInd 44) and the following parameters: -remove_bads 1 -baq 1 -C 50 -uniqueOnly 1. The resulting output, generated in beagle format, was then used for the PCA and Admixture analysis.

Principal components analysis
To explore the genetic affinities in our data we performed the principal component analysis (PCA) using PCAngsd v0.98 71 on the 46 individuals' genotype likelihood panels obtained using ANGSD. A covariance matrix was created and we then used Rstudio to calculate the eigenvectors and eigenvalues on the covariance matrix file and used ggplot2 to plot the PCA.

Admixture
The genotype likelihood file generated with ANGSD was used as input for NGSadmix v32 72 to generate the ancestry cluster and proportion of admixture for 46 samples in the dataset using 3008607 SNPs. The outgroup, the Andean fox, was excluded from this analysis and we retained sites with the default minimum depth for a minimum of 95% of the individuals in our dataset (-minInd 44). The structure in the dataset was computed using 2 to 15 clusters (K) and for every cluster the analysis was repeated 100 times to ensure convergence to the global maximum. For each K the replicates with the best likelihood scores were chosen and used as input file list (option -i) in pong 73 to visualize the ancestry clusters. The options -n and -l were used to assign the samples' order and a color for each cluster respectively.

Heterozygosity in sliding windows
The heterozygosity per sample was estimated using ANGSD, by calculating the per sample folded site frequency spectrum (SFS). We generated a saf.idx file based on individual genotype likelihoods using GATK (-GL 2) from the scaffolds larger than 1Mb (704 scaffolds) for each bam file (doSaf 1 -fold 1) and we excluded transitions (rmtrans 1) and reads with quality score and bases with mapping quality lower than 20 (-minQ 20 -minmapq 20). Since we chose the option -fold 1 to estimate the folded SFS, the wolf reference genome was used both as reference and as ancestral (-ref and -anc options). The repeat regions were masked using a repeat mask of the wolf reference genome 60 The scaffolds that were longer than 1 Mb were partitioned into overlapping windows of size 1 Mb with a step size of 500 kb using the bedtools windows tool. Windows shorter than 1Mb at the end of the scaffolds were discarded. The SFS for each window was estimated using the realSFS utility tool provided in ANGSD and subsequently the ratio heterozygous sites/total sites was calculated to provide the final heterozygosity per window. R studio 67 and ggplot2 and dplyr 74 were used to visualize the heterozygosity level at each window and to create a violin plot for a subset of samples.

Heterozygosity in sliding windows -reference bias and downsampling
To ensure the robustness of our results to the choice of reference genome, we performed the heterozygosity in sliding windows using the same samples mapped to the dog reference genome. Similarly, in order to ensure that the different depths of coverage across the different samples did not lead to significant differences in the heterozygosity estimates, we downsampled 2 individuals, one Italian wolf (ItalianWolf_W050) and one Syrian wolf, of 50% of their original coverage using SAMtools view and the option -s 0.5. Heterozygosity in sliding windows was computed on the subsampled data following the same procedure as for the full data. The newly generated bam files were then sorted and indexed using SAMtools sort and index and went through the same steps used for the original bam files.

Nuclear genome phylogeny
All the individuals in the dataset, including the outgroup (Andean fox) were used to construct the nuclear genome phylogeny. First, ANGSD was used to generate a consensus sequence for each genome in our dataset using the wolf genome as reference. Each base was sampled based on the consensus base (-dofasta 2). Bases with base quality lower than 20 and reads with mapping quality lower than 20 were discarded (-minQ 20 -minmapq 20). The minimum coverage for each individual was set to 3x (-setminDepthInd 3 ) and the following additional filters were used: -doCounts 1 -remove_bads 1 -uniqueOnly 1 -baq 1 -C 50. We then selected 1000 random regions, each 5000bp long, from the wolf reference genome using bedtools random 75 with the following parameters: -l 5000 -n 1000. Samtools 76 was used to generate a fasta file for each region using the consensus sequence generated by ANGSD. For each region, the consensus sequences across all samples were combined into a single multi-sequence fasta file. Subsequently, for each region, RAxML-ng 77 was used to reconstruct the phylogeny using the evolutionary model GTR+G. The regionwise gene trees were concatenated together and a species tree was estimated using Astral-III 14 with the default parameters, retaining all the branches in the gene trees, i.e. the different individuals of the same species were not collapsed into a single group. Interactive Tree Of Life (iTOL) v4 78 online tool was used to visualise the species tree estimated by Astral-III.
Further, we used DiscoVista 79 to visualize the discordance between the 1000 gene trees and the species tree generated with ASTRAL-III. For this step, the samples belonging to the same species were then collapsed together using an annotation file (option -a) and the Andean fox was specified as outgroup to root the tree by using the option -g. The option -o was used to create an output folder with the resulting DiscoVista tree and the plot showing the relative frequency analysis which were used to evaluate the three topologies for each internal branch of the tree.
Gene flow between the Sardinian dhole, AHD and Asian dholes.

D-statistics (ABBA-BABA)
The program ANGSD was used to investigate the presence and extent of gene flow between the Sardinian dhole and the other canids included in this study. We restricted our analyses to only the scaffolds over 1 MB. To compute the D-statistics using ANGSD, sites with base quality and mapping quality lower than 20 (-minQ 20 -minMapQ 20) were discarded, and at each site a single allele was randomly sampled (-doAbbababa 1). Transitions were also removed (-rmTrans 1) from the analysis to avoid biases introduced by ancient DNA damage and the following options were used: -doCounts 1 -useLast 1 -blockSize 1000000. We tested all triplets of samples, using the Andean fox as the outgroup. The subset of triplets representing the correct tree topology, as estimated by ASTRAL-III and DiscoVista, were considered for testing gene flow hypothesis between the Sardinian dhole and the other species. Using a similar approach, we also investigated gene flow between Asian dholes and AHDs. Finally, D-statistics with a Z-score between 3 and -3 were not considered significant.

Calling of polymorphism and filtering
We used GATK v3.4.0 with the option -T Haplotype caller 80 tool to call variants for all the samples in our dataset, using the wolf genome as reference (option -R). The option -L was used to specify the regions of our interest (scaffold above 1 Mb) and sites with base quality and mapping quality lower than 20 (-mmq 20 -mbq 20) were discarded. The resulting vcf files were then compressed and validated using VCFtools v0.1.8 81 option VCF-validator.
VCFtools was subsequently used for further filtering the VCF files by excluding indels, sites with minimum depth below 5 (-minDP 5). All the sites with more than 90% of missing genotypes (--max-missing-count) over all individuals were excluded and the option --plink was used to generate the plink files in MAP and PED format. The plink files were then merged using Plink v1.9 82,83 and used to generate the files BED BIM and FAM. The following options were applied: --merge-list, --allow-extra-chr and --keep-allele-order. qpGraph qpGraph is part of ADMIXTOOLS software 84 and we used it to reconstruct the different relationships across the species in the study by comparing the various f statistics (f2, f3, f4) and generating an admixture graph with the best fitting admixture proportions and branch length (in unit of genetic drift). In particular, the graph generated using this tool is the representation of the relationship between 6 main lineages: Sardinian dhole, Asian dhole, AHDs, Ethiopian wolf and grey wolves.
In order to run qpGraph the PLINK files (BED BIM and FAM) were converted into EIGENSTRAT format using the package ConvertF implemented in ADMIXTOOLS. Then a qpGraph par file was created to specify working directory, the genotypename, snpname and indivname files. The options hires: YES, lsqmode: YES, blgsize: 0.005 and diag: .0001 were applied.
The samples were clustered into different groups representing the main lineages: AHDs, Asian dholes, Sardinian dhole, Ethiopian wolf and grey wolf. The best strategy was to start with a small graph and later add the other populations. Therefore, we built a graph topology including only the outgroup (the Andean fox), 6 AHDs, 2 Asian dholes, and the Sardinian dhole. When constructing the graph, it is important to do so considering the tree topology already available (see ASTRAL and DiscoVista trees) in order to specify the root, the outgroup and every other lineage representing a leaf of the graph and connected through nodes. A graph with a Z-value within 3 and -3 was considered significant, meaning that the topology of the graph fits with the combination of the f-statistics. Instead, when the Z-value is higher than 3 or lower than -3 the topology does not fully represent the relationship between the samples in the study and changes are necessary. Therefore when constructing the graph, the Sardinian dhole was modelled as sister clade to all possible internal and external nodes and as admixed from different node pairs and the Z values were then evaluated.
Subsequently, once confirmed that the graph had a good fit (-3 > Z < +3) we added the Ethiopian wolf and later a grey wolf (Portuguese wolf) and evaluated their fit in the phylogeny as explained above.

Split time analysis -F(A|B)
We also investigated the divergence time between the two dhole lineages by computing the probability F(A|B) that an (ancient) individual A (in this case the Sardinian dhole) carries a derived heterozygous allele in an individual B (Asian dhole). We estimated the standard error using a block jackknife estimate of the statistic, using a block size of 1 MB to partition the genome into non-overlapping regions. The assumption behind this approach is that when two populations start to diverge, they will also accumulate mutations that -due to isolation -will not be shared with other populations. Therefore, we first polarized the alleles to ancestral and derived using the Andean fox genome and we then called haplotypes with minimum base quality of 25 for the population B that we used to select only the heterozygous sites. We used these sites to compute the probability that the individual A would carry the derived allele, by randomly sampling each allele. To calibrate the probability F(A|B) to the population size history of the Asian dhole, we computed PSMC on the higher coverage genome Asian dhole in our dataset (Beijing Zoo dhole). The PSMC inference was based on the parameters, -N25 -t15 -r5 -p "4+25*2+4+6", and the output was used to simulate 900 mb through msHOT 85,86 .
Different divergence times were computed, every 10 ka, spanning from 30 ka to 400 ka (expressed in generation time). We found the divergence time range by identifying the intersection between the empirical value line and the expected decay of F(A|B) as a function of split time.

hPSMC
To estimate the end of gene flow between the Sardinian dhole and the Asian dhole we used hPSMC 32 . We first used ANGSD to generate haploid consensus sequences mapped to the dog reference genome and considering only autosomes. Bases with base quality lower than 30 and reads with mapping quality lower than 30 were discarded (-minQ 30 -minmapq 30).
The minimum depth was set to 2x (-setminDepth 2) and the following quality filters: -remove_bads 1, -uniqueOnly 1, -baq 1 and -C 50. The two fasta files generated were combined into a diploid sequence using the hPSMC tool psmcfa_from_2_fastas.py.
Subsequently, we ran the psmcfa output through PSMC with the parameters (-p) "4+25*2+4+6", number of iterations = 25 (-N25), maximum 2N0 coalescent time = 15 (-t15), initial theta/rho ratio = 5 (-r5). We used psmc_plot.pl to translate into a plot this information and we assumed a mutation rate of 4 x 10 -9 per base pair per generation 30 and 3 years generation time 30,31 . The pre-divergence effective population size (Ne) estimated from the output was used to run simulations using hPSMC_quantify_split_time.py script from hPSMC tool with different divergence times, between 100,000 and 700 ka in 50 ka intervals using ms 85 .