Abstract
Background One difficulty in testing the hypothesis that the Australasian dingo is a functional intermediate between wild wolves and domesticated breed dogs is that there is no reference specimen. Here we link a high-quality de novo long read chromosomal assembly with epigenetic footprints and morphology to describe the Alpine dingo female named Cooinda. It was critical to establish an Alpine dingo reference because this ecotype occurs throughout coastal eastern Australia where the first drawings and descriptions were completed.
Findings We generated a high-quality chromosome-level reference genome assembly (Canfam_ADS) using a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. Compared to the previously published Desert dingo assembly, there are large structural rearrangements on Chromosomes 11, 16, 25 and 26. Phylogenetic analyses of chromosomal data from Cooinda the Alpine dingo and nine previously published de novo canine assemblies show dingoes are monophyletic and basal to domestic dogs. Network analyses show that the mtDNA genome clusters within the southeastern lineage, as expected for an Alpine dingo. Comparison of regulatory regions identified two differentially methylated regions within glucagon receptor GCGR and histone deacetylase HDAC4 genes that are unmethylated in the Alpine dingo genome but hypermethylated in the Desert dingo. Morphological data, comprising geometric morphometric assessment of cranial morphology place dingo Cooinda within population-level variation for Alpine dingoes. Magnetic resonance imaging of brain tissue show she had a larger cranial capacity than a similar-sized domestic dog.
Conclusions These combined data support the hypothesis that the dingo Cooinda fits the spectrum of genetic and morphological characteristics typical of the Alpine ecotype. We propose that she be considered the archetype specimen for future research investigating the evolutionary history, morphology, physiology, and ecology of dingoes. The female has been taxidermically prepared and is now at the Australian Museum, Sydney.
Introduction
The most influential book on evolution, Darwin’s 1859 On the origin of species [1], starts with a chapter on domestication to reverse engineer natural selection. Some nine years later Darwin [2] expanded his initial thinking into the book The variation of animals and plants under domestication. He hypothesized that the process of domestication proceeded in a stepwise manner first by unconscious selection (wild ®tamed) followed by what we now call artificial selection (tamed ®domesticated), with the key distinction between these processes being the involvement of humans on mating and reproduction. A gap in our ability to test Darwin’s hypothesis has been the identification of a model system with an extant plant or animal that is intermediate between the wild ancestor and the domesticate. Here we explore the overarching hypothesis that the Australasian dingo (Canis (familiaris) dingo) is evolutionarily intermediate between the wild wolf (Canis lupus) and domestic dogs (Canis familiaris) [3]. One alternate hypothesis is that the process of domestication is continual and does not proceed in a stepwise manner [4], instead representing a series of phases reflecting an intensification of the relationship between a wild animal (or plant) and human societies [5].
The taxonomic name of the dingo remains unstable, however, it is now clear the Australasian dingo is a distinct evolutionary lineage closely related to domestic dogs [6]. The first European drawing of an animal referred to as a “dingo” appears in White 1790 [7] with a more complete anatomical description appearing in Meyer 1793 [8]. A “large dog” from coastal eastern Australia near Sydney was earlier illustrated by George Stubbs in 1772, based on a recorded description by Joseph Banks from 1770; it is now clear that this animal was a dingo, but the name had not yet been learned from the local Aboriginal people. We follow the precedent that when zoologists disagree over whether a certain population is a subspecies or a full species, the species name may be written in parentheses. Scientists advocating a General Lineage Species Concept consider dingoes to be distinct species (Canis dingo) or a subspecies of domestic dog (Canis familiaris dingo) [9-11]. Others advocating a Biological Species Concept [12] consider the dingo to be a breed of dog (Canis familiaris breed dingo) due to the interfertility between dingo and domestic dog [11, 13, 14].
Corbett [15] mentioned the possibility of three different dingo ecotypes existing in north, central and southeastern Australia. These are now referred to Tropical, Desert, and Alpine dingoes [16]. Subsequently, Corbett [17] noted that dingo skulls from southeastern Australia (Alpine dingoes) were genuinely different from those of the rest of the country, but posited the differences may be due to hybridization with domestic dogs rather than independent lineages. Jones [18] agreed that the southeastern dingoes, were distinct and suggested a revaluation of ecotype morphologies to resolve the conundrum.
Analyses of mitochondrial variation in canids from Southeast Asia supports the hypothesis that there are distinct dingo lineages [19-22]. Zhang et al. [19] found a strong Bayesian posterior value supporting the separation of Australian dingoes into two groups. One is a northwestern group, whereas the other is a southeastern group that clusters with New Guinea Singing dogs (Canis (familiaris) hallstromi). Support for two, or perhaps three, distinct lineages of dingoes has also come from Y-chromosome and SNP-chip data [23, 24].
The dog is the first species to be domesticated [25]. They are likely the most frequently kept domestic animal, exhibit exceptional levels of morphological variation, and many breeds have been developed by strong artificial selection in the past 200 years [26-28]. The Australasian dingo has been proposed to be a functional [29] and evolutionary [6] intermediate between wild wolves and domesticated dogs. Unfortunately, the absence of a dingo holotype reference specimen impedes our ability to definitively determine whether dingoes are a tamed intermediate or a feral canid because we do not have a single reference point that links the scientific name to a specific specimen [30].
This study aims to link high resolution long-read de novo chromosomal assembly, mitochondrial DNA sequence and the DNA methylome with morphological descriptions of head shape and computed tomography data of brain data to describe the ‘archetype’ dingo (Fig. 1). This designation will support future comparisons with a reference enabling further characterization of the evolutionary history of the dingo. In this case we do not propose any formal taxonomic name for the specimen as it is a regional morphotype that is being characterized however we suggest the principle of having a ‘type’ specimen makes biological sense and will enable the focusing of future research.
Results
Chromosome-level genome assembly
Workflow
The genome was assembled following a similar pipeline to Field et al. [28] (Supplementary Fig. 1). Briefly, 1722 contigs were assembled from SMRT and ONT sequence data with a total length of 2.38 Gb and N50 length of 12.4 Mb [31]. The contig assembly was then polished for two rounds with SMRT reads, correcting ∼5 million bases in the first round and ∼15 thousand in the second [32, 33]. The assembled sequence contigs were scaffolded sequentially using 10X linked-reads and polished with 10X linked-reads [33]. The scaffolded assembly was then super scaffolded with Bionano and Hi-C proximity ligation.
Supplementary Fig. 2 shows the contact matrices generated by aligning the Hi-C data set to the genome assembly after Hi-C scaffolding [34, 35]. To increase the contiguity of the assembly we used the SMRT and ONT reads to fill gaps, which was then followed by a final round of SMRT read polishing. The gap filling successfully closed 282 gaps increasing contig N50 to the final figure of 23.1 Mb. A final round of polishing was performed with 10X linked reads. The resulting chromosome-length genome assembly and its gene annotation was deposited to NCBI with accession number GCA_012295265.2.
Assembly statistics and completeness
The final assembly had a total length of 2,398,209,015 bp in 477 scaffolds with a scaffold and contig N50 of 64.8 Mb and 23.1 Mb, respectively (Table 1). Chromosome-level scaffolds accounted for 98.4 % of the assembly with only 0.9 % (21.1 Mb) of all sequences not aligning to a CanFam4.1 chromosome [36].
Evaluation by Benchmarking Universal Single-Copy Orthologs (BUSCO v5.2.2 [37]) against Carnivora_odb10 data set (n=14,502) indicated that 95.1 % of the conserved single-copy genes were complete (Table 1, Supplementary Fig. 3A). Only 3 of 13,791 complete (single-copy or duplicated) BUSCO genes were not on the 39 nuclear chromosome scaffolds.
Next, we compared single-copy “Complete” BUSCO genes in Alpine dingo Cooinda and nine canid genomes [6, 27, 28, 36, 38-41]). Of the 13,722 genes, 13,711 were found in the assembly using BUSCOMP v1.0.1. Only Sandy the Desert Dingo v2.2 (13,715 genes) and China the Basenji v1.2 (13,712 genes) had more.
Additional kmer analysis of the final assembly [42] yielded 97.32 % (97.2% in chromosomes) and an overall Q-score estimate of 37.5 (38.4 for chromosomes). No sign of retained haplotigs was evident (Supplementary Fig. 3B).
Comparison of dingo genomes
We generated a Circos plot [43] to represent the single-nucleotide variants (SNV) and small indel variation between the Alpine and Desert dingo (Fig. 2) using MUMmer4 [44], and sniffles v1.0.11 [45]. In comparison to the autosomes, these plots show low variation on the X chromosome (Fig. 2). To further investigate the low variation, we compared each of the dingoes to CanFam4 (Supplementary Fig. 4, Supplementary Table 1). We then generated a conservative consensus set of structural variants (SV) by merging PacBio, and Nanopore SV calls generated with sniffles [45, 46]. Overall, we found ∼half the number of SV and small variants calls relative to Desert dingo than to CanFam4 (32798 v 62524 and 1729790 v 3839712, respectively).
We generated synteny plots using MUMmer plot and GenomeSym [47]. Synteny plots between the dingo genomes show several large-scale chromosomal events. On chromosome 16 there is a 3.45Mb inverted region and a 0.9Mb complex rearrangement (Supplementary Fig. 5). This 3.45Mb inversion does not appear in the wolf or domestic dogs, so we speculate it is unique to the Desert Dingo assembly [6]. The inversion overlaps 60 unique ENSEMBL transcripts and was enriched for gene ontology terms of cellular metabolic processes, including glycolysis and glucose metabolism [6]. Also, on Chromosome 16, the 0.9Mb complex rearrangement occurs between 55 – 57 Mb downstream (Supplementary Fig. 5). Additional structural events include small inversions on Chromosome 11 and on Chromosome 25 (Supplementary Fig. 5). On the X chromosome, there appear to be multiple small nonsyntenic regions (Supplementary Fig. 5); however, further examination of these apparent differences is required to establish whether they are true biological differences or assembly artifacts.
In parallel, we used GeMoMa gene predictions [48] to investigate chromosomal level events. Like the synteny analyses, this approach revealed a large inversion and a disordered region on chromosome 16 as well as smaller inversions on Chromosomes 11 and 25. We also found two structural events on chromosome 26 (Supplementary Fig. 6) containing mostly short genes that are not perfectly conserved (Supplementary Fig. 5F). A MUMmer4 nucmer alignment plot [44] for chromosome 26 corroborated these events (Supplementary Fig. 6)
The Alpine and Desert dingo both have a single copy pancreatic amylase gene (AMY2B) on Chromosome 6. The Alpine dingo assembly does not include a 6.4kb long LINE that was previously reported in the Desert dingo [6].
Phylogenetic analyses
All 39 full-length chromosomes in the final assembly were aligned to the corresponding chromosomes in nine published canine de novo genome assemblies [6, 27, 28, 36, 38-41]). SNVs and small indels (deletions and insertions <50bp) were called using MUMmer4 call-SNPs module for all possible pairings (Supplementary Table 2). Distance matrices were generated from the inter-canid differences in SNVs and indels and then transformed to WA distance [6, 49]. Fig. 3AC show the phylogenetic tree from SNVs and indels respectively. Both figures show strong support for monophyly of dingoes and dogs relative to the wolf.
These figures also strongly support the hypothesis that dingoes are the sister group to domestic dogs. Fig. 3BD show the ordination analyses from SNVs and indels, respectively. Scores for the taxa calculated from the largest two axes (Axis 1 and Axis 2) describe 75.6% of the variance in SNV’s and 73.2% of the variance in indels (Fig. 3BD).
Mitochondrial genome
Genome assembly workflow
A 46,192 bp contig from the assembly mapped onto the CanFam reference mtDNA. It constituted a repeat of approximately 2.76 copies of the mtDNA. Following additional polishing and circularization, a final 16,719 bp mtDNA genome was extracted and has been uploaded to GenBank (OP476512).
Comparison of dingo mtDNA genomes
When the mtDNA genome of Alpine dingo Cooinda is compared with that of Desert dingo there is a single 10bp SV in the control region that highlights the repeat number difference. In the former, there are 28 repeats (RCGTACACGT) ACGTACGCGCGT, while in the latter, there are 29. Potentially the R(G or A) could represent heteroplasmy [50] that may be further studied with single cell sequencing approaches [51]. Folding this region [52] shows that increasing repeat number increases stem length and overall stability (Supplementary Fig. 7).
Next, we conducted a network analysis in Popart [53] to determine whether the mtDNA of dingo Cooinda fell within the previously described dingo southeastern or northwest clade (Fig. 4) [19, 22]. We included dingo mtDNA from four previous studies, a New Guinea Singing Dog, and an ancient Iron Age dog from Taiwan [6, 22, 54-56]. There were 89 segregating sites and 32 parsimony informative sites in the dataset. Predictably, there were no differences between the mtDNA genome of Cooinda and that previously published from her brother Typia [54]. Further, as expected, Cooinda and Typia mtDNA clustered with samples that had previously been collected from the Alpine region (Fig. 4). Somewhat unexpectedly, the mtDNA from Sandy the dingo found in the desert [6] did not cluster with dingoes from the northwest clade but was closer to canids in the southeastern clade (Fig. 4). This relationship could imply the introgression of Alpine alleles into the Sandy genome however further work would be needed to confirm this.
DNA methylome
To explore the regulatory landscape of dingo Cooinda, we performed whole genome bisulfite sequencing [58] on genomic DNA extracted from whole blood. In concordance with other adult vertebrates [59, 60], the Cooinda genome displays a typical bimodal DNA methylation pattern. Over 70% of CpG dinucleotides are hypermethylated (levels higher than 80%), and 5% of CpG dinucleotides hypomethylated (methylated at 20% or lower) (Supplementary Fig. 8A).
Next, to determine the number and genomic distribution of putative regulatory regions, we segmented the methylome into unmethylated regions (UMRs) and low-methylated regions (LMRs) using MethylSeekR [61]. UMRs are fully unmethylated and largely coincide with CpG island promoters, whereas LMRs display partial DNA methylation, characteristic of distal regulatory elements such as enhancers in other mammalian models [62]. MethylSeekR analysis identified ∼ 19,000 UMRs and ∼44,000 LMRs in line with previously reported numbers of promoters and enhancers (e.g., human: ∼18,000-20,000 UMRs and 40,000-70,000 LMRs; mouse: ∼17,000-19,000 UMRs and 55,000-90,000 LMRs) [61, 63] (Supplementary Fig. 8BC).
To establish whether proximal gene regulatory regions in the dingo Cooinda genome display different methylation states in the Desert dingo, we converted Cooinda UMR coordinates from Cooinda to the Desert dingo genome assembly using LiftOver (see Methods). Next, we calculated average DNA methylation at Cooinda UMRs and their corresponding lifted-over regions in the Desert dingo genome. We found two UMRs in the Cooinda dingo were hypermethylated in the Desert dingo. These regions overlapped gene bodies of glucagon receptor gene GCGR and histone deacetylase HDAC (Supplementary Fig. 8DE). GCGR is on chromosome 9 and has a single transcript. This transcript is 99.8% identical at the amino acid level between the dingoes. HDAC4 occurs on chromosome 25 and has 12 transcripts with all 12 transcripts being 100% identical at the amino acid level. Further studies are needed to determine the functional significance of the observed differences in DNA methylation.
Altogether, this data provides a genome-wide resource for the putative gene regulatory regions in the Alpine dingo genome, which will be instrumental for future studies.
Morphology
Skull Morphometrics
Cranial morphology (Supplementary Fig. 9A), quantified using 3D geometric morphometric landmarks, is that of a typical adult female Alpine dingo (Fig. 5). Within the morphospace defined by the principal components explaining the greatest variation between specimens (PC1, PC2), dingo Cooinda’s position is clearly within the Alpine cluster (Fig. 5A). Alpine and Desert dingoes are most clearly differentiated from one another along PC1 (15.70%), for which increasing values describe crania with relatively shorter and broader rostra, shallower orbitals with broader zygomatic arches at the glenoid fossa, prominent and anteriorly-positioned frontals, a higher cranial vault, and prominent sagittal cresting tending to terminate in a high, posteriorly-positioned occiput (inion). Positive values along PC2 (10.60%) mainly denote relatively gracile crania with posteriorly-angled frontals, poorly-developed sagittal cresting, downward-sloping posterior calvarium and a low occipital termination. The sampled Alpine and Desert groups exhibit a near-identical range of PC2 values. As the development of the sagittal cresting, calvarium shape and occipital prominence are related to age and sex, with these traits tending to be more robust and well-developed in males and older dingoes [64], the shared PC2 values across Alpine and Desert groups likely reflect related demographic variation within the respective populations. Within each population (Alpine, Central Desert, Western Desert), males and females overlapped in in their position along PC2 (Supplementary Fig. 9), indicating an absence of strong dimorphism associated with the major axes of shape variance. Despite considerable overlap, PC2 scores tended to be lower in females compared to males in the Alpine and Western Desert populations (see Supplementary Fig. 9, Supplementary Table 3).
The regression of cranial shape (Procrustes shape variables) on log centroid size (Procrustes shape variables ∼ log(centroid size)) revealed that size contributed significantly to shape variance in the sample (3.91% variance, p <0.001). Size was found to have a non-significant effect on the morphological trajectory described by PC1, which separates Alpine and Desert dingo populations (Fig. 1C), with only 1.23% of related shape-change predicted by centroid size (p = 0.124). Conversely, size predicted 19.88% of shape-change associated with PC2 (p < 0.0001). Alpine and Desert dingo populations share overlapping scores along PC2, and variation along this axis reflects intra-population variability in demographic makeup (age, sex) that should be expected within a natural population. As such, size differences play very little to no role in determining Cooinda’s morphological relationship to Desert dingoes but are important to her position in the Alpine group (Supplementary Fig. 10BC). The low proportion of variation captured in each principal component is a previously-noted feature of the dingo cranial landmark dataset [65] and is unrelated to allometry.
Brain imaging
To supplement the morphological data, we quantified brain size. Using a thresholding approach, we used the software 3D Slicer [66] to segment the whole brain as the region of interest. Despite the canids being of very similar size the dingo brain (75.25cm3) was 20% larger than the dog brain (59.53 cm3) (Fig. 5B).
Discussion
Domestication has received much attention from diverse fields, reflecting the complexity of the process and variation in its duration and intensity [5]. A notable gap in our understanding of the principles of domestication has been the identification of a model system to test Darwin’s two-step predictions [2]. Here we provide the necessary groundwork to explore the potential for dingoes to be a functional and evolutionary intermediate between wild wolves and domestic dogs. One alternate hypothesis is that the process of domestication does not proceed in a stepwise manner [4], but is continual process that represents an intensification of the relationship between a wild species and humans [5].
In this study we compare our high-quality chromosome-level de novo assembly of the dingo Cooinda genome with that of the Desert dingo [6], seven domestic dogs [27, 28, 36, 38-40] and the Greenland Wolf [41]. Relative to the wolf and the domestic breeds the Australasian dingo ecotypes are monophyletic. Future studies may include ancient dingo and south east Asian specimens [3], the New Guinea Singing dog [4] and Chinese indigenous dogs [4].
Ancient specimens have potential to give insight into the evolutionary history of dingoes [3] and further instruct the influence of domestic dog admixture [17]. New Guinea Singing Dog may be the sister group to a monophyletic dingo lineage or perhaps more closely related to the Alpine ecotype as suggested by the mtDNA network analyses [19] and cranial shape studies [65]). Inclusion of Chinese indigenous dogs will facilitate determination of the relationships among crown domestic dog breeds [4] and thereby facilitate determination of the divergence date of dingoes and modern dogs.
Multiple large scale chromosomal inversions occur between the two dingo assemblies. There are two large rearrangements on chromosome 16 and likely structural events on Chromosomes 11, 25 and 26 (Supplementary Figs 7, 8). It is also possible that there are multiple small inversions on the X chromosome. It is important to determine the frequency of these events and whether breakpoints affect any regulatory regions or protein coding genes.
Inversions may maintain locally adapted ecotypes, while breakpoints may disrupt regulatory regions or protein coding genes. Hager et al. [67] discovered a 41-megabase chromosomal inversion that characterized defining traits of deer mice (Peromyscus maniculatus) and implicated divergent selection in maintaining distinct ecotypes in the wild despite high levels of gene flow. An inversion disrupting FAM134b has been associated with sensory neuropathy in Border Collie dogs [68].
There is a single copy of AMY2B in both dingo genomes; however, they differ by a 6.4 kb retrotransposon insertion present in the Desert dingo. As the retrotransposon is absent in the Greenland wolf and Alpine dingo it would seem likely that the retrotransposon has inserted into the Desert dingo and domestic dog lineages independently. LINE elements can generate duplications through an RNA intermediate and have been associated with amylase expansions in a range of species from humans to mice and rats to dogs [69, 70]. A 1.3kb canid-specific LINE element in domestic dogs is associated with each amylase copy [70].
This expansion is predicted to increase the ability to digest starch [6, 71]. Field et al. [28] compared the influence of AMY2B copy number on the microbiomes of dingoes and German Shepherd dogs. They observed distinct and reproducible differences that they hypothesized may influence feeding behaviors. Further studies on AMY2B may be fruitful as copy number may be an ecologically relevant mechanism to establish the role of a canid in the ecosystem.
Both dingo ecotypes exhibited low variation on the X chromosome, although it could be argued that variation along the chromosome is not uniform (Fig. 2). Theoretical models predict that genes on the X chromosome can have unusual patterns of evolution due to hemizygosity in males. Sex chromosomes are predicted to exhibit reduced diversity and greater divergence between species and populations compared to autosomes due to differences in the efficacy of selection and drift in these regions [72, 73]. In canids, Plassais et al. [74] show genetic variation in three genes on the X chromosome is strongly associated with body size. Further studies of genetic variation of genes on the X chromosome within and between ecotypes are likely informative.
We integrate the mtDNA genome assembly data with that previously collected from 29 canids in Australasia [6, 22, 54-56]. The mitochondrial genome has been used to infer historical events in various species including canids, but the D-loop region has been difficult to align. Here we show that the region can be folded to increase structural stability with repeat number (Supplementary Fig. 8AB). We found 28, 10-bp repeats in dingo Cooinda compared to 29 in the Desert dingo. The function of the proposed structures is unknown.
Still, folding the region into an extended repeat-dependent stem is expected to decrease the time the DNA in the D-loop is single-stranded during replication. More speculatively, the structure may have a regulatory function that influences mitochondrial bioenergetics and the evolution of mtDNA [75]. Björnerfeldt et al. [76], found that domestic dogs have accumulated nonsynonymous changes in mitochondrial genes at a rate faster than wolves implying a relaxation of selective constraint during domestication.
Phylogenetic and network analyses show that dingo Cooinda has the dingo southeastern Australian mtDNA type of the canine A1b4 subhaplogroup. This southeastern type has been proposed to originate in southern China and includes dogs from Papua New Guinea [19, 22]. Based on mtDNA data, Zhang et al. [19] propose that the TMRCA for most dingoes dates to 6,844 years ago (8,048–5,609 years ago). This estimate is about 3,000 years older than the first known fossil record [77] suggesting that at least two dingo mtDNA haplotypes colonized Australia or older fossil records of dingoes in Australia have yet to be found.
Next, we compare the regulatory landscape of Cooinda dingo with that previously published for the Desert dingo. In comparison to the Alpine dingo, the glucagon receptor gene GCGR and HDAC4 are hypermethylated in the Desert dingo suggesting the potential for dietary or immune differences between ecotypes. Highly methylated gene promoters often indicate a transcriptionally repressed state, while unmethylated gene promoters specify a permissive state [78]. Field et al. [6] previously proposed differences in the feeding behavior of dingoes and wild dogs linked to their AMY2B copy number. GCGR is activated by glucagon and initiates a signal transduction pathway that begins with the activation of adenylate cyclase, which in turn produces cyclic AMP. Glucagon is considered the main catabolic hormone of the body and is central to regulating blood glucose and glucose homeostasis [79]. In mice, glucagon has anti-inflammatory properties [80]. HDAC4 is a member of the ubiquitously important family of epigenetic modifier enzymes and has been implicated in processes related to the formation and function of the central nervous system and metabolism. HDAC4 acts as a regulator of pattern-recognition receptor signaling and is involved in regulating innate immune response [81]. In humans, mutations in HDAC4 have been linked with eating disorders [82]. Overlapping conserved Nanopore/PacBio structural variants with these genes identified no variants within GCGR and a single 35bp intronic insertion in HDAC4. The functional impact (if any) of this insertion is unknown.
Dingo Cooinda’s cranial morphology is consistent with the Alpine ecotype from the 20th century. As the first cranial morphological assessment of an Alpine dingo considered to be “pure” by genomic verification, this result is significant in that it suggests that the phenotypic distinctiveness of Alpine dingoes from Desert dingoes is not exclusively the result of recent domestic dog ancestry. Dog admixture has been the predominant explanation given [83] primarily based on the fact that such ancestry is relatively enriched in the southeast region of Australia compared to the north and west [84, 85]. An alternative explanation is that the Alpine and Desert dingoes represent distinct evolutionary lineages. Koungoulos [65] suggested that the cranial shape of Alpine and other southeastern dingoes shares broad similarities with that of New Guinea Singing Dogs and is distinct from the more widespread northwestern lineage [22]. However, these two scenarios are not mutually exclusive. Most introgression likely occurs when a female dingo mates with a male domestic dog. In such cases, extensive backcrossing will not exclude the domestic dog Y. Therefore, examining the Y chromosome of males shown to be pure with the current battery of nuclear-encoded microsatellites will illuminate genetic history. A combination of direct radiocarbon dating, genetic sequencing and morphometric assessment for subfossil material will provide a more confident picture of the nature of change or continuity between ancient and modern Alpine dingoes.
Finally, we supplement our morphological data with magnetic resonance and computed tomography data of Alpine dingo Cooinda’s brain. Her brain was 20% larger than the similarly sized domestic dog, which is consistent with the hypothesis that she was tamed but not domesticated [3] (Fig. 1C). Our brain imaging data are also compatible with prior comparisons that have used endocranial volume as a proxy for brain size, examining a small sample of dingoes (see Geiger et al. [86]) compared to wolves, domestic, basal and archaeological dogs [3]. Endocranial volume in a mixed sample of domestic dogs was shown to be around 30 cm3 smaller than in wolves and jackals [87, 88], which is greater than the 15.7 cm3 difference between the brains of Cooinda and the domestic dog sampled here.
Similarly, brain mass has been shown to be 28.8% smaller in a broad sample (>400) of domestic dogs as compared to wolves [87, 89], which also places the 20% difference between Cooinda and the domestic dog as less pronounced than is seen for comparisons with the wild counterpart (wolf). Brain size reductions are common among domesticated animals compared to their wild counterparts, having been observed across many species, including sheep, pigs, cats, and dogs [87, 90]. Smaller-sized brains, especially size reductions in regions of the forebrain involved in the fight-or-flight response, have been associated with tameness and reductions in fear-based response among domestic animals compared to wild animals [91].
These changes have also been linked to potential reductions in cognitive processing requirements associated with inhabiting anthropogenic environments with lower complexity [92, 93]. Moreover, brain size reductions appear to persist where domestic animals have re-entered a wild environment and exist as feralized animals, at least under certain circumstances [94-96], suggesting that prolonged past exposure to the human niche may be detectable in brain traits. An alternative hypothesis is that differences in brain size is due to environmental adaptation or perhaps Cooinda was an anomaly. Examination of brain size may represent a fruitful pathway for further investigation determining the status of the dingo as a potential feralized animal.
There are at least three possible explanations supporting the existence of two dingo ecotypes (Alpine and Desert). The first is they are ancient Asian lineages that have come into sympatry in Australia. One alternate hypothesis is that a single lineage spread through southeast Asia and then diverged in Australia. There are no major geographical divides in continental Australia, suggesting any differences may reside at the level of biological interactions or they are influenced by climate. In the former case, one possibility is that one or more inversions may maintain the ecotypes [67]. An intriguing alternate hypothesis is that responses to parasites or venomous animals may occur if there are genetic differences in the responses of the ecotypes. In Nigeria, population genomic analyses of 19 indigenous dogs identified 50 positively selected genes including those linked immunity that likely involve adaptations to local conditions [97]. Experimentally it has been shown that adaptation to different parasites or snakes can influence the invasion success of three-spined sticklebacks (Gasterosteus aculeatus) and may represent a barrier to gene flow, even between closely related connected populations [98]. In Australia, various parasites and venomous animals have broadly similar distributions to the Alpine ecotype, such as the paralysis tick (Ixodes holocyclus) and the red-bellied black snake (Pseudechis porphyriacus) [99].
Conclusions
Here we characterize dingo Cooinda and propose that she be considered the archetype for Australasian dingoes. Characterizing an archetype opens potential for testing Darwin’s [2] two-step model of domestication as an alternative to the hypothesis that domestication represents a continuum [5]. Under the scenario that the dingo has been unconsciously selected, we predict genomic signatures of tameness, as an outcome of unconscious selection [100-102]. Morphologically, we predict lowest shape variation in the rostrum and facial skeleton in the wolf (natural selection), intermediate in the dingo (unconscious selection) and highest in domestic breeds (artificial selection) (i.e., rank order wolf< dingo < modern breeds). Wild populations are more likely to show a narrow range of shape variation about a fitness optimum, whereas changed environmental conditions could support and promote the survival of forms that are farther from the adaptive peak. This is evidenced by earlier research that has shown cranial morphological variation in domestic dogs exceeds that exhibited by the Order Carnivora [26]. In terms of brain size, we predict the magnitude of relative brain size difference will be greater between dingoes and modern breeds than between wolves and dingoes (i.e., rank order wolf> dingo >> modern breeds). Brain size reduction is pronounced in artificial selection and associated with the lack of fear avoidance behavior in domesticates [103]. Dingoes do not show domesticate level reductions in ‘fight or flight’ response [29], and our initial data appear to be at least consistent with this based on the relative brain volume we report.
Methods
Sampling: Cooinda the dingo
In selecting an animal for the project, it was considered essential to select an individual that represented the Alpine ecotype, which is found around Sydney, New South Wales (NSW). The individual selected was bred at the Dingo Sanctuary Bargo, NSW, approximately 100km west of Sydney, and has been included in multiple previous studies [6, 29]. Cooinda is the litter sister to Typia from whom short read data had previously been obtained [54]. Cooinda’s parents (Mirri Mirri and Maka), her brothers Typia and Gunya and her were all ginger in color and determined to be pure by microsatellite testing [104]. Mirri Mirri and Maka were independently found in the Alpine region of New South Wales.
An aim of the study is to link genetic and morphological variation, so we provide a brief description of her here. As is typical of Alpine dingoes Cooinda was stocky in appearance with a brad skull and prominent eyes. She was light ginger in color, with dark brown eyes with white paws and chest (Fig. 1AB). Her double coat was not oily like many modern breed dogs and did not have a dog-like odor when wet. She had a pointed muzzle with a broad skull and hooded erect ears. She could turn her neck 180 degrees in any direction. She had lean muscular legs with a long bottle-shaped bushy tail. She weighed 22kg and stood 46cm at the withers. She did not have dewclaws and came into estrus annually. Dingo Cooinda had a loud and clear howl and did not have a modern-dog bark [105]. Cooinda died in 2019 at 10 years of age.
Chromosome-level genome assembly
DNA extraction and sequencing
Genomic DNA for the Pacific Bioscience Single Molecule Real-Time (SMRT) sequencing was prepared from 2 mL of fresh blood using the genomic-tip 100/G kit (Qiagen, Hilden, Germany). This was performed with additional RNase (Astral Scientific, Taren Point, Australia) and proteinase K (NEB, Ipswich, MA, USA) treatment following manufacturer’s instructions. Isolated gDNA was further purified using AMPure XP beads (Beckman Coulter, Brea, CA, USA) to eliminate sequencing inhibitors. DNA purity was calculated using a Nanodrop spectrophotometer (Thermo Fisher Scientific). Molecular integrity was assessed by pulse-field gel-electrophoresis using the PippinPulse (Sage Science) with a 0.75% KBB gel, Invitrogen 1kb Extension DNA ladder and 150 ng of DNA on the 9hr 10-48kb (80V) program. SMRTbell libraries with 20kb insert size were CLR sequenced on Sequel I machines with 2.0 chemistry. Sequencing included 18 SMRT cells with a total polymerase read length 94.25 Gb.
DNA for the Oxford Nanopore (ONT) PromethION sequencing DNA (1 µg) was prepared for ONT sequencing using the 1D genomic DNA ligation kit (SQK-LSK109, ONT) according to the standard protocol. Long fragment buffer was used for the final elution to exclude fragments shorter than 1000 bp. In total, 119 ng of adapted DNA was loaded onto a FLO-PRO002 PromethION flow cell and run on an ONT PromethION sequencing device (PromethION, RRID:SCR_017987) using MinKNOW (18.08.2) with MinKNOW core (v1. 14.2). Base-calling was performed after sequencing with the GPU-enabled guppy basecaller (v3.0.3) using the PromethION high accuracy flip-flop model with config ‘dna_r9.4.1_450bps_hac.cfg’.
For the 10X Genomics Chromium sequencing, DNA was prepared following the protocol described above for SMRT sequencing. A 10X GEM library was barcoded from high-molecular-weight DNA according to the manufacturers recommended protocols. The protocol used was the Chromium Genome Reagent Kits v2 (Document # CG00043 revision B). QC was performed using LabChip GX (PerkinElmer, MA, USA) and Qubit 2.0 Flurometer (Life Technologies, CA, USA). The library was run on a single lane of a v2 patterned flowcell. Sequencing was performed in 150bp paired-end sequencing mode on a single lane on the Illumina HiSeq X Ten platform with a version 2 patterned flowcell.
For the Bionano optical mapping high molecular weight (HMW) DNA was isolated from fresh blood (stored at 4°C) using the Bionano Prep Blood DNA Isolation Protocol following [28]. HMW DNA (∼190 ng/µL) was labelled (BNG, Part #20351) at DLE-1 recognition sites, following the Bionano PrepTM Direct Label and Stain Protocol (BNG, Document #30206 revision C). Labelled DNA was loaded directly onto Bionano Saphyr Chips (BNG, Part #20319), without further fragmentation or amplification, and imaged using a Saphyr instrument to generate single-molecule optical maps. Multiple cycles were performed to reach an average raw genome depth of coverage of 180X.
For the Hi-C sequencing the assembly was scaffolded to chromosome-length by the DNA Zoo following the methodology described here: www.dnazoo.org/methods. Briefly, an in situ Hi-C library was prepared [106] from a blood sample of the same female and sequenced to 29X coverage (assuming 2.6 Gb genome size).
Workflow
For the initial assembly, The SMRT and ONT reads were corrected and assembled with the Canu assembler (Canu, RRID:SCR_015880; v1.8.0) [31] with the command “canu correctedErrorRate=0.105 corMhapSensitivity=normal corOutCoverage=100 -p Cooinda -d assembly genomesize=2.3g -pacbio-raw Cooinda_SMRT_ONT_combined.fasta. The resulting contigs were polished with two rounds of the Arrow pipeline, each consisting of aligning the raw SMRT reads to the assembly with pbmm2 (https://github.com/PacificBiosciences/pbmm2) and correcting the sequencing errors using gcpp [32].
The Arrow-polished SMRT/ONT assembly was scaffolded using Alpine dingo 10X linked-reads as in ARCS [107]. The 10X data was aligned using the linked-read analysis software provided by 10X Genomics, Long Ranger, v2.1.6 [108]. Misaligned reads and reads not mapping to contig ends were removed, and all possible connections between contigs were computed keeping best reciprocal connections. Finally, contig sequences were joined, spaced by 10kb with stretches of N’s, and if required reverse complemented.
To further improve the assembly, another round of polishing was performed by aligning the Illumina short reads from the 10X Chromium sequencing to the assembly using minimap2 [109] (v2.16) and correcting the sequencing errors using Racon (Racon, RRID:SCR_017642; v1.3.3) [110].
The Hi-C data was processed using Juicer (Juicer, RRID:SCR_017226) [111], and used as input into the 3D-DNA pipeline [112] to produce a candidate chromosome-length genome assembly. We performed additional curation of the scaffolds using Juicebox Assembly Tools [113].
After scaffolding and correction, all raw SMRT and ONT reads were separately aligned to the assembly with Minimap2 (v2.16) (-ax map-pb/map-ont) [109]. The combined alignments were used by PBJelly (pbsuite v.15.8.24) [114] for one round of gap filling.
Following scaffolding, another round of polishing was done to further improve the assembly. Polishing was performed by aligning the Illumina short reads from the Chromium sequencing to the assembly using Long Ranger v2.2.2 and correcting the SNVs and indels using Pilon (Pilon, RRID:SCR_014731) [33].
The Pilon-polished genome underwent a final scaffold clean-up using Diploidocus as described in Edwards et al. [27] to generate a high-quality core assembly, remove low-coverage artefacts and haplotig sequences, and filter any remaining vector/adapter contamination. This reduced the final number of scaffolds to 632 (780 contigs), including the mtDNA.
Assembly completeness was evaluated using BUSCO v5.2.2 [37] short mode against the Carnivora_ob10 data set (n=14,502) implementing BLAST+ v2.11.0 [115], HMMer v3.3 [116], and Metaeuk v20200908 [117]. “Complete” BUSCO genes with available sequences were compiled across Alpine dingo Cooinda and nine canid genomes (Desert dingo [6], two Basenji’s (China and Wags) [27], two German shepherd dogs (Nala and Mischa) [28, 36], Great Dane [38], Labrador [39], Dog10K Boxer [40], and Greenland Wolf [41]) using BUSCOMP v1.0.1. Additional kmer-based assembly completeness and quality evaluations were performed using Merqury v21.3 [42] from the 10x reads.
Chromosome mapping and variation
Chromosome mapping was completed in 2019 using the CanFam v3.1 reference genome downloaded from Ensembl (GCF_000002285.3 [118]). Full length chromosomes were renamed with a CANFAMCHR prefix and used for reference mapping. The final Cooinda Alpine dingo genome assembly was mapped onto the CanFam3.1 reference genome using Minimap2 v2.16 [109] (-x asm5 --secondary=no --cs) to generate PAF output. Scaffolds were assigned to CanFam3.1 chromosomes using PAFScaff v0.2.0 [119] based on Minimap2-aligned assembly scaffold coverage against the reference chromosomes. Scaffolds were assigned to the chromosome with highest total coverage. Scaffolds failing to map onto a chromosome were rated as “Unplaced”.
Comparison of Alpine and Desert dingo genomes
To investigate the variation between the dingo ecotypes we used Circos [43]. Circos uses a circular ideogram layout to facilitate the display of relationships between the genomes using ribbons, which encode the position and number of SNV’s, small indels and large indels for each of the 38 autosomes and the X chromosome. SNV and indel numbers were calculated using MUMmer4 ‘show-snp’ script following pairwise alignments [44] (v4.0.0 beta 2).
Synteny plot between the Alpine and published Desert dingo assembly [6] was conducted using GenomeSyn [47]. With GenomeSyn the position of the genome is indicated by a black horizontal ruler with tick marks. Syntenic blocks between the genomes are displayed as light grey regions with white illustrating non-syntenic regions. Inversions are represented by red-brown curves.
We used GeMoMa v1.6.2beta [48] to further investigate whole chromosomal events. Here we mapped genes onto the Alpine Dingo assembly following previously described protocols [28]. Subsequently, we checked the synteny of the genes in the reference genome and the target genome using the module GeMoMa module SynthenyChecker. This module uses the GeMoMa annotation with information for reference gene and alternative to determine the best homolog of each transcript. Comparing the order of genes in the reference and the target genome, it allows to determine breakpoints of chromosomal events.
Phylogenetic analyses
All 39 full-length chromosomes in the final assembly were aligned to the corresponding chromosomes in nine published canine de novo genome assemblies (Desert dingo [6], two basenjis (China and Wags) [27], two German shepherd dogs (Nala and Mischa) [28, 36], Great Dane [38], Labrador [39], Dog10K Boxer [40], and Greenland Wolf [41]) using MUMmer4 [44]. SNVs and small indels (deletions and insertions <50bp) were called using MUMmer4 call-SNPs module for all possible pairings (Supplementary Table 2). Copy number (CNV) and SVs were also called using svmu (v0.2) [120] however these were not included in the phylogeny. SNV’s and indels were analyzed separately. Distance matrices were generated from the inter-canid differences in SNV’s and indels and then transformed to WA distance [49]. Glazko et al. [49] report WA has better phylogenetic properties against normalization of genome sizes than other coefficients.
Phylogenetic analyses using maximum parsimony were generated from the R-package ‘phangorn’ version 2.8.1 [121]. The analyses were run as unrooted networks to test the hypothesis that the wolf was the outgroup. To test the stability of the nodes, a Bayesian bootstrap was applied to the original distance matrix using the program bayesian_bootstrap (github.com/lmc2179/bayesian_bootstrap) and the phylogenetic analysis was re-calculated. This process was iterated 500,000 times. The consensus phylogenetic trees were rooted on the branch leading to wolf, the values indicate the percentage of times that a node occurred. The Y-axis and branch lengths were rescaled to the original number of differences in SNV’s and indels among the taxa. The retention index that measures the fit of the network to the distance matrix exceeded 94% for all 500,000 trees of SNVs and indels.
Non-metric multidimensional scaling (NMDS) was calculated from the distance matrices and scores for the taxa calculated from the largest two axes. Minimum spanning trees were calculated among the scores in NMDS space. NMDS and minimum spanning trees were calculated in Past 4.04 [122].
Mitochondrial genome
Genome assembly workflow
A 46,192 bp contig from the assembly mapped onto the CanFam reference mtDNA (NC_002008.4), constituting a repeat of approx. 2.76 copies of the mtDNA. The CanFam mtDNA was mapped onto this contig using GABLAM v2.30 [123] and full-length mtDNA copy with highest similarity to CanFam mtDNA was extracted along with 8 kb each side. PacBio reads were mapped onto this mtDNA contig using minimap2 v2.22 [109] and 10x linked reads mapped using BWA v0.7.17 [124] for polishing with HyPo v1.0.3 [125] (32.7 kb assembly size at 673X coverage). The CanFam mtDNA was re-mapped onto the polished assembly using GABLAM v2.30.5 [123] and a 16,719 bp sequence extracted, starting at position 1 of the CanFam sequence. The mtDNA was annotated with the MITOS2 server [126] for submission to NCBI GenBank (accession: OP476512).
Comparison of dingo mtDNA genomes
The mtDNA genome of Alpine dingo Cooinda was compared with the Desert dingo [6]. Direct observation of the D-loop region in the two dingoes suggested there was a 10bp repeat and the canids differed in the number of repeats. Imperfect tandem repeats have previously been reported in canids [50]. The D-loop region in Alpine dingo Cooinda was folded using the program mfold [52] to determine ay underlying structures.
To test whether the mtDNA from dingo Cooinda fell within the previously described SE clade we compared the assembly with 33 other canids, including dogs from New Guinea and Taiwan [6, 22, 54, 55]. In this case multiple large gaps were in some of the ancient samples, so the initial assembly was modified based on the predicted secondary structure folding. A inter neighbor-joining network analysis with α = 0.5 was completed in POPART [53]. A limitation of this analyses is that large sections of multiple mtDNA’s were unknown, so it was not possible to distinguish deletions from missing data. Understanding these differences may be biologically important, particularly if the predicted folding of the D-loop region is biologically significant.
DNA methylome
MethylC-seq library preparation
Genomic DNA was extracted from whole blood using DNeasy Blood & Tissue kit (Qiagen, USA). MethylC-seq library preparation was performed as described previously [127].
Briefly, 1 ug of genomic DNA was sonicated to an average size of 300 bp using a Covaris sonicator. Sonicated DNA was then purified, end-repaired and 3’-adenylated followed by the ligation of methylated Illumina TruSeq sequencing adapters. Library amplification was performed with KAPA HiFi HotStart Uracil+ DNA polymerase (Millenium Science Pty Ltd).
MethylC-seq data analysis
The methylome library was sequenced on the Illumina HiSeq X platform (150 bp, PE), generating 377M reads. Sequenced reads in fastq format were trimmed using the Trimmomatic software (ILLUMINACLIP:adapter.fa:2:30:10 SLIDINGWINDOW:5:20 LEADING:3 TRAILING:3 MINLEN:50). Trimmed reads were mapped (GCA_012295265.2_UNSW_AlpineDingo_1.0_genomic.fna genome reference, containing the lambda genome as chrLambda) using WALT with the following settings: -m 10 -t 24 -N 10000000 -L 2000. Mapped reads in SAM format were converted to BAM format; BAM files were sorted and indexed using SAMtools. Duplicate reads were removed using Picard Tools v2.3.0. Genotype and methylation bias correction were performed using MethylDackel (MethylDackel extract dingo_lambda.fasta $input_bam -o $output --mergeContext -- minOppositeDepth 5 --maxVariantFrac 0.5 --OT 10,140,10,140 --OB 10,140,10,140). The numbers of methylated and unmethylated calls at each genomic CpG position were determined using MethylDackel (MethylDackel extract dingo_lambda.fasta $input_bam -o output –mergeContext). Segmentation of hypomethylated regions into CpG-rich unmethylated regions (UMRs) and CpG-poor low-methylated regions (LMRs) was performed using MethylSeekR (segmentUMRsLMRs(m=meth, meth.cutoff=0.5, nCpG.cutoff=5, PMDs = NA, num.cores=num.cores, myGenomeSeq=build, seqLengths=seqlengths(build), nCpG.smoothing = 3, minCover = 5).
Cooinda UMR coordinates were converted to the Desert dingo genome assembly using LiftOver following genomewiki.ucsc.edu pipeline (http://genomewiki.ucsc.edu/index.php?title=Minimal_Steps_For_LiftOver). Briefly, the query (Desert dingo) genome build was split into individual scaffolds using faSplit (i). The we performed pairwise sequence alignment of query sequences from (i) against the Cooinda genome build using BLAT, Then, coordinates of .psl files were changed to parent coordinate system using liftUp and alignments were chained together using axtChain. Chain files were combined and sorted using chainMergeSort; alignment nets were made using chainNet.
Finally, liftOver chain file was created using netChainSubset. Cooinda UMRs in .bed format were lifted over to Desert dingo genome assembly using created liftOver chain file. Average methylation was calculated for Cooinda UMRs and compared to that of corresponding lifted-over regions in the Desert dingo genome. Cooinda UMRs with >50% methylation increase in Desert dingo genome were considered as hypermethylated in the Desert dingo.
Morphology
Skull Morphometrics
To examine cranial morphology, we obtained a 3D model of Cooinda’s cranium using an Artis Pheno Computed Tomography (CT) Scanner. The skull was damaged slightly when the brain was extracted, so the damaged region (dorsal part of the calvarium) was reconstructed using Blender to reassemble the separated fragment following guidelines for digital specimen reconstruction outlined by Lautenschlager [128] (Supplementary Fig. 10A). Geometric morphometric landmarks (n=45) were collected on the 3D cranial model using Stratovan Checkpoint (Stratovan Corporation, Davis, CA version 2018.08.07) and analyzed with MorphoJ [129], following the landmarking protocol used for dingo crania by Koungoulos [65]. This approach uses 45 landmarks along the left side of the cranium, covering all major anatomical features and regions, excepting a few fragile processes which are frequently lost in prepared specimens (Supplementary Fig. 11; Supplementary Table 4). The cranial landmarks collected on the Cooinda cranium were incorporated into an existing data set comprising 91 Alpine dingoes and 101 Desert dingoes [65] and subject to Procrustes superimposition to remove all non-shape differences, due to translation, rotation and scaling [130]. The resultant Procrustes shape variables were ordinated using Principal Component Analysis (PCA) to assess the cranial morphology of Cooinda in relation to other dingoes. To assess the impact of allometry on cranial shape variation in the sample, a regression of Procrustes shape variables against log centroid size was performed using MorphoJ [129].Residuals were extracted from this regression and ordinated using PCA (see Supplementary Material).
Brain imaging
Cooinda’s brain and that of a domestic dog (Kelpie) of the same body size were extracted. Brains of these animals, which died within 2 weeks of each other, were fixed in Sigma-Aldrich 10% Neutral Buffered Formalin (NBF) after extraction and were washed with Gd DTPA (gadolinium-diethylenetriamine pentaacetic acid) solution prior to imaging. Brains were scanned using high-resolution magnetic resonance imaging (MRI). A Bruker Biospec 94/20 9.4T high field pre-clinical MRI system was used to acquire MRI data of a fixed dingo and domestic dog brain. The system was equipped with microimaging gradients with a maximum gradient strength of 660mT/m and a 72mm Quadrature volume coil. Images were acquired in transverse and coronal orientation using optimized 2D and 3D Fast Spin Echo (FSE) and Gradient Echo (MGE) methods. Image resolution was 200×200×500 and 300×300 microns isotropic for type 3D and 2D pulse sequences, respectively. To quantify brain size, we used the open-source software 3D Slicer “Segment Statistics” module [66]. The software considers the pixel spacing and slice thickness set to calculate the volume accurately. The threshold was empirically set to the grayscale intensity 1495, where everything below that is background, and ventricles and everything above that is the brain.
Availability of supporting data and materials
The chromosomal assembly is available at GCA_012295265.
The mtDNA and has been submitted to NCBI GenBank (accession: OP476512). The methylation data is available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE212509. The 3D Cranial landmark data are available on Figshare at https://figshare.com/articles/dataset/Cooinda_Alpine_Dingo_3D_Cranial_Landmarks/205230
The raw Dicom data for the magnetic resonance imaging (MRI) of the Alpine dingo and domestic dog brain are available on Figshare at https://figshare.com/articles/dataset/Dicom_data_MRI_Alpine_dingo_and_domestic_dog_bra in/20514693.
Ethics approval and consent to participate
All experimentation was performed under the approval of the University of New South Wales Ethics Committee (ACEC ID: 16/77B).
Competing interests
The authors declare that they have no competing interests.
Funding
This work was supported by an Australian Research Council Discovery award to J.W.O.B. (DP150102038). M.A.F. is funded by NHMRC APP5121190. M.A.F. is supported by a National Health and Medical Research Council fellowship (APP5121190). L.A.B.W. is supported by an Australian Research Council Future Fellowship (FT200100822). E.L.A. was supported by the Welch Foundation (Q-1866), a McNair Medical Institute Scholar Award, an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375), a US-Israel Binational Science Foundation Award (2019276), the Behavioral Plasticity Research Institute (NSF DBI-2021795), NSF Physics Frontiers Center Award (NSF PHY-2019745), and an NIH CEGS (RM1HG011016-01A1). Hi-C data were created by the DNA Zoo Consortium (www.dnazoo.org). DNA Zoo is supported by Illumina, Inc.; IBM; and the Pawsey Supercomputing Center. The Ramaciotti Centre for Genomics acknowledge infrastructure funding from the Australian Research Council (LE150100031), the Australian Government NCRIS scheme administered by Bioplatforms Australia, and the New South Wales Government RAAP scheme.
Author contributions
JWOB coordinated the project and wrote the initial draft. MAF performed variation analyses. BDR and RJE performed and assisted with the genome assembly, polishing and KAT analysis. LABW and LGK undertook cranial imaging and LGK collected cranial morphometric data. The DNA Zoo initiative, including OD, AO, EA performed and funded the Hi-C experiment. OD and ELA conducted the Hi-C analyses. BC performed the phylogenomic analyses. JK performed the GeMoMa analyses including gene order predictions. OB and KS performed and funded the whole genome bisulphite sequencing and analysis. Eva Chan and Vanessa Hayes collected the Bionano data and performed the analyses. Rob Zammit obtained the initial blood samples and extracted the brain. All authors edited and approved the final manuscript. All authors edited and approved the final manuscript.
Acknowledgements
Comments from four reviewers improved the manuscript. We would like to thank Luci Ellem, and Dingo Sanctuary Bargo for providing frequent access to Cooinda. Picture of Cooinda was taken by Luci Ellem. Staff at the Vineyard Veterinary Hospital provided constant encouragement. Mike Archer suggested the usage of the term “archetype” and we thank him for valuable taxonomic discussions. Richard Melvin conformed the purity of Cooinda using microsatellites. We thank Shyam Gopalakrishnan and Simon Ho for discussions and Hauke Koch for assistance with translation. SMRT sequencing was conducted at the Ramaciotti Center for Comparative Genomics at University of New South Wales (UNSW). The ONT, 10X Chromium and Bionano genomics data were collected within the Kinghorn Centre for Clinical Genomics at the Garvan Institute of Medical Research, Sydney, Australia and the Hi-C data at Baylor College of Medicine. The high field pre-clinical MRI system was located at the Biological Resources imaging Laboratory at UNSW. Thanks to Jiaming Song for the GenomeSyn analyses, Mihwa Lee for help with DNA folding and Tim Smith for synteny plots. Bootstrapping was on the Wesleyan computing cluster. Thanks go to the facilities of Sydney Imaging at the University of Sydney, and the expertise of Pranish Kolakshyapati in generating the Artis Pheno CT scans of Cooinda’s cranium. Finally, we thank Sandy Ingelby and Harry Parnaby of the Australian Museum for their assistance in facilitating scans of Cooinda’s cranium.
Footnotes
matt.field{at}jcu.edu.au
Richard.edwards{at}unsw.edu.au
Laura.Wilson{at}anu.edu.au
lkou2342{at}uni.sydney.edu.au
ben.rosen{at}usda.gov
B.chernoff{at}wesleyan.edu.
Olga.Dudchenko{at}bcm.edu, erez{at}erez.com, arinaomer{at}gmail.com,
Jens.keilwagen{at}julius-kuehn.de
k.Skvortsova{at}garvan.org.au, o.bogdanovic{at}gmail.com, eva.chan{at}health.nsw.gov.au, vanessa.hayes{at}sydney.edu.au,
razammit{at}me.com
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE212509
https://figshare.com/articles/dataset/Dicom_data_MRI_Alpine_dingo_and_domestic_dog_brain/20514693
Abbreviations
- BLAST
- Basic Local Alignment Search Tool
- BMG
- Bionano Genomics
- bp
- base pairs
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- CHD
- Canine hip dysplasia
- d.p
- decimal point
- CNV
- Copy number variant
- gDNA
- genomic DNA
- GSD
- German Shepherd Dog
- HMM
- hidden Markov model
- HME
- High Molecular Weight
- ONT
- Oxford Nanopore Technologies
- ORF
- open reading frame
- PacBio
- Pacific Biosciences
- PCR
- polymerase chain reaction
- qPCR
- quantitative polymerase chain reaction
- RNA-seq
- RNA sequencing
- s.f.
- significant figure
- SMRT
- single-molecule real time
- SNV
- single-nucleotide variant
- SV
- structural variant
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵