Abstract
Background The time required for sequencing and de novo assembly of genomes is highly dependent on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow. As a result, genome projects are often not only limited by financial, computational and sequencing platform resources, but also delayed by second party sequencing service providers. By bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities and know-how, we aimed at generating a high-quality mammalian de novo genome in the shortest possible time period. Therefore, we streamlined all processes involved and chose a very fast dog as a model: The Whippet.
Findings We present the first chromosome-level genome assembly of the Whippet. We used PacBio long-read HiFi sequencing and reference-guided scaffolding to generate a high-quality genome assembly. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. In addition, we used available mammalian genomes and transcriptome data to annotate the genome assembly. The annotation resulted in 28,383 transcripts resembling a total of 90.9% complete BUSCO genes and identified a repeat content of 36.5%.
Conclusions Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week and adds a high-quality reference genome to the list of domestic dog breeds sequenced to date.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Name E-Mail ORCID Christian Betz christian.betz{at}bioscientia.de
Alexander Ben Hamadou alexanderben.hamadou{at}senckenberg.de
Carola Greve carola.greve{at}senckenberg.de
Axel Janke axel.janke{at}senckenberg.de
Charlotte Gerheim charlotte.gerheim{at}senckenberg.de
List of abbreviations
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- CCS
- Circular Consensus Sequencing
- HiFi
- High Fidelity
- LINEs
- Long interspersed nuclear elements
- LR-WGS
- Long-Read Whole-Genome Sequencing
- LTR
- Long Terminal Repeat
- SFS
- Site Frequency Spectrum