Abstract
Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity.
Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%.
Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
rharris1{at}bcm.edu
raveendr{at}bcm.edu
lyfoung{at}wisc.edu
fritz.sedlazeck{at}bcm.edu
helmy.medhat{at}gmail.com
prall{at}wisc.edu
jakarl{at}wisc.edu
doddapan{at}bcm.edu
qingchang.meng{at}bcm.edu
yhan{at}bcm.edu
donnam{at}bcm.edu
rwwiseman{at}wisc.edu
dhoconno{at}wisc.edu
jr13{at}bcm.edu
Abbreviations
- ACE2
- angiotensin-converting enzyme 2
- BCM
- Baylor College of Medicine
- bp
- base pairs
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- BWA
- Burrows-Wheeler Aligner
- COVID-19
- coronavirus disease 2019
- EST
- expressed sequence tag
- FFPE
- formalin-fixed, paraffin-embedded
- Gbp
- gigabase pairs
- GC
- guanine-cytosine
- IFN
- interferon
- kbp
- kilobase pairs
- Mbp
- megabase pairs
- MQR
- Molecule Quality Report
- NCBI
- National Center for Biotechnology Information
- NEB
- New England BioLabs
- ng
- nanogram
- ONT
- Oxford Nanopore Technologies
- PCR
- polymerase chain reaction
- RBD
- receptor-binding domain
- RNA-Seq
- RNA-sequencing
- SARS-CoV-2
- severe acute respiratory syndrome coronavirus 2
- STAT2
- signal transducer and activator of transcription factor 2
- TMPRSS2
- transmembrane protease serine 2