Abstract
As the pandemic SARS-CoV-2 virus has spread globally its genome has diversified to an extent that distinct clones can now be recognized, tracked, and traced. Identifying clonal groups allows for assessment of geographic spread, transmission events, and identification of new or emerging strains that may be more virulent or more transmissible. Here we present a rapid, whole genome, allele-based method (GNUVID) for assigning sequence types to sequenced isolates of SARS-CoV-2 sequences. This sequence typing scheme can be updated with new genomic information extremely rapidly, making our technique continually adaptable as databases grow. We show that our method is consistent with phylogeny and recovers waves of expansion and replacement of sequence types/clonal complexes in different geographical locations.
GNUVID is available as a command line application (https://github.com/ahmedmagds/GNUVID).
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Emails AMM: moustafaam{at}email.chop.edu
List of abbreviations
- WhatsGNU
- What is Gene Novelty Unit
- GNUVID
- Gene Novelty Unit-based Virus Identification
- ST
- Sequence Type
- CC
- Clonal Complex
- SARS-CoV-2
- Severe Acute Respiratory Syndrome Corona Virus 2
- COVID-19
- Corona Virus Disease 2019
- MLST
- Multilocus Sequence Typing
- cgMLST
- core genome MLST
- wgMLST
- whole genome MLST