ABSTRACT
The COVID-19 pandemic has ignited a broad scientific interest in coronavirus research. The identification of coronaviral species in natural reservoirs typically involves de novo assembly. However, existing genome, metagenome and transcriptome assemblers often are not able to assemble coronaviruses into a single contig. Coverage variation between datasets and within dataset, presence of close strains, and contamination set a high bar for assemblers to process datasets with diverse properties. We developed coronaSPAdes, a new module of the SPAdes assembler for RNA viral species recovery in general and coronaviruses in particular. coronaSPAdes leverages the knowledge about viral genome structures to improve assembly. We have shown that coronaSPAdes outperforms existing SPAdes modes and other popular short-read and viral assemblers in the recovery of full-length RNA viral genomes.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Added RNA viral assemblies (HIV, Influenza)