PT - JOURNAL ARTICLE AU - Tyler J. Dougan AU - Stephen R. Quake TI - Viral Taxonomy Derived From Evolutionary Genome Relationships AID - 10.1101/322511 DP - 2018 Jan 01 TA - bioRxiv PG - 322511 4099 - http://biorxiv.org/content/early/2018/05/15/322511.short 4100 - http://biorxiv.org/content/early/2018/05/15/322511.full AB - We describe a new genome alignment-based model for classification of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the viruses are created from alignments using NCBI BLAST, and their match scores are combined to produce a metric between genomes, which is in turn used to determine a global classification using the 5,817 viruses on RefSeq. In cases where there is no measurable alignment between any genes, the method falls back to a coarser measure of genome relationship: the mutual information of k-mer frequency. This results in a principled model which depends only on the genome sequence, which captures many interesting relationships between viral families, and which creates clusters which correlate well with both the Baltimore and ICTV classifications. The incremental computational cost of classifying a novel virus is low and therefore newly discovered viruses can be quickly identified and classified.