RT Journal Article SR Electronic T1 Analysis of genomic-length HBV sequences to determine genotype and subgenotype reference sequences JF bioRxiv FD Cold Spring Harbor Laboratory SP 831891 DO 10.1101/831891 A1 Anna L McNaughton A1 Peter Revill A1 Margaret Littlejohn A1 Philippa C Matthews A1 M Azim Ansari YR 2019 UL http://biorxiv.org/content/early/2019/11/08/831891.abstract AB Hepatitis B virus (HBV) is a diverse, partially double-stranded DNA virus, with 9 genotypes (A-I), and a putative 10th genotype (J), thus far characterised. Given the broadening interest in HBV sequencing, there is an increasing requirement for a consistent, unified approach to HBV genotype and subgenotype classification. We set out to generate an updated resource of reference sequences using the diversity of all genomic-length HBV sequences available in public databases. We collated and aligned genomic-length HBV sequences from public databases and used maximum-likelihood phylogenetic analysis to identify genotype clusters. Within each genotype, we examined the phylogenetic support for currently defined subgenotypes, as well as identifying well-supported clades and deriving reference sequences for them. An alignment of these reference sequences and maximum-likelihood phylogenetic trees of the sequences are provided to simplify classification. Based on the phylogenies generated, we present a comprehensive set of HBV reference sequences at the genotype and subgenotype level.