Abstract
Precise estimation of genetic substitution patterns is critical for accurate reconstruction of pathogen phylogenies. Few studies of viral evolution account for mutational rate variation across a single gene. This is especially true when considering evolution of segmented viruses where individual segments are short, encoding for few proteins. However, the structural and functional partition of these proteins could provide valuable information for more accurate inference of viral evolution, due to the intense immune selection pressure on different functional domains. In this study we developed and evaluated a structurally informed partitioning scheme combined with an approximate codon substitution model that accounts for rate variation among immunogenic head and stalk domains of the surface protein hemagglutinin (HA) of influenza viruses. We evaluated the model fit with a Bayes factor, using path-sampling (PS) and stepping-stone sampling (SS) approaches to calculate the marginal likelihood estimation. We evaluated and compared 4 different models - HKY85, SRD06 codon, HKY85 plus functional partitioning, SRD06 plus functional partitioning on pandemic H1N1/2009, seasonal H3N2, B-Yamagata and Victoria lineages, and two highly pathogenic avian influenza A viruses H5Nx and H7N9. The Bayes factor tests showed that structurally informed partitioning with SRD06 performed better for all datasets with decisive support. Significantly faster nucleotide substitution rates for head domain, compared to stalk domain was observed and may provide insight for stalk derived universal vaccine design. In summary, we show that integrating a functionally conserved partitioning scheme based on protein structures of immune targets allow for significant improvement of phylogenetic analysis and providing important biological insights.