Abstract
HIV-1 samples a large number of sequence variants that help the virus to evade the immune system and escape from antiviral drugs. However, it is unclear how amino acid changes that require multiple nucleotide mutations contribute to the evolution of HIV-1. In a large database of HIV-1 protease sequences from circulating viruses we find that the most frequently observed amino acids are disproportionately accessible by single-base mutations from the consensus amino acid. Multiple-base mutations at a codon may be rarely sampled and/or exhibit fitness defects. To investigate the impact of fitness effects on mutant frequency, we quantified the experimental impacts of all individual amino acid changes in protease on the expansion of HIV-1 in tissue culture. Many amino acid changes requiring multiple nucleotide mutations that efficiently supported viral replication were observed rarely, or not at all, in the sequence database. Frequently observed amino acid changes that require multiple nucleotide mutations were accessible by single nucleotide mutational intermediates with higher fitness, on average, compared to the intermediates for unobserved amino acid changes. An additive quasispecies model that combined different mutational barriers for single- and multiple-base mutations with the experimental fitness effects showed an improved correspondence with clinical observations of mutant frequencies compared to fitness alone. Our results indicate that mutational sampling provides a larger barrier than fitness effects to amino acid changes requiring multiple base mutations in HIV-1 protease.