PT - JOURNAL ARTICLE AU - Jianhao Cao AU - Shuhong Luo AU - Yuanyan Xiong TI - The variability of amino acids sequences in hepatitis B virus AID - 10.1101/326959 DP - 2018 Jan 01 TA - bioRxiv PG - 326959 4099 - http://biorxiv.org/content/early/2018/05/26/326959.short 4100 - http://biorxiv.org/content/early/2018/05/26/326959.full AB - Hepatitis B virus (HBV) is an important human pathogen belonging to the Hepadnaviridae family, Orthohepadnavirus genus. It infects over 240 million people globally. The reverse transcription during its genome replication leads to low fidelity DNA synthesis, which is the source of variability in the viral proteins. To investigate the variability quantitatively, we retrieved amino acid sequences of 5167 records of all available HBV genotypes (A-J) from the Genbank database. The amino acid sequences encoded by the open reading frames (ORF) S/C/P/X in the HBV genome were extracted and subjected to alignment respectively. We analyzed the variability of the lengths and the sequences of proteins as well as the frequencies of amino acids. Our study comprehensively characterized of the variability and conservation of HBV at the level of amino acids, especially for the structural proteins, hepatitis B surface antigens (HBsAg), to find out the potential sites critical for virus assembly and immune recognition. Interestingly, the preS1/S2 domains in HBsAg were variable at some positions of amino acid residues, which provides a potential mechanism of immune-escape for HBV, while the preS2 and S domains were conserved in the lengths of protein sequences. In the S domain, the cysteine residues and the secondary structures of the alpha-helix and beta-sheet were likely critical for the stable folding of the protein structure. The preC domain and C-terminal domain (CTD) of the core protein are highly conserved. And the polymerases HBpol and the HBx were highly variable at the amino acid level.