Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity

  1. Eric A. Stone1,2 and
  2. Arend Sidow2,3,4
  1. 1 Department of Statistics, Stanford University, Stanford, California 94305-5324, USA
  2. 2 Department of Pathology, Stanford University, Stanford, California 94305-5324, USA
  3. 3 Department of Genetics, Stanford University, Stanford, California 94305-5324, USA

Abstract

We find that the degree of impairment of protein function by missense variants is predictable by comparative sequence analysis alone. The applicable range of impairment is not confined to binary predictions that distinguish normal from deleterious variants, but extends continuously from mild to severe effects. The accuracy of predictions is strongly dependent on sequence variation and is highest when diverse orthologs are available. High predictive accuracy is achieved by quantification of the physicochemical characteristics in each position of the protein, based on observed evolutionary variation. The strong relationship between physicochemical characteristics of a missense variant and impairment of protein function extends to human disease. By using four diverse proteins for which sufficient comparative sequence data are available, we show that grades of disease, or likelihood of developing cancer, correlate strongly with physicochemical constraint violation by causative amino acid variants.

Footnotes

  • [Supplemental material is available online at www.genome.org. A Java executable of MAPP and documentation are freely available for download at http://mendel.stanford.edu/supplementarydata/stone_MAPP_2005.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3804205. Article published online before print in June 2005.

  • 4 Corresponding author. E-mail arend{at}stanford.edu; fax (650) 725-4905.

    • Accepted April 21, 2005.
    • Received February 7, 2005.
| Table of Contents

Preprint Server