Detecting groups of coevolving positions in a molecule: a clustering approach

BMC Evol Biol. 2007 Nov 30:7:242. doi: 10.1186/1471-2148-7-242.

Abstract

Background: Although the patterns of co-substitutions in RNA is now well characterized, detection of coevolving positions in proteins remains a difficult task. It has been recognized that the signal is typically weak, due to the fact that (i) amino-acid are characterized by various biochemical properties, so that distinct amino acids changes are not functionally equivalent, and (ii) a given mutation can be compensated by more than one mutation, at more than one position.

Results: We present a new method based on phylogenetic substitution mapping. The two above-mentioned problems are addressed by (i) the introduction of a weighted mapping, which accounts for the biochemical effects (volume, polarity, charge) of amino-acid changes, (ii) the use of a clustering approach to detect groups of coevolving sites of virtually any size, and (iii) the distinction between biochemical compensation and other coevolutionary mechanisms. We apply this methodology to a previously studied data set of bacterial ribosomal RNA, and to three protein data sets (myoglobin of vertebrates, S-locus Receptor Kinase and Methionine Amino-Peptidase).

Conclusion: We succeed in detecting groups of sites which significantly depart the null hypothesis of independence. Group sizes range from pairs to groups of size approximately 10, depending on the substitution weights used. The structural and functional relevance of these groups of sites are assessed, and the various evolutionary processes potentially generating correlated substitution patterns are discussed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution / genetics*
  • Aminopeptidases / genetics*
  • Animals
  • Cluster Analysis
  • Evolution, Molecular*
  • Markov Chains
  • Methionyl Aminopeptidases
  • Models, Genetic*
  • Myoglobin / genetics
  • Phylogeny*
  • Plant Proteins / genetics
  • Protein Kinases / genetics
  • RNA, Bacterial / genetics
  • RNA, Ribosomal / genetics
  • Sequence Alignment

Substances

  • Myoglobin
  • Plant Proteins
  • RNA, Bacterial
  • RNA, Ribosomal
  • Protein Kinases
  • S-receptor kinase
  • Aminopeptidases
  • Methionyl Aminopeptidases