Protein structure comparison by alignment of distance matrices

J Mol Biol. 1993 Sep 5;233(1):123-38. doi: 10.1006/jmbi.1993.1489.

Abstract

With a rapidly growing pool of known tertiary structures, the importance of protein structure comparison parallels that of sequence alignment. We have developed a novel algorithm (DALI) for optimal pairwise alignment of protein structures. The three-dimensional co-ordinates of each protein are used to calculate residue-residue (C alpha-C alpha) distance matrices. The distance matrices are first decomposed into elementary contact patterns, e.g. hexapeptide-hexapeptide submatrices. Then, similar contact patterns in the two matrices are paired and combined into larger consistent sets of pairs. A Monte Carlo procedure is used to optimize a similarity score defined in terms of equivalent intramolecular distances. Several alignments are optimized in parallel, leading to simultaneous detection of the best, second-best and so on solutions. The method allows sequence gaps of any length, reversal of chain direction and free topological connectivity of aligned segments. Sequential connectivity can be imposed as an option. The method is fully automatic and identifies structural resemblances and common structural cores accurately and sensitively, even in the presence of geometrical distortions. An all-against-all alignment of over 200 representative protein structures results in an objective classification of known three-dimensional folds in agreement with visual classifications. Unexpected topological similarities of biological interest have been detected, e.g. between the bacterial toxin colicin A and globins, and between the eukaryotic POU-specific DNA-binding domain and the bacterial lambda repressor.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Actins / chemistry
  • Algorithms*
  • Amino Acid Sequence
  • Cluster Analysis
  • Colicins / chemistry
  • DNA-Binding Proteins / chemistry
  • Globins / chemistry
  • Heat-Shock Proteins / chemistry
  • Hexokinase / chemistry
  • Models, Molecular
  • Molecular Sequence Data
  • Monte Carlo Method
  • Muramidase / chemistry
  • Protein Structure, Secondary
  • Protein Structure, Tertiary*
  • Reproducibility of Results
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Actins
  • Colicins
  • DNA-Binding Proteins
  • Heat-Shock Proteins
  • Globins
  • Hexokinase
  • Muramidase