Detecting coevolving positions in a molecule: why and how to account for phylogeny

Brief Bioinform. 2012 Mar;13(2):228-43. doi: 10.1093/bib/bbr048. Epub 2011 Sep 24.

Abstract

Positions in a molecule that share a common constraint do not evolve independently, and therefore leave a signature in the patterns of homologous sequences. Exhibiting such positions with a coevolution pattern from a sequence alignment has great potential for predicting functional and structural properties of molecules through comparative analysis. This task is complicated by the existence of additional correlation sources, leading to false predictions. The nature of the data is a major source of noise correlation: sequences are taken from individuals with different degrees of relatedness, and who therefore are intrinsically correlated. This has led to several method developments in different fields that are potentially confusing for non-expert users interested in these methodologies. It also explains why coevolution detection methods are largely unemployed despite the importance of the biological questions they address. In this article, I focus on the role of shared ancestry for understanding molecular coevolution patterns. I review and classify existing coevolution detection methods according to their ability to handle shared ancestry. Using a ribosomal RNA benchmark data set, for which detailed knowledge of the structure and coevolution patterns is available, I demonstrate and explain why taking the underlying evolutionary history of sequences into account is the only way to extract the full coevolution signal in the data. I also evaluate, using rigorous statistical procedures, the best approaches to do so, and discuss several important biological aspects to consider when performing coevolution analyses.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Computer Simulation*
  • Evolution, Molecular*
  • Phylogeny*
  • RNA, Ribosomal / genetics
  • Sequence Alignment

Substances

  • RNA, Ribosomal