The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference

Protein Sci. 2018 Oct;27(10):1857-1870. doi: 10.1002/pro.3494.

Abstract

Sequence databases are powerful tools for the contemporary scientists' toolkit. However, most functional annotations in public databases are determined computationally and are not verified by a human expert. While hypotheses generated from computational studies are now amenable to experimentation, the quality of the results relies on the quality of input data. We developed the CaspBase to expedite high-quality dataset compilation of annotated caspase sequences, to maximize phylogenetic signal, and to reduce the noise contributed from public databanks. We describe our methods of curation for the CaspBase and how researchers can acquire sequences from CaspBase.org. Our immediate goal for developing the CaspBase was to optimize the ancestral protein reconstruction (APR) of caspases, and we demonstrate the utility of the CaspBase in APR studies. We also developed the Common Position (CP) system for comparing human caspase family paralogs and suggest the CP system as an update to current reporting methods of caspase amino acid positions. We present a standardized multiple sequence alignment (MSA) for the CP system and show the advantage of using large databases such as the CaspBase in defining structural positions in proteins. Although the results described here pertain to caspase evolution and structure-function studies, the methods can be adapted to any gene family.

Keywords: ancestral protein reconstruction; caspase; computational biology; database curation; protein evolution; sequence analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Caspases / chemistry*
  • Caspases / genetics
  • Caspases / metabolism
  • Computational Biology
  • Databases, Protein
  • Humans
  • Models, Molecular
  • Sequence Alignment
  • Sequence Analysis, Protein

Substances

  • Caspases