The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference

Robert D Grinshpon; Anna Williford; James Titus-McQuillan; A Clay Clark

doi:10.1002/pro.3494

The CaspBase: a curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference

Protein Sci. 2018 Oct;27(10):1857-1870. doi: 10.1002/pro.3494.

Authors

Robert D Grinshpon¹, Anna Williford², James Titus-McQuillan², A Clay Clark²

Affiliations

¹ Department of Molecular and Structural Biochemistry, NC State University, Raleigh, North Carolina, 27608.
² Department of Biology, University of Texas at Arlington, Arlington, Texas, 76019.

Abstract

Sequence databases are powerful tools for the contemporary scientists' toolkit. However, most functional annotations in public databases are determined computationally and are not verified by a human expert. While hypotheses generated from computational studies are now amenable to experimentation, the quality of the results relies on the quality of input data. We developed the CaspBase to expedite high-quality dataset compilation of annotated caspase sequences, to maximize phylogenetic signal, and to reduce the noise contributed from public databanks. We describe our methods of curation for the CaspBase and how researchers can acquire sequences from CaspBase.org. Our immediate goal for developing the CaspBase was to optimize the ancestral protein reconstruction (APR) of caspases, and we demonstrate the utility of the CaspBase in APR studies. We also developed the Common Position (CP) system for comparing human caspase family paralogs and suggest the CP system as an update to current reporting methods of caspase amino acid positions. We present a standardized multiple sequence alignment (MSA) for the CP system and show the advantage of using large databases such as the CaspBase in defining structural positions in proteins. Although the results described here pertain to caspase evolution and structure-function studies, the methods can be adapted to any gene family.

Keywords: ancestral protein reconstruction; caspase; computational biology; database curation; protein evolution; sequence analysis.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Caspases / chemistry*
Caspases / genetics
Caspases / metabolism
Computational Biology
Databases, Protein
Humans
Models, Molecular
Sequence Alignment
Sequence Analysis, Protein

Substances

Caspases