Abstract
Ancestral sequence reconstruction is a technique which is gaining widespread use in molecular evolution studies and protein engineering. Here we present Graphical Representation of Ancestral Sequence Predictions (GRASP) that can be used to infer and explore ancestral variants of protein families with more than 10,000 members. GRASP uses partial order graphs to represent homology in very large data sets, which are intractable with current inference tools and may, for example, be used to engineer proteins by identifying ancient variants of enzymes. We demonstrate that (1) across three distinct enzyme families, GRASP predicts ancestor sequences, all of which demonstrate enzymatic activity, (2) within-family insertions and deletions can be used as building blocks to support the engineering of biologically active ancestors via a new source of ancestral variation, and (3) generous inclusion of sequence data encompassing great diversity leads to less variance in ancestor sequence.
Footnotes
Minor edits, including updated figures, typos, and corrections to Eq. 2.