Abstract
In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network, and a protein language model. Focusing on two enzyme families, we expressed and purified over 440 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved experimental success rates by 44-100%. Surprisingly, neither sequence identity to natural sequences nor AlphaFold2 residue-confidence scores were predictive of enzyme activity. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants to test experimentally.
Competing Interest Statement
The authors have declared no competing interest.