Abstract
Design and natural evolution of protein sequences can be profoundly impacted by the extent of epistasis between mutations. For most proteins and sets of residues, it’s unclear how much epistasis there is. Here, we measure the effect of combinatorial variants at ten positions in the antitoxin ParD3 on its ability to neutralize its cognate toxin. Using this and two additional datasets, we show that a site-wise independent model without epistasis can explain virtually all of the combinatorial mutation effects. This model can be trained on few random observations and still predict combinatorial variant effects not observed during training. We then develop an unsupervised strategy to design functional and diverse protein sequences without experimental variant effect measurements by using a site-wise independent model trained on structural databases. Such independent approaches could enable the combinatorial design of therapeutically relevant binding proteins with desired binding properties with few or no observations.
Competing Interest Statement
The authors have declared no competing interest.