Abstract
Large scale mutagenesis experiments are becoming possible owing to the advancement in the sequencing technologies and high throughput screening. Deep mutational scans perform exhaustive single-point muta-tions on a protein and probe their phenotypic effects. Performing a full scan with site-directed mutations of all the amino acid residues in a protein may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In order to develop the realistic paradigm of assay context-aware predictive hybrid models, we combine minimal deep mutational studies with computational models and pre-dict the phenotypic outcomes quantitatively. Structural, sequence and co-evolutionary information along with partial deep mutational scan data was included to capture the phenotypic relevance of the mutations to the specific screening criterion. The model reliably predicts the fitness outcomes of hundreds of randomly selected amino acid mutations in β-lactamase, when the phenotypic fitness data from as few as 15% of the full mutation is available. Interestingly, the predictive capabilities are better with a random set of mutations rather than with a systematic substitution of all amino acids to alanine, asparagine and histidine (ANH). The model can potentially be extended for predicting the phenotypic outcomes at other concentrations of the stressor by carefully analyzing the dose-response curves of a representative set of mutations.
Author Summary Mutations are the minor changes in protein sequences, with incommensurately high consequences for their function. Many severe diseases can occur with simple single point mutations. An interesting way of studying these mutations is not to isolate the protein from its natural conditions, but rather study how the fitness of the cell improves or decreases in response to these mutations. Whether it is for understanding disease biology or for bio-engineering applications it is important to quantify the impact of mutations on the cellular fitness. An experimental paradigm has evolved which has improved the ability to sample several hundred thousands of mutation-fitness relations using high throughput screening. However, since these are very specialized experiments, the question is if the number of such experiments required can be minimized, by using computer models to complement the rest of the fitness predictions. In this work we introduce this new paradigm which uses computer model trained on a partial deep mutation scan data, to predict the fitness variations in a full mutations scan that could also be repeated under multiple experimental conditions like drug concentrations.