Abstract
Protein design requires searching exponentially large spaces of possible sequences to find candidates with desired properties. Language models (LMs) pretrained on universal protein datasets have shown potential to help make this search space tractable. However, LMs trained on natural sequences alone have limitations in designing proteins with novel functions, which is especially important for many pharmaceutical applications. In this work, we used a combination of methods to finetune pretrained LMs on laboratory data collected in an anti-CD40L antibody library campaign to develop an ensemble scoring function to model the fitness landscape and guide the design of new antibodies. Laboratory testing showed that the designed antibodies had improved affinity to CD40L. Notably, the designs improved the affinities of four antibodies, originally ranging from 1 nanomolar to 100 picomolar, all to below 25 picomolar, approaching the limit of detection. This work is a promising step towards realizing the potential of LMs to leverage laboratory data to develop improved treatments for diseases.
Competing Interest Statement
The authors have declared no competing interest.