Abstract
Successful development of monoclonal antibodies (mAbs) for therapeutic applications is hindered by developability issues such as low solubility, low thermal stability, high aggregation, and high immunogenicity. The discovery of more developable mAb candidates relies on high-quality antibody libraries for isolating candidates with desirable properties. We present Immunoglobulin Language Model (IgLM), a deep generative language model for generating synthetic libraries by re-designing variable-length spans of antibody sequences. IgLM formulates anti-body design as an autoregressive sequence generation task based on text-infilling in natural language. We trained IgLM on approximately 558M antibody heavy- and light-chain variable sequences, conditioning on each sequence’s chain type and species-of-origin. We demonstrate that IgLM can be applied to generate synthetic libraries that may accelerate the discovery of therapeutic antibody candidates.
Competing Interest Statement
The Johns Hopkins University has filed one or more patent application(s) related to this technology. R.W.S., J.A.R., and J.J.G. are named as inventors on these application(s).
Footnotes
richardshuai{at}berkeley.edu
jruffolo{at}jhu.edu