Abstract
Computation or deep learning-based functional protein generation methods address the urgent demand for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This emergence leads to the creation of highly efficient and specialized proteins with wide-ranging applications in scientific, technological, and biomedical domains. This study establishes a conditional protein diffusion model, namely CPDiffusion, to deliver diverse protein sequences with desired functions. While the model is free from extensive training data and the sampling process involves little guidance on the type of generated amino acids, CPDiffusion effectively secures essential highly conserved residues that are crucial for protein functionalities. We employed CPDiffusion and generated 27 artificially designed Argonaute proteins, programmable endonucleases applied for easy-to-implement and high-throughput screenings in gene editing and molecular diagnostics, that mutated approximately 200 − 400 amino acids with 40% sequence identities to those from nature. Experimental tests demonstrate the solubility of all 27 artificially-designed proteins (AP), with 24 of them displaying DNA cleavage activity. Remarkably, 74% of active APs exhibited superior activity compared to the template protein, and the most effective one showcased a remarkable nearly nine-fold enhancement of enzymatic activity. Moreover, 37% of APs exhibited enhanced thermostability. These findings emphasize CPDiffusion’s remarkable capability to generate long-sequence proteins in a single step while retaining or enhancing intricate functionality. This approach facilitates the design of intricate enzymes featuring multi-domain molecular structures through in silico generation and throughput, all accomplished without the need for supervision from labeled data.
Competing Interest Statement
The authors have declared no competing interest.