TY - JOUR T1 - Deep protein representations enable recombinant protein expression prediction JF - bioRxiv DO - 10.1101/2021.05.13.443426 SP - 2021.05.13.443426 AU - Hannah-Marie Martiny AU - Jose Juan Almagro Armenteros AU - Alexander Rosenberg Johansen AU - Jesper Salomon AU - Henrik Nielsen Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/05/14/2021.05.13.443426.abstract N2 - A crucial process in the production of industrial enzymes is recombinant gene expression, which aims to induce enzyme overexpression of the genes in a host microbe. Current approaches for securing overexpression rely on molecular tools such as adjusting the recombinant expression vector, adjusting cultivation conditions, or performing codon optimizations. However, such strategies are time-consuming, and an alternative strategy would be to select genes for better compatibility with the recombinant host. Several methods for predicting expressibility and solubility are available; however, they are all optimized for the expression host Escherichia coli. We show that these tools are not suited for predicting expression potential in the industrially important host Bacillus subtilis. Instead, we build a B. subtilis-specific machine learning model for expressibility prediction. Given millions of unlabelled proteins, and a small labelled dataset, we can successfully train such a predictive model. The unlabelled proteins provide a performance boost relative to using amino acid frequencies of the labelled proteins as input. On average, we obtain a modest performance of 0.64 area-under-the-curve (AUC) and 0.2 Matthews correlation coefficient (MCC). However, we find that this is sufficient to be useful for prioritization of expression candidates. Moreover, the predicted class probabilities are correlated with expression levels. A number of features related to protein expression, including base frequencies and solubility, are captured by the model.1Competing Interest StatementJesper Salomon is employed by Novozymes A/S. The remaining authors have declared no competing interest. ER -