RT Journal Article SR Electronic T1 Predicting Protein Producibility in Filamentous Fungi JF bioRxiv FD Cold Spring Harbor Laboratory SP 138560 DO 10.1101/138560 A1 Karmen L Dykstra A1 Juho Rousu A1 Mikko Arvas YR 2017 UL http://biorxiv.org/content/early/2017/05/16/138560.abstract AB In this paper we study the problem of predicting the producibility of recombinant proteins in filamentous fungi, especially T. reesei, using machine learning methods. We train supervised and semi-supervised support vector machines with protein sequences, represented by their amino acid composition as well as protein family and domain information. Our results indicate, somewhat surprisingly, that quite modest amount of proteins with experimental data are required to build a state-of-the-art classifier and that additional unlabeled sequences in semi-supervised models do not bring increased predictive performance. Our experiments in cross-species prediction show that models trained for the filamentous fungus A. niger protein dataset can be generalized to predict protein producibility in T. reesei, and vice versa, without sacrificing too much accuracy, regardless of their approximately 500 millions years of divergence. However, predictors trained on E. coli and S. cerevisiae datasets gave variable performance when applied to the filamentous fungi datasets, indicating that while protein producibility prediction can be generalized accross related species, fully generic prediction tools applicable to any protein production host may not be realistic to achieve.