RT Journal Article SR Electronic T1 Structure-aware Protein Solubility Prediction From Sequence Through Graph Convolutional Network And Predicted Contact Map JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.06.24.169011 DO 10.1101/2020.06.24.169011 A1 Chen, Jianwen A1 Zheng, Shuangjia A1 Zhao, Huiying A1 Yang, Yuedong YR 2020 UL http://biorxiv.org/content/early/2020/06/25/2020.06.24.169011.abstract AB Motivation Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information.Results In this study, we have developed a new structure-aware method to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps from the sequence. GraphSol was shown to substantially out-perform other sequence-based methods. The model was proven to be stable by consistent R2 of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based predictions. More importantly, this architecture could be extended to other protein prediction tasks.Availability The package is available at http://biomed.nscc-gz.cnContact yangyd25{at}mail.sysu.edu.cnSupplementary information Supplementary data are available at Bioinformatics online.Competing Interest StatementThe authors have declared no competing interest.