RT Journal Article SR Electronic T1 Efficient and accurate prediction of transmembrane topology from amino acid sequence only JF bioRxiv FD Cold Spring Harbor Laboratory SP 627307 DO 10.1101/627307 A1 Qing Wang A1 Chong-ming Ni A1 Zhen Li A1 Xiu-feng Li A1 Ren-min Han A1 Feng Zhao A1 Jinbo Xu A1 Xin Gao A1 Sheng Wang YR 2019 UL http://biorxiv.org/content/early/2019/05/05/627307.abstract AB Motivation Fast and accurate identification of transmembrane (TM) topology is well suited for the annotation of whole membrane proteome, and in turn the initial step to predict the structure and function of membrane proteins. However, till now the methods that utilize only amino acid sequence (pureseq) will suffer from low prediction accuracy, whereas the methods that exploit sequence profile or consensus will need too much computing time.Method This article employs a deep learning framework DeepCNF that predicts TM topology from amino acid sequence only. Compared to previous pureseq approaches that based on Hidden Markov Models (HMM) or Dynamic Bayesian Network (DBN), DeepCNF can accommodate a lot more context information by a hierarchical deep neural network, and simultaneously model the interdependency between adjacent topology labels.Result Experimental results show that our TM prediction method PureseqTM not only outperforms existing pureseq methods, but also reaches or even surpasses the profile/consensus methods. On the 39 newly released membrane proteins, our approach successfully identifies the correct TM segments and boundaries for at least 3 cases while either of the other approaches failed to do so. When applied to the entire Human proteome, our method can identify the incorrect annotations of TM regions by UniProt, as well as discover the membrane-related proteins that are not manually curated as membrane protein.Availability http://pureseqtm.predmp.com/