RT Journal Article
SR Electronic
T1 Tuning intrinsic disorder predictors for virus proteins
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2020.10.27.357954
DO 10.1101/2020.10.27.357954
A1 Gal Almog
A1 Abayomi S Olabode
A1 Art FY Poon
YR 2020
UL http://biorxiv.org/content/early/2020/10/27/2020.10.27.357954.abstract
AB Many virus-encoded proteins have intrinsically disordered regions that lack a stable folded threedimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36% gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to SARS-CoV-2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response.Competing Interest StatementThe authors have declared no competing interest.