RT Journal Article SR Electronic T1 Tuning intrinsic disorder predictors for virus proteins JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.10.27.357954 DO 10.1101/2020.10.27.357954 A1 Gal Almog A1 Abayomi S Olabode A1 Art FY Poon YR 2020 UL http://biorxiv.org/content/early/2020/10/27/2020.10.27.357954.abstract AB Many virus-encoded proteins have intrinsically disordered regions that lack a stable folded threedimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36% gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to SARS-CoV-2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response.Competing Interest StatementThe authors have declared no competing interest.