RT Journal Article SR Electronic T1 DHS-Crystallize: Deep-Hybrid-Sequence based method for predicting protein Crystallization JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.11.13.381301 DO 10.1101/2020.11.13.381301 A1 Azadeh Alavi A1 David B. Ascher YR 2020 UL http://biorxiv.org/content/early/2020/11/13/2020.11.13.381301.abstract AB The key method for determining the structure of a protein to date is X-ray crystallography, which is a very expensive technique that suffers from high attrition rate. On the contrary, a sequence-based predictor that is capable of accurately determining protein crystallization property, would not only overcome such limitations, but also would reduce the trial-and-error settings required to perform crystallization. In this work, to predict protein crystallizability, we have developed a novel sequence-based hybrid method that employs two separate, yet fully automated, concepts for extracting features from protein sequences. Specifically, we use a deep convolutional neural network on a publicly available dataset to extract descriptive features directly from the sequences, then fuse such feature with structural-and-physio-chemical driven features (such as amino-acid composition or AAIndex-based physicochemical properties). Dimentionality reduction is then performed on the resulting features and the output vectors are applied to train optimized gradient boosting machine (XGBoostt). We evaluate our method through three publicly available test sets, and show that our proposed DHS-Crystallize algorithm outperforms state-of-the-art methods, and achieves higher performance compared to using DCNN-deriven features, or structural-and-physio-chemical driven features alone.Competing Interest StatementThe authors have declared no competing interest.