RT Journal Article SR Electronic T1 Complete fold annotation of the human proteome using a novel structural feature space JF bioRxiv FD Cold Spring Harbor Laboratory SP 092379 DO 10.1101/092379 A1 Sarah A. Middleton A1 Joseph Illuminati A1 Junhyong Kim YR 2016 UL http://biorxiv.org/content/early/2016/12/07/092379.abstract AB Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 96% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into biological function, including the prediction of functional motifs relevant to human diseases. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.