Abstract
The methodology of Specific Peptides (SP) has been introduced within the context of enzymes. It is based on an unsupervised machine leaning (ML) tool for motif extraction, followed by supervised annotation of the motifs. In the case of enzymes, the classifier is the Enzyme Classification (EC) number. Here we demonstrate that this method reaches precision of 96.5% and recall of 89.1% on presently available protein sequences. We also apply this method to two other protein families, GPCR and ZF, find their corresponding SPs, and provide the code for searching any protein sequence for its classification under any such family.
Competing Interest Statement
The authors have declared no competing interest.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.