Protein fingerprinting with a binary alphabet

G. Sampath

doi:10.1101/119313

Abstract

Proteins can be identified by partitioning a proteome into eight mutually exclusive sets of peptides, recoding them with a binary alphabet obtained by dividing the 20 amino acids into two ordered sets based on amino acid volume, and searching for the recoded peptides in a protein sequence database. With this approach over 89.7% of all protein sequences in the human proteome (http://www.uniprot.org; database id UP000005640, 20207 curated sequences) can be uniquely identified. Implementation issues are briefly discussed. In particular, nanopore-based sequencing of partitioned peptides becomes less difficult as the signal processing involved is largely a matter of thresholding the current blockade signal due to a translocating peptide and generating a binary sequence from it.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.