Abstract
Proteins can be identified by partitioning a proteome into eight mutually exclusive sets of peptides, recoding them with a binary alphabet obtained by dividing the 20 amino acids into two ordered sets based on amino acid volume, and searching for the recoded peptides in a protein sequence database. With this approach over 89.7% of all protein sequences in the human proteome (http://www.uniprot.org; database id UP000005640, 20207 curated sequences) can be uniquely identified. Implementation issues are briefly discussed. In particular, nanopore-based sequencing of partitioned peptides becomes less difficult as the signal processing involved is largely a matter of thresholding the current blockade signal due to a translocating peptide and generating a binary sequence from it.