TY - JOUR T1 - Protein identification with a nanopore and a binary alphabet JF - bioRxiv DO - 10.1101/119313 SP - 119313 AU - G. Sampath Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/10/119313.abstract N2 - Protein sequences are recoded with a binary alphabet obtained by dividing the 20 amino acids into two subsets based on volume. A protein is identified from subsequences by database search. Computations on the Helicobacter pylori proteome show that over 93% of binary subsequences of length 20 are correct at a confidence level exceeding 90%. Over 98% of the proteins can be identified, most have multiple identifiers so the false detection rate is low. Binary sequences of unbroken protein molecules can be obtained with a nanopore from current blockade levels proportional to residue volume; only two levels, rather than 20, need be measured to determine a residue’s subset. This procedure can be translated into practice with a sub-nanopore that can measure residue volumes with ∼0.07 nm3 resolution as shown in a recent publication. The high detector bandwidth required by the high speed of a translocating molecule can be reduced more than tenfold with an averaging technique, the resulting decrease in the identification rate is only 10%. Averaging also mitigates the homopolymer problem due to identical successive blockade levels. The proposed method is a proteolysis-free single-molecule method that can identify arbitrary proteins in a proteome rather than specific ones. ER -