TY - JOUR T1 - Protein identification with a binary code, a nanopore, and no proteolysis JF - bioRxiv DO - 10.1101/119313 SP - 119313 AU - G. Sampath Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/04/27/119313.abstract N2 - If protein sequences are recoded with a binary alphabet derived from a division of the 20 amino acids into two subsets, a protein can be identified from its subsequences by searching through a recoded sequence database. A binary-coded primary sequence can be obtained for an unbroken protein molecule from current blockades in a nanopore. Only two (instead of 20) blockade levels need to be recognized to identify a residue’s subset; a hard or soft detector can do this with two current thresholds. Computations were done on the complete proteome of Helicobacter pylori (http://www.uniprot.org; database id UP000000210, 1553 sequences) using a binary alphabet based on published data for residue volumes in the range ∼0.06 nm3 to ∼0.225 nm3. Assuming normally distributed volumes, more than 93% of binary subsequences of length 20 from the primary sequences of H. pylori are correct with a confidence level of 90-95%; they can uniquely identify over 98% of the proteins. Recently published work shows that a 0.7 nm diameter nanopore can measure residue volume with a resolution of ∼0.07 nm3; this makes the procedure described here both feasible and practical. This is a non-destructive single-molecule method without the vagaries of proteolysis. ER -