Abstract
Computations on proteome sequence databases show that most proteins can be identified from a protein’s isoelectric point (IEP) and digitized linear sequence volume (equal to the total volume of its residues). This is illustrated with four proteomes: H. pylori (1553 proteins), E. coli (4306 proteins), S. cerevisiae (6721 proteins), and H. sapiens (20207 proteins); the identification rate exceeds 90% in all four cases for appropriate parameter values. IEP can be obtained with 1-d gel electrophoresis (GE), whose accuracy is better than 0.01. Linear protein sequence volumes of unbroken proteins can be obtained with a sub-nanometer diameter nanopore that can measure residue volume with a resolution of 0.07-0.1 nm3 (Kennedy et al., Nature Nanotech., 2016, 11, 968-976; Dong et al., ACS Nano, 2017, doi: 10.1021/acsnano.6b08452); the blockade current due to a translocating protein is roughly proportional to the volume it excludes in the pore. There is no need to identify any of the residues. More than 90% of all the proteins have estimated translocation times higher than 1 μs, which is within the time resolution of available detectors. This is a minimalist proteolysis-free GE-and nanopore-based single-molecule approach requires very small samples, is non-destructive (the sample can be recovered for reuse), and can be translated with currently available technology into a portable device for possible use in the field, an academic lab, or a pre-screening step preceding conventional mass spectrometry.