ReviewAffinity purification of protein complexes for analysis by multidimensional protein identification technology
Highlights
► Analysis of affinity purified protein complexes by proteomics approaches has facilitated biochemical discovery. ► Quantitative proteomics approaches are being coupled to affinity purified complexes. ► The basics of the mass spectrometry analysis of affinity purified complexes are provided.
Introduction
MudPIT-based proteomic analyses of protein complexes has been used to elucidate the architecture of cellular machinery involved in many fundamental processes from gene transcription [1] to mitosis [2], and has enabled the isolation of protein components specific to diseases including pancreatic cancer [3] and heart disease [4]. Various techniques using affinity purification of tagged bait proteins have been used to isolate a given protein together with other proteins that associate and copurify with the bait [5]. Analysis of these purified samples by MudPIT mass spectrometry can then be used to determine the protein subunit composition of purified complexes [6]. Quantitative analysis of MudPIT data can provide accurate information about which proteins associate closely with the bait proteins and might be regarded as part of a protein complex, and which proteins might associate more loosely [7]. Network analysis strategies have been used to identify novel components of complexes, dissect which proteins might be members of more than one complex and provide information about overall architecture of complexes including which components might be geographically close to each other and which components of complexes might comprise submodules that might be capable of disassociating from a complex [8], [9]. Recent advances in the tools available for purification together with those available for downstream network analysis have enabled a more detailed understanding of the architecture of cellular machinery.
Protein–protein interactions play several roles within the cell. For example, they provide structural architecture; regulate gene expression as activators or repressors; and modulate substrate specificity of enzymes. Associated proteins are traditionally separated from cell lysates via multiple purification steps based on biochemical properties (e.g., molecular mass, surface charge and hydrophobicity, Stokes radius) [10], [11], [12]. As it is now easy to clone genes of interest and express engineered proteins, affinity purification has become a common approach to isolate complexes containing a specific protein of choice.
Affinity purification relies on two things: (1) the ability for a protein to bind to a substrate that is attached to a solid support such as agarose or resin and (2) the ability to remove the protein from the solid support; elution is accomplished either with a soluble substrate which competes for binding sites or by cleavage between the protein and support with a specific protease [13], [14]. The basis of affinity purification is exemplified by the isolation of maltose binding protein (MBP) as described by Ferenci and Klotz [15]; similar methods are used to this day for MBP-chimera proteins [16]. Ferenci and Klotz passed Escherichia coli extract over a column containing a support resin cross-linked to potato-derived amylose, to which MBP bound efficiently [15]. A buffer containing 10 mM maltose could elute the protein with few or no contaminants [15]. By contrast, elution with 10 mM glucose produced no yield, a finding consistent with the binding properties of MBP [15]. Now, biochemists can exploit the specific binding characteristics of peptide tags such as MBP and other proteins/protein domains to isolate their protein of choice to high purity and with great ease [17].
Peptide tags that bind selectively with high affinity to immobilized substrates allow for relatively rapid isolation of protein complexes from moderate amounts of starting material in a single chromatographic step. As one must genetically engineer a construct for the expression of one’s protein of choice fused to the protein or epitope tag, the use of affinity purification is typically limited to cells whose genomes can be manipulated or which allow protein expression from an exogenous plasmid or viral DNA. For cases that preclude this, such as for the isolation of protein complexes from human or large animal tissue samples, one must turn to other techniques such as affinity purification using an antibody specific to the protein of interest or traditional multi-step chromatography [18], [19], [20]. Although affinity-based purification provides clean and relatively fast yields and allows more flexibility in the number and range of experiments that can be done, affinity purification by native protein-specific antibodies or traditional “bucket biochemistry” allows the investigation of a system that is under normal gene expression conditions. Furthermore, these latter methods avoid artifacts that may be introduced by a protein tag and which will be discussed later [18].
A wide variety of affinity tags are available, the more widely used ones listed in Fig. 1 and the main ones discussed in more detail below. Choice of tag can be important; they may affect native protein interactions, post-translational modifications, solubility and/or cellular localization of the recombinant protein [21], [22]. Some tags can tolerate and function under a large range of conditions (salt, reducing agents, detergent) and hence give the scientists flexibility to change components – and their concentrations – of lysis and elution buffers to suit their needs. For example, 6xHis and biotin tags function and bind to their substrates under fully denaturing conditions (8 M urea) [23]. This property has proven useful to those isolating proteins that are insoluble under native conditions as is commonly experienced with highly over-expressed proteins that have been sequestered into inclusion bodies; proteins involved in aggregate-related diseases such as sickle cell anemia, Huntington’s, and Alzheimer’s; and proteins from cells with an inhibited proteasome [24], [25]. For the proteomic pursuits of identifying participants of protein interactions, especially those that are weak or transient, one may apply similar strategies using 6xHis- and biotin-based purifications from in vivo-crosslinked cell extracts [23]. Some protein-fusion partners like MBP and glutathione S-transferase (GST) can enhance solubility of proteins [26]; chimeras of these may be employed when investigating partial complexes where the protein-of-interest normally resides in a position unexposed to solvent (i.e., a protein that is normally buried within a complex’s quaternary structure) [26].
The use of the FLAG affinity tag was first described by Hopp and co-workers [27]. The FLAG epitope is small (peptide sequence DYKDDDDK) and hydrophilic, properties that both increase exposure of the tag for interaction with anti-FLAG antibody and reduce interference between the tag and the tagged protein. Choice of antibodies for FLAG purification is important. Of the three available antibodies, M1, M2 and M5, the M1 requires exposure of the N-terminal aspartic acid for efficient purification; the M2 and M5 antibodies both allow purification of either N-terminally Met-FLAG tagged or C terminally FLAG tagged proteins (reviewed in [28]). As M1 anti-FLAG antibody is calcium dependent and the M5 unable to bind proteins expressed in E. coli, the M2 antibody is commonly used [29], [30]. For protein interaction studies using MudPIT, proteins have typically been eluted from anti-FLAG agarose resin by competition with soluble FLAG peptide [1], [31]. Competitive elution with peptide is a gentle strategy and results in a relatively low number of contaminants that bind to the agarose resin nonspecifically; however, control experiments using lysates without FLAG-tagged proteins yield a consistent set of proteins, which possibly occur through binding to the immobilized antibody via domains similar to the FLAG epitope [1], [32]. FLAG purification strategies have been used to characterize several protein complexes in conjunction with mass spectrometry including the mammalian Mediator complex [1] and the TRRAP/TIP60 complex [7], [33] as well as for a large scale identification of yeast protein complexes [34] and a large scale analysis of the human protein interactome [35].
The ∼35 kDa Halo tag is one of several tags that have been described that are based on the affinity of a modified enzyme to an immobilized ligand [36], [37], [38]. During Halo affinity capture the tag becomes covalently coupled to the immobilized support allowing for overexpressed bait protein to be retained on the support during SDS or urea elution [37]. Alternatively the bait can be released by elution with TEV protease, which digests a cleavage site immediately downstream of the tag [37]. A recent study analyzed proteins co-purifying with Halo tagged RNA polymerase subunits using MudPIT mass spectrometry and identified several novel RNA polymerase associated proteins [39]. The authors suggest that fewer non-specific contaminants copurify with Halo tagged proteins than with immunoaffinity purified proteins [39]. Other advantages of the Halo tagging system include the availability of vectors allowing different levels of expression of the recombinant protein and the ability to use the tag for in vivo dye labeling of the protein for live cell imaging [37].
A number of affinity tags have been developed that use various peptide motifs that bind streptavidin and can be subsequently eluted by competition with biotin. The 9 aa Strep-tag (AWRHPQFGG) was first identified in a screen for peptides that would bind streptavidin using a random peptide library [40], [41]. As the Strep-tag can only function when it is at the C-terminus of a protein, Strep II tag (WSHPQFEK) was designed to allow the tagging on either N- or C-terminal ends. Due to its low affinity to streptavidin, Strep II is used in conjunction with a reengineered variant of streptavidin, StrepTactin [42]. The Strep II and its modification Strep III, which includes a linker region, have been applied to proteomic studies aimed at purifying proteins associating with the multi-functional protein phosphatase PP2 for analysis by LC-MS/MS [43]. The larger 38 aa streptavidin binding peptide (SBP) was later identified in an attempt to find a streptavidin binding peptide sequence with a higher affinity than its predecessors [44]. SBP binds to streptavidin with a dissociation constant of 2.5 nM, which allows for extensive washing and gentle elution with biotin and results in higher purity eluates than the 6 aa His tag or the 394 aa maltose binding protein (MBP) [44]. In addition, the high capacity of streptavidin resins enables the purification of relatively large amounts of protein (0.5 mg protein/ml streptavidin resin) compared to other antibody-based purifications. Another screen for a smaller peptide than SBP with a high affinity for streptavidin resulted in the Nano-tags (15 aa KD 4 nM and 9 aa KD 17 nM) [45].
Maltose binding protein (MBP) is a 43 kDa, highly soluble protein that can be useful for expressions of some recombinant proteins where the native protein is insoluble and tends to form aggregates in solution [26], [46]. As mentioned before, MBP tagged complexes can be immobilized by binding to amylose resin and can be eluted from the immobilized phase with maltose [47]. An MBP purification strategy was used to purify spliceosome complexes and led to the identification 58 new components of the spliceosome [48]. Zhou and coworkers used an elegant strategy to purify components of the spliceosome associated with mRNA using the MS2 bacteriophage coat protein which binds to a specific RNA hairpin structure [48]. They first purified an MBP/MS2 fusion protein expressed in E. coli using amylose beads and then bound the purified protein to model pre-mRNA which contained MS2 binding sites [48]. They then assembled spliceosomes on the pre-mRNA, and then further purified the spliceosomes using gel filtration and a second amylose bead affinity step (method outlined in [49]).
The c-myc epitope tag binds with strong affinity to its antibody [50] and has been used to investigate protein interaction networks centered on the yeast ubiquitin ligase subunit Skp1 [51]. In this study, proteins were either eluted with SDS or with TEV protease, before separation by SDS PAGE prior to mass spectrometry [51]. This example, though, highlights one of the misgivings of using the c-myc epitope tag. As it binds so tightly to its antibody, soluble peptide cannot effectively compete off the isolated c-myc fusion protein. Hence, one must resort to harsher means of eluting proteins that may also increase levels of contaminants in the final product.
Proteins bearing the polyhistidine tag, which normally contains 5–10 repeats of histidine, will bind to immobilized metals (Ni, Co, Zn) under higher pH though sharing the free electrons of the R-groups to the empty electron clouds of the transition elements [52], [53], [54]. Due to the characteristic of this interaction, isolation by metal columns must be done at a pH above the pKa of histidine and in the absence of reducing agents and metal chelators [55]. Analogously, elution may be performed either by competition with imidazole, a reduction of pH, or by chelation of the metal by EDTA [55]. Since the R-groups of other amino acids (including tryptophan, tyrosine and phenylalanine) can also bind to transition metals, purification by metal-chelating resin may not be particularly clean [55]. It is possible to attenuate contaminant levels by binding and washing with buffers containing low concentrations of imidazole and eluting by means of an imidazole gradient [55].
Many alternative single-stage purification strategies exist, each with their individual benefits and caveats. To aid researchers in the optimal choice of system for their particular needs, Lichty and coworkers compared the efficiency of a number of affinity tags, including His, CBP, Strep II, FLAG, GST and MBP [17]. They used these tags to purify complexes from a variety of organisms and looked at protein purity and yield as well as cost [17]. Firstly, they suggested that the Strep II tag might be a good choice where a compromise between purity, yield and cost needs to be considered. In addition their side by side comparison of tags provides useful data for researchers for whom one particular property of a purification system might be more important (for example they concluded that the His tag, although only offering moderate purity, was inexpensive and gave high protein yields) [17].
As well as the choice of tag, the conditions used during purification may also affect the purity of complexes. A recent study by Rees and coworkers investigated pre-equilibration of affinity surfaces with thiocyanate ions for the suppression of non-specific interactions [56]. The large thiocyanate anions are thought to disrupt the nonpolar forces largely responsible for nonspecific protein–protein interactions by altering the structure of water near interacting protein surfaces [57].
The use of two affinity tags in tandem to purify yeast protein complexes to near homogeneity for analysis by mass spectrometry was first described by Rigaut and co-workers [58], [59]. They first tested a number of tags to determine their efficiency in purifying a low abundance yeast protein (the FLAG, Strep and His tags, the calmodulin-binding peptide (CBP), the chitin-binding domain (CBD) and two repeats of the Protein A (IgG-binding unit) [58]. They found that only the Protein A and CBP tags enabled recovery of more than 50% of the fusion protein, so they then used a sequence encoding CBP, a TEV protease site and the ProteinA IgG-binding domains in tandem (TAP tag) to purify yeast SmX4p associated complexes in two steps [58]. After binding complexes to IgG beads via the Protein A tag and and eluting them with TEV protease, they isolated the complexes from non-target proteins and TEV protease via calmodulin beads in the presence of calcium [58]. Removal of TAP-tagged proteins and associated proteins occurs in the presence of EDTA (Note: EDTA might interfere with complex stability and so TAP purification may not be suitable for some downstream applications) [58]. Improvements to the TAP purification have been suggested, including the removal (by mutation) of a nuclear localization sequence within the CBP domain and the addition of a formaldehyde crosslinking step to increase the recovery of weakly interacting proteins [60].
Following on from the work of Rigaut et al., several groups have tested other combinations of tandem affinity tags. In 2004, Graumann et al. used a His9 – PreScission-Myc9 tag (HPM) to purify yeast protein complexes for MudPIT analysis [61]. They purified proteins associating with the yeast histone acetyltransferase Gcn5p; they then compared the set of HPM-Gcn5p associated proteins that they identified with two sets of TAP-Gcn5p associated proteins identified in previously published studies [61]. This analysis suggested that purifications using the HPM and TAP tags generated similar sets of Gcn5p associated proteins and validated the HPM tag as an alternative to the TAP tag [61]. Work by Bürckstümmer et al. in 2006 aimed at improving the TAP method for use with mammalian cells [62]. Their GS-tag consisted of two repeats of a ProteinG IgG-binding peptide followed by a TEV protease cleavage sequence and a streptavidin binding peptide sequence; hence, the first purification step used TEV protease for elution and the subsequent step used biotin [62]. With cells stably expressing tagged proteins under the control of the CMV promoter as starting material (∼5 × 107 cells), they demonstrated that a sufficient quantity of protein complexes could be purified for mass spectrometry analysis from relatively small-scale purifications [62]. They reported a tenfold increase in protein-complex yield using a GS-TAP tag compared with the TAP tag when purifying complexes from HEK293 cells [62]. More recent studies have identified other combinations of tags for purifying complexes from mammalian cells that give higher protein yields. These include the SF-TAP (FLAG-2x Strep II, up to 40% recovery [63]), and the SH tag (Strep-HA, up to 40% recovery estimated from Western blots [64]). Additionally, the InterPlay mammalian TAP system, based on a CBP-SBP tag, has been developed commercially by Stratagene (Agilent Technologies) and has been used to purify FANCA interacting proteins for analysis by nano-flow LC-MS/MS [65]. Li and coworkers modified this system to add an additional C-terminal His tag to the fusion protein to overcome problems with one fusion protein, which did not bind to the calmodulin phase efficiently [66].
Another purification strategy was developed to try to capture transiently or weakly interacting proteins by crosslinking proteins with formaldehyde and then purifying the covalently bound complexes under denaturing conditions [23]. For this technique, proteins are fused to the HB tag, a His tag in tandem with BIO, a signal peptide for in vivo biotinylation [23], [67], since neither of these tags require folded proteins during the purification step. Purifying complexes under denaturing conditions has the additional advantage that changes to post-translational modifications established in vivo are minimized (in particular deubiquitination can occur after lysis during the purification [68]).
Strategies that allow purification of sufficient quantities of protein while keeping expression levels regulated to avoid unnatural interactions between proteins have also been developed. Zeghouf and co-workers used a sequential peptide affinity (SPA) system consisting of calmodulin binding protein fused to a 3x FLAG tag under the control of a number of different promoters to achieve regulated expression of recombinant proteins in both E. coli and HEK293 cells [69], [70]. In addition to the strong CMV promoter, they used the mouse Rpb1 promoter for relatively weak, ubiquitous expression and the ecdysone inducible promoter for inducible expression; all of their systems shielded the transcription units from other genomic regulatory elements with insulator sequences [69]. A later study developed a series of vectors for tagging proteins with either EGFP-TEV-S-peptide (the localization and affinity purification or LAP tag [71]) or FLAG-TEV- S peptide2 under the control of different strength TET inducible promoters for tetracycline regulatable expression (T-Rex) [72]. These vectors have been made available to the research community through Addgene (www.addgene.org).
The goal of MudPIT mass spectrometry is to identify many different proteins in a sample with vastly differing abundances with a series of measurements of molecular mass. Only a limited number of peptide identifications can be performed in a given time as the peptide mixture enters the mass spectrometer, with identification biased towards more abundant species. Long (∼20 h) run times are used to increase the total number of scans per sample analysis and triple phase chromatography is used to resolve peptides and reduce the range of peptide species flying into a detector at any one time [74]. Separating peptides by chromatography over a significant time thus aims to maximize the number unique peptides analyzed by the mass spectrometer and hence maximize the number of unique protein identifications.
In the MudPIT workflow (Fig. 2), proteins are first digested into peptides, which can be separated by multidimensional chromatography and eluted directly into a mass spectrometer [74]. Here peptides of a unique mass can be isolated, fragmented and the mass/charge values of the fragments measured to generate an MS/MS spectrum. This spectrum provides an empirically generated “fingerprint” of the peptide, which can be compared with the predicted pattern of fragment ions for a particular peptide to achieve a protein identification [75], [76]. A 20-h MudPIT run typically generates hundreds of thousands of MS/MS spectra from which hundreds to thousands of unique proteins in a complex sample can be identified. Quantitative information about the relative abundances of proteins present in the sample can be estimated in a number of ways [77]. One of these, spectral counting, uses the total number of MS/MS spectra that match a particular protein to calculate a value, dNSAF, which allows an estimation of the relative abundances of different proteins (described in more detail below) [78].
In order to obtain a complex peptide mixture suitable for analysis, a protein sample (between 10 ng and 10 μm – the amount dependent on its protein complexity) – is typically prepared as follows. Firstly disulfide bonds are reduced with Tris(2-carboxylethyl)-phosphine hydrochloride (TCEP), and the resulting thiol groups are alkylated with chloroacetamide (CAM) to prevent bond reformation [79]. Commonly, interior regions of the proteins are exposed by denaturing them in urea before digestion with the endoproteinase Lys-C, which cleaves at the carboxyl side of lysine residues; after reducing the concentration of urea by dilution, trypsin, which cleaves at the carboxyl side of both arginine and lysine, can be used to achieve a more complete digestion. Although this procedure is suitable for many applications, a proportion of the tryptic products are either too small or too large for analysis (smaller products (<0.8 kDa) often don’t have unique sequences, and larger (>3 kDa) peptides are unsuitable for a number of reasons outlined by Tran and co-workers [80]). If desired, one can generate a greater number of unique peptides for each protein by splitting a sample and digesting each aliquot with a different sequence specific or non-specific proteinase [79]. This increases the percentage of protein sequences covered by identified peptides and can be useful, for example, when trying to identify post-translational modifications for a particular protein [79].
Advantageously, MudPIT analysis allows identification of many low abundance proteins in a complex mixture [81]. To achieve this, components of the peptide mixture from a digested sample are separated chromatographically over a long period of time so that the more abundant peptides do not continually elute with the lower abundance peptides and dominate detector time.
- •
Firstly, the protein fragments are loaded under high pressure onto a fused silica microcapillary column containing three phases in tandem: a reverse phase, a strong cation exchange (SCX) resin, and a second reverse phase (Fig. 3a). The presence of the reverse phase (RP) resin upstream of the ion exchange phase allows the sample to be loaded onto the column directly after digestion in the presence of salt/urea without moving past the ion exchange resin. The sample can thus be desalted during loading/washing.
- •
The triphasic microcapillary column is then mounted between an HPLC and a tandem-mass spectrometer detector so that the peptides can be eluted directly into the mass spectrometer.
- •
The peptides in the digest mixture are resolved for detection using a series of 10 ∼2 h chromatography steps. An initial acetonitrile gradient removes bound peptides from the upstream RP phase onto the SCX resin. In each subsequent ∼2 h step, peptides are transferred from the SCX to the downstream RP with a salt bump and are then eluted into the mass spectrometer with a gradual acetonitrile gradient (Fig. 3b and c). The salt bump concentration is increased each step for a further 8 steps allowing the peptides to be separated and introduced to the detector gradually over a ∼20 h period.
Although this method of presenting sample for analysis has a 20-fold longer run-time than more traditional methods, it also increases sampling time, decreases noise, and increases detection sensitivity, especially for peptides that are in lower abundance.
The several millimeter interface between the tip of the column and the entrance to the mass spectrometer allows peptides to be ionized and isolated from the liquid phase by electrospray ionization (ESI) [82], [83], [84].
A potential difference of about 2.5 kV is applied between the buffer in the column and the entrance to the MS detector (Fig. 4a) [79]. This induces a net positive charge in the buffer at the tip of the column and the resulting electrostatic forces causes the meniscus of the liquid to become destabilized and form a cone (Fig. 4b) [85]. As the force on the liquid at the tip of the cone becomes greater than the liquid’s surface tension, positively charged droplets are emitted from the tip of the cone [85]. The volume of these droplets is reduced by evaporation as they traverse the electric field between the column tip and the entrance to the mass spectrometer (Fig. 4b). The parental droplet becomes destabilized as liquid evaporates and charges in the droplet are brought closer together. The increasing electrostatic repulsive forces are eventually sufficient to overcome the cohesive force provided by the surface tension of the parental droplet; this results in the fission of the parental droplet and the emission of a stream of smaller droplets in a series of “coulombic explosions” [83]. This process continues as buffer evaporates from the smaller droplets until positively charged peptide ions in the gas phase are formed. The buffers used for the chromatography are optimized for this process and need to be relatively volatile and to have a sufficient electrical conductivity to allow movement of charges [86]. Interestingly, this process was initially used by the automobile industry for spray painting cars [83].
A variety of technologies exist for analyzing peptides and generating MS/MS spectra; the following description outlines the basic principles involved in mass analysis using a Thermo Finnigan Linear Trap Quadrupole (LTQ) mass spectrometer as an example [87], [88]. Before mass analysis can take place, the charged peptides need to be isolated from atmospheric molecules and are transported from the entrance of the mass spectrometer (a pressure of 760 Torr) through a series of chambers held at progressively lower pressures to a mass analyzer (at a pressure of ∼20 μTorr in the case of the Thermo Finnigan LTQ) (Fig. 5A) [87], [88]. Ions can be stored for a brief period in the mass analyzer and are then selectively ejected from the mass analyzer according to their mass/charge values and detected by a dynode/electron multiplier detection system.
The mass analyzer in the LTQ consists of an array of four ∼6 cm long hyperbolic rods, with each rod cut into three sections that are electrically insulated from each other (Fig. 5b); this creates two smaller quadrupoles, the end sections, and a central quadrupole where ions can be trapped [87], [88]. A DC voltage is applied to each section with the central section at −14 V and the two end sections at + 20 V, creating a potential well where ions can be trapped [87], [88]. The positively charged ions are permitted to enter the trap as DC voltages applied to the end sections are adjusted to −12 V, reducing the depth of the potential well; once ions are present in the center section of the mass analyzer, the DC voltages across the end sections can be increased, trapping a selection of positively charged peptide ions in the potential well [87], [88]. Additional AC voltages are applied across the rods in the center section to hold appropriately sized/charged ions in stable trajectories [87], [88]. By adjusting these voltages, ions with a given mass/charge (m/z) ratio become destabilized and are ejected through a pair of slits in two of the center section rods [87], [88]. The selective ejection and detection of ions can be used to generate a “full-scan MS” spectrum which provides information about the relative abundances of different peptides with various values of mass/charge. Higher mass accuracy can be achieved with spectrometers with more sophisticated detection systems such as the orbitrap [89], [90], [91]; this is useful when a more complete determination of post-translational modifications to proteins is desired.
The full-scan MS profiles the m/z ratio composition of peptides entering the mass analyzer; a number of tandem MS (MS/MS or MS2) analyses follow to give more detailed information about the more abundant moieties identified in the initial full-scan MS spectrum. During a typical MudPIT analysis a number of MS/MS spectra, each characteristic of a single type of peptide, are generated over a short time interval (tens of milliseconds) as follows (Fig. 6).
- •
Initial full-scan MS. A selection of the ions being eluted from the column are trapped and then gradually ejected for detection, starting with ions with the smallest values of m/z. This generates a full-scan MS spectrum, which will be used to select the types of ions that should be analyzed further by MS/MS. Usually, the most abundant species that have not been analyzed in recent scans are chosen (see Data dependent acquisition of MS/MS spectra) [88].
- •
MS/MS step 1: ion isolation. Another sample of charged peptide ions is trapped in the central section of the mass analyzer. This time, instead of ejecting ions for detection, peptides of a single entity and unique mass/charge are isolated by applying AC voltages that destabilize and eject all peptides of other mass/charge values [88].
- •
MS/MS step 2: collision induced dissociation. Once a homogeneous population of ions has been isolated, the kinetic energy of the ions is increased through the application of an appropriate radio frequency (RF) voltage. This causes collisions between peptide ions and helium gas in the mass analyzer cavity and results in absorption of energy and dissociation of ions into smaller product ions [88]. Dissociation occurs most frequently by bond cleavage between amino acids [92].
- •
MS/MS step 3: scan. The fragment ions are then sequentially ejected for detection in the same way that ions were analyzed during the full-scan MS. This generates an MS/MS spectrum characteristic of a particular parental peptide ion. For each full-scan MS, a number of subsequent MS/MS scans can be performed (typically 5). After the successive MS/MS scans, all three steps are reiterated to analyze another group of peptides [88].
Data-dependent acquisition of spectra further increases the detection of lower abundance proteins [93]. Typically an initial full-scan MS identifies the m/z values of many ions being eluted and detected at any one time; the series of five subsequent MS/MS scans systematically isolates, fragments and characterizes the five most abundant peaks identified by the full-scan MS (Fig. 7A) [73], [93]. To avoid repeated characterization of the same abundant peaks, an ion can be automatically added to an “exclusion list” once it has been selected for characterization by an MS/MS scan a set number of times (for example more than twice in 30 s). The ion is then barred from repeated selection for analysis in subsequent scans for a chosen time (typically about 2 min) [93]. This dynamic exclusion (DE) allows the identification of lower abundance peptides that would not have otherwise been selected for an MS/MS scan (Fig. 7B) [93]. The choice of dynamic exclusion time has been shown to affect the number of unique peptide spectral counts and the optimal DE time has been shown to be a function of chromatographic peak width [93].
Data from MudPIT runs consists of a series of hundreds of thousands of MS/MS spectra that identify mass/charge ratios of parental peptide ions and some of their component fragment ions. Each of these potentially describes a fragmented peptide resulting from the digestion of a particular protein. Once each experimental MS/MS spectrum has been matched to the predicted pattern of fragment ions for a particular peptide, a list of the proteins present in the original sample can be assembled [94].
Each experimentally derived MS/MS spectrum potentially provides a “fingerprint” which is characteristic of a particular peptide. Using a computational algorithm, these “fingerprints” are correlated with peptides derived from an in silico digest of the relevant protein database by specified endoproteinases [75], [95], [96], [97]. So how can the expected fragmentation pattern of an MS/MS spectrum be predicted from a given peptide sequence (Fig. 8)? The major fragments resulting from collision induced dissociation of a particular peptide during MS/MS can be deduced from its sequence as bonds are preferentially broken along the backbone of the amino acid chain. The positive charge can be retained either on the N-terminal fragment (a “b” ion) or on the C-terminal fragment (a “y” ion) (Fig. 8) [98], [99]. Values of mass/charge for these more abundant b and y fragments can be calculated and an MS/MS spectrum predicted for every possible peptide. As matches between experimental MS/MS spectra and the predicted fragmentation patterns are made, cross correlation scores for each match give an indication of how good any match is; a match that is not sufficiently accurate or is ambiguous (a spectrum may be a close match to more than one peptide) can be filtered out before final assembly of a protein list [100].
Various label free methods exist to estimate relative amounts of the proteins identified in a MudPIT analysis [6], [101], [102]. One approach uses the normalized spectral abundance factor, NSAF, for quantitative proteomic analysis [78]. A recent comparison of methods suggested that the NSAF approach outperforms other methods [77], and we shall now focus on the use of spectral counting and the calculation of NSAF (or more recently dNSAF) values as a way to gauge relative protein abundances.
Processed data yield the values of several variables that are used for quantitation; these include the total number of proteins identified (N) and the total number of MS/MS spectra matched to a particular protein, also called the spectral count, (SpC). The spectral abundance factor (SAF) allows ranking of proteins in a given run by approximate abundance and is calculated for a particular protein k as SAFk = (SpC/L)k, where L is the length of the protein (longer proteins have more associated peptides that might be observed) [78], [103]. To compare relative abundances of a protein in different mass spectrometry runs, the normalized spectral abundance factor (NSAF) can be calculated:
The denominator in this equation reconciles run-to-run variances in total spectral count values. Hence, the relative abundance of a set of proteins under different conditions may be evaluated by comparing their respective NSAF values [78].
The application of the NSAF was first illustrated in an analysis of the yeast membrane proteome under varying growth conditions [78]. They evaluated the abundance of proteome components during culture in rich and minimal medium by NSAF values. They then measured relative protein levels under the two growth conditions using a second method that consisted of metabolically 15N labeling proteins under one growth condition; mixing the subsequent protein sample with another derived from cells under the opposing growth condition containing only a 14N source; and comparing the number of light and heavy spectra of specific peptides [78]. These studies confirmed the NSAF truly correlated with relative protein abundance and validated the use of the NSAF as a means by which data sets can be compared.
Other work demonstrates the versatility of the NSAF approach. Analysis using the NSAF may be applied to the comparison of identified peptides irrespective of whether digestion of proteins is performed with a sequence-specific or a non-sequence-specific protease such as proteinase K [104]. In addition, a modification to the NSAF algorithm, dNSAF (distributed NSAF), takes into consideration peptides that are common to multiple proteins (either different isoforms or closely related proteins) [105]. This is of particular significance in the analysis of the proteomes of higher eukaryotes. In brief, spectral counts for such peptides are distributed between different possible proteins in proportion to the numbers of unique spectra identified for each protein.
Different strategies can be used to assess the significance of two proteins copurifying. Bait associated proteins identified through MudPIT analysis might copurify via direct interaction with the bait or indirectly via other molecules. Putative physical interactions between individual proteins suggested by MudPIT data can be tested using standard biochemical assays [7], [31]. Another approach would be to ask whether perturbation of the bait protein might lead to a change in the function or localization of bait associated proteins.
To test for a direct physical interactions between bait and prey, the two proteins that might interact are typically overexpressed and the interaction tested by the propensity of the two proteins to copurify chromatographically in the absence of other proteins. For example, in one study, an AP-MudPIT analysis using FLAG-tagged S. pombe strains demonstrated that two transcription elongation factors, SpELL and SpEAF copurified using either protein as the bait [31]. A direct interaction between the two factors was confirmed with coimmunoprecipitations using SpELL and SpEAF protein overexpressed in Sf21 insect cells and analyzed by staining SDS–PAGE gels for total protein [31]. Thus the two factors were shown to copurify in the absence of other S. pombe factors and in the absence of significant amounts of other insect cell proteins [31]. Copurification of the two proteins in ion exchange chromatography fractions provided further evidence for a direct interaction between SpELL and SpEAF [31]. Such biochemical validations of putative interactions can generate complementary evidence to the clustering approaches discussed below in support of models of protein complex structure and organization.
An analysis of the large protein complex Mediator shows how the NSAF approach may be used to elucidate biological mechanisms [1], [6]. Mediator consists of over 26 subunits and acts as an essential bridge between RNA polymerase II and its regulators (activators and repressors) [106]. MudPIT analyses of Mediator purified from nuclear extracts from four different stable HeLa cell lines carrying different epitope-tagged subunits (n = 4 for each subunit bait) demonstrate the existence of distinct complex subspecies [6]. Most strikingly, Mediator purified via Med10 and Med26 subunits have different kinase module content yet roughly equivalent RNA Polymerase II content [6]. As Mediator affinity-purified through its Med26 subunit contained very little kinase module, which represses transcriptional activity, but significant levels of RNA Polymerase II, it was expected that these fractions would be more able to synthesize RNA [6]. Indeed, functional assays validated the hypothesis [6].
NSAF values have also been used in computational modeling of a protein complex’s structural organization. In this novel application of NSAFs, probabilistic local protein interaction networks are assembled using vector algebra and statistical methods [7]. This approach was used to construct the human Tip49a/Tip49b protein interaction network, demonstrating that Tip49a/Tip49b are components of four different protein complexes: URI/Prefoldin, hINO80, SRCAP, and TRRAP/TIP60 [7]. In addition to identifying new components of all these complexes, quantitative proteomics data can also be used to calculate probabilities of interactions within and between complexes. Using NSAFs derived from arrays of MudPIT analyses, Sardiu and coworkers used Bayes’ approach to assemble a probabilistic protein interaction network [7]. A number of the predicted interactions were then tested using co-immunoprecipitation assays; interactions with high calculated probabilities corresponded to positive co-immunoprecipitations while those with low calculated probabilities correspond to negative co-immunoprecipitations [7]. Later work used the Tip49a/Tip49b protein interaction network as a foundation for evaluating different clustering algorithms [107].
The end goal of using affinity-tag purification in conjunction with MudPIT mass spectrometry is to identify components that associate with a target protein in a non-biased fashion. Unlike probing a gel-resolved sample via Western blot using specific antibodies, MudPIT analysis doesn’t rely on a priori knowledge about a complex; may detect proteins which do not resolve well on a gel like those too large to enter into the resolving phase or too small to remain on the gel; and provides an indication of stoichiometry. In order to take full advantage of what MudPIT may offer, the biochemist must optimize samples to avoid artifacts, technical problems, and misleading data.
Although it may seem blatantly obvious, one must choose lysis and elution buffers that aid in the purification of the correct cellular component, is compatible with the affinity elution method, and does not interfere with downstream MS processes and analysis. For example, analysis of mitochondrial proteins may require lysis of the organelles by detergent such as NP-40 or Triton X-100. If purification of FLAG-tagged mitochondrial proteins follows, one may opt for a lysis buffer containing Triton X-100, as FLAG-antibody beads can tolerate higher concentrations of the detergent than NP-40 (amounts of NP-40 above 1% is detrimental to FLAG-antibody). Care needs to be taken to make sure detergent is removed before the proteolytic steps of MudPIT sample preparation; this is usually accomplished during trichloroacetic acid (TCA) precipitation (see below).
Sometimes it may be useful to include the modified sequence of an affinity-tagged protein in the protein database used for identifying spectra. In this case care needs to be taken if competitive elution with a peptide is used so that, for example, FLAG peptide used for elution is not misidentified as FLAG-tagged bait protein. For strategies where elution is achieved with TEV protease digestion, it may be prudent to take steps to minimize the presence of the protease in the sample so that minimal detector time is used analyzing the “contaminating” protease. This can be achieved either by digesting with smaller quantities of TEV protease and extending digestion times, or by using affinity-tagged TEV protease, which can be removed after digestion.
Other than keratin, which come from dust and handling, other proteins and peptides introduced during purification prevent the procurement of useful data. Bovine serum albumin (BSA) is commonly added to stabilize protein complexes in dilute solutions. BSA, however, is processed in the same fashion as the other sample proteins, and concentrations typically used during purifications, even if it is included only during early affinity steps, can carry over into elutions. Peptides of BSA origin will dominate protein spectra, and little is gleaned from the experiment. Although the gentle elution by competitive soluble peptides also introduces peptides that can be detected by mass spectrometry, this has not been problematic in the authors’ experience. If desired, these small peptides can be removed before protease digestion by size exclusion chromatography (e.g., via G-25 Sephadex).
Strategies aimed at subcellular fractionation can help to identify distinct regions of a cell involved in protein interactions [108]. Membrane proteins present an interesting challenge for affinity-tag purification and identification by MudPIT. Due to their nature, they normally rely on the presence of lipids for solubility, contain many hydrophobic groups within transmembrane domains, and have various post-translational modifications. To overcome some of these difficulties, Wu and co-workers developed a method for analyzing membrane proteins using MudPIT [109].
The small diameter (100 μm) chromatography column coupled with mass spectrometer in MudPIT needs consideration during sample preparation. Residual resin and/or magnetic beads block the fused silica and prevent proper loading. Their removal may be easily accomplished by passing affinity purification eluates through an empty Micro Bio-Spin® column (Bio-Rad). Insoluble or high concentrations of protein may also hinder loading; hence, 200–300 μg is recommended for 100 μm diameter, three-phase chromatography columns [73]. Isolation of proteins that bind to nucleic acids will probably purify associated DNA and/or RNA, which will cause samples to be viscous enough that they can clog the MS sample column. Protein eluates can therefore be digested with DNAse, RNAse or benzonase, which themselves may require specific salt and cofactors [110]. With these conditions in mind, the final buffer used for affinity purification should accommodate a subsequent nuclease digestion, and the amount of enzyme added for that latter step should not cause the nuclease to be an overly abundant protein.
The presence of non-volatile substances in the elution buffer that cannot be removed during buffer exchange during the loading and washing of the MudPIT sample column will cause problems during the mass spectrometry run. These can suppress electrospray ionization and prevent peptides from entering into the ion chamber [86], [111]. Of these, sulfates and phosphates may be found in sample buffers and should be avoided. Alternatively, sample proteins may be precipitated in TCA and washed with acetone prior to resuspension in 8 M urea, TCEP and CAM treatment, and protease digestion [79]. Lastly, our laboratory has experienced problems with samples containing large amounts of glycerol and recommend those be kept below 10% v/v in samples prior to TCA precipitation.
Typical compounds that causes artifacts during mass spectrometry are polyethylene glycol (PEG) and detergents, which some use as stabilizing or crowding agents during purification and which could also be introduced through glass washing [86]. Careless handling of glassware and samples with ungloved hands may also cause transfer of not only protein contaminants, but also PEG and detergents from hand cream and soap. To reduce these problems, avoid the inclusion of PEG and detergents in final elution buffers and use careful laboratory handling technique. Glassware should be cleaned under soap-free conditions, and use of reagents and their containers should be done with care to avoid introduction of contaminants, which would otherwise appear as spectra detected by mass spectroscopy.
Through understanding processes downstream from protein affinity-purification, the biochemist may adjust methods to optimize sample quality. Clean, compatible samples that contain the target proteins as the predominant species will result in the most informative MudPIT data.
Section snippets
Concluding remarks
A variety of affinity-based approaches to protein complex purification have been coupled to mass spectrometry based protein identification; the suitability of particular strategies depends on the demands of the biological question being addressed. Additional affinity based strategies exist for investigating modified proteins – in particular, methods for enriching for phosphorylated, glycosylated or ubiquitinated proteins have been reviewed by Azarkan and coworkers [112]. MudPIT-based analysis
Acknowledgments
This work was supported by the Stowers Institute for Medical Research.
References (115)
- et al.
A set of consensus mammalian mediator subunits identified by multidimensional protein identification technology
Mol. Cell
(2004) - et al.
Multidimensional protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue
J. Am. Soc. Mass Spectrom.
(2005) - et al.
Building protein–protein interaction networks with proteomic and informatics tools
J. Biol. Chem.
(2011) - et al.
The purification of coenzyme A by ion exchange chromatography
J. Biol. Chem.
(1953) - et al.
A critical review of the methods for cleavage of fusion proteins with thrombin and factor Xa
Protein Expr. Purif.
(2003) - et al.
Affinity chromatographic isolation of the periplasmic maltose binding protein of Escherichia coli
FEBS Lett.
(1978) - et al.
Comparison of affinity tags for protein purification
Protein Expr. Purif.
(2005) - et al.
A tandem affinity tag for two-step purification under fully denaturing conditions: application in ubiquitin profiling and protein complex identification combined with in vivo cross-linking
Mol. Cell Proteomics
(2006) - et al.
The FLAG peptide, a versatile fusion tag for the purification of recombinant proteins
J. Biochem. Biophys. Methods
(2001) - et al.
Identification and characterization of a Schizosaccharomyces pombe RNA polymerase II elongation factor with similarity to the metazoan transcription factor ELL
J. Biol. Chem.
(2007)
The mammalian YL1 protein is a shared subunit of the TRRAP/TIP60 histone acetyltransferase and SRCAP complexes
J. Biol. Chem.
Use of the Strep-tag and streptavidin for detection and purification of recombinant proteins
Methods Enzymol.
Molecular interaction between the Strep-tag affinity peptide and its cognate target, streptavidin
J. Mol. Biol.
One-step purification of recombinant proteins using a nanomolar-affinity streptavidin-binding peptide, the SBP-Tag
Protein Expr. Purif.
The Nano-tag, a streptavidin-binding peptide for the purification and detection of recombinant proteins
Protein Expr. Purif.
Solubility of proteins isolated from inclusion bodies is enhanced by fusion to maltose-binding protein or thioredoxin
Protein Expr. Purif.
Functional association of U2 snRNP with the ATP-independent spliceosomal complex E
Mol. Cell
Secretion in yeast. Purification and in vitro translocation of chemical amounts of prepro-alpha-factor
J. Biol. Chem.
Immobilized-metal affinity chromatography (IMAC): a review
Methods Enzymol.
Method for suppressing non-specific protein interactions observed with affinity resins
Methods
The tandem affinity purification (TAP) method: a general procedure of protein complex purification
Methods
Applicability of tandem affinity purification MudPIT to pathway proteomics in yeast
Mol. Cell Proteomics
An integrated mass spectrometry-based proteomic approach: quantitative analysis of tandem affinity-purified in vivo cross-linked protein complexes (QTAX) to decipher the 26 S proteasome-interacting network
Mol. Cell. Proteomics
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
J. Am. Soc. Mass Spectrom.
Mechanistic investigation of ionization suppression in electrospray ionization
J. Am. Soc. Mass Spectrom.
A two-dimensional quadrupole ion trap mass spectrometer
J. Am. Soc. Mass Spectrom.
A dual pressure linear ion trap Orbitrap instrument with very high sequencing speed
Mol. Cell. Proteomics
Mitotic spindle proteomics in Chinese hamster ovary cells
PloS One
Identification of proteins released by pancreatic cancer cells by multidimensional protein identification technology: a strategy for identification of novel cancer markers
FASEB J.
Mass spectrometry-based proteomic analysis of the epitope-tag affinity purified protein complexes in eukaryotes
Proteomics
Quantitative proteomic analysis of distinct mammalian mediator complexes using normalized spectral abundance factors
Proc. Natl. Acad. Sci. USA
Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics
Proc. Natl. Acad. Sci. USA
Proteome survey reveals modularity of the yeast cell machinery
Nature
Hydrophobic chromatography: use for purification of glycogen synthetase
Proc. Natl. Acad. Sci. USA
Gel filtration: a method for desalting and group separation
Nature
Streamlined analysis schema for high-throughput identification of endogenous protein complexes
Proc. Natl. Acad. Sci. USA
Isolation of proteins and protein complexes by immunoprecipitation
Methods Mol. Biol.
The dark side of EGFP: defective polyubiquitination
PloS One
Unexpected effects of epitope and chimeric tags on gonadotropin-releasing hormone receptors: implications for understanding the molecular etiology of hypogonadotropic hypogonadism
J. Clin. Endocrinol. Metab.
Isolation, renaturation, and formation of disulfide bonds of eukaryotic proteins expressed in Escherichia coli as inclusion bodies
Biotechnol. Bioeng.
IKK phosphorylates Huntingtin and targets it for degradation by the proteasome and lysosome
J. Cell Biol.
Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused
Protein Sci.
A short polypeptide marker sequence useful for recombinant protein identification and purification
Nat. Biotechnol.
Vectors for expression and secretion of FLAG epitope-tagged proteins in mammalian cells
BioTechniques
A cost-benefit analysis of multidimensional fractionation of affinity purification-mass spectrometry samples
Proteomics
Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry
Nature
Cited by (21)
Poly(styrene-co-maleic acid)-mediated isolation of supramolecular membrane protein complexes from plant thylakoids
2021, Biochimica et Biophysica Acta - BioenergeticsCitation Excerpt :Although SDGC is one widely used method of separation, additional fractionation methods may be employed to improve separation and subsequent SMALP characterization. It remains to be determined how broadly compatible SMA extraction is with other methods of separation such as tandem affinity purification [73], BN-PAGE [74,75], mass spectrometry [76], phase partitioning [77], magnetic immunoselection [78,79], and 2D electrophoresis [80,81]. Moreover, the larger, more native complexes generated by SMA solubilization may be preferred for further structural and functional analysis offered by cryo-EM [82–84], time-resolved FRET [85], fluorescence correlation spectroscopy [86,87], lipidomics [88], spin labeling EPR [89], and solid-state NMR [90].
Differential Complex Formation via Paralogs in the Human Sin3 Protein Interaction Network
2020, Molecular and Cellular ProteomicsCitation Excerpt :Peptides were loaded onto triphasic MudPIT microcapillary columns as previously described (21). Columns were placed in-line with an 1100 Series HPLC system (Agilent Technologies, Inc., Santa Clara, CA) coupled to a linear ion trap mass spectrometer (Thermo Fisher Scientific) and peptides were resolved using 10-step MudPIT chromatography as previously described (22). For each replicate, 3 confluent 850 cm2 culture vessels of Flp-In™-293 cells stably expressing SIN3A-HaloTag or SIN3B_2-HaloTag were harvested.
Integrative Modeling of a Sin3/HDAC Complex Sub-structure
2020, Cell ReportsA structured workflow for mapping human sin3 histone deacetylase complex interactions using halo-MudPIT affinity-purification mass spectrometry
2018, Molecular and Cellular ProteomicsObservations on different resin strategies for affinity purification mass spectrometry of a tagged protein
2016, Analytical BiochemistryCitation Excerpt :Co-affinity purification mass spectrometry (CoAP-MS) is a highly effective method for isolating and identifying protein interactions from a complex biological sample [1–7].
Bacterial Electron Transfer Chains Primed by Proteomics
2016, Advances in Microbial Physiology
- 1
Abbreviations used: MudPIT, multidimensional protein identification technology; MBP, maltose binding protein; GST, glutathione S-transferase; SBP, streptavidin binding peptide; TAP, tandem affinity purification; CBP, calmodulin-binding peptide; CBD, chitin-binding domain; SPA, sequential peptide affinity; CAM, chloroacetamide; SCX, strong cation exchange; RP, reverse phase; ESI, electrospray ionization; LTQ, Linear Trap Quadrupole.