Abstract
The membrane compartments of eukaryotic cells organize the proteome into dynamic reaction spaces that control protein activity. This ‘spatial proteome’ and its changes can be captured systematically by our previously established Dynamic Organellar Maps (DOMs) approach, which combines cell fractionation and shotgun-proteomics into a profiling analysis of subcellular localization. Our original method relied on data dependent acquisition (DDA), which is inherently stochastic, and thus offers limited depth of analysis across replicates. Here we adapt DOMs to data independent acquisition (DIA), in a label-free format, and establish an automated data quality control tool to benchmark performance. Matched for mass spectrometry (MS) runtime, DIA-DOMs provide double the depth relative to DDA-DOMs, with substantially improved precision and localization prediction performance. Matched for depth, DIA-DOMs provide organellar maps in a third of the runtime. To test the DIA-DOMs performance for comparative applications, we mapped subcellular localization changes in response to starvation/disruption of lysosomal pH in HeLa cells, revealing a subset of Golgi proteins that cycle through endosomes. DIA-DOMs offer a superior workflow for label-free spatial proteomics, with a broad application spectrum in cell and biomedical research.
Introduction
The subcellular organization of the proteome is essential to the function of all living cells. In eukaryotic cells, membrane-bound organelles provide unique physicochemical reaction environments and allow for the selective concentration of protein interaction partners, including other proteins and small molecule substrates. Therefore, protein localization must be tightly regulated to ensure correct protein function and, conversely, a large variety of human diseases have been linked to disrupted protein transport (reviewed in [1,2]). Our understanding of protein function is thus incomplete without a precise view of the dynamics of protein movement within the cell, fueling interest in the study of the spatial proteome [3–9].
Our lab previously developed the dynamic organellar maps (DOMs) method for systems-level organellar mapping of the proteome [10]. To achieve resolution of organelles, cells are mechanically lysed and the released organelles are separated by differential centrifugation [10,11]. The pelleted proteins are then identified and quantified by bottom-up mass-spectrometry (MS). The resulting protein profiles are characteristic for the harboring organelles and can be used to predict protein localization by a machine-learning algorithm. Furthermore, our robust experimental setup is not only suitable for static mapping of protein localization, but also for dynamic comparative experiments [10,12]. Comparative DOMs have been successfully applied to solve diverse biological problems, including to investigate protein missorting in childhood neurological disorders [12,13], the mechanism of drug action of cancer therapies [14] and the composition of extracellular vesicles during HIV infection [15].
The original DOM method utilized SILAC (stable isotope labelling by amino acids in cell culture) [16] for accurate protein quantification across subcellular fractions [10]. To expand the use of the method beyond cell lines that can be metabolically labelled, we adapted it to different quantification strategies, namely label-free quantification (LFQ) [17] and the peptide-level labelling methods TMT [18] and EASI-tag [4,19]. SILAC-based maps yield the most precise protein profiles but are of limited depth due to increased MS1 spectral complexity. On the other hand, LFQ maps achieve greater depth but suffer from noisier profiles due to quantification across different samples, while TMT and EASI-tag maps have intermediate quality [4].
All aforementioned DOM methods utilized a data-dependent acquisition (DDA) approach, which is currently the most widely used data-acquisition strategy for peptide identification by MS [20,21]. In each DDA measuring cycle, a defined number of the most abundant peptides (precursors) from the MS1 scan are individually fragmented for identification, thereby prioritizing the precursors that are most likely to generate high quality MS2 spectra. The disadvantage of this approach is that it introduces a stochastic element to precursor selection, resulting in the so-called ‘missing values problem’ - the occurrence of missing MS2 identifications for proteins across different samples run as part of a larger dataset. In the context of DOMs, the missing values problem imposes major limitations on the downstream analysis, since our profiling approach relies on quantification of the same protein across the majority of measured subcellular fractions and replicates. To counteract this problem we previously employed off-line fractionation of peptide samples [10,22]. This resulted in greater map depth with fewer missing values, but came at the cost of increased MS time per map.
Data-independent acquisition (DIA) is an increasingly popular alternative for MS data acquisition, which has been enabled by recent advances in MS instrumentation and data analysis software (summarized in [23]). Unlike in DDA where individual precursors are selected for fragmentation, in DIA the peptide mass range of the MS1 scan is partitioned into windows for fragmentation, each of which contains multiple peptides. Co-eluting peptides within each window are fragmented and measured simultaneously, resulting in convoluted MS2 spectra that collectively cover all injected peptides. In theory, all peptides present in a sample can thus be identified regardless of their abundance, leading to maximal data coverage. Moreover, the MS2 fragment ions can be used for quantification in addition to the MS1 peptide intensities, increasing the precision of quantification [24]. For these reasons, DIA is becoming the strategy of choice for extensive profiling-based approaches such as SEC-MS [25] and has recently been applied to high-throughput subcellular phosphoproteomics [8].
In this study, we harness the power of DIA in a new workflow for improved label-free DOMs. We show that the proteomic depth, precision and reproducibility of DIA-LFQ organellar maps improve significantly compared to DDA-LFQ maps. To demonstrate this, we developed a quality control tool that enables fast and standardized assessment of organellar maps, published alongside this work. In addition, we describe optimized formats for short gradients suitable for high throughput experiments and long gradients providing maximum sensitivity. Finally, we demonstrate that the DIA-LFQ workflow is also suitable for comparative DOMs by assessing subcellular rearrangements upon inhibition of lysosomal acidification in HeLa cells.
Results
Establishing and optimizing a new DIA workflow for DOMs requires a fast and objective assessment of map quality and depth. We hence implemented an easy-to-use python-based quality control tool that provides multiple metric calculations and info plots in a browser-based graphical user interface (available at https://domqc.bornerlab.org, and source code at https://github.com/valbrecht/SpatialProteomicsQC). As input data, the QC tool handles raw output files from MaxQuant and Spectronaut, or any pre-processed profiling data. Multiple maps can be uploaded together, and directly compared within and across experiments, ensuring consistent quality and aiding method optimization. Proteomic depth is calculated before and after filtering for usable profiles and quantification across replicates. Principal component analysis (PCA) plots provide a visual overview of map topology. To assess the quality of profile quantifications objectively, we established and/or automated three metrics: 1. Organellar prediction performance (supervised machine learning by support vector machines (SVMs)), quantified by the F1 score of the individual organelles. 2. Scatter of protein profiles that are part of the same stable protein complex, quantified by the Manhattan distance to the median complex profile in each map. This metric is based on the assumption that tightly bound proteins fractionate as a complex and should thus have identical abundance profiles; profile differences should largely reflect measurement noise. 3. Reproducibility of individual protein profiles across map replicates, quantified by the Manhattan distance to the average profile of the protein.
To compare DIA and DDA based DOMs, we prepared three independent subcellular fractionations from HeLa cells, each with six fractions, as described [17]. These were then analyzed by mass spectrometry, using different LC-MS setups and data acquisition strategies. For DIA acquisition, each gradient length, cycle time and window shape were optimized (see Methods for details). All protein identifications and quantifications shown here were generated with MaxQuant 2.0 [26], which contains the new MaxDIA algorithm and is compatible with fractionated DIA samples.
DIA maps outperform DDA maps across all metrics
To establish how DIA-LFQ maps compare to previous acquisition strategies, we measured our benchmark samples in single shot on 100 min gradients in DDA and DIA. DIA data were processed either with a DDA measurement-based spectral library (ca. 159,000 peptides), or in the in-silico library-based ‘Discovery Mode’. PCA plots of the DDA-, library DIA- and discovery DIA-based maps looked topologically similar (Fig 1A), and average organelle profile shapes were nearly identical (data not shown). While the unfiltered proteome depth was only slightly increased by using DIA (7K DDA, 7.8K DIA), the number of proteins profiled across replicates increased from ∼2800 protein groups (PGs) in DDA by 89% with discovery DIA (5241 PGs) and by 142% with library DIA (6791 PGs; Fig 1B). This was largely due to quantifying proteins more consistently across samples, as demonstrated by the increase of the data completeness from 69% to 96% (data not shown). The overlap between these reproducibly profiled proteins (Fig 1B) shows that the DIA datasets overlap greatly and contain almost all protein groups quantified by DDA. Using a set of 844 established organellar marker proteins common to all three maps, we performed SVM-based organellar classification (Fig 1C). Prediction performance was substantially improved by using DIA. Overall recall was increased from 93% to 95% (data not shown), and the unweighted average F1 scores increased from 0.87 to 0.90. Remarkably, all organelle classes were better separated by at least one of the DIA analysis modes than by DDA maps. In particular, the gains in the highly dynamic membrane compartments (Endoplasmic reticulum (ER), Plasma membrane, Endosome) are important for biological applications. In combination with the increased depth, the number of high confidence organelle predictions for non-marker proteins greatly increased from 1099 with DDA to 1850 and 2308 with discovery and library DIA, respectively, with many more medium and low confidence predictions on top (Fig 1D). For all three datasets the concordance with our previously published predictions made with SILAC DOMs [10] was 98-100% for the top prediction category (data not shown). Finally, we evaluated quantification precision, which is key for DOMs [4]. Inter-protein profile scatter within complexes was markedly reduced with DIA (Fig 1E). Global profile scatter between replicates was dramatically better with DIA than with DDA (25% reduction in median scatter with direct DIA and 33% reduction with library DIA, Fig 1F). Taken together, these data show very clearly that highly reproducible organellar maps can now be acquired at significantly increased depth and sensitivity with DIA compared to DDA on single shot 100 min gradients. While acquisition of a deeply measured DDA library can further increase depth and reproducibility of DIA maps, the performance of DIA maps based on an in-silico library is already a drastic improvement and on par with measured library DIA in terms of biological accuracy.
Faster and deeper formats for DIA maps using a high throughput LC system
As DIA offers superior quality and depth relative to DDA with identical MS time investment, we evaluated different LC formats that either reduced the MS time or redistributed it over fractionated samples. For this we used the Evosep One LC system [27], which works with higher flow, premixed gradients (standard lengths are 21 and 44 minutes) and reduces time between overhead time between runs to a few minutes. Comparing single injections on the 100 min nanoLC and 44/21 minutes Evosep shows that there is almost a linear relationship between depth and time investment (Fig 2A) and the 44 min DIA gradient has a similar depth to the 100 min DDA format (Fig 1B). The SVM performance dropped from 0.88 for the 100 min nanoLC to 0.72 for the Evosep One (F1 scores, Fig 2B), regardless whether 44 or 21 minutes were used, mostly due to drops in the classification of Golgi, ERGIC/cisGolgi and peroxisomal proteins. Additionally, the shorter gradients displayed increasingly higher inter-protein scatter (Fig 2C), and lower reproducibility (Fig 2D). Therefore, the shortest gradients are best suited for fast screens at lower resolution and the optimization of other experimental parameters, such as biochemical fractionation. Furthermore, we optimized the machine-time to gradient-time ratio by peptide fractionation, as 1×100 min+35 min overhead on the nanoLC roughly equals 3x(44 min+5) min overhead on the Evosep One. We also tested 3x(21 min+5). For both short gradient lengths triple fractionation more than doubled the number of consistently profiled protein groups (Fig 2A) and particularly improved inter-protein scatter within stable complexes (Fig 2C). For similar machine time, 3×44 min Evosep compared to 100 min nanoLC yields around 1000 more profiled protein groups (Fig 2A) and improves both the SVM performance (Fig 2B) and the reproducibility of profiles (Fig 2D). The 3×21 minute format provides a good option for cutting the measurement time roughly in half at a rather small cost in data quality. Overall this means that, regardless of the LC system employed, in-depth measurements are possible either using a single long gradient or multiple shorter gradients, and the Evosep One allows for easy measurement time reduction.
Dynamic application of DIA maps reveals the effect of bafilomycin A1 treatment
To test whether DIA-based maps are suited for comparative experiments, we created organellar maps of untreated HeLa cells and cells that were treated with bafilomycin A1 to block lysosomal acidification [28]. This is known, among other effects, to lead to accumulation of Golgi proteins in the endolysosomal compartment [29,30]. To enhance this effect, the cells were additionally nutrient starved during the bafilomycin A1 treatment. Applying the 100 min single shot format with library DIA, we profiled 6389 proteins across all six maps, of which 118 underwent a significant alteration in subcellular localization upon treatment (Fig 3A). These 118 hits can be separated into three groups: 46 structural constituents of the ribosome or ribosome-associated proteins, 15 Golgi proteins, and 57 other proteins (mostly enzymes and structural proteins) (Fig 3B). We then used PCA to visualize the shifts in the context of subcellular localizations (Fig 3C). This revealed that the ribosome shifts as a whole complex, pelleting later in the differential gradient (data not shown), indicating lower molecular weight assemblies. This is consistent with reduced translation due to starvation. Importantly, the moving Golgi proteins have very similar movement trajectories (GLG1, TM9SF2, TM9SF4 and TGN46 are highlighted) and move from the Golgi to the endosome/lysosome, as expected. To validate this result orthogonally, we employed fluorescence imaging of TGN46 and could see that the crisp staining of the Golgi stacks disperses into many small punctae throughout the cytosol in the bafilomycin A1 treated cells (Fig 3D), matching the results from the spatial proteomics experiment. These findings demonstrate that DIA-LFQ based DOMs can identify proteins that cycle between the Golgi apparatus and endosomes.
Discussion
The dynamic organellar maps method provides a systems-level view of protein localization and has been successfully applied to diverse biological problems. We have now developed a DIA-based label-free workflow for DOMS that significantly enhances the performance of the method, opening up new possibilities for the exploration of protein subcellular localization. Applying our newly developed automated workflow for objectively evaluating the performance of organellar maps, we identified optimized measurement formats for DIA maps suitable for different applications. Our DIA method provides up to 2.5x greater depth than DDA on standard long gradients, while at the same time improving SVM performance, reproducibility and the spread of stable protein complexes. In addition, DIA in combination with high-throughput liquid chromatography allows organellar maps to be generated with 21 min gradients, although, as expected, quality measures and depth correlate with the amount of measurement time invested. Based on our extensive optimization of LC-MS formats, we now recommend these two formats for organellar maps: 1) Deep measurement for biological comparisons with 12 hours MS time per map, either by single shot injections on a 100 min nanoLC gradient or triple shot injections on a 44 min Evosep gradient; 2) High throughput maps for pilots and method optimization with 2.5 hours per map on Evosep 21 min gradients. The latter format has recently been used by the Olsen-lab [8] for their chemical fractionation-based spatial proteomics method and achieved a similar depth to our study, suggesting that this is a robust LC-MS approach with consistent depth across machines and even sites.
We created an open source quality control tool for the objective assessment of map quality (https://domqc.bornerlab.org), which played a pivotal role in our extensive comparison of different LC-MS formats, and will enable other scientists to select the optimal format for their purposes. Our QC tool has a web-based graphical user interface, making it easily accessible, transparent and robust. MetaMass [31] and Qsep [32] are two existing quality assessment tools for spatial proteomics, which can be used to determine how well organelles are separated by a given method. This is an important aspect covered in our QC tool in the form of misclassification matrices inserted from external tools, such as Perseus or MetaMass. While accurate organelle separation is the most important goal in static experiments, the most important aspect for a successful comparative experiment is the reproducibility of profiles. Therefore, we introduced two new quality control metrics, intra-protein-complex scatter and profile reproducibility. Additionally, our standardized data processing workflow ensures that only high quality data are used for downstream analysis and enables reproducible analysis of proteomic depth and overlap between experiments. With this automated QC tool that assesses depth, organelle separation, accuracy and reproducibility, researchers are now able to evaluate all aspects of their profiling data to ensure quality and to guide them through method optimization. This comprises: experimental procedures in the wet lab (e.g. organelle separation technique), MS data acquisition (e.g. gradient lengths and MS parameters), as well as raw data processing (e.g. DIA analysis approach and stringency).
Originally, DOMs were established in our lab using SILAC-based quantification [10], as this provides the highest measurement precision and reproducibility, which is critical for comparative experiments. However, this comes at the cost of relatively low depth per MS run time. Conversely DDA-LFQ maps were so far the best option for deep, but less precise maps. Both were usually measured from triple fractionated samples to increase proteomic depth [17]. Based on our results DDA-SILAC maps still outperform label-free maps in terms of precision (data not shown), but the deep DIA-LFQ format strongly ameliorates the tradeoff between depth and precision by outperforming DDA-LFQ in both. Therefore, DIA-LFQ maps should replace any DDA-LFQ measurements to improve overall map performance significantly. DDA-SILAC is still recommended to detect very small protein movements, in systems that allow metabolic labelling. This opens the question whether DIA-SILAC maps could give a further performance boost over DIA-LFQ maps.
In addition to providing static spatial proteomes, DOMs can be used as an unbiased systematic discovery tool to detect protein localization changes under different physiological conditions [10,12–14]. Using the DDA-SILAC approach a comparative experiment took roughly 11.5 days of MS measurement time [12]. Here, we measured a comparative experiment in only 3.5 days and achieved 50% higher depth. We treated HeLa cells with bafilomycin A1 to inhibit lysosomal acidification, leading to disruptions in protein transport in the endomembrane system [28–30]. Our analysis identified a large number of Golgi proteins that shift away from the Golgi in bafilomycin A1-treated cells, suggesting they may cycle between the Golgi and endolysosomal compartments, and for many of them this effect has not been previously described. This experiment illustrates the power of DIA-DOMs for systematic phenotype discovery in significantly reduced measurement time.
In conclusion, this study provides a comprehensive quality control tool for spatial proteomics experiments and DIA-based workflows for label-free organellar maps with unprecedented precision and depth.
Funding sources
This study was supported by The Max-Planck Society for Advancement of Science. AD received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no. 896725 and a Humboldt Research Fellowship.
Author Contributions
GB, JS and VA devised the study; VA and JS implemented the quality control tool; JS, VA and GB analyzed data; VA and AD performed organellar mapping experiments; AD performed fluorescence imaging; VA ran MS acquisitions and raw data analysis; PS contributed to raw data processing in MaxQuant; JS wrote the initial draft of the manuscript; all authors contributed to manuscript editing; GB supervised the project.
Materials and Methods
Experimental Protocols
Antibodies
The following antibodies were used in this study: sheep anti-TGN46 1:200 for IF (Bio-Rad Cat# AHP500) and Alexa680-labelled donkey anti-sheep IgG 1:500 for IF (Thermo Fisher Scientific Cat# A-21102).
Cell culture
HeLa cells were cultured in Dulbecco’s Modified Eagle’s Medium (DMEM; Gibco Cat# 31966-021), supplemented with 10% (v/v) foetal bovine serum (FBS; Gibco Cat# 10270106) and 1% penicillin-streptomycin (P/S, Gibco Cat# 15140122). Cells were maintained at 37 °C in a humidified atmosphere of 5% CO2.
Autophagy Induction
For starvation, HeLa cells were washed three times with Dulbecco’s Phosphate Buffered Saline (PBS) (Gibco Cat# 14190-094) and then incubated for 1 h in Earle’s Balanced Salt Solution (EBSS; Sigma-Aldrich Cat# E2888) plus 100 nM Bafilomycin A11 (Merck, Cat# 19-148).
Immunofluorescence Microscopy
For widefield microscopy, HeLa cells were grown onto 13 mm coverslips and fixed in 3% (v/v) formaldehyde in PBS for 20 min at room temperature. Residual aldehyde groups were quenched with 0.02 M glycine in PBS for 5 min. Formaldehyde fixed cells were permeabilized with 0.1% (w/v) saponin in PBS for 10 min and blocked in 1% (w/v) BSA/0.01% (w/v) saponin in PBS (BSA block) for 10 min. Primary antibody (diluted in BSA block) was added for 1 h at room temperature. Coverslips were washed three times in BSA block and then fluorophore-conjugated secondary antibody (diluted in BSA block) was added for 30 min at room temperature. Coverslips were then washed three times in PBS. Nuclei were stained with DAPI (300 nM in PBS; Thermo Scientific Cat# 62248) for 5 min. Coverslips were washed in PBS, followed by a final wash in ddH2O, before being mounted in ProLong™ Glass Antifade Mountant (Invitrogen Cat# P36980).
Microscopy was performed at the Imaging Facility of Max Planck Institute of Biochemistry, Martinsried, using a Leica DMi8 inverted microscope (Leica Thunder) equipped with a Leica DFC9000 GTC Camera, a 63x/1.47 oil objective (HC PL APO 63x/1.47 OIL), and an iTK LMT200 motorized stage. Leica Application Software X (LAS X) was used to acquire images, and ImageJ was used for cropping and global brightness/contrast adjustments, which were performed uniformly across all images.
Generation of label-free organellar maps
Scale
To avoid as much sample related variation as possible during method optimization, we generated three replicate maps from HeLa cells on a single day and digested the protein samples on a large enough scale to do any peptide clean up from the same stock of tryptic peptides. The established workflow was then applied to analyze subcellular arrangements that occur when lysosomal pH is disrupted. To this end, organellar maps were prepared from control HeLa cells (untreated) and HeLa cells starved for 1 h in the presence of bafilomycin A1, in triplicate (six maps in total). Replicate maps were generated on the same day.
Dynamic Organellar Maps Workflow
Cell lysis and subcellular fractionation were performed as reported previously [10,33]. All steps were performed at 4 °C with pre-chilled ice-cold buffers. HeLa cells were washed in PBS (without CaCl2 and MgCl2), incubated in PBS for 5 min, rinsed with hypotonic buffer (25 mM Tris HCl, pH 7.5, 50 mM sucrose, 0.5 mM MgCl2, 0.2 mM EGTA), and immediately incubated in hypotonic buffer for 5 min. Cells were scraped in a total volume of 4 mL of fresh hypotonic lysis buffer and mechanically lysed with 15 strokes of a pre-chilled Dounce homogenizer (7 mL, tight pestle, Kontes Glass Co.). Osmolarity was restored to 250 mM with hypertonic sucrose buffer (25 mM Tris base, pH 7.5, 2.5 M sucrose, 0.5 mM MgCl2, 0.2 mM EGTA).
All centrifugation steps were performed at 4 °C with the fastest acceleration and deceleration settings. Crude cell lysates were centrifuged at 1000×g for 10 min (Multifuge 1 L, Heraeus) to pellet nuclear material and unbroken cells (‘1 K’ fraction). Post-nuclear supernatants were transferred to new tubes and centrifuged at 3,000×g for 10 min (‘3 K’ fraction). Post-3000×g supernatants were transferred to ultracentrifuge tubes and further subfractionated using the Optima™ MAX Ultracentrifuge (Beckman Coulter) with a pre-chilled TLA 110 rotor (Beckman Coulter) by sequential centrifugation steps, each time collecting a protein pellet and transferring the entire volume of supernatant to a new ultracentrifuge tube: 10,000 rpm (5,400×g) for 15 min (‘6 K’ fraction), 15,000 rpm (12,200×g) for 20 min (‘12 K’ fraction), 21,000 rpm (24,000×g) for 20 min (‘24 K’ fraction), and 38,000 rpm (78,400×g) for 30 min (‘80 K’ fraction). All pellets were resuspended in 1×SDS buffer (2.5 % SDS, 50 mM Tris HCl, pH 8.1). The supernatant obtained after the final centrifugation step (cytosolic fraction) was mixed at a 4:1 ratio with 5×SDS buffer (12.5 % SDS, 50 mM Tris HCl, pH 8.1). Samples were heated at 72 °C for 5 min and sonicated using a Bioruptor (Diagenode Inc) with fifteen 30 s on/off cycles at maximum intensity. Fully solubilized samples were stored at -80 °C. Protein concentrations were determined using the Thermo Scientific™ Pierce™ BCA (bicinchoninic acid) Protein Assay Kit (Thermo Scientific™ Cat# 23225). Following concentration determination, DTT (Sigma-Aldrich Cat# D0632-25G) was added to a final concentration of 1 mM before preparing the samples for mass spectrometry.
Sample preparation for mass spectrometry
In-solution digestion of proteins
Protein was precipitated by the addition of five volumes of ice-cold acetone, incubated at -20°C overnight and pelleted by centrifugation at 10,000×g (Centrifuge 5418R, Eppendorf) for 5 min at 4°C. All subsequent steps were performed at room temperature. Precipitated protein pellets were air-dried for 5 min, resuspended thoroughly in urea buffer (8 M urea, 50 mM Tris HCl, pH 8.1, freshly added 1 mM DTT), and incubated for 15 min. Sulfhydryl groups were alkylated by the addition of 5 mM iodoacetamide for 1 h in the dark. Proteins were enzymatically predigested by the addition of LysC (1 μg per 50 μg of protein; Wako Cat# 129-02541) for overnight incubation. Predigests were then diluted four-fold with 50 mM Tris, pH 8.1 (final urea concentration < 2 M) before addition of trypsin (1 μg per 50 μg of protein; Sigma-Aldrich Cat# T6567) for a 3 h incubation. Reaction was stopped by the addition of 1 % trifluoroacetic acid (TFA, final pH < 3). Samples were incubated on ice for 10 min and spun at 10,000×g for 5 min at 4 °C. Supernatant was transferred to a new tube for peptide storage at -20 °C.
Purification and fractionation of peptides
Peptides were purified (and fractionated) either by solid-phase extraction with poly(styrenedivinylbenzene) reverse-phase sulfonate (SDB-RPS), as previously described [22], or by LC trapping using commercially available C18 StageTips (EvoTips Cat# EV2001) of the Evosep System according to the manufacturer’s instructions. In brief, EvoTips were activated by wetting the C18 material in 1-propanol, washed with Evosep buffer B (0.1 % [v/v] FA in ACN), and wetted in 1-propanol again for 5 min. Soaked tips were washed with Evosep buffer A (0.1 % [v/v] FA), loaded first with 0.2 % FA and then 200 ng sample. EvoTips were washed with Evosep buffer A, finally loaded with Evosep buffer A and stored at 4 °C until analysis by mass spectrometry. Peptides purified via the SDB-RPS approach were dried at 45 °C in a centrifugal vacuum concentrator (Concentrator 5301, Eppendorf), resuspended in buffer A* (0.1 % [v/v] TFA, 2 % [v/v] ACN), and stored at -20 °C until analysis by mass spectrometry.
Mass spectrometry analysis
All measurements were done on Thermo Exploris mass spectrometers, with fewest possible chromatography column changes. Several MS setups and strategies were tested, most importantly data independent vs data dependent acquisition. The effect of gradient length on map quality was evaluated for 21, 44 and 100 min gradients, as well as fractionated vs unfractionated samples.
Liquid Chromatography
Nanoflow reversed-phase chromatography was performed using either the Evosep One (Evosep Biosystems) or the EASY-nLC 1200 ultra-high-pressure system coupled online to an Orbitrap Exploris 480 instrument via a nano-electrospray ion source (all Thermo Fisher Scientific). On the EASY-nLC 1200 system a binary buffer system with the mobile phases A (0.1 % [v/v] FA) and B (80 % ACN, 0.1 % [v/v] FA) was employed. Peptides were separated in 100 min at a constant flow rate of 300 nL/min on a 50 cm×75 μm (i.d.) column with a laser-pulled emitter tip, packed in-house with ReproSil-Pur C18-AQ 1.9 μm silica beads (Dr. Maisch GmbH). The column was operated at 60 °C using in-house manufactured oven. In total, 300 ng of purified peptides in Buffer A* were loaded onto the column in Buffer A and eluted using a linear 84 min gradient of Buffer B from 5 % to 30 %, followed by an increase to 60 % B in 8 min, a further increase to 95 % B in 4 min, a constant phase at 95 % B for 4 min, followed by washout – a decrease to 5 % B in 5 min and a constant phase at 5 % B for 5 min – before re-equilibration. On the Evosep One LC system a binary buffer system with the mobile phases A (0.1 % [v/v] FA) and B (0.1 % [v/v] FA in ACN) was used. Peptides were separated in 21 min at a flow rate of 1.0 μL/min on an 8 cm column (with a throughput of 60 samples per day [SPD]) or 44 min at a flow rate of 0.5 μL/min on a 15 cm column (with a throughput of 30 SPD), using in-house packed columns and standard pre-programmed gradients (see section 3.1.7.1.1). The 15 cm in-house packed column was operated at 60 °C using an in-house manufactured oven.
Mass spectrometry
The Orbitrap Exploris 480 mass spectrometer run by Xcalibur (v.4.4, Thermo Fisher) was operated in data-dependent top 15 scan mode (DDA) with a full scan range of 300 - 1650 Th when coupled to the EASY-nLC 1200 system (100 min gradient). Survey scans were acquired at 60,000 resolution with an automatic gain control (AGC) target of 3 × 106 charges and a maximum ion injection time of 25 ms. The selected precursor ions were isolated in a window of 1.4 Th, fragmented by higher-energy collisional dissociation (HCD) with normalized collision energies of 30. Fragment scans were performed at 15,000 resolution, with a maximum injection time of 28 ms, an AGC target of 1 × 105 charges, and a precursor dynamic exclusion for 30 s. Acquisition schemes for the data-independent acquisition (DIA) scan mode used herein were described previously [34,35], but were optimized and tailored for the Dynamic Organellar Maps approach. In brief, the DIA method for the 100 min gradient consisted of one survey scan that was followed by 33 variably sized MS2 windows in one cycle resulting in a cycle time of 2.5 s. Survey scans were acquired at 120,000 resolution with an AGC target of 3 × 106 charges and a maximum injection time of 60 ms covering a m/z range of 350 – 1,400. MS2 scans were acquired at 30,000 resolution with an Xcalibur-automated maximum injection time covering a m/z range of 332 (lower boundary of the first window) to 1,570 (upper boundary of the 33rd window). The DIA method for the 44 min and 21 min gradient consisted of one survey scan that was followed by 35 equally sized MS2 windows (19.2 Th with 1 Th overlap) in one cycle resulting in a cycle time of 1.5 s. Survey scans were acquired at 120,000 resolution with an AGC target of 3 × 106 charges and a maximum injection time of 45 ms covering a m/z range of 350 – 1,400. MS2 scans were acquired at 15,000 resolution with a maximum injection time of 22 ms covering a m/z range of 361 - 1,033.
Bioinformatic analysis
Raw data analysis
For peptide and protein identification, resultant MS data were imported into MaxQuant version 1.6.7 or 2.0.0.0 (see below). Unless otherwise stated, default parameters were used for all settings. The MS2 spectra were searched against the SwissProt entries contained in the human UniProt FASTA database (UP000005640_9606, 42,418 entries). Spectral libraries were constructed using DDA raw data of fractionated subcellular samples of the same organellar maps that were used for the data acquired in DIA mode.
Spectral library generation and DDA analysis
For spectral libraries and the DDA benchmark, DDA raw files were processed in MaxQuant (v.1.6.7) [36,37] employing the Andromeda search engine [38]. For accurate label-free quantification, the ‘MaxLFQ algorithm’ [39] was enabled with LFQ minimum ratio count of 1 and the match-between-runs feature was enabled to match between equivalent and adjacent fractions of replicates.
DIA analysis
DIA raw files were processed via MaxDIA which is embedded into the MaxQuant software environment (v.2.0.0.0) [26], using default settings with the same exceptions as for DDA raw files. For both the direct and library DIA approach, spectral libraries of peptides were provided in the form of ‘peptides’, ‘evidence’, and ‘msms’ files. Whereas for the library approach these files were obtained from the MaxQuant DDA searches, for the discovery approach an in silico predicted library for all peptides with up to 1 missed cleavage was used. The prediction had previously been generated using the DeepMass:Prism tool [40]. The provided library was filtered to only contain Swiss-Prot entries using a python script (github.com/cox-labs/DIAtools/tree/main/Misc/FilterAdditional).
Downstream data quality analysis
The intra- and inter-experimental quality of the dynamic organellar maps were evaluated to assess the performance of different combinations of MS methods and LC-MS setups. To enable the visual exploration and quality assessment of spatial proteomics data, we developed a web-based quality control tool. The workflow is entirely based on the Python scripting language. The QC tool employs different bioinformatic analysis approaches that allow in-depth quality assessment of individual experiments and enable their comparison.
Data filtering
The primary output from MaxQuant or Spectronaut, or preprocessed data with protein quantification across the subcellular fractions, can be loaded. For the MaxQuant output, reverse hits, contaminants and proteins only identified by modifications are removed, otherwise all data is used. Further filtering is then performed at the level of individual maps and tailored to each quantification strategy, to obtain datasets with high-quality measurements.
For SILAC maps, proteins are retained if they have more than two quantification events, or two quantification events where the ratio variability was below 30 %. For each fraction, SILAC ratios are normalised by dividing by the median ratio for the fraction. Only proteins with complete profiles are retained. SILAC ratios are inverted and profiles for each protein are 0-1 normalized or log-transformed. For LFQ maps, intensities are already globally normalized, hence no further normalization is required. Two stringency filters are applied: First, only profiles with LFQ intensities in at least 4 consecutive fractions are considered. Second, profiles are rejected if their average MS/MS count per subcellular fraction is less than two. Then, (0-1) normalization of each profile is performed. Filtered datasets were annotated based on a predefined set of approximately 1076 bona fide organellar marker proteins covering 12 subcellular localizations/organelles [10]. These default settings were used for all datasets in this study.
Proteomic depth
Proteomic depth is assessed by counting protein groups that are profiled (i.e. passing the quality filter) in one or all replicates. For comparison the same numbers are also calculated for all identified protein groups, including the ones that fail the quality cutoffs. Additionally. data completeness, number of profiles and the depth of each individual fraction are available in the tool.
Principal component analysis
For graphical map representation, filtered and (0-1) normalized data from all experiments compared were z-scored in each fraction and jointly subjected to principal component analysis (PCA) to achieve dimensionality reduction. For each map, the first three principal components were calculated via Python’s scikit-learn library (v.0.23.2).
Interprotein scatter within stable complexes
Profile scatter of different, well-characterized protein clusters, for example the 20S core proteasome (14 subunits, PSMA1-7, PSMB1-7), can be analyzed to quantify how well the assumption is fulfilled that proteins with similar localizations have similar profiles. Proceeding with the filtered and (0-1) normalized data, profiles that belong to a specified protein cluster are extracted and filtered to leave only proteins that were measured with full coverage across all compared maps and experiments. By default, only complexes with full coverage data for at least five proteins are analyzed. Subsequently, within each replicate, the Manhattan distances were calculated between each protein of a specified cluster and the corresponding median cluster profile.
Profile reproducibility
To evaluate the reproducibility of (0-1) normalized profiles, the inter-profile scatter per protein across replicates is calculated in a similar fashion: for each individual protein the Manhattan distances to the average protein profile across replicates are calculated and averaged. This yields the distribution plot of the profile scatter in each experiment. All proteins quantified with full coverage across all compared maps and experiments are analyzed.
Support vector machine analysis
To further evaluate the performance of organellar maps, their power to predict protein localization was assessed using quality-filtered, (0-1) normalized data with full replicate coverage. For supervised classification the set of bona fide marker proteins covering 11 subcellular localizations was used as a means to assign all other proteins to organellar clusters by SVMs in Perseus (v.1.6.2.3) [41] (ER_high_curvature was removed in this study due to low number of marker proteins in the depth limited datasets). As far as possible (see figure legends) only markers present in all compared datasets were used and identical SVM parameters were used.
First, the SVM algorithm was trained on the marker proteins, to determine optimal classification parameters and thus define optimal boundaries between the organellar clusters. The kernel was set to a radial basis function (RBF) and 8-fold cross-validation was used to ensure models are not overfit. Both parameters, Sigma and C were optimized via one-dimensional scans to achieve the minimal overall classification error. Second, optimized parameters were used to classify non-marker proteins applying leave-one-out cross-validation. The misclassification matrix from Perseus is then used by the quality control tool to calculate the global marker prediction accuracy (ratio of correctly predicted to the total number of markers), the organelle specific recall (proportion of markers correctly assigned to the cluster), and the organelle specific precision (ratio of markers correctly assigned to the number of all markers assigned to the cluster). The harmonic mean of recall and precision, the F1 score, was used as the primary readout for SVM performance.
Protein movement analysis
The previously established MR plot analysis was used to identify protein profiles that are statistically significantly different between two conditions [10,33]. This analysis was performed in the Perseus environment (v.1.6.2.3). First, ‘delta profiles’ were calculated within each cognate pair of untreated and treated maps, by subtracting the profiles obtained from each treated map from the profiles of its control map. Second, for each of three sets of delta profiles, a multidimensional outlier test was performed providing three p-values for movement for each protein. The median of the three p-values was selected for further analysis, raised to the power of three and adjusted for multiple hypothesis correction using the Benjamin-Hochberg approach. The obtained p-value was -log(10) transformed to obtain the Movement score (M score). Third, the Reproducibility score (R score) was calculated: the Pearson correlation of all pairs of delta profiles was calculated (Rep 1 vs 2, 1 vs 3, 2 vs 3) and the median of these was set as the R score. For FDR calculation the three untreated maps were compared in all combinations (1 vs 2, 2 vs 3, 1 vs 3) to obtain largely static delta profiles as decoy distribution. To get a stringent FDR the R scores were calculated less conservatively, using the maximum correlation. This yielded a 32% FDR for the reported hits, which is probably an overestimate.
Acknowledgements
We wish to thank Matthias Mann for his continued generous support. We also want to thank members of the department for Proteomics and Signal Transduction for fruitful discussions and providing starting points for the DIA method optimization: Isabell Bludau, Sophia Steigerwald, Maximilian Zwiebel, Marvin Thielert, Patricia Skowronek, Jakob Bader, Florian Meier. We also thank Jürgen Cox for providing us with MaxQuant 2.0 prior to its release. We thank the MPIB Imaging Facility for their excellent technical support. We are very grateful to Igor Paron and the column team for outstanding technical support.