Ultra-deep proteomics by Thin-diaPASEF with a 60-cm long column system

Recent advances have allowed for the detection of 10,000 proteins from cultured human cell samples, such as HeLa and HEK293T cells in a single-shot proteome analysis. However, deeper analysis remains challenging. Therefore, in this study, we aimed to perform a deep proteomic single-shot analysis using timsTOF HT. To achieve deep proteomics, we developed Thin-diaPASEF, a parallel accumulation-serial fragmentation (PASEF) technology featuring a thinly divided m/z axis only in regions of high ion density. Furthermore, using a 60-cm long C18 column with a particle size of 1.7 µm, an average of 11,698, 11,615 and 11,019 unique proteins were successfully detected from 500 ng of HEK293T, HeLa and K562 cell digests, respectively, with a 100 min active gradient. The same system was used to analyze Lycopersicon esculentum lectin (LEL) enriched plasma and serum. The LEL method identified an average of 8,613 and 4,078 unique proteins, in plasma and serum, respectively. Our ultra-deep proteomic analysis system will be helpful for the in-depth comparison of proteins in medical and biological research because it enables the analysis of highly proteome coverage in a single-shot.


Introduction
Recent advances in mass spectrometry and improvements in data-independent acquisition (DIA) has allowed for the identification of more than 10,000 proteins in typical cultured human cell samples, such as HeLa and HEK293T cells, in a single-shot analysis (1)(2)(3).However, since typical human cells contain approximately 12,000 mRNAs, detecting 10K proteins in single-shot proteome analysis is not satisfactory (4,5).Guzman et al. used Thermo Fisher Scientific's Orbitrap Astral mass spectrometer and DIA analysis to detect10K proteins in HEK293T cells, and identified approximately 12,000 proteins by fractionation of peptides using high-pH reverse phase chromatography (1).The identified ultra-low-expression proteins consist of important signaling proteins such as transmembrane receptors and transcription factors.It is therefore of great significance to be able to observe ultra-low-expression proteins in a simple single-shot.
Currently, two mass spectrometers equipment, a Thermo Fisher Scientific's Orbitrap Astral and Bruker Daltonics's timsTOF HT, are expected to exceed 12,000 proteins in a single-shot.Owing to its high acquisition speed, resolution, sensitivity, and automatic gain control ability, the Astral analyzer achieves a wide dynamic range (6).The main features of the timsTOF series are a high ion mobility resolution and the ability to trap and enrich ions separated by ion mobility (7,8).In addition, timsTOF HT has improved the dynamic range and depth of analysis with the 4th generation TIMS-XR and advanced digitizer technology (9).Furthermore, in the timsTOF series, a DIA method using parallel accumulation-serial fragmentation (PASEF) technology (diaPASEF) was established (10), and more recently, slice-PASEF (11) and new diaPASEF methods, such as midia-PASEF (12) and synchro-PASEF (13), have been developed.However, to date, there have been a limited number of reports on deep proteomic challenges in the timsTOF series, including timsTOF HT, and there are no reports of a single-shot 10K proteome approach using the timsTOF series in typical human cells.
The resolution of liquid chromatography (LC) is key to improving protein identification in single-shot analysis (14)(15)(16).The complexity of simultaneously ionized peptides is reduced by the high chromatographic resolution of LC, which increases the number of detected peptides.A simple way to increase chromatographic resolution is to use long columns packed with sub-2 µm particles.Although there is a disadvantage of higher LC pressure when using such columns, it is an important factor that allows for an additional step in the improvement of protein identification.
In this study, more than 12K single-shot proteomics was challenged using timsTOF HT, which has high instrumental potential but has not been fully investigated in deep proteomics.First, we examined the parameters of diaPASEF for deep proteomics, and established the Thin-diaPASEF, a diaPASEF with a thinly divided m/z axis on only regions of high ion density.Next, to more improve the depth of analysis, a 60 cm long C18 column with a particle size of 1.7 µm was used, and successfully detected more than nearly12,000 unique proteins (without grouped proteins) in HEK293T and HeLa cells.Furthermore, plasma and serum treated with the enrichment method using Lycopersicon esculentum lectin (LEL) (17) were analyzed in the same system and an average of 8,649 and 4,093 unique proteins were identified, respectively.
Proteins in HEK293T cells were extracted using protein extraction buffer (100 mM Tris-HCl, pH 8.0, 20 mM NaCl, and 10% ACN containing 4% sodium dodecyl sulfate) by sonication in a Bioruptor II (CosmoBio, Tokyo, Japan) for 15 min.Protein concentration in the extract was determined using a BCA protein assay kit (Thermo Fisher Scientific), according to the manufacturer's instructions, and adjusted to 100 ng/µL using protein extraction buffer.

Preparation of Plasma and Serum
Serum and plasma were treated using the Lycopersicon esculentum lectin (LEL) method, as previously described (17), with modifications.Initially, 25 µL of streptavidin beads suspension (Cytiva, Marlborough, MA, USA) was added to 600 µL of protein-free blocking buffer (Setsuyaku-Kun Supporter, DRC, Tokyo, Japan) diluted 10-fold with Tris buffered saline (TBS, 25 mM Tris-HCl pH 7.4, 1.37 mM NaCl and 2.68 mM KCl).Next, 10 µL of 1 µg/µL LEL (Vector Laboratories) was added to the solution.The solution was mixed gently for 30 min and the beads were washed once with 1.2 mL of dilution/wash buffer (TBS with 0.0005% tween20).Next, 25 µL of serum or plasma, diluted in 475 µL of dilution/wash buffer, was added to the beads and mixed for 60 min.Subsequently, the beads were washed twice with 1.2 mL of dilution/wash buffer and mixed in 200 µL of the protein extraction buffer for 15 min.The eluted proteins were then collected.The LEL method was automatically performed using a Maelstrom 9610 instrument (Taiwan Advanced Nanotech, Taoyuan, Taiwan).
The antibody column depletion method used Top14 Abundant Protein Depletion Mini Spin Columns (Thermo Fisher Scientific), following the manufacturer's instructions.

Protein Digestion
HEK293T lysate (20 μg) and 200 µL of treated serum and plasma by the LEL method were subjected to cleaning and digestion with SP3-LASP method (18) with minor modifications utilizing the Maelstrom 9610 instrument.Briefly, two types of SeraMag SpeedBead carboxylatemodified magnetic particles (hydrophilic particles, CAT# 45152105050250, and hydrophobic particles, CAT# 65152105050250; Cytiva) were used.These beads were combined in a 1:1 (v/v) ratio, washed twice with distilled water, and reconstituted in distilled water to achieve a concentration of 10 μg solids/μL.Then, 20 μL of the reconstituted beads (SP3-beads) was added to the protein sample, followed by 1-propanol to a final concentration of 75% (v/v), and mixed for 20 min.Beads were collected and washed twice with 80% 1-propanol and once with ethanol.

Dissolution of commercial human cell digests
HeLa cell tryptic digest (Thermo Fisher Scientific; 20 μg) and 100 μg of K562 cell tryptic digest (Promega) were added to 100 μL and 500 μL of 0.02% DMNG containing 0.1% TFA, respectively, and then mixed for 10 min.The concentration of each digest was 200 ng/μL.
The separation process was conducted using a NanoElute2 system (Bruker Daltonics, Bremen, Germany) with solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in acetonitrile).The gradient conditions varied depending on the column length and duration.In a The peptides eluted from the column were analyzed on a TimsTOF HT (Bruker Daltonics) via Captive Spray II (Bruker Daltonics) and measured in diaPASEF mode using timsControl 5.0.The source parameters were as follows: capillary voltage, 1,600 V; dry gas, 3.0 L/min; and dry temperature, 180 °C.The MS1 and MS2 spectra were collected in the m/z range of 100-1,700.
The accumulation and ramp times were set to 75 ms.The collision energy was set by linear interpolation between 59 eV at an inverse reduced mobility (1/K0) of 1.60 versus/cm 2 and 20 eV at 0.6 versus/cm 2 .To calibrate the ion mobility dimensions, two ions of the Agilent ESI-Low Tuning Mix were selected (m/z [Th], 1/K0 [Th]: 622.0289, 0.9848; 922.0097, 1.1895).The diaPASEF window scheme ranged in dimensions from m/z 340 to 1,100 and in dimension 1/K0 from 0.75-1.25.Thin-diaPASEF method was set to a narrow polygon centered around the precursor density region with a charge of 2 within the m/z 340-1,100 range, adjusted to a cycle time of 1 s with a DIA window width of 15 Th using the diaPASEF editor available in timsControl 5.0.The other diaPASEF method was generated based on the Thin-diaPASEF method using the Python package for DIA with the automated isolation design (py-diAID) software (10).The window schemes for the diaPASEF method are shown in Fig. 1A.The denoising mode was set to 'Low Sample Amount (Sensitive)' in diaPASEF data reduction within the Tune section of the TimsControl 5.0 Data Analysis DIA-MS files were searched against an in silico human spectral library using PaSER (v2023b, Bruker Daltonics) with a TIMS DIA-NN.First, a spectral library was generated from the UniProt human protein sequence database (downloaded 2023, march, 20,575entry, UP0000056) using DIA-NN (version 1.8.1, https://github.com/vdemichev/DiaNN)(19).The Parameters for generating the spectral library were as follows: digestion enzyme, trypsin; missed cleavage, 2; peptide length range, 7-45; precursor charge range, 2-4; precursor m/z range, 350-1,250; and fragment ion m/z range, 200-1,800."FASTA digest for library-free search/library generation;" "deep learning-based spectra, RTs, and IMs prediction;" "n-term M excision;" and "C carbamidomethylation" were enabled.PaSER (v2023b) with TIMS DIA-NN search parameters were as follows: mass accuracy, 15 ppm; MS1 accuracy, 15 ppm; and protein inference, genes, off.The MBR was turned on for quantitative analyses of the proteins and precursors.The protein identification threshold was set at 1% or less for both precursor and protein FDRs.
A protein containing a unique peptide was selected.Subsequently, proteins were selected from valid values detected in at least 70% of the samples within at least one experimental group.The coefficient values were calculated using Perseus v1.6.15.0 (https://maxquant.net/perseus/).For UniProt keyword enrichment analysis, proteins were sorted by intensity and less than 10,000 proteins were extracted.Biological keyword enrichment analysis was performed using DAVID (https://david.ncifcrf.gov/).For the scatterplot, the protein intensities were subjected to Log2 transformation.The protein list of Food and Drug Administration (FDA) approved drug targets was downloaded from the Human Protein Atlas website (https://www.proteinatlas.org/;downloaded on April 3, 2024).

Experimental Design and Statistical Rationale
In this study, we compared Thin-diaPASEF and py-diAID PASEF, and optimized Thin-diaPASEF, reproducibility as well as plasma and serum proteomes using the LEL method with three (Fig. 1), three (Fig. 2AB), ten (Fig. 2C), and four technical replicates (Fig. 3), respectively.These experiments were performed using human-derived samples (HEK293T, HeLa, and K562 cells for optimization of DIA, plasma, and serum to determine the significance of the LEL method).

Improved diaPASEF for deep proteomics
First, the diaPASEF window setting was examined for deep proteome analysis.The basic diaPASEF uses windows generated with py-diAID software (py-diAID PASEF), which covers a wide region of detected ions (Fig. 1A).In contrast, we tried Thin-diaPASEF, a diaPASEF with a thinly divided m/z axis, only in regions of high ion density (Fig. 1B).Compared with py-diAID PASEF, Thin-diaPASEF increased the number of identified proteins by 10%, 11%, and 10% in the HEK293T, HeLa, and K562 digests, respectively, and the number of precursors by 7.6%, 12.6%, and 11% in the HEK293T, HeLa, and K562 digests, respectively.Because of the smaller area of ions scanned in Thin-diaPASEF, we assumed that the number of precursors in Thin-diaPASEF was lower than that in py-diAID PASEF; however, the number of precursors in Thin-diaPASEF was higher.This indicates that by focusing on the small region, Thin-diaPASEF was able to identify more trace precursors than expected, which could not be identified by py-diAID PASEF.Therefore, Thin-diaPASEF, which thinly delimits the m/z range in regions of high ion density, was found to reduce the complexity of MS2, allow for the identification of trace precursors, and improve protein identification.
In recent years, diaPASEF methods that utilize the features of TIMS-PASEF such as media-PASEF and synchro-PASEF, which finely delimit the ion mobility axis, have been developed.However, MS data measured using these methods are not supported by protein identification software, such as Bruker's PaSER, Spectronaut, and DIA-NN, and cannot be used universally and compared with our methods.Thin-diaPASEF also offers the advantage of easily creating DIA windows using the LC-MS operation software.In addition, protein identification and quantification analysis can be performed using various software packages such as PaSER, Spectronaut, and DIA-NN.

Impact of 60-cm long column
We used a 60-cm long C18 column, with a particle size of 1.7 µm, to establish a deeper proteome analysis system (Fig. 2A).In the 50 min active gradient, the number of proteins and precursors identified was slightly higher on the 60-cm column in the HEK293T, HeLa and K562 when comparing the 25-cm and 60-cm columns.In the 100 min active gradient, the number of identified proteins and precursors was greater in the 60-cm column for all samples.Therefore, the long gradient was more effective.Although the 60-cm column did not have a major impact, we consider small improvements important to establish the world's leading single-shot proteome analysis system.Next, we evaluated suitable peptide loadings for the 100 min active gradient using the 60-cm column (Fig. 2B).The number of identified proteins and precursors increased when the loading amount of HEK293T, HeLa, and K562 digests increased from 200 to 500 ng.However, increasing the loading amount from 500 to 1,000 ng did not change the number of proteins and precursors.Therefore, the optimum peptide loading amount for this system was 500 ng.As shown in Fig. 2C, the reproducibility of this system was assessed by analyzing 500 ng of HEK293T cells digest 10 times.Calculations of the coefficient of variation (CV) of the protein intensities revealed that the median CVs of the observed proteins was 4%, indicating high reproducibility.
Using our system, an average of 11,698, 11,615, and 11,019 unique proteins in 500 ng of HEK293T, HeLa, and K562 digests, respectively, were identified (Fig. 2B).MS data analyses were performed in PaSER with TIMS DIA-NN and, on the PaSER software, the number of identified protein groups displayed an average of 13,583, 13,254, and 12,583 in the HEK293T, HeLa and K562 digests, respectively.However, because the grouped proteins contained duplicates, we determined the number of unique proteins with unique peptides.The largest number of proteins was identified in single-shot proteome analysis in these three cell lines, confirming the superiority of the ultra-deep proteome analysis system, which combines Thin-diaPASEF on timsTOF HT with a 100 min active gradient using a 60-cm column.For each cell type, UniProt keyword enrichment analysis was performed for proteins with intensity rankings below 10,000 (Fig. 2D).We found that the cells were rich in transcription factors, which are often master regulators of various biological phenomena, and it is important to analyze ultra-trace proteins.
For the LC-MS measurements of the ultradeep proteome analysis system, the active gradient time plus the overhead time was 120 min, with an analytical throughput of 12 samples per day (SPD).Guzman et al. detected 12,179 proteins by analyzing 46 fractions of HEK293T digests using Astral, with a throughput of four SPD for the system (1).Our system was able to detect 11,721 proteins in HEK293T cells at a throughput of 12 SPD, which is thought to be a high throughput considering the depth of the analysis.HEK293T, HeLa, and K562 cell digests were used as standards for proteomic analysis.In this study, HeLa and K562 digests were obtained from commercial sources.Therefore, our results serve as a useful benchmark for single-shot proteomic analyses.

Ultra-deep proteome analysis for plasma and serum
The established ultra-deep proteome analysis system was applied to plasma and serum samples, which are known to have a wide dynamic range of protein concentrations.Fourteen highly abundant protein depletions using an antibody column (TOP14D method) and LEL method were attempted as pretreatment methods for plasma and serum.The TOP14D method identified 2,930 and 1,933 unique proteins and 16,122 and 12,051 precursors in plasma and serum, respectively, whereas the LEL method identified 8,613 and 4,078 unique proteins and 77,204 and 25,057 precursors in plasma and serum, respectively (Fig. 3A).More than twice as many proteins were identified by the LEL method than the TOP14D method in both plasma and serum, confirming the advantage of the LEL method in deep plasma and serum proteomic analyses.In addition, significantly more proteins were identified in plasma than in serum using the LEL method, indicating the LEL method is more suitable for plasma samples.
In our previous study, we reported that the combination of Solanum tuberosum lectin (STL) and LEL was the best way to enrich low-abundance proteins in serum.However, high-quality biotinylated STL from Vector Laboratories has become difficult to obtain.Therefore, in this study, LEL, which can enrich trace proteins on its own, was used to optimize dilution and wash buffer using the conventional STL/LEL method.We previously reported that the STL/LEL method identified approximately 1.5-fold more proteins than the TOP14D method, whereas LEL alone identified more than 2-fold more proteins than those identified in this study.This was mainly due to the optimization of the dilution and wash buffer, with excellent results obtained using LEL alone.
Of particular interest, we found that approximately 8,500 plasma proteins were identified in a single-shot analysis.In the run with the most proteins identified in the group, 215 proteins were identified by the TOP14D method and 432 by the LEL method among the 854 FDA-approved drug target proteins listed in the Human Protein Atlas (Fig. 3B).The FDA-approved drug target proteins showed a broad range of protein intensities.Compared to the TOP14D method, protein identification in the LEL method was approximately 3.4-fold higher, and FDA target protein detection was approximately 2-fold higher.The LEL method exhibited an advantage in plasma proteome.
Current plasma and serum proteome analyses focus on throughput rather than depth of analysis (6,9,(20)(21)(22).Throughput is important in large cohort studies and in major diseases with large numbers of patients, but for diseases with low morbidity, it is difficult to collect many samples; therefore, a deep analysis is considered more suitable than throughput.We often study rare and intractable diseases in children (23)(24)(25)(26), and it is not possible to actually collect a large number of specimens; therefore, a deeper analysis is required.Depending on the purpose and target disease, the combination of the LEL method and the ultra-deep proteome analysis system will be a useful method for biomarker discovery.

Conclusion
We established the Thin-diaPASEF, a diaPASEF with a thinly divided m/z axis on only regions of high ion density, with the aim of detecting 12,000 proteins from typical cultured cells by single-shot proteome analysis.Furthermore, using a 60-cm long column, 11,698, 11,615, and 11,019 unique proteins were detected from 500 ng of HEK293T, HeLa, and K562 digests, respectively in a 100 min active gradient.In addition, this system was combined with the LEL method to perform deep proteomic analyses of the plasma and serum.As a result, 8,613 and 4,078 proteins were successfully detected in plasma and serum, respectively.B) Evaluation of the amount of tryptic peptide injected.To assess the analytical depth of Thin-diaPASEF, digested peptides derived from K562, HEK293T, and HeLa cells were analyzed at 200, 500, and 1,000 ng.The analysis was conducted with a 100 min gradient on a 60-cm column.
C) Reproducibility of Thin-diaPASEF with 100 min gradient using a 60-cm column.To confirm the reproducibility of Thin-diaPASEF with a 100-minute gradient using a 60-cm column, 500 ng of HEK293T-digested peptides were measured 10 times.Proteins were selected if they were detected in at least 70% of the samples within at least one experimental group, and coefficient variations (CVs) were calculated using Perseus v1.6.15.0.The red line represents the median CVs.D) Components of low-intensity proteins in HEK293T, HeLa, and K562 cells.In each cell, proteins identified and quantified using 500 ng of tryptic peptide were ranked by intensity, and those with an intensity below 10,000 were extracted.Subsequently, the extracted proteins were subjected to UniProt keyword enrichment analysis using DAVID, and the top five biological keywords were selected for each cell type.Fig. 3. Deep plasma and serum proteome using the LEL method A) Comparison of plasma and serum protein using the LEL method.Plasma and serum proteins were pretreated using TOP14D and LEL methods.Digested peptides from plasma and serum, equivalent to 1 µL for the TOP14D method and 10 µL for the LEL method, were analyzed using Thin-diaPASEF.
B) Detection of FDA-approved drug target proteins from plasma proteins.In the run with most of the proteins identified in the group, the plasma-identified proteins were compared with the protein list of FDA-approved drug targets downloaded from the Human Protein Atlas.

Fig. 2
Fig. 2 Evaluation of acquisition system by the Thin-diaPASEF