Matrix-matched calibration curves for assessing analytical figures of merit in quantitative proteomics

Mass spectrometry is a powerful tool for quantifying protein abundance in complex samples. Advances in sample preparation and the development of data independent acquisition (DIA) mass spectrometry approaches have increased the number of peptides and proteins measured per sample. Here we present a series of experiments demonstrating how to assess whether a peptide measurement is quantitative by mass spectrometry. Our results demonstrate that increasing the number of detected peptides in a proteomics experiment does not necessarily result in increased numbers of peptides that can be measured quantitatively.

mass spectrometry approaches have increased the number of peptides and proteins measured per 10 sample. Here we present a series of experiments demonstrating how to assess whether a peptide 11 measurement is quantitative by mass spectrometry. Our results demonstrate that increasing the 12 number of detected peptides in a proteomics experiment does not necessarily result in increased 13 numbers of peptides that can be measured quantitatively.
14 Mass spectrometry based proteomics has made great progress and is being used to address 15 essential questions in basic biology and of biomedical significance. Of particular interest, the 16 development of data independent acquisition mass spectrometry (DIA-MS) has made it possible 17 to measure tens of thousands of peptides in a protein digest in 1-2 hours of instrument time. The 18 sampling of tandem mass spectra in DIA-MS is unbiased [1] and systematic [2], in principle making 19 1 it an appealing compromise between a narrowly focused targeted data acquisition strategy [3] and 20 an irregularly sampled discovery method. Although fully targeted proteomics assays often include 21 validation experiments to assess whether the change in measured signal is reflective of the actual 22 change in peptide abundance, proteomics assays measuring thousands of analytes in an unbiased 23 fashion rarely assess which peptide measurements are truly quantitative.

24
A measurement is quantitative when the change in measured signal reflects a change in the 25 quantity of the analyte [4]. Specifically in mass spectrometry proteomics, for a method to be 26 considered quantitative the relationship between the measured signal and the peptide quantity must 27 be assessed. This assessment uses a calibration curve, where the analyte is diluted systematically to 28 demonstrate that the measured signal is precise and above the lower limit of quantitation (LLOQ), 29 the quantity below which a change in signal no longer reflects a change in quantity. Because liquid 30 chromatography-tandem mass spectrometry is subject to matrix effects, calibration curves must 31 be constructed in a relevant sample matrix. For endogenous compounds like peptides that are 32 present in the sample matrix, assessment is frequently performed with reverse calibration curves, 33 where a heavy isotope-labeled synthetic version of the analyte is diluted in the sample matrix [5, 6].

34
Although a signal measured below the LLOQ may still be used to assess a difference between two 35 conditions, when compared to a signal above the LLOQ, the magnitude of the difference in signal 36 is not reflective of the true difference in analyte quantity. In some papers, this phenomenon has 37 been referred to as ratio compression [7]. Thus, unless the relationship between the quantity and 38 signal for each analyte is documented, mass spectrometry measurements should be considered only 39 differential rather than quantitative. In targeted proteomics studies, reverse calibration curves  we propose a framework for discriminating between peptides that are only detectable and those 47 which are both detectable and quantitative in a mass spectrometry experiment. We introduce an 48 alternative to reverse calibration curves called matrix-matched calibration curves.

49
Our goal was to construct calibration curves and determine the LLOQ for every detectable 50 peptide in a given complex protein mixture of interest using one dilution series and without pre-51 Figure 1: Constructing reference material calibration curves using a matched-matrix diluent. (a) A reference material is diluted into a matrix-matched material of similar matrix complexity but with no shared endogenous analytes, for example by stable isotope labeling the matrix or using a diverged species. The curve is made from dilutions spanning several orders of magnitude plus a blank with only the matrix-matched proteome. (b) The model for assessing the lower limit of quantification (LLOQ) using the sparse matrix-matched calibration curve data. We assess the LLOQ (cyan line) as the first point that is statistically different from the background (pink line) and has a CV ≤ 20% using bootstrapping (red line). (c) The sequence of plasma membrane ATPase (Pma1) is represented as a black line. The transmembrane domains along the sequence are depicted in grey. Each peptide detected by DIA-MS is represented by a colored box placed along the sequence. The color of the box ranks the peptide LLOQs. Three of the peptide calibration curves are shown above the sequence. Yellow shading indicates two standard deviations above and below the median for the bootstrapped data. determining targets. We propose matrix-matched calibration curves, in which a complex protein 52 sample of interest (a reference material [9]) is diluted with a matrix-matched material. A matrix-53 matched material may be any sample of equivalent biochemical complexity, but should not share 54 any endogenous analytes with the reference material. For example, a matrix-matched material 55 could be a stable-isotope labeled a reference material that preserve the matrix complexity but shift 56 the peptide masses or using an equivalent biosample from an evolutionarily-diverged species (Sup-57 plemental Fig 1, 2). Each point in the dilution series has the same total protein concentration, 58 composed of some ratio of the reference and matrix-matched material (Fig 1a) spanning several 59 orders of magnitude (see Supplementary Table 1). A strength of this approach is that every peptide 60 (or other type of analyte) in the reference material is diluted through the curve, meaning that cali-61 bration curves are constructed for all peptides detected in the reference material. To fit calibration 62 curves to this novel data, we developed a computational model (Fig 1b) which extends the work Methods). 71 We apply the matrix-matched calibration curve framework first in yeast, and find that it high-72 lights the divide between detection and quantification especially at low protein abundances. In 73 particular, highly abundant proteins often contain peptides that are detected at 1% FDR but are 74 not quantifiable because the observed abundance in the reference material is below the LLOQ. Us-75 ing the highly-abundant yeast proteome plasma membrane ATPase protein (Pma1) as an example,  [10]. The wide-window DIA using a chromatogram-library approach [11] detects, at 1% protein-level FDR, 74% of these proteins (blue, 2,870 proteins). The number of proteins quantifiable by DIA-MS (proteins with at least one peptide with a defined LLOQ) encompasses 52% of the detected proteins, or 39% of the expressed proteins (green, 1,511 proteins). (c) Peptides detected in the yeast lysate narrow-window library are ranked by intensity, and the wide-window detected and quantitative peptides are shown for each decile. (d) Cerebrospinal fluid peptides detected in the narrow-window library (8,698 total peptides, 2,994 protein groups) are ranked by intensity, and the wide-window detected and quantitative (3,183 peptides; 1,303 protein groups) peptides are shown for each decile.
The yeast proteome has the advantage of an established reference quantity for each protein, 84 allowing us to contextualize our results. Ghaemmaghami et al. affinity-tagged the protein coding 85 regions in yeast and reported the protein abundances in molecules-per-cell for 4,102 proteins, 3,869 86 of which could be quantified above 50 molecules/cell [10]. Using data independent acquisition mass 87 spectrometry (DIA-MS) [11], we detected peptides from 2,870 of the proteins they quantified in 88 the reference yeast proteome (Fig 2a, b). Using matrix-matched calibration curves to assess the 89 quantitative accuracy of the detected peptides, we found that half of the detected proteins had at 90 least one quantitative peptide (1,511 proteins). The proteins with validated peptides are primarily 91 high quantity proteins, particularly those above 10,000 molecules/cell. As the reported quantity [10] 92 decreases, fewer detected proteins have at least one quantitative peptide (Fig 2a, b). We compared 93 the peptides determined to be quantitative by matrix-matched calibration curves with the peptides 94 determined to be quantitative by a more conventional synthetic peptide approach [12]. Overall, 95 the proposed framework assessed 6x more candidate peptides and defined 4.7x more peptides as 96 quantitative (Supplemental Fig 3), demonstrating the higher throughput of the proposed framework 97 compared to conventional approaches.

98
The matrix-matched calibration curve approach is generalizable beyond cell culture. To illus- calibration curve framework, we found that 36% of peptides detected in the CSF reference material 107 library (8,698 peptides; 2,994 protein groups) have a defined LLOQ (3,183 peptides; 1,303 protein 108 groups) (Fig 2d). In both the yeast (Fig 2c) and CSF (Fig 2d) references, the most intense peptides 109 in the reference are more likely to be detected and quantified. We also applied the matrix-matched 110 calibration curve approach to an FFPE sample (Supplemental Fig 4) and acquired the data by 111 another form of mass spectrometry (selected reaction monitoring). To construct the FFPE matrix-112 matched calibration curve, we spiked human plasma into homogenized chicken liver as a reference 113 and used the unspiked homogenized chicken liver for the background proteome. We targeted 84 114 peptides (18 proteins) and found that 27 of the targeted peptides were quantitative (14 proteins). 115 6 This demonstrates that the matched-matrix calibration curve approach is generalizable broadly 116 across not only sample types but also mass spectrometry acquisition approaches.

117
A limitation of the approach is that the maximum possible peptide quantification is limited 118 by the endogenous abundance of the peptide in the reference, which for low abundance peptides 119 results in stunted linear range. Another consequence of the endogenous abundance limitation is that 120 matrix-matched calibration curve data is extremely sparse compared to conventional calibration 121 curves because low abundance reference peptides produce low signal which reduces to zero signal as 122 the reference is diluted. Additionally, while the quantitative peptides reported here may serve as a 123 starting point for future assay development, we emphasize that these LLOQs are specific to these 124 exact conditions. Matrix-matched calibration curves, like all calibration curves, are only reflective 125 of the peptide measured on a given platform. While most quantitative methods report precision, 126 this does not assess whether a change in signal reflects the change in quantity. Therefore, the use of 127 matrix-matched calibration curves should be performed for all proteomics experiments that require 128 an assessment of which peptides reflect the change in quantity those that are just differential.