Cell size contributes to single-cell proteome variation

Accurate measurements of the molecular composition of single cells will be necessary for understanding the relationship between gene expression and function in diverse cell types. One of the most important phenotypes that differs between cells is their size, which was recently shown to be an important determinant of proteome composition in populations of similarly sized cells. We therefore sought to test if the effects of cell size on protein concentrations were also evident in single cell proteomics data. Using the relative concentrations of a set of reference proteins to estimate a cell’s DNA-to-cell volume ratio, we found that differences in cell size explain a significant amount of cell-to-cell variance in two published single cell proteome datasets.


9
Individual cells are the basis of life. It is therefore important to develop techniques that accurately quantify the 2 0 molecular composition of single cells. Extensive progress examining mRNA composition has been achieved at 2 1 single cell resolution, helping to catalog diverse cell types in multicellular organisms (1-3). Yet, mRNA 2 2 sequencing gives an incomplete measurement of the state of the cell because diverse post-transcriptional 2 3 mechanisms also impact gene expression. For example, the correlation between mRNA and protein amounts 2 4 is complicated by differing translation and degradation rates (4). Moreover, transcriptomic methods are blind to 2 5 the diverse set of protein modifications that are often key to activity and function. To address the limitations 2 6 inherent to measuring only mRNA transcripts, single cell proteomic methods have emerged.

7
Advances in single cell proteomics are driven by increases in measurement sensitivity from a new generation 2 8 of mass spectrometers (5). In addition to this increased sensitivity, multiplexed peptide labeling approaches 2 9 enable the measurement of hundreds and sometimes thousands of proteins from single mammalian cells (6-9). 3 0 Initial experiments have revealed that the proteomes of single cells are influenced by cell cycle phase (5, 10), 3 1 though it is unclear which other physiological features underlie cell-to-cell proteome heterogeneity. It is 3 2 important to measure these and other quantifiable sources of proteome variation to better characterize features 3 3 that are specific to particular cell types and states.

4
We recently showed that cell size (i.e., the DNA-to-cell volume ratio) is an important determinant of proteome 3 5 content (11). Contrary to the assumption that most cellular components would remain at constant 3 6 concentration in cells of different sizes, we found widespread, size-dependent changes in the concentrations of 3 7 individual proteins ( Figure 1A). These changes in protein concentration likely reflect, to a large extent, the 3 8 size-dependent changes in the cellular growth rate (12-15). Importantly, a recent proteome analysis of the 3 9 NCI60 cancer lines revealed a similar pattern of size-dependent changes to the proteome (16). Thus, 4 0 regardless of cell type, cell size has an important influence on proteome composition and therefore should 4 1 contribute to the cell-to-cell heterogeneity in the proteomes of single cells. 4 2

3
Data curation 4 4 For Brunner et al., protein intensities for the individual G1 cells were obtained from PRIDE (ID: PXD024043). 4 5 G1-labeled columns were extracted from the file named: "20210919_DIANN_SingleCellOutput.pg_matrix.tsv" 4 6 (DIANN1.8 cell cycle folder). G1 cells without Histone H4 intensity were excluded from the analysis. Also, G1 4 7 cells with the fewest number of protein identifications were excluded until a shared set of ~300 proteins were 4 8 detected in each single cell. This resulted in the reanalysis of 70 of the 93 G1 cell proteomes (Table S1). For 4 Figure 1 -Cell size contributes to variation in the proteomes of single cells. 1 4 A) Proteomes vary with cell size. For example, the amount of histone proteins is maintained in proportion to 1 5 the genome so that histone concentrations are inversely proportional to cell size. The Protein Slope describes 1 6 how the concentration of an individual protein scales with cell size (Lanz et al., 2022). Proteins with a slope of 1 7 0 maintain a constant cellular concentration regardless of cell volume ("scaling"). A slope value of 1 1 8 corresponds to an increase in concentration that is proportional to the increase in volume ("super-scaling"), and 1 9 a slope of −1 corresponds to dilution (concentration ~ 1/volume; "sub-scaling"). 2 0 B) Schematic illustrating how relative histone protein concentrations can be used as a proxy for cell size in 2 1 single cell proteomics datasets in which cell size was not measured. 2 2 C) Quantile-quantile plot between the distribution of eigenvalues of the empirical covariance matrix and a 2 3 sample of the Marcenko-Pastur distribution, which is the distribution expected from uncorrelated, normally 2 4 distributed random variables. Eigenvalues above the grey identity line indicate the presence of an underlying 2 5 signal. previously measured Protein Slope value (11). Histone H4 was excluded from the plot. Error bars represent the 3 6 99% confidence interval. The plot in (H) was filtered to display the most abundant proteins. Figure

3
A) Schematic illustrating the methodology to estimate cell size. We first select a small subset of reference 4 4 proteins, like histone H4, whose concentrations were shown to be strongly size-dependent (Lanz et al., 2022). 4 5 Using these reference proteins, and their corresponding size slope values, we performed a least-squares 4 6 regression to estimate the size of each cell. 4 7 B) Having estimated the size of each cell, we then calculate the size slope for each protein in our single cell 4 8 proteomics data sets. Comparison of slopes estimated via the approach described in this paper and those 4 9 measured previously (11). Orange dots denote reference proteins and blue dots with error bars denote binned 5 0 values. 5 1 C and D) Comparison of protein concentration covariance and correlation in the initial dataset (C) and after 5 2 removing the estimated effect of cell size (D) for a set of 5 proteins with large absolute measured slopes.

3
Removing the estimated effect of cell size reduces the covariance and the correlation coefficient between 5 4 protein pairs. We illustrate this effect with a given protein pair. 5 5 E) Relationship between estimated slopes and the coefficients of the first principal component. Both quantities 5 6 are very close to each other, indicating the estimated slopes approximate the direction of maximum variance in 5 7 the dataset. 5 8 F) Amount of variance leftover after removing the first principal component (blue), the estimated effect of cell 5 9 size from measured slopes (orange), the effect of H4 only (green), and the estimated effect of cell size from 6 0 random slopes (red). The number of proteins included in the analysis (x-axis) was gradually increased based 6 1 on protein absolute slopes. For example, if 50 proteins were included in the analysis, this set contains the 50 6 2 proteins with the highest absolute slopes. The maximum amount of variance removed by cell size is bounded 6 3 that removed by the first principle component (PC1 blue). 6 4 6 5 6 6 Supplementary information 6 7 6 8 Figure S1: Multiple core histones can be used to estimate cell size. 6 9 The concentration of Histone H4 correlated with the concentration of other core histones in single cells. A 7 0 regression line is plotted in dark blue with 95% confidence intervals. Pearson r value and its associated p-value 7 1 are shown. tags, so cell size was estimated using the relative concentration of MS2-level reporter ions for Histo 8 4 Each dot represents a proteome and its color indicates the relative H4 concentration. Correlation betwe 8 5 relative histone concentrations and PC1 and PC2. A regression line is plotted in dark blue with 95% con 8 6 intervals. Pearson r value and its associated p-value are shown. 8 7 B) A Pearson correlation coefficient was calculated by regressing the relative concentration of each indiv 8 8 protein against a proxy for each cell's size (histone H4 concentration). The r value for each protein in the 8 9 Specht et al. dataset is plotted against the Protein Slope value (Lanz et al., 2022). Histone H4 was exclu 9 0 from the plot. Error bars for the binned data represent the 99% confidence interval of the mean. In Figure  9 1 only the most abundant proteins are depicted. All proteins identified by Specht et al. are included. 9 2 9 3 9 4 9 5 9 6 Figure S4 -Metrics used to determine the number of reference proteins 9 7 The orange dot represents the number of reference proteins that is presented in the main text. Relative c 9 8 denotes the value for n reference proteins minus the value for n-1 reference proteins divided by value fo 9 9 reference proteins. 0 0 A) Relative change in inferred estimated cell size distribution as more reference proteins are added. 0 1 B) Relative change in estimated slope distribution as more reference proteins are added. 0 2 C) Normalized inner product between the slopes and the PC1 coefficient as a function of the number of 0 3 em mass stone H4. tween the onfidence dividual the cluded ure 1H, e change for n of reference proteins . Reference proteins were discarded from the dataset before computing this metric. We 0 4 chose 5 reference proteins because beyond this number changes produced only marginal differences. 0 5 0 6 0 7 Figure S5 -Impact of increasing number of reference proteins on single cell slopes 0 8 Each panel reports the slopes estimated with a given number of reference proteins. Orange dots denote the 0 9 proteins used as reference and blue dots denote binned data and associated confidence intervals.   I  l  e  r  t  e  n  ,  I  .  ,  Z  h  a  n  g  ,  S  .  ,  Y  o  u  ,  D  .  S  .  ,  M  a  r  i  n  o  v  ,  G  .  ,  M  c  A  l  p  i  n  e  ,  P  .  ,  9  4  E  l  i  a  s  ,  J  .  E  .  ,  a  n  d  S  k  o  t  h  e  i  m  ,  J  .  M  .  (  2  0  2  2  )  I  n  c  r  e  a  s  i  n  g  c  e  l  l  s  i  z  e  r  e  m  o  d  e  l  s  t  h  e  p  r  o  t  e  o  m  e  a  n  d  p  r  o  m  o  t  e  s  s  e  n  e  s  c  e  n  c  e  .   M  o  l  C  e  l  l   8  2  ,  9  5  3  2  5  5  -3  2  6  9  e  3  2  5  8  9  6  1  2  .  Z  a  t  u  l  o  v  s  k  i  y  ,  E  .  ,  L  a  n  z  ,  M  .  C  .  ,  Z  h  a  n  g  ,  S  .  ,  M  c  C  a  r  t  h  y  ,  F  .  ,  E  l  i  a  s  ,  J  .  E  .  ,  a  n  d  S  k  o  t  h  e  i  m  ,  J  .  M  .  (  2  0  2  2  )  D  e  l  i  n  e  a  t  i  o  n  o  f  9  7  p  r  o  t  e  o  m  e  c  h  a  n  g  e  s  d  r  i  v  e  n  b  y  c  e  l  l  s  i  z  e  a  n  d  g  r  o  w  t  h  r  a  t  e  .   F  r  o  n  t  C  e  l  l  D  e  v  B  i  o  l   1  0  ,  9  8  0  7  2  1  9  8  1  3 .