Proteomic and functional comparison between human induced and embryonic stem cells

Human induced pluripotent stem cells (hiPSCs) have great potential to be used as alternatives to embryonic stem cells (hESCs) in regenerative medicine and disease modelling, thereby avoiding many of the ethical issues arising from the use of embryo-derived cells. However, despite clear similarities between the two cell types, it is likely they are not identical. In this study, we characterise the proteomes of multiple hiPSC and hESC lines derived from independent donors. We find that while hESCs and hiPSCs express a near identical set of proteins, they show consistent quantitative differences in the expression levels of a wide subset of proteins. hiPSCs have increased total protein content, while maintaining a comparable cell cycle profile to hESCs. The proteomic data show hiPSCs have significantly increased abundance of vital cytoplasmic and mitochondrial proteins required to sustain high growth rates, including nutrient transporters and metabolic proteins, which correlated with phenotypic differences between hiPSCs and hESCs. Thus, higher levels of glutamine transporters correlated with increased glutamine uptake, while higher levels of proteins involved in lipid synthesis correlated with increased lipid droplet formation. Some of the biggest metabolic changes were seen in proteins involved in mitochondrial metabolism, with corresponding enhanced mitochondrial potential, shown experimentally using high-resolution respirometry. hiPSCs also produced higher levels of secreted proteins, including ECM components and growth factors, some with known tumorigenic properties, as well as proteins involved in the inhibition of the immune system. Our data indicate that reprogramming of human fibroblasts to iPSCs effectively restores protein expression in cell nuclei to a state comparable to hESCs, but does not similarly restore the profile of cytoplasmic and mitochondrial proteins, with consequences for cell phenotypes affecting growth and metabolism. The data improve understanding of the molecular differences between induced and embryonic stem cells, with implications for potential risks and benefits for their use in future disease modelling and therapeutic applications.

A mass spectrometry-based strategy, involving MS3-based, synchronous precursor selection 16 (SPS) tandem mass tagging (TMT) 17 , was used to characterise the proteomes of independent sets of hESC and hiPSC lines derived from different donors within a single 10-plex. To optimise quantification accuracy, each sample was allocated to a specific isobaric tag in a way that minimised cross-population reporter ion interference (Fig.1a), as previously described 18 . In total 8,491 protein groups (henceforth referred to as 'proteins'), were detected at 1% FDR, with >99% overlap between the proteins detected from both the hESC and hiPSC lines (Fig.   1b).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint To provide a quantitative comparison of the respective hESC and hiPSC proteomes, we focussed on analysing the 7,878 proteins that were detected with at least 2 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021. 10.20.464767 doi: bioRxiv preprint unique and razor peptides. Protein copy numbers were estimated via the "proteomic ruler" 19 , which revealed that both the hESC (Fig. 1c) and hiPSC (Fig. 1d) proteomes display a similar wide dynamic range, with estimated protein copy numbers extending from a median of less than 100 copies, to over 100 million copies per cell.
Furthermore, the composition of the respective hiPSC and hESC proteomes also appear very similar. Both populations display high expression levels of ribosomal proteins, protein chaperones and glycolytic enzymes (Fig. 1e&f), consistent with both being primed pluripotent stem cells, which are heavily dependent on glycolysis for energy generation 20 .
It is only when the quantitative data are examined in more detail that differences between the cell types become apparent (Fig. 1g). A principal component analysis (PCA), based on the protein copy numbers, revealed a clear separation between the two stem cell populations within the main component of variation, which accounted for 69% of variance. The PCA clearly showed that the independent iPSC lines were more similar to each other than to any of the hESC lines, and vice versa. Protein content differences masked by data normalisation To assess potential population-scale effects, we next compared the hESC and hiPSC proteomes using two different normalisation approaches, along with differential expression analysis. First, a concentration-based approach was used, as typically applied to proteomics datasets (see methods). It should be noted that, unlike protein copy number estimates, this normalisation strategy does not account for potential changes in either cell size, or total protein content, between the . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint populations being compared. Using this concentration based methodology (Fig. 2a), no major differences in protein expression were detected between the hESC and hiPSC lines, i.e. with no significant changes seen for ~95% of all proteins (see methods), consistent with previous reports 13 . However, when a differential expression analysis is based on protein copy numbers instead (Fig. 2b), systematic differences between the two populations were distinguished. Thus, 20% (1,587/7,878) of all proteins detected showed over two-fold higher expression in hiPSCs than in hESCs (Fig. 2e, p-value < 0.01). In contrast, only 22 proteins (0.3%) showed significantly lower expression levels in hiPSCs.
Estimations of the total protein content for both populations, based upon the MS data, indicated that hiPSCs have a median increase of 57% in total protein content, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint compared to hESCs (Fig. 2c). To validate this observation, an independent assay (EZQ TM assay; see methods), was used to measure the total protein yield from similar numbers of freshly grown hiPSC and hESC cells. From these experiments, the calculated protein amount per million cells was 74% higher ( Fig. 2d; p-value=0.0018) in hiPSC cells, relative to hESCs. Next, to check if these differences in total protein content reflected differences in cell cycle distributions between the two populations, FACS analyses were performed.
This showed no significant differences in the percentage of cells at each cell cycle stage between the hiPSC and hESC lines (Fig. EV1). We conclude that there is a consistent difference in total protein expression between hiPSCs and hESCs, independent of cell cycle effects.
To explore further the similarities and differences between the respective hiPSC and hESC proteomes, we next compared Spearman and Lin's concordance correlation coefficients. The Spearman rank correlation was used to compare protein ranking . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint based on the estimated copy numbers in both populations (Fig. 2e). This showed a correlation coefficient of 0.99, demonstrating that the protein expression profiles are nearly identical between the two populations. However, Lin's correlation coefficient, which measures the degree of agreement between two populations, essentially evaluating if the results are identical 21 , was notably lower, at 0.91 (Fig. 2e).
These data indicate that hiPSCs and hESCs have very similar rank profiles, i.e., the most abundant proteins are essentially the same in both populations. However, there are nonetheless quantitative differences between the hiPSC and hESC proteomes.

Subcellular Proteome scaling
Having detected many proteins that were significantly increased in expression in hiPSCs compared with hESCs, we next checked specifically whether this included key primed pluripotency markers (Fig. 3a). The data showed no significant differences (p-value <0.01) in expression of SOX2, NANOG (detected as NANOGP8), and OCT4 (detected as POU5F1), across the independent hiPSC and hESC lines (Fig. 3a).
To test whether the protein abundance difference was related to phenotypic variations between hiPSCs and hESCs in one or more specific subcellular compartments, we used an overrepresentation analysis (ORA), of the cellular compartments using WebGestalt 22 . The analysis focussed on all the proteins showing significantly increased expression in hiPSCs, compared to hESCs (see methods) and it showed the highest enrichment for organelle and plasma membrane related localisations, proteins localised to the Golgi apparatus and proteins in preribosomes (Fig. 3b).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint To characterise the subcellular compartment protein scaling, total copy numbers for all proteins in each compartment were compared between the hiPSC and hESC lines. Interestingly, the data showed that nuclear proteins had the highest similarities between the cell types, with chromatin-associated proteins being virtually unchanged between hiPSCs and hESCs (Fig. 3c). However, the data showed higher than median fold changes for mitochondrial, secreted, plasma membrane, Golgi, lysosome and endosome associated proteins.
We next focussed on the only non-membrane related 'compartment' that was enriched in the ORA, the pre-ribosome. Hence, we next looked at proteins directly related to ribosome subunit biogenesis and associated processes, such as pre-rRNA . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint processing (Fig. 3d). Proteins linked with ribosome subunit biogenesis, as defined by Kegg 23 , showed considerable increases in expression in hiPSC, as compared with hESC lines.
For example, RNA Polymerase I subunits (responsible for transcription of rRNA genes), showed the highest increase in hiPSCs. In contrast, subunits unique to RNA Polymerase II, which is responsible for transcription of mRNA, snRNA and microRNA genes, showed a more modest increase (Fig. 3e). And finally, the end point of ribosome biogenesis, the ribosomal proteins themselves, also had significantly higher expression in hiPSCs compared to hESCs (Fig. 3f). These specific observations can account, at least in part, for the higher protein content detected in hiPSCs.

Upscaling of mitochondrial translation and ribosomes
Next, another ORA was performed, this time focused on biological processes, rather than cellular compartments. This revealed a clear enrichment in specific terms relating predominantly to mitochondrial translation, transmembrane transport, extracellular structure and rRNA metabolic process (Fig. 4a). The highest enrichment score was seen for proteins involved in mitochondrial translation, with nearly all proteins in the pathway showing significantly increased expression (p-value <0.01) in iPSCs. Specifically, all proteins involved in mitochondrial pre-rRNA processing, translation initiation, translation elongation, together with 66% of the proteins involved in translation termination, all had higher copy numbers per cell in hiPSCs ( Fig. 4b). Mitochondrial ribosome proteins were also increased in expression by 74% ( Fig. 4c), with a significantly altered ratio of small to large subunit proteins (Fig. 4d).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint Amongst the mitochondrial family of SLC transporters, while they vary considerably in total abundance (Fig. 5b), all 26 showed significantly increased expression in hiPSCs. This included SLC25A6, which is the most abundant member of the family and represents the main ATP/ADP transporter for both hiPSCs and hESCs. Notably, SLC25A6 was present in almost 11 million copies per cell in iPSCs, representing an increase in expression of ~83% over hESCs.
Furthermore, 11/12 of the main cellular amino acid transporters located in the plasma membrane also showed significantly increased expression in hiPSCs. This included the most abundant transporter SLC3A2 (Fig. 5c), which is a subunit of several heterodimeric amino acid transporter complexes, whose substrates vary . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; according to the specific subunits within the complex. We note that the 3 main glutamine transporters 25,26 , i.e., SLC38A1, SLC38A2 and SLC1A5, were all increased by >2-fold in hiPSCs (Fig. 5d). This suggested that hiPSCs have higher potential capacity for glutamine transport, compared to hESCs.
To test this hypothesis experimentally, the uptake of radio-labelled glutamine was measured for both hiPSCs and hESCs (see methods). These data showed that hiPSCs had a median 93% higher uptake of glutamine, compared to hESCs (Fig.   5e). This experiment provides independent confirmation of the functional significance of the quantitative protein expression data determined by MS analysis and supports one of the key conclusions concerning phenotypic and functional differences between hiPSC and hESC lines.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint To explore the potential consequences of the increased glutamine uptake in hiPSCs, we focused on the glutamine catabolism pathway. Both the GLS and GLUD1 proteins show over two-fold higher expression in hiPSCs, compared to hESCs, with a p-value <0.01. Glutaminolysis has been shown to be vital for human PSCs as it can provide ATP via the TCA 27 , as well as the aforementioned biosynthetic . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint precursors required to sustain growth and proliferation. Our data show most TCArelated enzymes are increased in expression >1.5-fold in hiPSCs, compared with hESCs, with the biggest change seen for the isocitrate dehydrogenase 3 and succinate dehydrogenase complexes (Fig. 5f).

Secreted proteins and extracellular matrix
Amongst secreted proteins and proteins related to extracellular matrix organisation, multiple growth factors of relevance to primed pluripotent stem cells showed significantly increased expression in hiPSCs, including FGF2, FGF1, TGFB1 and NODAL. FGF and Activin/Nodal, which have important roles in differentiating cells, are also vital components of signalling pathways that maintain pluripotency within human primed stem cells [28][29][30] . Both show >3-fold higher expression in hiPSCs, compared to hESCs (Fig. 6a).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint A further 8 growth factors were identified as having significantly increased expression in iPSCs, including MYDGF, which is reported to promote cardiac protection 31 and MDK, which has been reported to promote inflammation by recruiting macrophages and neutrophils 32 . The data also highlighted 6 protease inhibitors, some linked to thrombosis, like TFPI 33 , along with 6 serine proteases and 7 metalloproteinases (Fig. 6b).
Focussing then on the extracellular matrix (ECM) proteins and those known to interact with its components, hiPSCs showed the highest increase for type IV collagen (COL4A1 and COL4A2), along with alpha integrins (ITGA2 and ITGAV) (Fig   6c). Increased expression was also seen for the most abundant laminins (LAMA1, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Histone variants
Very few proteins (<1%; 52/7,878) showed decreased expression (p-value<0.01) within hiPSCs, in comparison with hESCs. An ORA showed that the proteins decreased in abundance were enriched for GO terms related to DNA recombination, nucleosome positioning and chromatin silencing (Fig. 7a). Notably, this included four Histone H1 proteins. Histone H1 proteins are linker histones, which do not form part of the core histone octamer, but instead sit on top of the nucleosome and bind DNA entry and exit sites (Fig. 7b). They have been linked with influencing nucleosomal repeat length 36 and stabilising chromatin structures 37 .
Our data show that the most abundant variant in hESCs, HIST1H1E, which is present at almost 25 million copies per cell, is significantly decreased in expression in hiPSCs to just over 7.5 million copies per cell. A similar case was also seen for HIST1H1C, HIST1H1D and H1FX, which were all decreased in expression within hiPSCs compared to hESCs (Fig. 7c).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  These specific histone variants have high sequence similarity; hence some peptide sequences will be unique to each protein, but some will be shared by many.
HIST1H1E, HIST1H1D and HIST1H1C have 75% sequence identity. Thus, while unique peptides were identified for all 3 proteins, a pool of shared peptides were also identified that could belong to all 3 of them (Fig. EV2). Due to limitations in how the Andromeda algorithm 38 assigns peptides to a protein, all of the shared peptides were assigned to HIST1H1E, potentially distorting the abundance of the 3 histones by overestimating HIST1H1E and underestimating HIST1H1D and HIST1H1C. Despite the potential issues in reliably estimating total abundance, the data for both the H1FX HIST1H1D HIST1H1E shared peptides assigned to HIST1H1E variants Trypsin unique peptides within SwissProt unique peptides within this dataset HIST1H1C . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint unique and shared peptides is consistent and supports that all three of these H1 variant proteins were reduced in expression in iPSCs. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made This reduced expression seen in H1 histones was not seen for protein members of the other histone families. Evaluating the estimated protein copy numbers across all histones, for example, showed no significant differences between hiPSCs and hESCs (Fig. 7d). However, there was a difference in expression for H2 histones, with hiPSCs showing significantly (p-value=0.003) higher expression compared to hESCs (Fig. 7e). Moreover, it is particularly interesting that the core H2 histones, HIST1H2BK, HIST1H2AC and HIST1H2BJ, were unchanged in expression between the hiPSC and hESC populations. Rather, altered expression was seen for the H2 variants, H2AFV, H2AFY, H2AFY2, which were all significantly (p-value<0.01) increased in expression within hiPSCs compared to hESCs (Fig. 7f). As with H1 histones, these H2 variants share similar sequences. For example, H2AFY and H2AFY2 have 68% sequence identity. Nonetheless, both the shared and unique peptides display congruent behaviour, with all showing increased expression in hiPSCs (Fig. EV3). . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Discussion
Induced pluripotent stem cells can provide vital models for clinical research and future therapies, which makes understanding their similarities and any specific differences with embryo-derived human stem cells all the more important. Our data have highlighted that while multiple, independent hESC and hiPSC lines express a near identical set of proteins, with similar abundance ranks, they also display important quantitative differences in the copy numbers with over 20% of all proteins quantified were significantly increased (fold change>2 and p-value <0.01) in expression within hiPSCs, compared to hESCs. Consequently, estimation of the total protein content per cell, as calculated from the MS analysis, showed that hiPSCs had a median increase of ~60% when compared to the hESCs. The conclusion that hiPSCs have a higher protein expression level than hESCs was subsequently confirmed using an orthogonal EZQ assay, independent of the MS data, which indicated ~75% higher total protein levels in iPSCs.
An important technical point that emerges from this study is that the normalisation approach used to analyse the data has to be carefully considered. Thus, by using a standard median normalisation (concentration-based approach), instead of the proteomic ruler 19 , the difference in total protein content between the cell types, involving the increased expression of thousands of proteins, is not apparent. This results in an erroneous conclusion that there is little to no change in protein expression between hiPSCs and hESCs, while the MS data analysed for protein copy number and validated via independent, non-MS methods, shows that protein levels are significantly higher across all of the independent hiPSC lines.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copy number data showed that the increase in total protein levels in iPSCs resulted specifically from enhanced expression of a subset of protein families. One of the most prominent protein families showing increased expression in hiPSCs were the amino acid transporters, which are known to play important roles to fuel cell growth and protein synthesis 39 . The largest increase in transporter expression was detected in the 3 main glutamine transporters, i.e., SLC1A5, SLC38A1 and SLC38A2. The functional significance of this increased expression was shown by performing a radio labelled glutamine uptake assay, which revealed that hiPSCs had ~94% higher glutamine uptake. Furthermore, there was also increased expression of proteins involved in the downstream glutaminolysis pathway, including GLS and GLUD1. These two proteins are involved in the conversion of glutamine to glutamate and subsequently to alpha-ketoglutarate, an important intermediate for the TCA pathway, which in turn can provide additional energy required to fuel high protein synthesis rates in hiPSCs. It has been reported that when cells preferentially use the glycolytic pathway, as seen in both primed pluripotent stem cells and many transformed tumour cells, there is increased demand for biosynthetic precursors and NADPH, which can be supplied by glutaminolysis [40][41][42] . The higher uptake of glutamine is potentially fuelling the increased protein mass seen in hiPSCs.
The data also showed that specific subsets of mitochondrial proteins were increased in hiPSCs. Virtually all proteins involved in mitochondrial rRNA processing, translation initiation, elongation and termination of the mitochondrial genomeencoded proteins, were significantly increased in expression. We also detected increased expression of proteins encoded in the mitochondrial genome, which are all hydrophobic membrane proteins that are components of the electron transport chain (ETC) 43 . Similarly, all ETC complexes were significantly increased in expression in . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021.10.20.464767 doi: bioRxiv preprint hiPSCs with the highest fold change seen on complex II, which is also a part of the TCA. The mitochondrial protein differences extended also to mitochondrial transporter proteins, all of which were significantly increased in expression, including SLC25A6, the main ATP/ADP transporter within both cell types.
Differences between the mitochondria in hiPSCs and hESCs have been previously reported. For example, it has been shown that hESCs have globular mitochondria with few cristae, while hiPSCs show a mixture of the phenotypes with some globular and some elongated mitochondria, similar to somatic cells 44 . Furthermore, it has been shown that hiPSCs have higher oxygen consumption rate and reserve capacity 44 , which is congruent with our proteomic data, as we also see increased expression of the OXPHOS related machinery and transporters.
The other major class of proteins showing increased expression between hiPSCs and hESCs were secreted and ECM-related proteins. These proteins can exert a wide range of effects beyond the cell that produced them. For example, MDK has been reported to promote inflammation by recruiting macrophages and neutrophils 32 , while increased expression of SERPINE1 (PAI-1) is linked to higher risk of deep vein thrombosis 45 and a strong risk factor for stroke in the elderly 46 . Furthermore, FGF2 overexpression has been linked with breast cancer 47 , gastric cancer 48 and gliomas 49 .
FGF2 is of relevance to pluripotent cells as it has been shown that sustained increased exposure to FGF2 better promotes ERK activation in human primed pluripotent cells 50 . ERK2, which also is significantly increased in expression in hiPSCs, has been shown to promote protein synthesis via multiple mechanisms, including through mTORC1 activation 51 , eIF4E phosphorylation 52 and PDCD4 inhibition via RSK1/2 53 . The increased expression of FGF2 could be a feedforward loop driving/sustaining growth in hiPSCs.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021. 10.20.464767 doi: bioRxiv preprint The only family of proteins which were expressed at significantly lower levels in iPSCs, as compared with hESCs, were H1 histone variants. H1 histones are often referred to as 'linker histones'. They do not form nucleosomes directly, but bind to nucleosomes and are reported to compact chromatin 54 . It has been reported that upon differentiation of hESCs, the expression levels of the H1 histone variants are increased 55 . In other cell types it has also been shown that changes in H1 histone variants are linked with modified differentiation potential 56 . It therefore would be of interest in future to study if the variations in expression of these histones affect the differentiation potential of hiPSCs into different lineages, compared to hESCs.
In summary, our data show that hiPSCs and hESCs, despite their clear similarities, are not identical. These data help define the specific differences between these cells at the protein level and will assist researchers in developing strategies to mitigate for these differences as hiPSCs continue to be used in clinical applications and as disease models.
interpret the data. A.I.L and D.A.C supervised the project and helped to interpret the data. The paper written be A.J.B and A.I.L and edited by all authors.

Declaration of interests
E.G now works for Boehringer Ingelheim.
Cells were routinely passaged twice a week as single cells using TrypLE select (Life Technologies) and replated in TESR medium that was further supplemented with the Rho kinase inhibitor Y27632 (Tocris, 10 μ M) to enhance single cell survival. Twentyfour hours after replating Y27632 was removed from the culture medium. For proteomic analyses cells were plated in 100 mm geltrex coated dishes at a density of 5x10 4 cells cm -2 and allowed to grow to for 3 days until confluent with daily medium changes.

Protein extraction
Cell pellets were resuspended in 300 µL extraction buffer (4% SDS in 100 mM triethylammonium bicarbonate (TEAB), phosphatase inhibitors (PhosSTOP™, Roche)). Samples were boiled (15 min, 95 °C, 350 rpm) and sonicated for 30 cycles . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021.

High pH reversed phase fractionation
TMT labelled samples were fractionated using off-line high pH reversed phase chromatography. Dried samples were resuspended in 5% formic acid and loaded onto a 4.6 x 250 mm XBridge BEH130 C18 column (3.5 µm, 130 Å; Waters).
Samples were separated on a Dionex Ultimate 3000 HPLC system with a flow rate of 1 mL/min. Solvents used were water (A), ACN (B) and 100 mM ammonium formate pH 9 (C). While solvent C was kept constant at 10%, solvent B started at 5% for 3 min, increased to 21.5% in 2 min, 48.8% in 11 min and 90% in 1 min, was kept at 90% for further 5 min followed by returning to starting conditions and re-equilibration for 8 min. Peptides were separated into 48 fractions, which were concatenated into . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  with a mass resolution of 120,000, an AGC target of 4x10 5 ions and a maximum injection time of 50 ms. Precursor ions with charges between 2 and 7 and a minimum intensity of 5x10 3 were selected with an isolation window of m/z 1.2 for fragmentation using collision-induced dissociation in the ion trap with 35% collision energy. The ion trap scan rate was set to "rapid". The AGC target was set to 1x10 4 ions with a maximum injection time of 50 ms and a dynamic exclusion of 60 s. During the MS3 analysis, for more accurate TMT quantification, 5 fragment ions were coisolated using synchronous precursor selection in a window of m/z 2 and further fragmented with a HCD collision energy of 65%. The fragments were then analysed . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

LC-MS analysis
The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021. 10.20.464767 doi: bioRxiv preprint in the orbitrap with a resolution of 50,000. The AGC target was set to 5x10 4 ions and the maximum injection time was 105 ms.

Radiolabelled glutamine uptake (protocol was adapted from 60 )
Two hiPSC lines (wibj_2 and oaqd_3) with 3 technical replicates each were compared to two hESC lines (SA121 and SA181) with 3 technical replicates of each.
Both hiPSCs and hESCs were plated in 6-well plates 2 days before the transport assay (5e4 cells/cm2 -this gives 1e6 cells/well on "uptake day" ). The cell growth media was carefully aspirated so as not to disturb the adherent monolayer of cells.

Data filtering
All protein groups identified with less than 2 razor or unique peptides or labelled as 'Contaminant', 'Reverse' or 'Only identified by site' were removed from the analysis.

Copy number calculations
Protein copy numbers were estimated following the "proteomic ruler" method 19 but adapted to work with TMT MS3 data. The summed MS1 intensities were allocated to the different experimental conditions according to their fractional MS3 reporter intensities.

Protein content estimations
The protein content was estimated using the following formula: CN × MW and then converting the data from Daltons to picograms, where CN is the protein copy number and MW is the protein molecular weight (in Da).

28S to 39S ratios
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 20, 2021. ; https://doi.org/10.1101/2021. 10.20.464767 doi: bioRxiv preprint For each hiPSC and hESC line the sum of the estimated copy numbers for all subunits of the 28S complex were divided by the sum of estimated copy numbers of all 39S subunits.

Differential expression analysis
Fold changes and P-values were calculated in R, for individual proteins the p-values were calculated with the bioconductor package LIMMA 63 version 3.7. The Q-values provided were generated in R using the "qvalue" package version 2.10.0. P-values for protein families and protein complexes were calculated using Welch's T-test.

Subcellular localisation data
The subcellular analysis was performed using the database obtained from the Human Protein Atlas 64 in their subcellular location dataset version 20.1

hiPSC vs hESC overrepresentation analysis
All overrepresentation analysis were done on WebGestalt. The first analysis selected proteins where the log 2 fold change was greater than 1 and a p-value lower than 0.01. The second analysis selected proteins whose fold change was lower than median minus one standard deviation (0.195) and a p-value lower than 0.01. Both analyses used all identified proteins with 2 or more razor and unique peptides as a background and required an FDR lower than 0.05.