Screening for variable drug responses using human iPSC cohorts

We have used a cohort of human induced pluripotent stem cell (hiPSC) lines to develop a laboratory-based drug screening platform to predict variable drug responses of potential clinical relevance. Our approach is based on the findings that hiPSC lines reflect the genetic identity of the donor and that pluripotent hiPSC lines express a broad repertoire of gene transcripts and proteins. We demonstrate that a cohort of hiPSC lines from different donors can be screened efficiently in their pluripotent state using high-throughput cell painting assays, allowing detection of variable phenotypic responses to a wide range of clinically approved drugs, across multiple disease areas. Furthermore, we provide information on mechanisms of drug-cell interactions underlying the observed variable responses by using quantitative proteomic analysis to compare sets of hiPSC lines that had been stratified objectively using cell painting data. We propose that information derived from comparative drug screening using curated libraries of hiPSC lines can help to increase the success rate of drug development pipelines and improve the delivery of safe new drugs suitable for a broad range of genetic backgrounds and gender diversity within human populations.


ABSTRACT
We have used a cohort of human induced pluripotent stem cell (hiPSC) lines to develop a laboratory-based drug screening platform to predict variable drug responses of potential clinical relevance. Our approach is based on the findings that hiPSC lines reflect the genetic identity of the donor and that pluripotent hiPSC lines express a broad repertoire of gene transcripts and proteins. We demonstrate that a cohort of hiPSC lines from different donors can be screened efficiently in their pluripotent state using highthroughput cell painting assays, allowing detection of variable phenotypic responses to a wide range of clinically approved drugs, across multiple disease areas. Furthermore, we provide information on mechanisms of drug-cell interactions underlying the observed variable responses by using quantitative proteomic analysis to compare sets of hiPSC lines that had been stratified objectively using cell painting data. We propose that information derived from comparative drug screening using curated libraries of hiPSC lines can help to increase the success rate of drug development pipelines and improve the delivery of safe new drugs suitable for a broad range of genetic backgrounds and gender diversity within human populations.

INTRODUCTION
Most current drug development pipelines involve an initial laboratory-based research phase, in which disease-relevant molecular targets are identified (usually proteins) and then either small molecules, or biological effectors (e.g. antibodies), are characterised that specifically bind to, or otherwise modulate, the targets. Promising drug candidates are then progressed to clinical trials, where they must be evaluated for efficacy and possible toxicity in human patients before they can be certified for use by regulatory authorities. Unfortunately, these pipelines often have very high failure rates, with 80% or more drug candidates in major disease areas failing to pass clinical trials (1). The high failure rates at late stages of drug development are a major factor in the high cost of bringing safe and effective new drugs to market.
A significant limitation with drug development pipelines is that they typically do not take into account effects that may disproportionately affect the suitability of drugs for use with patient groups from different sexes and/or genetic backgrounds. Variable drug responses resulting from natural variations in human biology have been cited as one of the major reasons for high failure rates of drug candidates in clinical trials (2).
Fundamentally, drug development pipelines that map drugs to targets in the laboratory do not account for normal population-level differences between individuals.
There are now many examples of drugs used in oncology (3), anti-psychotics (4), cholesterol-lowering medications (5,6) and others (7) where variable efficacy and toxicity are well-documented and included in standard dosing guidelines. In many cases, differential responses result from epistatic interactions and have been mapped to genetic variants affecting the expression levels and/or activity of, for example metabolic enzymes, other proteins acting in target pathways or membrane transporters. A key gap in the drug development pipeline, therefore, is the lack of assays that report on the impact of population-level phenotypic variation on drug-cell interactions during the pre-clinical phases of drug development.
Several projects have now established cohorts of human induced pluripotent cell (hiPSC) lines from multiple donors using standardised protocols (8,9). In particular, the Human Induced Pluripotent Stem Cell Initiative (HipSci, https://www.hipsci.org), has established a library of >700 cell lines generated from over 300 individuals, including healthy male and female donors of varied ages, as well as patient cohorts with known genetic disorders (10). The HipSci cell lines were all reprogrammed from human skin fibroblasts, following transduction of the pluripotency transcription factors using a Sendai virus vector (10). Many of these cell lines have been subjected to mRNA and protein expression analysis and all these data are publicly available. Interestingly, comparison of mRNA and protein expression patterns in the HipSci lines showed that mRNA and protein levels in separate hiPSC lines from the same individual are more similar to each other than to lines derived from different donors suggesting that the biomolecular states of th HipSci lines are strongy dfind by the genetic identity of the donor (10,11). The HipSci cohort therefore provides a tractable system for measuring how human genetic variation affects cellular phenotypes, potentially including variable drug responses.
The ability of hiPSCs to be converted into defined cell types through standardised differentiation protocols is frequently used for experimental manipulation (12). However, the pluripotent state of hiPSCs provides a significant advantage for drug screening.
Several studies have measured the range of gene transcript and protein expression in hiPSCs and embryonic stem cells (ESCs) (11,13). In general, human ESCs and iPSCs express a broader repertoire of gene transcripts and proteins than terminally differentiated, primary cells and tissues, or in most tumour-derived transformed cell lines.
For example, iPSC lines express ~2.5-fold more receptor tyrosine kinases than many primary cells or tumour cell lines. For the majority of chromosomes, >50% of their known protein coding genes are expressed in iPSCs ((11) and references therein). This broad molecular expression profile, including many pathways targeted by clinical therapeutics, makes the pluripotent state of hiPSCs an attractive foundation for compound and drug screening.
In this report, we describe the establishment and characterisation of a laboratory-based drug screening platform that compares drug responses between multiple hiPSC lines in the pluripotent state. The platform combines a cell painting assay advanced analytics and mass-spectrometry (MS)-based proteomic profiling to detect and explain variable drug responses. We validate the use of this screening platform using a wide variety of clinically-approved drugs that target diverse physiological pathways and disease areas.

Establishing Pluripotency in hiPSC Cohorts in High-Throughput Format
To test whether we could detect and characterize variable drug responses across a cohort of hiPSC lines from different donors, the high throughput cell painting assay (14) was first adapted to pluripotent hiPSC lines. We reasoned that we could use an imagebased cell painting assay to detect variable responses and then employ quantitative MSbased proteomics to identify potential mechanisms causing variable response (see Figure 1A). We first established whether multiple hiPSC lines could be cultured in the pluripotent state in 384 multi-well plates, suitable for high content imaging and analysis.
Twenty eight independent hiPSC lines derived from different donors (Suppl Table 1), were propagated in mTeSR medium, plated at 3x10 4 cells/cm 2 in 384 multi-well plates for 72 hrs and then fixed and analysed by immunofluorescence to detect the established markers of pluripotency, i.e., Oct-4, Sox-2, Nanog and TRA-1-81 (Supp Figure 1A). All four markers showed strong, homogeneous staining.
We conclude, first, that multiple hiPSC lines can be cultured successfully in multi-well format suitable for large-scale, compound profiling by image analysis and second, that the panels of hiPSCs can be maintained in a pluripotent state throughout the 48-72 hours required for conducting the screening protocol.

Using Cell Painting to Profile Variable Drug Responses
Having characterised the growth conditions in 384 multi-well plates, the same set of 28 independent hiPSC lines from different donors (hereafter, "donors") were grown in 384 multi-well plates for 24 hrs (see Methods for details), then treated either with a library of FDA-approved drugs, or with DMSO carrier, each at a single concentration (Suppl Table   2). After incubation for a further 24 hrs in the presence of the drugs, the cells were fixed and stained with cell painting markers (see Materials and Methods). Images were recorded in an automated, plate-based imager, then imported into OMERO (9). Images were segmented with respect to the 'nuclei', 'cytoplasm', 'Golgi' and 'protrusions', using a custom CellProfiler pipeline (15). For each drug-donor combination, image features were calculated, using 871 separate feature measurements from a total of 6,000-8,000 cells per donor-drug pair. After Robust Z' normalisation, all measured features were scaled as the number of standard deviations away from DMSO for each of the hiPSC lines (14). Figure 1B shows a heatmap of the measured features of 28 donors, comparing treatment with 4 different drugs, i.e., atorvastatin, simvastatin, rapamycin and afatinib. The heatmap shows that each drug has characteristic feature patterns, but that different donors show varied magnitudes of response, visible as either darker, or lighter horizontal stripes, respectively ( Figure 1B). The feature patterns for atorvastatin and simvastatin are similar, consistent with them having the same protein target, i.e., HMG CoA reductase. Interestingly, atorvastatin, which has a higher target affinity also shows a stronger response than simvastatin in this assay.
These data demonstrate that the cell painting assay, which was originally developed for analysis of immortalised cancer lines (14), also generates characteristic phenotypic fingerprints in hiPSC lines. Moreover, the data show that cell painting fingerprints from hiPSC cohorts reveal variable responses between the lines from different donors, with the major variation appearing as differences in the magnitude of features.
Next, the assay was expanded to profile the hiPSC cohort responses to 52 different FDAapproved drugs, which are used to treat a range of different diseases. To visualise this larger dataset, two different data reduction techniques were used because no single visualisation sufficed to reveal all relevant detail in this dataset. First, features were removed that had a Spearman correlation coefficient >0.98 which left 442 features for further analysis. The filtered features from all drug-donor pairs were then subjected to embedding with UMAP, which fits a manifold to a high-dimensional dataset and then projects this to a 2D plane (16) (Figure 2A). This showed that compound responses clustered according to their known mechanism of action. For example, statins, everolimus and rapamycin, topoisomerase inhibitors, cytotoxics and microtubule depolymerizers all formed separate, individual clusters. Interestingly, fluphenazine, an antipsychotic that acts by blocking specific dopamine receptors in the brain and that is used to treat schizophrenia and other psychotic disorders, clustered near the statins. This is consistent with reports that fluphenazine, like statins, affects lipid metabolism in human patients (17).
Besides showing that many drugs with similar modes of action cluster in this assay, the data also reveal cell lines that show variable responses to the same drugs. For example, when treated with rotenone, which disrupts microtubules, most hiPSC lines cluster together ("MT Modulators, Figure. 2A). An exception is the nufh4 cell line, which clusters 7 instead with hiPSC lines treated with fenbendazole. This is a drug which also causes microtubule depolymerisation, but by a different mechanism to rotenone, which mostly inhibits microtubule assembly dynamics (18,19) (filled circles with trefoil, Figure 2A).
To test whether these results were reproducible, a selection of the previously analysed hiPSC lines were regrown from frozen stocks and analysed in cell painting assays after treatment with the same set of FDA-approved drugs, but this time using a different HCS imaging system (data collected on a Yokogawa CV7000 instead of InCell 2200). Using UMAP for visualisation, we observed that the cell painting features generated in this repeat set of assays were equivalent to those collected from the original experiment that was performed 6 months previously, using independent cultures of each hiPSC line and with data acquired using a different type of HCS imaging system (Suppl Figure 1B and erlotinib showed more dispersed clustering of cell line responses. We hypothesised that this behaviour reflected an increased degree of variable response to these drugs between the respective cell lines. To investigate potential mechanisms of variable response underlying clustering differences observed in UMAP plots, we next used linear approaches for measuring compound responses, aiming to avoid effects specific to clustering in high-dimensional spaces (20). Figure 2B shows a heatmap of the cell painting induction, calculated as the fraction of filtered features > 2σ from DMSO (and thus significant with >95% certainty) (21). This has the effect of reducing the filtered features for each drug-donor pair to a single statistic and thus likely underestimates the complexity of phenotypes that give rise to clustering in UMAP. Nonetheless, this visualisation shows clearly that, first, hiPSC lines derived from different donors exhibit different levels of induction when treated with the same drugs and second, that examples of such differential response are observed between cell lines for all of the different drugs tested.
In all but one case, the individual cell lines show varied responses and appear to be either hyper-or hyposensitive to specific drugs. The exception was the cell line voce2, which consistently showed a higher induction level (i.e., it is hypersensitive to most of the drugs used in this assay), than the other cell lines. Nonetheless, combining the results shown in Figure 1B (feature heatmap) and Figure 2B (induction heatmap), we conclude that a significant source of variable responses between donor cell lines seen in the cell painting assay results from differences in the magnitude of response shown by each line.

Linking Variable Responses to Protein Expression & Stoichiometry
Our previous studies on this HipSci cohort have demonstrated that variations in protein expression levels between lines derived from different donors can be controlled by specific genomic loci, i.e., protein Quantitative Trait Loci (pQTLs) (11). This suggested the hypothesis that a potential source of differential drug responses between hiPSC lines might be due to epistatic interactions, i.e., resulting from genomic variation between donors causing differences in protein expression that can modulate drug-induced phenotypes, for example by affecting the stoichiometry of components of drug response pathways (22). Importantly, this hypothesis takes into account drug-cell interactions outside of the direct interaction of a drug with its protein target. Furthermore, it can potentially explain why patterns of drug response features measured by cell painting appear similar across cell lines, while the magnitude of responses between lines differ, as seen in the feature and induction heatmaps ( Figures 1B and 2B).
To test this hypothesis, we first used the induction plot ( Figure 2B) and a graph of induction values (Supp Figure 1C), to classify cell lines as, respectively, "low" and "high" responders, for two well-characterised drugs, i.e., atorvastatin and simvastatin, that share the same target. Cell lines denw6 and zaie1 ("low responders") and hayt1 and tuju1 ("high responders"), were each treated separately with either DMSO (control), or with either atorvastatin or simvastatin, each at 5 µM for 24 hrs, as in the cell painting assay. Following treatment, cell extracts were prepared and proteomes analysed by LC-MS/MS (see Methods). From the resulting data, differences in protein expression levels between DMSO-and drug-treated lines were visualised in 'volcano plots' (Figure 3).
Combining data for treatment of all four cell lines with either atorvastatin ( Figure 3A), or simvastatin ( Figure 3B), showed that many of the proteins whose expression increased after drug treatment are involved in lipid metabolism (red dots in Figure 3A, B). Further, GO term enrichment analysis also showed a significant enrichment for expression of protein factors involved in lipid metabolism (Suppl Table 3  While a common type of response to inhibition of cholesterol synthesis is seen across the hiPSC lines, differences were evident in the magnitude of responses between individual cell lines to atorvastatin, or simvastatin, as detected by cell painting induction ( Figure 2B). We therefore tested whether such variation in response magnitude might be linked to differences in protein expression between the respective high and low response lines. Volcano plots comparing protein expression in the high and low response lines after treatment with either atorvastatin ( Figure 4A), or simvastatin (Supp Figure.  In addition to identifying well-established targets of statins, using GO enrichment and STRING analysis (https://string-db.org/) to characterise the proteins whose expression increases in the high response lines identified factors involved in Rab protein signal transduction and in the metabolism of RNA ( Figure 4C and Supp Figure 2C; Suppl Table   5 and 7), all of which are known to change expression upon statin treatment (33)(34)(35)(36)(37). In contrast, the low response lines showed increased expression of proteins linked to the actin cytoskeleton along with several ubiquitin E2 and E3 enzymes, suggesting that ubiquitin-conjugation levels may affect the degree of statin response ( Figure 4D and Suppl Figure 2D; Suppl Table 6 and 8). These data support the view that there is a diversity of factors and pathways that can affect the cellular response to well-defined drug perturbations and these differences are due in part to their proteomic landscapes.
In summary, these data are consistent with the hypothesis that variable levels of response to statins, and potentially also to other classes of drugs, reflect population-level variations in gene and protein expression between the respective lines.

DISCUSSION
We have established the first laboratory pipeline for the systematic analysis of population-level variable drug responses, using a cohort of hiPSC lines, each derived from multiple, healthy donors (10). This pipeline provides a 'genetically-informed', highthroughput assay for identifying and characterising variable drug responses in human cells, using cell painting and quantitative proteomics.
The choice of screening hiPSC lines specifically in their undifferentiated, pluripotent state leverages the recent discovery that, at least in part, gene expression in hiPSC lines reflects the genetic identity of the donor at both the RNA and protein levels (10,11).
Thus, the undifferentiated hiPSC lines can behave as avatars of the donors, with respect to their proteomes and cellular phenotypes. The corollary is that by screening how panels of hiPSC lines from different donors respond to drug treatment in a standardised assay format, these lines can be stratified objectively based upon measurements of their varied phenotypic responses. We hypothesise that these data may reveal important information about variable human drug responses that is germane to clinically relevant differential drug responses between patients.
The data presented in this study show that features can be extracted from the analysis of high-throughput, cell painting, fluorescence microscopy images, and used to stratify differences in the magnitude of drug responses between different hiSPC lines, (illustrated in Figures 1 and 2). The data also show that the stratification of cell lines based upon their degree of drug response is reproducible in this assay format. Further, by using quantitative proteomic analysis to compare the stratified sets of hiPSC lines we show that additional information relevant to understanding the mechanisms underlying variable drug response phenotypes can be identified. We have focussed on proteome level analysis here because proteins are the direct targets of most drugs in clinical use and because proteins are also the primary mediators of most disease processes and mechanisms of drug action.
We propose a hypothesis to explain the reproducible, differential response to drug treatment that is detected between different hiPSC lines in the screening assay. This hypothesis postulates that variations between the cell lines, at the genomic and/or epigenomic level, in turn determine the respective proteomes and thereby lead to differences in drug-cell interactions. We envisage that drug-cell response phenotypes can reflect multiple epistatic interactions that are predominantly mediated at the proteome level. This hypothesis is supported by using mass spectrometry to compare the proteomes of hiPSC lines that were stratified by cell painting as either 'high' or 'low' responders to treatment with either atorvastatin and simvastatin. While we have concentrated here on comparing protein expression levels between cell lines, we note that other comparative analyses, for example comparing protein phosphorylation and other protein post translational modifications (PTMs) and protein-protein interactions can also provide valuable insights into the mechanisms causing variable drug responses between the different cell lines.
In this study we have investigated potential mechanisms involved in the variable responses seen to statins in this screening assay. Statins are drugs developed to reduce the level of cholesterol in blood and thus lower the risk of developing cardiac and circulatory diseases, including angina, heart attack and stroke (38). Separate, but

Cell Culture
Human iPSC lines used in this study were from the HipSci cohort as previously described (4). Feeder-free human iPSC lines were cultured in Essential 8 (E8) medium (E8 complete medium supplemented with (50x) E8 supplement ThermoFisher-A1517001) on tissue-culture dishes coated with 10 µg/cm 2 of reduced Growth Factor Basement Membrane matrix (Geltrex, ThermoFisher A1413202 resuspended in basal medium DMEM/F12 Thermo Fisher 21331020). Medium was changed daily.
To passage feeder-free hiPSC lines, cells were washed with PBS and incubated briefly  Table 2 lists the compounds and their final concentrations used in this study.
For proteomics analysis drug treatments with simvastatin (Tocris 1965) and atorvastatin; (Selleckchem S5715) were performed in triplicate in 6-well plates at a final concentration of 5 µM for 24hrs.

Cell staining and analysis of pluripotency markers
All cell lines used in this study were quality controlled for pluripotency prior and during the HCS assays. All fixation, permeabilisation and immunostaining analyses were performed at room temperature, apart from primary antibody incubation, which was

Data Processing, Analysis and Visualization of Multiple Cell Lines
Raw images were imported into OMERO Plus (Glencoe Software, Inc., (39)) and then processed using a custom pipeline in CellProfiler (14), which segmented nuclei, cytoplasm, Golgi and cortical protrusions and calculated a range of defined features. All further steps were executed using the Pandas Python library (40). Features for each plate were normalised by the median of features in the DMSO control for that plate.
Features with a coefficient of variation >0.50 were removed from further analysis. We further removed all features with |Spearman coefficient| > 0.98. Z-normalised data were then visualised using Uniform Manifold Approximation and Projection (UMAP (16)).
UMAP parameters were n_neighbours = 15, mindist = 0.01 and a cosine metric was used for scaling distance between points. GO enrichment analysis was carried out using WebGestalt (WEB-based Gene SeT AnaLysis Toolkit) (41,42). Functional proteinprotein interaction analysis was carried out using STRING-DB (43).

Proteomics sample preparation
To support parallel MS-based proteomics analysis in this screening platform in a 6-well plate format, we also tested whether the expression of pluripotency markers was maintained when hiPSC lines were plated at 5x10 4 cells/cm 2 prior to any drug addition and cultured as previously described. This showed that strong, homogeneous expression of the four pluripotency markers was maintained at this higher density (not shown).
All hiPSCs samples were washed twice with 1x PBS on ice twice, then centrifuged at 300g to collect the cell pellets. 4% SDS lysis buffer (10 mM TCEP, 100 mM Tris-HCl, pH 7.4) was used to lyse the cells and extract proteins, with 1x Protease and Phosphatase Inhibitor (A32961, Thermo Fisher) added. 1 mL lysis buffer was added to the cell pellets. 500 ug proteins from each sample were further processed using the SP3 protocol, as described (44). In brief, protein samples were mixed with SP3 beads (1:10, protein:beads) and digested with LysC/trypsin mixture (1:50, enzyme:protein). Peptides were eluted from the SP3 beads with 2% DMSO. The peptide concentration was measured using the Pierce™ Quantitative Fluorometric Peptide Assay following the manufacturer's instructions. Peptide samples were stored at -20℃ before LC-MS/MS analysis.

MS data analysis
For DIA MS analysis, the data from different samples were analyzed using DIA-NN (45).
The in silico spectral library was generated by DIA-NN from the Homo sapiens database from UniProt (SwissProt October 2021). The FDR threshold was set to 1% for each respective Peptide Spectrum Match (PSM). The data were searched with the following parameters: stable modification of carbamidomethyl (C), variable modifications, acetylation (protein N terminus), with a maximum of 1 missed tryptic cleavage threshold, MBR checked. The MaxLFQ-based protein quantification was performed using an R package, as described previously (46).

Supplemental Tables
Supplemental Table 1: List of hiPSC lines used in this study Supplemental