Learning chemical sensitivity reveals mechanisms of cellular response

Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to observable traits. However, comprehensive chemical screens in diverse biological models are impractical. To address this challenge, we developed ChemProbe, a model that predicts cellular sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and chemical structures. Using ChemProbe, we inferred the chemical sensitivity of cancer cell lines and tumor samples and analyzed how the model makes predictions. We retrospectively evaluated drug response predictions for precision breast cancer treatment and prospectively validated chemical sensitivity predictions in new cellular models, including a genetically modified cell line. Our model interpretation analysis identified transcriptome features reflecting compound targets and protein network modules, identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that allows researchers to measure cellular response to diverse compounds, facilitating research into molecular mechanisms of chemical sensitivity.


Introduction
Chemical probes are highly potent small molecules that selectively target known mechanism-of-action proteins 1 .These tools are crucial for understanding the role of specific proteins in biological processes and diseases, and have been instrumental in investigating a range of functions such as those related to the cell cytoskeleton, immunosuppression, mTOR signaling, protein kinase dynamics, and have often served as the starting point for drug development [1][2][3] .In addition to their primary use as therapeutic agents, drugs can serve as chemical probes in complex diseases like cancer.Addressing cancer heterogeneity necessitates precision clinical treatment strategies and research into the mechanisms that control disease resistance and sensitivity 4,5 .By improving our understanding of gene expression patterns contributing to variance in drug response, we can develop better solutions for cancer patients exploiting specific tumor vulnerabilities.
Ideally, we could test large libraries of chemicals on disease models, engineered cell lines, and patient samples to probe disease mechanisms.However, screening biological samples against a large library of chemical probes is resource-prohibitive.To overcome this problem, a variety of traditional machine learning methods have been applied to predict drug response, including support vector machines (SVMs), random forests (RFs), and multi-layer perceptrons (MLPs) 6 .Early approaches often relied on a single cellular feature set, such as mutation status or gene expression profile 7 .However, significant improvements have been achieved by incorporating multimodal information, such as chemical structure and pharmacological features 8,9 .These advancements have demonstrated the value of integrating diverse types of data to enhance drug response prediction.
Deep learning has become a way to effectively represent and integrate diverse feature sets.These methods commonly employ separate feature encoders that learn rich representations prior to integration 10,11 .In one example, variational auto-encoders (VAEs) can also leverage pretraining for transfer learning [12][13][14][15] .More broadly, neural networks are adaptable to novel inputs, such as graph representations for chemical structures [16][17][18] and their composability provides exciting opportunities for feature integration such as cross-attention 18,19 .
Interpreting the computational structure of predictive models themselves can inform on the underlying biology of compound response.On one hand, ensemble models offer confidence scores, and direct interpretation of model coefficients (e.g., attention matrices) reveals feature relationships [19][20][21][22] .Gradient-based attribution methods can also help identify features driving a particular prediction 15 .On the other hand, integrating biological priors into neural networks effectively reduces the feature space and incorporates interpretable features, such as gene ontologies and pathway annotations [23][24][25] .By imposing constraints through priors, learning operates in the context of existing biological knowledge.However, this approach can limit a model's capacity to learn novel gene combinations and systems mechanisms that are not well understood.
We developed a deep learning model that predicts the sensitivity of new cellular samples to a panel of chemicals along with a framework for understanding the gene features implicated in the response.ChemProbe learns to combine cellular transcriptomes and chemical structures to predict sensitivity.The model can be applied to new biological samples and leverages interpretable learned gene features relevant to known compound mechanisms.ChemProbe accurately models chemical response without biological priors, enabling in silico chemical screening of biological models and mechanistic interpretation of learned gene dependencies.

Conditional modeling enhances cellular drug sensitivity prediction
We hypothesized that a deep neural network model could learn to combine gene expression with chemical structure to predict cellular sensitivity (Fig 1a).We leveraged publicly available datasets to match cancer cell line basal transcriptomes with a large-scale chemical screen.The Cancer Therapeutics Response Portal (CTRP) reports the viability of 842 cancer cell lines in response to 545 compounds and compound pairs across a range of concentrations 26 .
These compounds span cell circuitry targets, offering a nuanced view of cellular response to pathway perturbations across a broad range of cellular components.The Cancer Cell Line Encyclopedia (CCLE) provides basal transcriptomic characterizations of all 842 CTRP cell lines 27 .We combined compound structures and concentrations from the CTRP with proteincoding gene transcriptomes from the CCLE to create a dataset of compound-cell line pairs consisting of approximately 5.8 million labeled examples (Methods).
We formulated the cellular drug sensitivity prediction task as a conditional model  = (|), where  is cellular viability,  is a matrix of standardized RNA abundance values,  is a matrix of chemical features, and  is parameterized by a neural network (Methods).Thus the model's prediction of cellular viability depends on a cell's transcriptomic profile in the context of a chemical structure and concentration.ChemProbe predicts viability by learning to use chemical features to modulate gene expression through linear transformations of internal gene expression representations (Fig 1b).This enables a logic akin to chemical substructures modulating gene products (proteins).We tested several ways to combine cellular features and chemical information within a single model, as assessed by the average maximum coefficient of determination (R 2 ).Accounting for comparable model sizes, we trained, validated, and hyperparameter-optimized different model architectures across five data folds stratified by cell line (five-fold cross-validation, Methods).We compared three methods of learned feature conditioning against a baseline feature concatenation approach 28 .All conditioning approaches outperformed feature concatenation by a notable margin (Table 1).Among the conditioning models, scaling, shifting, and linearly modulating gene expression by chemical features Cellular response commonly follows a sigmoidal relationship to drug concentration.To quantify whether compound dosage alone was driving drug sensitivity predictions, we performed a feature ablation experiment, wherein we purposefully removed crucial data from the model's training and compared it to the actual model.For the "straw model," 29,30 we replaced chemical fingerprints with unique but structurally uninformative and randomized numerical values.The "straw model" trained on ablated features failed catastrophically, underscoring the importance of compound structural features in the modeling task (Table 1) 30 .Explicitly modeling chemical information as conditioning provides a valuable inductive bias for chemical sensitivity prediction and gives insights into the predictive mechanisms of the model.We hyperparameter optimized 5 FiLM models across cell-line stratified data folds and used this ensemble (0.7173 ± 0.0052 R 2 ) of models in subsequent experiments (Methods).

ChemProbe predicts breast cancer patient response
We next asked whether learned transcriptional patterns would generalize to an in vivo cellular context.We measured how well ChemProbe, trained solely on cell line expression profiles, could predict drug response in clinical tumor samples.We used gene expression and patient drug response data from the I-SPY2 adaptive, randomized, phase II clinical trial of neoadjuvant therapies for early-stage breast cancer (NCT01042379) 31,32 .I-SPY2 assigned patients to treatment arms based on biomarkers such as hormone receptor status, human epidermal growth factor receptor-2 expression, and MammaPrint status.Absence of invasive cancer in the breast and regional lymph nodes at the time of surgery defined the endpoint of pathological complete response (pCR) (nonresponse, pCR=0; response, pCR=1).We assessed whether ChemProbe could retrospectively stratify I-SPY2 responders and non-responders despite these differences in datasets.The molecular characteristics of their tumors determined the drugs administered to the participants of the I-SPY2 trial.Five drugs from the I-SPY2 trial were in ChemProbe's panel.We first compared the magnitudes of ChemProbe's sensitivity predictions between responders and non-responders.For four out of five drugs, ChemProbe predicted lower scaled-AUC values for the responder group (Fig 2a).Next, we generated receiver operating characteristic (ROC) curves to compare the drug response predictions of I-SPY2 and ChemProbe with the trial outcomes.We used the treatment designations from I-SPY2 as a proxy for drug response prediction.ChemProbe's area under the ROC curve for each drug ranged from 0.60 (paclitaxel and neratinib) to 0.73 (veliparib), with a macro-average auROC of 0.65 (Fig 2b).
Despite being trained only on isogenic cell lines, these results support ChemProbe's use with heterogeneous tumors from clinical patient samples.

ChemProbe predicts cellular drug sensitivity
We conducted a prospective evaluation of ChemProbe's ability to differentiate drug sensitivity between two primary breast cancer cell lines, HCC1806-Par and MDA-MB-231-Par [34][35][36][37] .We compared the gene expression profiles of the two cell lines to their CCLE counterparts by analyzing the top two gene-expression principal components.Our analysis showed significant disparities in the gene expression patterns of the two cell lines compared to the training data, highlighting the challenges of maintaining consistency across cellular models (Extended Data Fig 3a).The observed differences, which make the prospective test more difficult but particularly informative, may be attributed to variations in cell culture protocols, reagents, and genetic drift commonly found between experimental settings 38 .

Gene expression attribution vectors pass interpretability soundness checks
To assess whether ChemProbe learned biologically relevant patterns, we investigated whether the model's gene expression saliency reflected known compound pharmacology and network biology.Saliency mapping methods evaluate which gene features a model puts the most "weight" on when making its predictions and how these choices change for different cell lines or compounds.We experimentally characterized seven new cell line transcriptomes, including primary tissue models and metastatic derivatives [39][40][41] .Using integrated gradients saliency mapping, 42 we determined IC50 values for each compound-cell line pair with ChemProbe and computed its gene attribution vectors.However, we acknowledge that integrated gradient attribution vectors may correlate with input feature magnitudes, potentially undermining their usefulness in quantifying feature importances. 43We used principal component analysis to determine the first two principal components of attribution vectors by cell line to check this.We found that attribution and transcriptome vectors correlated (Extended Data Fig 5a,c).To address this confounder, we normalized attribution vectors by line, which decreased cell-line-specific effects in the principal component analysis and decoupled the correlation between attribution and transcriptome vectors (Extended Data Fig 5b,c).
To ensure model interpretation accurately reflected learned feature transformations, we performed two tests (Methods) 43 .The first test randomly initialized the model parameters and compared the outcomes with the true model's attribution vectors.We trained the model using scrambled labels in the second test and compared its attribution vectors with the true model's.
We conducted these tests using both uncorrected (raw) and cell line-effect corrected (adjusted) attribution vectors.The raw attribution vectors were highly correlated with transcriptome profiles, random-model, and permuted-model attribution vectors, failing the tests of independence from learned parameters and the training data.However, the adjusted attribution vectors were not correlated with those derived from the control models, indicating that adjusted attribution vectors do not simply reflect data or parameter artifacts (Extended Data Fig 5c).

Learned transcriptomic features reflect compound pharmacology and network biology
Neural networks are notoriously difficult to interpret, but we hypothesized that ChemProbe's highly attributed gene features may reflect causative mechanisms or correlative biomarkers of drug sensitivity.First, we investigated whether the model relied on similar gene features for compounds with the same known protein targets.We created a control compound set (CCS) based on nominal target classes, each with at least two compounds successfully predicted in all seven cell lines.We applied K-means clustering to the CCS attribution vectors and computed the adjusted mutual information (AMI) between clusters and target class labels to determine whether transcriptomic attribution vector similarity corresponded to known compound mechanisms of action (MOA) (Fig 5a).We also examined the AMI between structural clusters and target classes, as chemical structure similarity alone may at times reflect target profile similarity, albeit imperfectly. 44e attribution vector clusters AMI was significantly greater than that of structural We next examined the network topology of nominal target classes using the STRING database of high-confidence protein-protein interactions 45 to interrogate biological relevance.We clustered attribution vectors, gathered target annotations within each cluster, and queried STRING for the respective target interactome (Methods).Target modules had significantly greater connectivity than modules generated from randomly sampled target protein sets or randomly sampled protein sets (Fig 5d).Finally, we tested whether attribution-defined target modules of action (ModOA) also showed protein interaction enrichment.On analysis, 50% of ModOA reflected significant network interaction enrichment and a variety of functional enrichments from gene ontologies, KEGG pathways, and Reactome pathways (Fig 5f).These findings suggest that highly attributed transcriptome features reflect systems biology and potential mechanisms of drug response.

Screening genetic dependencies for mechanisms of ferroptosis
We further hypothesized that ChemProbe's highly attributed gene features would relate to compound MOA.To test this, we used linear regression for differences in gene attribution between groups.This "differential attribution analysis" (DAA; see Methods) generates ranked gene lists, which we use as marker genes to arrange attribution clusters hierarchically (Fig 6a).Changes in compound sensitivity following gene knockout (KO) or overexpression can inform on mechanisms of gene-dependent protection or resistance.Accordingly, we assessed ChemProbe's utility for screening gene-dependent ferroptosis resistance in silico.Lipoprotein receptor LRP8 has recently been shown to act as a ferroptosis resistance factor by maintaining cellular selenium levels and appropriate translation of GPX4.Selenium uptake is reduced in LRP8 KO models, leading to ribosome stalling and early translation termination of GPX4, which sensitizes cells to ferroptosis 48 .We tested if ChemProbe correctly predicted that a LRP8 KO cell line would have reduced sensitivity to ferroptosis-inducing compounds than a wild-type.
Consistent with previous research, ChemProbe predicted LRP8 KO cells were more sensitive than wild-type to known ferroptosis-inducing compounds ML210, 1S,3R-RSL-3, ML162, and We noticed several correlations between cellular response and the expression of highly attributed genes for compounds that induce ferroptosis (Extended Data Fig 6b,c).We wondered if highly attributed genes played functional roles related to ferroptosis.We extracted the ten highest differentially-attributed genes and applied a functional enrichment analysis (Extended Data Fig 6d).We observed enrichment of terms related to lipid transport and fatty acid metabolic

ChemProbe architecture, training, and evaluation
The study focused on predicting drug sensitivity in the context of pharmacological intervention by integrating cell state features with compound features.To achieve this, we formulated a conditional model where cellular viability is predicted based on a vector of standardized protein-coding RNA abundance values and a vector of chemical features, including structure and concentration.We explored two methods of integrating gene expression and small molecule feature representations: simple concatenation and hierarchical integration using feature-wise linear modulation (FiLM).
ChemProbe includes a conditional encoder that embeds compound features into a vector of length  and an inputs encoder that embeds gene expression features into a vector of length .
We used a FiLM generator to predict  and  parameters of length  based on compound embeddings.The FiLM layer then applies an affine transformation of gene expression embeddings by  and  parameters.This process repeats across  FiLM layers, and the modulated gene expression embeddings pass through a linear block consisting of a linear layer, ReLU activation, batch normalization, and dropout.The final linear block compresses feature maps to a vector of length 1, and the mean-squared error is calculated between predicted cellular viability and true cellular viability.
To evaluate the performance of our model, we used cross-validation and split cell linecompound pairs into five groups of approximately equal size by cell line to avoid data leakage and performance inflation.We trained five individual models in a leave-one-out cross-validation scenario and applied 20 rounds of hyperparameter optimization to all five individually trained models.We implemented the ChemProbe model in PyTorch and applied hyperparameter optimization with Optuna.

Predictive modeling baselines
We compared different models that modify gene expression features by compound structure and concentration using various transformations.Our baseline model, "concatenation" architecture, simply combined gene expression and compound features into a single vector, which was fed into a multi-layer perceptron.We independently evaluated the isolated effects of learning transformation types using the "scale" and "shift" variants of the ChemProbe model.
The "scale" model held shift parameters constant (=0) and learned only the scale parameters (), whereas the "shift" model held scale parameters constant (=1) and learned only the shift parameters ().We assessed ChemProbe's dependence on compound concentration by creating a "permuted" model that used random binary fingerprints for each compound, ablating structural information.We trained and evaluated all models using 5-fold cross-validation on the originally defined dataset splits for three rounds of hyperparameter optimization.

Dose-response modeling
To generate predicted dose-response curves, log-logistic functions were fit to each set of cell line-compound predictions obtained from the five individually trained ChemProbe models.
A sequence of quality control conditions was defined to ensure the reliability of each doseresponse relationship.Firstly, cellular viability at any of the four largest compound concentrations was checked for increases of 20% or more from the fifth largest compound concentration.If this condition was met, the viability prediction at the largest concentration was dropped.This process was repeated recursively, and a minimum of 16 data points was required for fitting a dose-response curve.If the minimum predicted cellular viability was greater than 0.4, no dose-response curve was fit.For cell line-compound pairs that passed quality control, a 4parameter log-logistic function was fit.If the optimization failed, a 3-parameter log-logistic function was fit.If this optimization also failed, a 2-parameter log-logistic function was fit.
Additional quality control was performed during the analysis of predicted dose-response curves by filtering out log-logistic functions with undetermined parameters and with predicted EC50<1e-3 or EC50>300.Scipy was used to fit parameters of log-logistic functions to doseresponse relationships.
For relative potency comparisons, the drc package in R was employed to fit doseresponse models with a four-parameter log-logistic model.We focused on the median effective dose (ED50) as an indicator of relative potency, calculating it with the EDcomp function.
Significant differences in compound effects between cell lines were assessed using t-values and p-values obtained from EDcomp.

Retrospective I-SPY2 analysis
We obtained I-SPY2 clinical trial metadata and microarray characterizations of 988 patient transcriptomes from the Gene Expression Omnibus (GEO) (GSE194040).We matched 90% of the recorded genes to our training dataset, mean-imputed the remaining 10% of genes, and standardized the data using Z-score transformation.We then evaluated the alignment of I-SPY2 patient data with CCLE cell line training data across the first two principal components.
Next, we predicted drug sensitivity for each patient across 32 concentrations (1E-3 uM -300 uM) in response to all 545 compounds and compound pairs in the CTRP.We generated patientdrug response predictions using independent models and computed the area under the curve (AUC) of each predicted dose-response assay.We scaled the AUC of each patient-drug prediction between 0-1 based on the drug's minimum and maximum predicted AUC across all I-SPY2 patients.
Participants in the I-SPY2 trial were placed in treatment arms based on analyses of clinical and molecular information analyses, including clinical characteristics, gene expression patterns measured via microarray, and protein abundance as measured by reverse phase protein array (RPPA).The trial assessed the efficacy of various combination therapies relative to paclitaxel treatment, the clinical standard of care.We identified drugs matched between I-SPY2 treatment arms and the CTRP, including paclitaxel, neratinib, MK2206, veliparib, and carboplatin.In the I-SPY2 experimental arms, patients were treated with a combination of paclitaxel and an additional drug(s) to assess response relative to paclitaxel treatment only.As the predictive ability of ChemProbe was only evaluated with respect to the available compounds and compound pairs in the CTRP, the ChemProbe predictions for I-SPY2 patients reflected predicted patient response to a single compound rather than a combination therapy.

Prospective differential potency predictions
To identify differentially potent compounds between HCC1806-Par and MDA-MB-231-Par cell lines, we computed the difference in predicted IC50 values for compounds that passed dose-response modeling.We visually examined dose-response curves of the top 50 differentially sensitive compounds and selected candidates for in vitro testing.We based selection criteria on the completeness of dose-response curves in each cell line, including adequate Emax and Emin boundaries within the predicted concentration range.
We conducted a preliminary dose-response experiment to determine appropriate concentration points for the subsequent dose-response experiments across a broader range of concentrations than our predictions (300-1.7e-3uM, 12 points) (Extended Data Fig 4b,c).We narrowed the concentration range for the following experiment to capture response granularity (100-2.1e-6uM, 12 points) (Fig 4).

Integrated gradients
We employed integrated gradients, a path-based model attribution technique, to determine the extent to which feature gradients changed compared to a baseline feature vector.
The method involves linearly interpolating  feature vectors between a designated baseline and the query feature vector.We used zero-vector baselines for compound and gene expression features and set  = 50 as the step size.At each interpolated feature vector step, gradients of the inputs are calculated with respect to the corresponding prediction.Finally, the integral of each feature along the path of feature gradients between the baseline vector and the query vector is computed.The python package captum was used to compute integrated gradients.
To account for potential differences in cellular responses, we used the predicted compound IC50 for each cell line-compound pair to calculate integrated gradients and obtain an attribution vector at the predicted IC50.We then extracted the cell line feature attribution vector for each pair to investigate the influence of conditional compound information on gradient changes in the input gene expression features.To address cell line-specific effects, we standardized the transcriptome attribution vectors of each cell line separately using a Z-score transformation, resulting in adjusted attribution vectors (Extended Data Fig 5b,c).

Attribution method soundness checks
To evaluate the sensitivity of the attribution method to learned parameters and data features, we conducted soundness checks.First, we assessed model-dependent attribution method invariance by comparing the attribution vectors of randomly initialized parameters of architecturally identical models with those of the trained models.We applied integrated gradients to the trained and randomly initialized models and compared the attribution vectors.Second, we evaluated data-dependent attribution method invariance by permuting the data labels, training architecturally identical models, and applying integrated gradients to compare the true-model and permuted-model attribution vectors (Extended Data Fig 5c).We used the correlation between the attribution vectors of the true and alternative models to assess the attribution method's sensitivity to learned parameters and dependence between data features and labels.

Attribution similarity analysis
We investigated the relationship between compound MOAs and learned gene expression feature dependence by examining attribution vector similarity.First, we filtered attribution vectors by considering compound MOA classes with at least two compounds successfully attributed in all seven cell lines to obtain MOA classes with sufficient samples for analysis (control compound set).This resulted in 28 MOA classes, which served as a true label baseline.
We then compared these true labels to unsupervised labels generated by K-means clustering of attribution vectors from a trained model, a randomly initialized model, compound fingerprints, and a label-permutation baseline (Fig 5b).We applied K-means clustering on five independent trials.
We analyzed gene target attributions to further investigate the model dependence on individual nominal targets within each MOA class.Specifically, we applied a two-sided Wilcoxon-rank sum test to group attributions for each nominal target in the MOA class of interest and adjusted for false discovery rate (FDR) using the Benjamini-Hochberg (BH) procedure.We visualized the nominal target with the largest average attribution difference between groups for each MOA class (Fig 5c).

Attribution network analysis
We extended our analysis to include all attribution vectors generated from the 7-cell line test set.We randomly selected a single nominal target from each compound MOA class to avoid bias towards closely associated targets.This is because the nominal targets of a single compound likely fall in close network proximity, and downstream network analysis of target sets would reflect artificial over-connectivity.For example, the MOA class of neratinib includes nominal targets EGFR and HER2, which are involved in the same pathway.Therefore, we randomly chose one target from this set.
We applied Leiden clustering unsupervised discovery of attribution clusters.As described above, we defined attribution cluster MOA classes by random target selection from each compound MOA class.We filtered the STRING database to consider only high-confidence protein-protein associations (combined score > 0.7).We queried STRING for attribution cluster nominal targets and computed the connectivity of the resulting subgraph.To account for random subgraph connectivity due to target biases in STRING, we randomly sampled from available targets, queried the filtered STRING database, and computed connectivity.We repeated this procedure with randomly sampled protein-coding genes to account for random protein associations (Fig 5f).We used the networkx library for analysis.
To test for protein interaction enrichment, we defined attribution cluster nominal targets by random target selection from each compound MOA class, as described above.Next, we queried the STRING API for protein-protein interaction enrichment in the network of highconfidence protein-protein associations (combined score > 0.7).We computed statistical enrichment using the hypergeometric test, which tests if a query set of proteins has more interactions than expected relative to the background proteome-wide interaction distribution.We also applied the hypergeometric test for functional enrichment of GO terms, KEGG pathways, and Reactome pathways.We used the stringdb python package to access the STRING API.To infer potential modules of action for compounds, we selected the unique set of all nominal targets associated with an attribution cluster.
Using RNA that was rRNA-depleted with Ribo-Zero Gold (Illumina), libraries were prepared with SciptSeq-v2 (Illumina) and sequenced on an Illumina HiSeq4000 at UCSF Center for Advanced Technologies.Transcript abundances were quantified using Salmon, and tximport was utilized to summarize transcript-level measurements.We employed DESeq2 to identify differentially expressed genes.

Differential attribution analysis
To assess model dependence on individual genes within attribution clusters, we conducted an unbiased analysis.We applied a two-sided Wilcoxon-rank sum test to each gene to analyze gene attributions within a cluster relative to all remaining samples.We adjusted for FDR using BH to account for multiple testing.We utilized scanpy to apply tests across genes in each cluster relative to all other samples.Attributions were standard scaled and each cluster's top 10 most significant genes were plotted.Leiden groups were hierarchically clustered (complete linkage) by Pearson correlation.Scanpy was used for computation and visualization.We obtained gene expression-sensitivity Pearson correlation z-scores and corresponding visualizations from the Cancer Therapeutics Response Portal v2 feature correlation analysis (Extended Data Figure 6b,c).

Dose-response assay and cell culture
MDA-MB-231-Par and HCC1806-Par cells were seeded at 1,000 cells per well in quadruplicate per condition in a white opaque 96-well plate (catalog no.3917, Corning).

Extended Data Figures
Extended Data Fig performed similarly.A t-distributed stochastic neighbor embedding (t-SNE) decomposition of learned parameters demonstrated that scaling and shifting operations encoded distinct chemical features (Fig 1c).Hierarchical clustering of scaling () parameters grouped compounds by identity (Extended Data Fig 1a), whereas compound concentration correlated with the first principal component of shifting () parameters (p=1.72e-55,Extended Data Fig 1b).Thus the learned conditioning parameters interpretably reflected compound structure and concentration in the drug-response modeling task as an emergent property of model learning.
The I-SPY2 dataset introduced a significant change in the input data modality and allowed us to assess the robustness of ChemProbe.Unlike the CCLE training data, which quantified gene expression through high-throughput RNA sequencing, I-SPY2 collected pre-treatment patient gene expression by microarray.Microarrays have lower overall specificity and sensitivity and capture a smaller dynamic range of gene abundance 33 .Neural networks often fail on "out of distribution" samples whose features (gene abundance values) come from different assays than their training data.To determine the extent to which the I-SPY2 data was outside ChemProbe's training distribution, we compared dataset expression profiles across the top two principal components.A subset of the I-SPY2 data fell outside the training data distribution, consistent with the expectation that assay types introduce systematic measurement effects (Methods, Extended Data Fig 2a).
response curve (Fig 3a).ChemProbe predicted that HCC1806-Par would be more sensitive than MDA-MB-231-Par to 88.16% (201/228) of the compounds with fitted curves (Fig 3b).We focused on compounds with the largest differences in IC50 between the cell lines, selecting four compounds predicted to have strong IC50s against HCC1806-Par (neratinib, ceranib-2, CAY10618, and AZD7762) and two compounds with IC50s favoring MDA-MB-231-Par (ML162 and 1S,3R-RSL-3) (Fig 3c,d).In vitro prospective testing confirmed ChemProbe's predictions for all six compounds.Predicted differences in IC50s between the two cell lines significantly correlated with observed differences (Fig 4a-f; p=0.035,Fig 4g).We compared the relative potency of each compound at the median effective dose (ED50) between cell lines, finding significant differences in compound cellular viability as predicted (Table 2).Additionally, predicted IC50s correlated highly with measured IC50s for individual cell lines after correcting for an experimental outlier (p=0.096,Extended Data Fig 4a).These results were consistent with initial concentration range-finding experiments, where five out of six compounds had shown predicted differences in IC50s and relevant IC50s specific to individual lines (Extended Data Fig 4b,c).ChemProbe accurately predicted the sensitivities of independently obtained and characterized cell line samples, despite their transcriptomic profiles differing from the training dataset.
We noticed clusters 26 and 28 showed different prediction sensitivity to ferroptosis-inducing compounds(Fig 6b).Ferroptosis is a type of cell death implicated in multiple biological contexts, with therapeutic applications in cancer, immunity, development, and aging46,47 .These attribution clusters included compounds ML162 and 1S,3R-RSL-3, which had shown differential cellular sensitivity in the prospective in vitro experiments (Fig 4e-f).Additional compounds with ferroptosis-inducing mechanisms of action in these clusters included ML210, erastin, CIL56, and CIL70.Digging down, we investigated the attribution vectors of ferroptosis-inducing compounds to assess the alignment of model interpretations with established ferroptosis biology.We observed a clear distinction in predicted sensitivity between two groups of cell lines exposed to the same compounds (Fig 6b).To further analyze this, we merged clusters 26 and 28 into a combined cluster representing ferroptosis-inducing compounds and applied DAA.Since multiple mechanisms induce ferroptosis, we queried differential attributions of multiple ferroptosisassociated genes, including GPX4, SCD, SLC7A11, FSP1, and LRP8 46 .All ferroptosisassociated genes were within the most highly attributed in the ferroptosis-inducing compound cluster (Fig 6c).To verify that these results were not artifacts of the transcriptomes or relative gene expression differences, we also performed differential expression analysis (DEA) between MDA-MB-231-Par and HCC1806-Par.Besides GPX4, a key ferroptosis regulator, no ferroptosis-associated genes rose to significance (p<5e-2, Extended Data Fig 6a).
processes, pathways adjacent to lipid peroxidation and ferroptosis (Fig 6e).These results indicate that transcriptomic attributions align with ferroptosis biology, underscoring the potential of ChemProbe in screening genetic dependencies and identifying novel biological mechanisms.Discussion Using a conditional deep learning approach, ChemProbe evaluates cellular transcriptomic signatures against bioactive molecular structures to predict cellular responses to chemical perturbations.In experiments on cellular models and clinical tumor samples, this tool accurately predicts cellular viabilities prospectively.ChemProbe complements more clinically-oriented approaches with its ability to directly screen engineered cell lines and interrogate potential molecular mechanisms.Engineered cell lines, which possess specific genetic modifications or alterations, physically model disease conditions or the results of targeting pathways of interest.By leveraging ChemProbe, researchers can evaluate the sensitivity of known and newly engineered lines to a panel of chemical probes to assess how specific genetic modifications or alterations influence compound response.Intriguingly, deep learning model interpretation reflects compound mechanisms of action (Fig 5).The differential attribution analysis (DAA) method we introduce surfaces potential gene patterns driving responses and new disease-gene relationships.In one example, we identified genes linked to ferroptosis resistance in an LRP8 knockout cell line.ChemProbe's calculations were not specific to this biology; it may find similar use in screens for resistance mechanisms and target discovery across diverse cellular models (Fig 6).In cancer research, the tool rapidly evaluates the influence of specific oncogenic mutations or alterations in tumor suppressor genes on chemical sensitivity.When applied to engineered cell lines representing different genetic backgrounds, ChemProbe can highlight vulnerabilities and potential mechanisms of drug resistance associated with particular genetic alterations.Extending to clinical samples, ChemProbe becomes a tool for targeted therapy and precision medicine.We found that it predicts drug sensitivity in breast cancer patients across heterogeneous clinical tumor samples (Fig2).Likewise, ChemProbe suggests which drugs may be ineffective for a given patient; if borne out in clinical studies, it or similar methods could meaningfully reduce therapeutic trial and error49,50 .The ability to expedite treatment at earlier disease stages and target cellular vulnerabilities would be particularly impactful for tumors whose resistance mechanisms rapidly evolve.Nonetheless, several caveats merit mention.The experiments were constrained to a limited set of cell lines and compounds, and model interpretation reflects a limited set of biological factors.Without further study, gene features achieving high model attribution may reflect useful but obtuse patterns in the datasets rather than biological causality.Similarly, without a more diverse training set of chemical structures, the model may not be leveraging generalizable structural features.As always, deep learning model attribution methods are a moving target, so the potential compound mechanisms of action they reveal ultimately necessitate prospective biological testing51,52 .ChemProbe screens cell lines against half a thousand chemical probes and drug-like compounds.However, expanding its predictions to a larger subset of chemical space would require collecting biological screening training data at a commensurate scale, which is currently impractical.Deep learning models like AlphaFold and ESM have leveraged self-supervised learning to extract emergent properties from extensive unlabeled protein sequence data53,54 .Similarly, integrating ChemProbe with pretrained cellular transcriptomic or small-molecule structure foundation models may be a means to expand into broader biological and chemical space.When used to screen disease models, engineered cell lines, and clinical samples, ChemProbe is a powerful tool to assess how cells respond to various compounds.It supports exploring new therapeutic targets, suggests disease mechanisms, and can help researchers develop more precise and effective treatments.In this study, ChemProbe predicted drug sensitivities in multiple contexts, from cancer cells and tumor samples to data on breast cancer patient responses.Looking forward, we hope that ChemProbe's availability as an open-source tool will contribute to a range of research efforts in precision medicine and beyond.
Drug sensitivity data was obtained from the Cancer Therapeutic Response Portal v1 and v2 (CTRP v1/2).These datasets comprise 864 cell line responses to 481 individual compounds and 64 compound pairs across a range of concentrations.Response phenotypes were quantified by cellular viability, a normalized measure characterizing complete cell killing to cell stasis (0-1) and cell growth (>1).We utilized predicted cellular viability derived from fitted dose-response curves of each experimental set, in which replicate cell line-compound experiments were fit with a log-logistic function and predicted cellular viability was derived at the original experimental concentrations (Seashore-Ludlow).Compound structure was represented as 512-bit Morgan fingerprints (radius = 2) converted by RDKit from SMILES provided by the CTRP.Experimental micromolar compound concentrations were concatenated with Morgan fingerprints, resulting in 513-length compound feature vectors.We matched CTRP cell lines with the Cancer Cell Line Encyclopedia (CCLE) molecular characterizations and extracted protein-coding gene expression measurements, resulting in 19144-length cell line feature vectors.In total, 545 total compounds or compound pairs and 860 cell lines comprised 366,710 unique pairs and 5,849,340 total individual examples of compound response at various concentrations.

( a )
Approach to model training and dose-response modeling.We trained individual models on held-out cell line dataset splits by 5-fold cross-validation.We then fit log-logistic models to cross-validated model predictions and derived pharmacodynamic features.(b) Expected cumulative distribution plot of predicted compound IC50 differences between HCC1806 and MDAMB231 cell lines.Compounds selected for in vitro dose-response testing highlighted.(c) Predicted dose-response relationships of HCC1806 and MDAMB231 response to neratinib and (d) 1S,3R-RSL-3.95% confidence intervals.

. 1 |Extended Data Fig. 3 |Extended Data Fig. 4 |
Conditional parameter analysis.(a) Hierarchical clustering of learned gamma parameters.Color bars indicated compound identity and squared compound concentration.(b) Relationship between principal component 1 (PC1) of learned beta parameters and input concentration; Pearson correlation coefficient.Extended Data Fig. 2 | I-SPY2 analysis.(a) I-SPY2 participant tumor gene expression profiles projected on the CCLE training data distribution.VC, veliparib/carboplatin; N, neratinib; Ctr, control; gray = kernel density estimate (KDE) of training data distribution.(b) I-SPY2 confusion matrix of predicted response and observed response.(c) ChemProbe confusion matrix of predicted response and observed response.Prospective cell line expression similarity analysis.(a) PCA of gene expression similarity between prospectively tested cell lines and matched counterparts in the CTRP training data.gray = KDE of training data distribution.Prospective dose-response analysis.

( a )Extended Data Fig. 6 |
Relationship between predicted and observed IC50s of prospectively tested compounds in HCC1806-Par and MDA-MB-231-Par cell lines.Two-sided t-test.(b) Calibration experiment; relationship between predicted difference in IC50 and observed difference in IC50 between HCC1806-Par and MDA-MB-213-Par across tested compounds.Two-sided t-test.(c) Calibration Extended Data Fig. 5 | Attribution analysis.PCA decompositions of (a) raw attribution vectors versus (b) adjusted attribution vectors.(c) Pearson correlation between raw attributions and adjusted attributions versus transcriptome inputs, randomly initialized model attributions, and attributions of a model trained on permuted labels.(d-g) UMAP of cell line-compound attribution samples colored by compound targets with specific attributions.(h-k) UMAP of cell line-compound attribution samples colored by compound targets with diffuse attributions.Ferroptosis analysis.(a) DEA between MDA-MB-231-Par and HCC1806-Par with ferroptosis-associated genes in orange.(b) Pearson correlation z-score between highly-attributed LONRF3 expression and compound sensitivity.(c) Pearson correlation z-score between highly-attributed SLC27A5 expression and compound sensitivity.(d) Differential attribution scores of top 10 genes from ferroptosis-sensitive cell line-compound pairs relative to all others.

Table 1 |
Predictive performance.Average performance and standard error of 5 models trained across identical data folds.

Table 2 | Relative potency at median effective dose.
Relative potency of each compound at the median effective dose (ED50) between cell lines.