A scalable, data analytics workflow for image-based morphological profiles

Cell Painting is an established community-based, microscopy-assay platform that provides high-throughput, high-content data for biological readouts. In November 2022, the JUMP-Cell Painting Consortium released the largest annotated, publicly available dataset, comprising more than 2 billion cell images. This dataset is designed for predicting the activity and toxicity of 100k drug compounds, with the aim to make cell images as computable as genomes and transcriptomes. In this paper, we have developed a data analytics workflow that is both scalable and computationally efficient, while providing significant, biologically relevant insights for biologists estimating and comparing the effects of different drug treatments. The two main objectives proposed include: 1) a simple, yet sophisticated, scalable data analytics metric that utilizes negative controls for comparing morphological cell profiles. We call this metric the equivalence score (Eq. score). 2) A workflow to identify and amplify subtle morphological image profile changes caused by drug treatments, compared to the negative controls. In summary, we provide a data analytics workflow to assist biologists in interpreting high-dimensional image features, not necessarily limited to morphological ones. This enhances the efficiency of drug candidate screening, thereby streamlining the drug development process. By increasing our understanding of using complex image-based data, we can decrease the cost and time to develop new, life-saving treatments. Author summary Microscopy-assays are often used to study cell responses to treatments in the search for new drugs. In this paper, we present a method that simplifies the understanding of the data generated from such assays. The data in this study consists of 750 morphological features, which describe the traits and characteristics of the cells, extracted from the images. By using untreated cells as a biological baseline, we’re able to detect subtle changes that occur in the treated cells. These changes are then transformed into an equivalence score (Eq. score), a metric that lets us compare the similarities among different treatments relative to our baseline of untreated cells. Our Eq. score approach transforms complex, high-dimensional data about cell morphology into something more interpretable and understandable. It reduces the “noise” in the features and highlights important changes, the “signal”. Our method can be integrated into existing workflows, aiding researchers in understanding and interpreting complex morphological data derived from cell images more easily. Understanding cell morphology is crucial to deepening our knowledge of biological systems. Ultimately, this could contribute to the faster and more cost-effective development of new, life-saving treatments.

The process of drug development is a costly and time-consuming endeavor, prompting 2 researchers to explore new, more efficient methods of drug candidate screening. One 3 emerging approach is image-based profiling [1], which utilizes fluorescent markers with 4 techniques such as Cell Painting and CellProfiler, as well as Artificial Intelligence (AI), 5 to extract morphological features quickly and cost-effectively. CellProfiler is an 6 open-source image analysis software tailored for high-throughput biological experiments. 7 Developed by the Broad Institute, it enables customizable analysis pipelines and 8 incorporates various image processing algorithms. Its flexibility allows for diverse 9 applications in biological and biomedical research. By automating complex image 10 analysis tasks, CellProfiler facilitates the extraction of meaningful, reproducible 11 quantitative data, advancing high-content screening and biological image analysis [2]. 12 The affordability and high-throughput nature of image-based profiling, in combination 13 with providing both temporal and spatial information, make it an increasingly common 14 choice for drug screening purposes. However, while these techniques have evolved 15 greatly in recent years, connecting the thousands of images and their extracted features 16 back to biology in understandable metrics remains a significant challenge. 17 To address this issue, many researchers are turning to AI to find a solution. AI has 18 shown remarkable success in various applications, such as segmenting nuclei [3][4][5] and 19 cell bodies [6,7], image restoration [8,9], image super-resolution [10,11], and speeding up 20 fluorescent 3D sample imaging [12]. These examples demonstrate how AI-based methods 21 have excelled over traditional ones. Despite this success, the lack of explainability and 22 interpretability remains a major concern, particularly when classifying and quantifying 23 treatment data. This is partly due to the structure of the data, as most datasets have 24 numerous controls but a limited number of replicates for the actual treatments.

25
Combining this with a high-dimensional feature vector creates a skewed relationship 26 between observations and features, known as "the curse of dimensionality" [13].

27
In addition to AI-based solutions, traditional correlation-based metrics such as 28 cosine similarity, Pearson's correlation, Spearman's rank correlation, and Kendall's rank 29 are often used today to examine the similarity between treatments [14,15]. While these 30 metrics provide an overview of the data, they consider all features as equally important, 31 making it difficult to capture and identify the unique, subtle morphological changes that 32 separate different treatments from each other.

33
The skewed relationship between observations and features has been the standard in 34 chemometric problems for decades, where multivariate analysis (MVA) has shown great 35 success in capturing subtle differences between groups in high-dimensional data [16][17][18]. 36 Therefore, we have developed a data analytics workflow that offers valuable insights for 37 biologists in estimating and comparing the effect of different treatments. In this 38 method, the negative control group serves as a biological baseline for predictive models. 39 The models focus on the differences in the morphological profiles between treatments 40 and the negative controls to identify the unique biological impact of a specific treatment. 41 The two main objectives of this method are: 1) to create a simple, yet sophisticated, 42 scalable metric for comparing morphological profiles, which we call the equivalence score 43 (Eq. score). The Eq. score is a multivariate prediction of a model that has been trained 44 to identify the relationship between a reference treatment and negative controls; 2) to 45 identify and amplify the subtle morphological profile changes caused by a treatment 46 compared to the negative controls.  principal components, consisting of scores T and loadings P , are good "summaries" of 78 X such as: Where E is the residual and is "small" if enough principal components are used and/or 80 X contains mainly systematic variation and low levels of noise.

88
In PLS/OPLS, the predictor matrix X is decomposed into a set of scores T and 89 loadings P , as where E is the X-residual. In addition to this, the response matrix Y is also 91 decomposed into a set of scores C and loadings T: where C is the weights associated with Y and F is the Y-residuals. F represents the 93 difference between the observed Y and the modeledŶ . As with PCA, the weights, 94 scores and loadings can be calculated using various methods, with the NIPALS 95 algorithm [19] being one of the most commonly used.

96
Workflow 97 The premise of our approach is to compare a set of reference treatments to a control 98 group, which serves as a baseline during a calibration phase (Fig. 2a). During the 99 calibration phase, a scale between 0 and 1 is defined. The new scale, which we call the 100 Eq. score, represents the proportion of equivalence of a new treatment compared to the 101 reference treatment. PLS/OPLS regression is used to build the Eq. scores. The features 102 corresponding to the controls and a given reference treatment (X) are regressed against 103 an arbitrary vector of 0 and 1 (Y), corresponding to controls and the reference 104 treatment, respectively (Fig. 2a). To ensure that the predictions of the reference group 105 are fair, a leave-one-out cross-validation approach (LOOCV) is used for each replicate in 106 the reference group. New treatments can then be measured in the new referential built In (a) is the first step of the method, fitting a PLS/OPLS model on a reference treatment vs the control group. We create two Y-vectors where the reference observations have 1 and the control group 0. In (b) the fitted model is used on other treatments to model their Y in the same space, we call this value the Eq. score. In (c) we iterate through this process to create a new feature space consisting of the Eq. scores which can then be visualized for interpretation.

107
with PLS/OPLS models from the known treatments (Fig. 2b). They can then be 108 characterized by a single, or a spectrum, of Eq. scores and compared using multivariate 109 data analysis.

110
In this prediction step, the Sum Squared Error statistic (SSE) is used to evaluate the 111 feature space (X) and the prediction error of (Y). The former indicates if the features 112 presented by the predicted treatment lie within the training set feature space; the latter 113 shows how well the prediction is. The SSE for observation i is calculated as  amplified and noise is reduced. This feature space, which is fully based on CellProfiler 120 features, can then be used to look for clustering and similarities with methods such as 121 PCA as well.
Exemplifying with toxicity 123 The Eq. score method allows for the comparison of treatment effects relative to a 124 reference treatment based on the similarity or dissimilarity of their feature profiles. One 125 possible application of this method is in the assessment of different toxicities with 126 different cell death mechanisms. In this section, we will use a toy dataset to 127 demonstrate how Eq. scores can be utilized to better understand and compare the toxic 128 effects of various treatments.

129
Assume we have two known toxicities, Toxicity 1 and Toxicity 2, where each serves 130 as a reference treatment in two separate PLS/OPLS models. We can then predict the 131 Eq. scores of other treatments i.e. Treatment 1, 2 and 3 with the two models. The Eq. 132 scores can then be used as axes in a scatter plot to visualize the relationships among 133 different treatments and their toxicities. Fig. 3 provide insights into the extent to which 134 each treatment's effects resemble the reference toxicities. From Fig. 3a, we can infer  that Treatment 1 exhibits a stronger effect related to Toxicity 1, while Treatment 2 has 136 a weaker effect related to Toxicity 2. It is possible to speculate that both treatments are 137 the same as Toxicity 1 and 2, respectively, but at different concentrations. Interestingly, 138 Treatment 3 appears to exhibit a combined effect of both Toxicity 1 and Toxicity 2.

139
The same thing can be observed in Fig. 3b and c but in a bar plot for the equivalence of 140 Toxicity 1 and 2 respectively. This type of plot is particularly useful if we only have one 141 reference treatment we want insights into. control group and predicting the other compounds with it, we get Eq. scores. We can 150 use the Eq. scores as axes in a scatter plot to see relations between treatments, such as 151 in Fig. 4. In Fig. 4a we see distinct clustering of treatment groups. Noticeably,

158
Expanding further, we can incorporate the SSE of each model into the interpretation 159 of the results. The SSE can also be used as a scaling factor to adjust and correct the 160 predictions if they are not in a known feature space of the PLS model. Fig. 5 presents 161 the results from models fitted on SB-202190, purvalanol-a and PD-98059 in bar plots. 162 In Fig.5b, the Eq. scores from the SB-202190 model are displayed and we can see that  SB-203580, which shares the same target, has the highest Eq. score. Furthermore, in 164 Fig.5c it also has the lowest SSE, meaning that the prediction of the Eq. score is within 165 the confidence space of the model. Based on these two scores, we calculate a weighted 166 score in Fig.5a.

167
Similar results can be observed in the second row for the purvalanol-a model's 168 prediction. However, since aminopurvalanol-a has a high SSE, the weighted score in Fig 169  .5d is considerably lower than the Eq. score.

170
In the third row, we present the results from the PD-98059 model, and we observe a 171 different trend. The treatment with the same target, ZM-336372, has a negative Eq. By combining the Eq. scores of the six compounds, we create a new feature space with 176 the same number of features as compounds. By applying PCA to these new features we 177 get principal components that capture the directions with the most systematic variation. 178 A plot of the first two principal components is displayed in Fig. 6. The groups are  Fig. 6 summarizes the similarities of the compounds in a two-dimensional plane. It 187 is important to note the difference between Fig. 6 and 5a. Fig. 5a uses the Eq. scores 188 of two compounds as its axes while Fig. 6 uses the Eq. scores of all six compounds and 189 projects into two dimensions.
Benchmarking CellProfiler features vs Eq. Scores 191 So far, we have only looked at a portion of the full JUMP-CP1 dataset to 192 comprehensibly illustrate the methods and its result. Now, we will apply it on a larger 193 scale on all 303 compound treatments, cell lines and time points as in Fig. 2c. We use a 194 benchmarking metric developed by Srinivas Niranj Chandrasekara at Broad Institute 195 called the fraction of positive (fp) of the mean Average Precision (mAP) [14,20]. This 196 metric is presented in Fig. 7, for a comparison of replicability in terms of the fp between 197 the predicted Eq. Scores, where the prediction is based on CellProfiler features, and the 198 CellProfiler features themselves. Although the Eq. scores are based on the CellProfiler In Fig. 8 we look at the mAP on which the fp in Fig. 7 are based, for the Eq. scores 203 and CellProfiler features. Although the difference between the mAP of the Eq. scores Feature extraction from cell images and the post-processing of these are hot topics 207 today, particularly within the biopharmaceutical industry. There are great tools 208 available to extract these features but the tools to explore, interpret and discover 209 similarities between different treatments, and especially between modalities, are falling 210 behind. This is a challenging task since the number of features is generally high in 211 comparison to the number of replicates. Luckily, this skewed relationship between 212 observations and features has been the standard in chemometric problems for decades 213 where the family of PLS methods has shown great success. 214 We have demonstrated that our proposed PLS/OPLS-based method can be useful  The Eq. scores perform consistently better than the CellProfiler features in terms of 222 the benchmarking metric defined by Niranj [20]. PLS/OPLS models focus on amplifying 223 the features that are structurally important to distinguish a treatment from the 224 negative controls. Thus, we should expect the Eq. scores to perform better than the 225 original features as the models reconstruct the features resulting in a stronger, less noisy 226 signal. Compounds typically have a clear effect in which the PLS/OPLS models can 227 easily find the important variables and amplify the signal accordingly.

228
In drug screening and development, it is common that we know the gene we want to 229 target to treat a disease but not which compound candidates target the correct genes. 230 With this approach, we can more easily than before create an Eq. score of the gene or 231 genes we want to target and then compare the equivalence of compounds and 232 CRISPR-treated observations. Creating PLS/OPLS models based on the compound 233 target and looking at the Eq. score of the CRISPR treatment will also give us more 234 insight if the compound has other effects as well. This is a clear difference compared to 235 correlation since correlations are symmetric and the PLS/OPLS models will be different. 236 We can also modify the method to suit different datasets. One example is where the 237 concentrations of the compounds differ. In that case, the intensities set as the target for 238 the PLS/OPLS models could be set as 1 and 2 for two different concentrations 239 representing the different levels of effects. This, of course, is if the treatments are 240 expected to behave in such a way. 241 We are looking forward to exploring the combination of chemometric approaches 242 with the strengths of modern deep-learning approaches. We believe that these