LEA: Latent Eigenvalue Analysis in application to 1 high-throughput phenotypic proﬁling

6 Understanding the phenotypic characteristics of cells in culture and detecting perturbations introduced by drug stimulation is of great importance for biomedical research. However, a thorough and comprehensive analysis of phenotypic heterogeneity is challenged by the complex nature of cell-level data. Here, we propose a novel Latent Eigenvalue Analysis (LEA) framework and apply it to high-throughput phenotypic proﬁling with single-cell and single-organelle granularity. Using the publicly available SARS-CoV-2 datasets stained with the multiplexed ﬂuorescent cell-painting protocol, we demonstrate the power of the LEA approach in the investigation of phenotypic changes induced by more than 1800 drug compounds. As a result, LEA achieves a robust quantiﬁcation of phenotypic changes introduced by drug treatment. Moreover, this quantiﬁcation can be biologically supported by simulating clearly observable phenotypic transitions in a broad spectrum of use cases. Lastly, we describe the LEA pipeline in detail and illustrate the application to further use cases in the biomedical image domain, underlining the domain-agnostic facet of our methodological approach. In conclusion, LEA represents a new and broadly applicable approach for quantitative and interpretable analysis of biomedical image data. Our code and video demos are publicly available via https://github.com/CTPLab/LEA .


Introduction
Phenomics 1 -the systematic study of traits that make up a phenotype -has been a main driver of novel scientific insights into the pathogenesis of human diseases.This rapid progress is supported by the increasing availability of phenomic datasets in the biomedical domain 2,3 .However, a thorough and fine-grained analysis of phenotypic heterogeneity is often challenged by the volume and high-dimensional nature of biomedical datasets.In the cell-based drug screening, the emergence of novel fluorescent imaging protocols allows to reveal relevant cellular components and organelles (e.g., Nucleus (DNA), Endoplasmic reticulum (ER), Actin (Actin), Nucleolus and cytoplasmic RNA (RNA), Golgi and plasma membrane (Golgi)) in a highly multiplexed, high content manner.Following the cell-painting protocol 4 , two large-scale drug screening datasets RxRx19 (a, b) 5 have been recently released, which include more than 1800 drug compounds with up to 8 di↵erent concentrations that are tested on 3 di↵erent cell-lines.These high-throughput phenomic libraries have stimulated the development of novel approaches for analyzing phenotypic e↵ects introduced by drug treatments 5 .However, current methods failed to fully utilize the single-cell and highly multiplexed nature of these datasets, leaving much to be discovered.
In existing studies 5,6 , researchers usually start the analysis by downsizing the raw image read-outs or by exclusion of some fluorescent channels 6 .In a supervised manner, classification features of the entire image can then be learned to determine the phenotypic changes of a cell population induced by di↵erent drug compounds and can be linked to treatment e cacy 5 (Fig. 1 (a)).But, it is still an open and essential question to conduct a more in-depth analysis at the single-cell and/or single-organelle level for understanding the drug e↵ects in a concentration-specific manner.From a technical perspective, handling such massive datasets poses further statistical challenges 7 .New approaches for the comprehensive quantification of data heterogeneity are needed for the analysis of high-dimensional datasets 8 in di↵erent domains.
In the biomedical domain, statistical tests such as the F-test 9 and Student's t-test 10 are commonly used to examine the statistical discrepancy between two collections of heterogeneous tabular data.Despite demonstrable success in theoretical studies [11][12][13] , it is far from straightforward to apply them to real-world multi-dimensional cases.For instance, we found 14 that the p-values computed with these statistical tests 9,10,15,16 are not robust when applied to real-world clinical data, and consequently undermine the accuracy of identifying causal associations between diagnostic features and patient outcome.
In the deep learning domain, the importance of measuring the di↵erence (heterogeneity) between fake and real data has been recognized in parallel to the development of generative adversarial nets (GAN) [17][18][19] .To measure the quality of GAN reconstructions, researchers have proposed a variety of evaluation methods such as Fréchet Inception Distance (d FID ) 18 , Inception Score 20 and Kernel Inception Distance (d KID ) 21 .These approaches could be used to di↵erentiate image data distributions for the analysis of biomedical datasets.Nevertheless, it is not trivial to derive a multi-dimensional quantitative understanding with these scores, nor can we directly support them with plausible visual explanations.Therefore, they are less satisfactory for critical biomedical applications.Motivated by the emergence of (unsorted) eigenvalues in the improved implementation of d FID , we recently 22 suggested comparing sorted eigenvalues (d Eig ) as a simple alternative to d FID .For i = 1, 2, let Z i := (z i 1 ,..., z i n i ) be a collection of n i p-dimensional vectors.This leads to the following definition: where j i is the j-th largest eigenvalue of S i .
By quantifying the multi-dimensional eigenvalue di↵erence, d Eig can provide informative measurements along principal axes and facilitate a more complete analysis of data heterogeneity.

Quantification of phenotypic heterogeneity
Based on the theoretical foundation behind d Eig 22 , we propose a novel latent eigenvalue analysis (LEA) for high-throughput phenotypic profiling (Fig. 1).In the study of d Eig , Z i is usually the collection of features obtained with the penultimate layer (pool3) of an Inception V3 model 23 , where the model is trained for an ImageNet classification task.However, such an Inception model trained with ImageNet is not suitable for deriving meaningful features of multiplexed single-cell images.Alternatively, we utilize the approach of GAN inversion 24 and propose to learn the latent representations Z i,c on the c-th fluorescent channel of center-cropped single-cell images (Fig. ) be the collection of SCMs of Z i,k , then we define where p 0 ⌧ p and S 1 is the reference SCM.
Similar to Principal Component Analysis (PCA) 25 , we only utilize the p 0 ⌧ p largest eigenvalues that reflect the largest variances and the most critical information.As the 5 largest eigenvalues dominate > 95% of the overall values in the experiments, we set p 0 = 5 throughout the article (please see 'Results' and 'Methods' for more detail).The reference S 1 can be concretely determined in a given dataset, e.g., the SCM of mock cell read-outs in the drug screening study.We thus propose a novel quantitative method for phenotypic profiling of cells at single-organelle resolution.

Visualization of phenotypic transitions
Complementary to the d LEA that measures the eigenvalue di↵erence along each principal axis, we simulate observable phenotypic transitions by manipulating the principal component(s).This enables direct linkage of the observed phenotypic heterogeneity in the given datasets with human-interpretable biological information.Notably, there have been previous studies in understanding latent semantic transitions for natural images [26][27][28][29] .To probe the latent semantics of generative models, many of these investigations conducted image manipulations on fake images, where the manipulations are either unrelated or loosely related to a quantitative measurement.For example, Härkönen et al. 27 proposed to edit fake images by adding weighted eigenvectors to its latent representations.Similarly, Shen et al. 26 30 for reconstructing these single-cell images.For quantifying the drug e↵ects, we compute the eigenvalues with learned single-cell representations and support our quantitative results with clearly observable phenotypic transitions.

Contributions
By combining the quantification and visualization components, we developed the proposed Latent Eigenvalue Analysis (LEA) pipeline.Our contributions can be summarized as follows: • By comparing the largest eigenvalues, we propose the numerically robust quantification d LEA of phenotypic heterogeneity in multiplexed fluorescent image datasets.As a direct application of d LEA , we refine the high-throughput cell-based drug analysis to single-cell and single-organelle granularity.
• By manipulating the largest principal components, we provide phenotypically plausible visual explanations to d LEA .In the context of domain knowledge, these transitions can support novel interpretations of drug e↵ects and drug response heterogeneity.
• Consequently, we demonstrate the strength of LEA with two large-scale SARS-CoV-2 datasets for high-throughput cell-based drug screening.To conclude, we illustrate the domain-agnostic facet of LEA with further use cases and confirm its wide applicability in biomedical research.

Results
Driven by the need to find e↵ective antiviral treatments against the SARS-CoV-2 virus, the development of computational methods to support high-content screening for drug repurposing has progressed rapidly in the last two years 31 .Here, we describe the LEA approach and report the application of LEA to two large-scale phenomic libraries RxRx19 (a,b) released by Definition 4. Following the specification of SCMs S Mock , S Infected , S Drug as Eq. 2, we define  Importantly, the proposed LEA is well calibrated by the small distance between mock control and irradiated control cells (Fig. 2 (b,d,g,i)), which is consistent with the expected similar phenotypic and biological characteristics shared by the control conditions 'mock' (cells in culture medium without viral stimulation) and 'irradiated' (cells in culture medium incubated with the inactivated virus).In contrast, such a verifiable calibration cannot be reproduced with the ensemble approach.For the HRCE experiment, Fig. 6 (c, d) in the Appendix shows an inexplicably small di↵erence between mock and infected cell populations as compared to the control conditions, which contradicts the expected phenoprint heterogeneity induced by the virus.In terms of the largest eigenvalues, p 0 = 5 robustifies the drug e↵ect quantification of d LEA and shows an improved consistency with the hit score 5 , which clearly di↵ers from p 0 = 1, a balanced alternative of d Eig (See Fig. 7 (VERO) and 8 (HRCE) in the Appendix for more detail).

Overall comparison
For a thorough comparison, we screened all drug compounds tested on VERO and HRCE cell-lines (with the exclusion of three drugs that have duplicated or ambiguous names).Despite fundamentally di↵erent model designs (Fig. 1 (a,b)), our quantitative score d LEA demonstrates an overall consistent correlation to the baseline hit score 5 , that is, the lower the d LEA is, the higher the baseline hit score is.Importantly, the e↵ect estimation for a given drug compound can be directly derived from d LEA , while a manual threshold determination is required for the hit score (baseline) 5 .As displayed in Fig. 2 (b, c) of the VERO experiments, d LEA shows a mild yet meaningful decreasing trend with growing hit scores and identifies Remdesivir and its prodrug GS-441524 as e cacious compounds when using all their latent representations that are independent of concentration.
Further, LEA allows to take the optimal drug concentration into consideration, and thus achieves a superior resolution in identifying e↵ective drug candidates as illustrated in Fig. 2

Fine-grained quantification and visual interpretation
Motivated by these findings and following the overall comparisons to the baseline hit score 5 , we report the fine-grained quantification on both VERO and HRCE experiments for individual fluorescent channels and drug concentration levels.As for the overall analysis, the small di↵erence between mock and irradiated control is correctly captured in individual fluorescent channels for both experiments, which serves as an important sanity check for the stratified quantification.Regarding the drugs of interest presented in Fig. 3   clinical utility of Remdesivir approved by U.S. Food and Drug Administration (FDA) 1 .Meanwhile, the heterogenous e↵ects of Chloroquine and Hydroxychloroquine are also revealed in our refined analysis.For instance, the negative hit results in Actin, RNA and Golgi channels eventually undermine the overall performance of both candidates and provide the negative evidence for both drugs derived from real-world clinical studies 35 .Furthermore, the PCA plots displayed in (b) of Fig. 3 and 4 clearly support the e cacious hit results achieved by Remdesivir.With the k-means clustering on the largest principal component(s), we observe meaningful groups based on the nucleus morphology (DNA) for both VERO and HRCE.Besides, interesting and striking phenotypic transitions arise in other understudied channels.Taking the RNA channel as a concrete example, the largest eigenvalue (⇥10 5 ) of mock and infected cells are 1.75, 1.23 versus 2.20, 1.28 for VERO and HRCE resp.As shown in (b) of Fig. 3 and 4, the cellular sequences presented from left to right with enlarging the largest principal component(s) imply increased RNA production in the cytoplasm.This observation is biologically plausible, as SARS-CoV-2 expresses RNA-dependent RNA polymerase as well as a large number of supporting factors to transcribe and replicate the viral genome in infected cells 36 .Viral infection of host cells thus leads to massive upregulation of the production of viral RNA in the cytoplasm, which is correctly identified by the d LEA analysis at subcellular resolution.Importantly, Remdesivir acts as a nucleoside analog and stalls the RNA-dependent RNA polymerase of coronaviruses 37 .Our analysis identifies this e↵ect at the phenotypic level, as Remdesivir treatment calms the hyper state reflected by shifting the largest eigenvalue from 2.20 to 1.79 (VERO) and from 1.28 to 1.26 (HRCE).Taking the 5 largest eigenvalues as a whole, we further robustify and di↵erentiate the positive drugs from negative ones, while demonstrating persistent transitions for all fluorescent channels.

From in vitro to in vivo studies
In the cell-based in vitro studies, we have shown the refinement and improvement of LEA on the animal (VERO) and human (HRCE) cell-lines.Importantly, the key takeaways of drugs of interest can be well supported by relevant clinical studies.Nevertheless, we acknowledge the limitation of LEA on the human umbilical vein endothelial (HUVEC) cell-line (RxRx19b), which models cytokine storm conditions in severe COVID-19 38 .As displayed in Appendix Fig. 9, the overall drug identifications achieved by LEA are less consistent under these conditions with the baseline results.Although the least e↵ective drug candidates identified by 5 are likely to be assigned with high d LEA score, the association is less clear in regards to drug candidates with a positive baseline hit score (e.g., c-MET inhibitors in Fig. 9 (b)).Such inconsistency between LEA and the baseline approach 5 suggests the current level of evidence remains inconclusive.Whether this observation is due to a high level of variance or a true lack of e cacy of these drug candidates requires follow-up in vivo studies.

Discussion
In this study, we proposed a novel latent eigenvalue analysis (LEA) approach for high-throughput phenotypic profiling of biomedical image datasets.We demonstrated the practical application of LEA on two large-scale drug screening datasets at the single-cell and single-organelle granularity.Through application to well-defined biological settings, we achieve refinement of high-throughput drug screening under the cell-painting protocol.Importantly, we verify the LEA results in direct comparison to the baseline method 5 , which is carried out with a fundamentally di↵erent approach.As such, LEA can pave a promising path toward real-world applications in routine drug screening practice.Further, our LEA approach is not limited to multiplexed fluorescent image data as the input modality.In a di↵erent biomedical use case, we apply LEA to a skin lesion dataset HAM10000 39 , in terms of quantifying feature heterogeneity across di↵erent lesion categories and supporting clinical diagnosis with human-interpretable phenotypic transitions.From a domain perspective, our approach thus opens up new possibilities for the high-throughput analysis of biomedical datasets, and for the re-interpretation of feature heterogeneity in a biomedical context.In conclusion, the proposed LEA can serve as a useful analytical tool that is widely applicable in biomedical research.

Methods
In reference to Fig. 1(b), here we describe the architecture of LEA in detail and provide insights into the evaluation of model performance.This is done by conducting a separate series of experiments on the HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions 39 .Complementary to above drug screening studies, these experiments support the technical robustness of the LEA pipeline to various types of input data and showcase its domain-agnostic facet.As HAM10000 includes clinical images of human skin lesions, it is easily interpretable by domain experts and allows conceptual validation of the LEA results in the context of well-established disease categories.Concretely, the HAM10000 39 dataset has 7 categories of skin lesion images including actinic keratoses (akiec), basal cell carcinoma (bcc), benign keratosis (bkl), dermatofibroma (df), melanocytic nevi (nv), melanoma (mel) and vascular skin lesion (vasc).As the 1 https://www.fda.gov/drugs/emergency-preparedness-drugs/coronavirus-covid-19-drugsRGB channels of skin lesion images jointly inform the clinical presentation, we train the LEA pipeline on images with all RGB channels simultaneously.For the sake of probing d LEA within such a distinct domain, we design simple interpolation experiments among di↵erent categories as follows.Considering the benign nevus ('nv') collection of images as the reference and comparing this to the malignant counterpart (e.g., malignant melanoma, mel), we randomly mix the images of 'nv' and 'mel' (e.g., x mel,1 ,..., x mel,n mel , x nv,1 ,..., x nv,n nv ) according to the interpolation weight w = n nv n nv +n mel .Then, we measure the distribution di↵erence between nv and the mixed collection by d LEA .Ideally, we should observe that d LEA converges to 0 when w is shifted from 0 to 1 with the increasing inclusion of 'nv' (n nv ") and exclusion of 'mel' images (n mel #) (Fig. 5 (b)).To avoid the sample imbalance between the reference (nv, 6705 images) and compared categories, we take mel (1113), bkl (1099), and bcc (512) for comparison.

Model architecture
Motivated by the impressive achievements of GAN inversion 40 , we instantiate LEA with the state-of-the-art GAN inversion model (Fig. 1(b)).Firstly, we learn the decoder (generator) under the StyleGAN 41,42 framework, which has proved to be successful in hallucinating high-quality natural images.Based on the training protocols suggested in the widely-used repositories 23 , we pre-train the StyleGAN2 41 and StyleGAN3 42 on HAM10000 to obtain such a decoder (generator) that can synthesize faithful skin lesion images (Please see the Appendix Fig. 10).Next, we launch the encoder training to learn robust latent representations for image reconstruction.Specifically, we apply two practical architectures 'encoder for editing' 43 (e4e) and 'pixel2style2pixel' 44 (pSp) for comparison, both of which start with a ResNet backbone and then concatenate a feature pyramid network 45 .Similar to the loss design of these studies that enable high-fidelity image reconstruction, we determine our objective to be L = 1 L moco + 2 L 2 , where L moco is the contrastive loss that is superior in visual representation learning 46 , and L 2 is the l 2 reconstruction loss.Eventually, we report the quantitative reconstruction results in Tab. 2 among compared architectures.

pSp VS e4e
In terms of quantitative scores such as PSNR and SSIM, it is clear to see that the pSp encoder in combination with either StyleGAN2 or 3 decoder outperforms e4e by a clear margin, suggesting superior image reconstruction qualities.This can also be verified by the image samples presented in Fig. 5 (a): The images reconstructed from the representations of pSp encoder reserve finer detail of lesion demarcation and skin pigmentation, while the e4e encoder tend to produce more blurry reconstructions.As a result, we take the pSp architecture as the default encoder.

StyleGAN2 VS StyleGAN3
Following   With regards to nv, d LEA of StyleGAN3_pSp surprisingly suggests a larger data di↵erence of bkl (bcc) than mel.This is also counter-intuitive as malignant neoplasms (mel) are well known to present distinct appearances in lesion size and pigmentation heterogeneity, allowing these lesions to be clearly di↵erentiated from a benign mole (nv).Besides, we notice that the learned representations of StyleGAN3 tend to be more convoluted and are thus less ideal to support clear biological interpretation (See for example Appendix Fig. 11).Since StyleGAN3 is motivated by the texture-sticking drawback occurring in natural images and imposes equivariant translation and rotation on learned representations 42 , it may explain why StyleGAN3 does not adapt well to biomedical images from substantially distinct modalities.This is also reflected by the drawbacks identified by Alaluf et al. 47 for natural images.
Based on these results, we set StyleGAN2_pSp as the default architecture for conducting drug screening and skin lesion experiments through this article.

Further comparisons
Next, we investigate the d LEA performance in regards to the amount p 0 of the largest eigenvalues utilized in Eq. 2. As we can see in Fig. 5

Clinical Interpretations
As shown in Fig.

Figure 1 .
Figure 1.Model illustrations for the baseline method (Cuccarese et al. 5 ) (a) and proposed single-cell LEA approach (b).a, In the baseline method, a variant model of DenseNet-161 is trained on the downsized multiplexed fluorescent images for classifying the drug compounds (Remdesivir, Tofacitinib, Bortezomib shown as exemplars).Then, the learned features are utilized to analyze the e↵ectiveness of di↵erent drugs.b, In LEA, we start the pipeline by pre-training the decoder on center-cropped single-cell images in an unsupervised manner for each fluorescent channel.Then, we learn robust latent representations with a residual-based encoder30 for reconstructing these single-cell images.For quantifying the drug e↵ects, we compute the eigenvalues with learned single-cell representations and support our quantitative results with clearly observable phenotypic transitions.

Recursion 5 ,
which document the e↵ects of more than 1800 drug candidates on Severe Acute Respiratory Syndrome Coronavirus Type 2 (SARS-CoV-2) infection and associated systemic inflammation using the multiplexed fluorescent cell painting protocol on human and animal cell-lines.We set the drug hit score proposed by Cuccarese et al.5 as the baseline and demonstrate the performance of d LEA .To investigate the phenotypic e↵ects of drug candidates at single-cell resolution, we carried out cell segmentation using the DNA channel (Please see the Mahotas documentation32 or our code repository).Accordingly, we derive 23 million 64 ⇥ 64 single-cell training images from 0.37 million raw images.This allows us to analyze drug e↵ects on individual cell organelle components, greatly extending the range of detectable phenotypic perturbations.As shown in Fig.1(a,b), our approach using unsupervised training on the center-cropped cell images for the individual fluorescent channels indeed di↵ers greatly from the published baseline using supervised training on the entire image read-outs.As the phenomic library (RxRx19a) has four cell conditions: Mock control (Mock), irradiated control (Irradiated), infected without drug treatment (Infected), and infected with di↵erent drug treatments (Drug), we set the reference S 1=Mock (Eq.2) corresponding to the 'Mock' latent representations and then determine the e↵ect of a drug based on whether it reverses the phenoprint of infected cells.d LEA is provided as percentage (⇥100) in the following plots for clearer visualization.

Figure 2 .
Figure 2. Reconstruction visualization of LEA and quantitative comparison of drug responses between the baseline (Cuccarese et al. 5 ) and d LEA (Proposed).a (VERO) and f (HRCE): The reconstructed samples obtained by LEA.b (VERO) and g (HRCE): The quantitative comparison between the hit score 5 and d LEA with the latent representations of all concentrations.c (VERO) and h (HRCE): The violin plot of overall comparison between the hit score 5 and d LEA .d (VERO) and i (HRCE): The quantitative comparison between the hit score 5 and d LEA with the latent representations of optimal drug concentration.e (VERO) and j (HRCE): The hierarchical clustering of top 50 drug compounds (if exist) w.r.t. the 5 largest eigenvalues of the latent representations of optimal drug concentration.5/19 (d).Importantly, Remdesivir and GS-441524 remain the top e↵ective candidates among all the drug compounds.The superior e↵ectiveness of Remdesivir and GS-441524 can also be di↵erentiated from other drugs by examining the hierarchical clustering (Fig.2(e)).For example, we observe distinct patterns of latent representations of ER, RNA and Golgi channels for Remdesivir and GS-441524.This corresponds to the successfully reversed phenoprint reflected by the largest eigenvalues, e.g., (ER ⇥10 5 ): 1.33 for Mock, 1.33 for GS-441524, 1.36 for Remdesivir, and 1.66 for Infected cells.(RNA ⇥10 5 ): 1.75 for Mock, 1.76 for GS-441524, 1.79 for Remdesivir, and 2.20 for Infected cells.(Golgi ⇥10 5 ): 3.23 for Mock, 3.20 for GS-441524, 3.23 for Remdesivir, and 3.69 for Infected cells.Similar to the VERO experiment, Fig.2(g) and (h) show that d LEA remain well correlated with the baseline in the HRCE experiment.Strikingly, Remdesivir and GS-441524 are identified as strongly e cacious when computing the eigenvalues on the latent representations of the optimal drug concentration (Fig.2(i)), indicating verifiable positive drug e↵ects achieved by both candidates.On the other hand, Chloroquine and Hydroxychloroquine demonstrate contradictory e↵ects on both the latent representations of the cells treated with di↵erent drug concentrations and the optimal drug concentration, both of which are identified as negative by the d LEA (S Mock , S Infected ) threshold (Fig.2 (g) and (i)).Such inconsistency between the ine↵ective identification on the human cell-line and the e↵ective identification on the animal cell-line undermines its fidelity in clinical treatment, which can be explained by the fact that neither of them is recommended in treating hospitalized COVID-19 patients according to clinical studies33,34 .If we examine the latent representations in more detail, Fig.2 (j) highlights the unique patterns presented in ER, Actin, and RNA channels for Remdesivir and GS-441524, which reveal novel and subtle phenotypic changes that were previously unidentified.Cell-level phenotype analysis by d LEA can therefore provide novel insights and patterns in high-throughput drug-screening experiments as candidates for subsequent biological exploration.

Figure 3 .
Figure 3. Identification of drug-concentration dependent e↵ects and visual interpretation for key drugs of interest in the VERO cell-line.a, The proposed d LEA of di↵erent drug concentrations for individual and all fluorescent channels.Here, we report the mean d LEA (with standard deviation) averaged on 4 randomly sampled cell collections.b, The PCA plots and phenotypic transitions driven by manipulating the largest (top) and 5 largest (bottom) principal component(s).The bounding box indicates the reconstructed image.

Figure 4 .
Figure 4. Identification of drug-concentration dependent e↵ects and visual interpretation for key drugs of interest in the HRCE cell-line.a, The proposed d LEA of di↵erent drug concentrations for individual and all fluorescent channels.Here, we report the mean d LEA (with standard deviation) averaged on 4 randomly sampled cell collections.b, The PCA plots and phenotypic transitions driven by manipulating the largest (top) and 5 largest (bottom) principal component(s).The bounding box indicates the reconstructed image.

Figure 5 .
Figure 5.The data interpolation quantification and visual interpretation on HAM10000 experiments.a, The reconstructed samples obtained by 4 di↵erent architectures.b, The d LEA comparison of data interpolation regarding di↵erent architectures, number of the largest eigenvalues, and existing measurements.Here, we report the mean d LEA and comparative measurements (with standard deviation) averaged on 4 randomly sampled data mixtures given the interpolation weight.c, The PCA plots and morphological transitions driven by manipulating the largest principal components.The bounding box respectively indicates the reconstructed image.

11 / 19 marginal (Fig. 5 (
a)).Furthermore, when examining the d LEA behavior with the increasing interpolation weight, we found notable di↵erences between the two architectures.Compared to StyleGAN2, Fig.5 (b)shows that d LEA computed with StyleGAN3_e4e increases unexpectedly from w = 0.25 to w = 0.5 for the mixture data collection of nv and mel images, which is in conflict with the fact that more inclusion of benign mole images should reduce the distance to the nv reference category.
Fig. 5 (c)) with increasing 1 i,k (from left to right), the 'nv' images begin to show a poor lesion demarcation, further increase in lesion size and pigmentation heterogeneity, with the lesions displayed towards the right showing clear pathological changes towards a clinically suggestive appearance of malignancy.Considering the largest eigenvalue (⇥10 5 ) 4.41 for nv versus 6.04 for mel, the appearance shift towards malignancy by enlarging the principal component of nv representations can indeed explain the eigenvalue di↵erence 4.41 < 6.04.Comparable observations can be also made when investigating the 5 largest principal components.Apart from similar lesion size patterns arising from the k-means clustering, we show distinct samples clustered in the two groups (Bottom rows of Fig. 5 (c)).Accordingly, the PCA plots regarding the 5 largest eigenvalues verify the distinguished yet consistent heterogeneity quantification among mel, bcc, bkl and nv.

Figure 8 .
Figure 8.Quantification result d LEA of HRCE w.r.t.di↵erent amount of the largest eigenvalues.a: d LEA computed with the latent representations of all drug concentrations.b: d LEA computed with the latent representations of optimal drug concentration.

Figure 9 .Figure 10 .
Figure 9.The quantification results of HUVEC experiment.a, The quantitative comparison between the hit score 5 and d LEA with latent representations of all drug concentrations (left) and the optimal drug concentration (right).Our drug e↵ects (positive/negative) are thresholded by the d LEA of storm-severe cells without drug treatment.b, The proposed d LEA of di↵erent drug concentrations for individual and all fluorescent channels.Here, we report the mean d LEA (with standard deviation) averaged on 4 randomly sampled cell collections.

Figure 11 .
Figure 11.The phenotypic transitions driven by manipulating the largest principal component, which are derived from the latent representations of StyleGAN3_psp.

Table 1 .
The numerical comparison of reconstruction results between the proposed and ensemble LEA on VERO and HRCE.

Table 2 .
The numerical comparison of reconstruction results among di↵erent model architectures on HAM10000.
the encoder architecture comparison, we investigate the variants of StyleGAN decoder.While StyleGAN3_pSp achieves better PSNR and SSIM scores than StyleGAN2_pSp, the qualitative di↵erence in image reconstruction appears (b), d LEA shows comparable decreasing scores with the increasing weight of including more nv images for p 0 = 1,..., 4, 5.Such results demonstrate the feasibility and robustness of d LEA computed with the largest eigenvalues for the RGB imaging.For both the HAM10000 and RxRx19 datasets, we narrow down the amount p 0 of the largest eigenvalues to 5.In addition, we evaluate d LEA using well-established statistical tests and widely used scalar-valued scores.For the former, we report the (average) p-values computed with two collections of eigenvalues.We further compute d KID with multiple subsets of randomly sampled latent representations and d Eig with two SCMs (Def.1).Regarding statistical tests, we have witnessed either contradictory behaviors obtained by F_test or uninformative results from the Levene_test and Wilcoxon_test, which confirms the challenging adaption of standard statistical tests to high-dimensional use cases.Although meaningful curves illustrating decreasing distances between nv and compared categories can be obtained with d KID , it shows clear fluctuations with large standard deviations.This is mainly due to the additional randomness that comes from subset selection, which is not present in other measurements.Without the imbalance issue regarding di↵erent channels introduced in the skin lesion dataset, d Eig shows plausible decreasing trends similar to d LEA .

Table 3 .
The numerical comparison of reconstruction results among di↵erent model architectures on HUVEC.