A genome-wide atlas of human cell morphology

A key challenge of the modern genomics era is developing data-driven representations of gene function. Here, we present the first unbiased morphology-based genome-wide perturbation atlas in human cells, containing three genome-scale genotype-phenotype maps comprising >20,000 single-gene CRISPR-Cas9-based knockout experiments in >30 million cells. Our optical pooled cell profiling approach (PERISCOPE) combines a de-stainable high-dimensional phenotyping panel (based on Cell Painting1,2) with optical sequencing of molecular barcodes and a scalable open-source analysis pipeline to facilitate massively parallel screening of pooled perturbation libraries. This approach provides high-dimensional phenotypic profiles of individual cells, while simultaneously enabling interrogation of subcellular processes. Our atlas reconstructs known pathways and protein-protein interaction networks, identifies culture media-specific responses to gene knockout, and clusters thousands of human genes by phenotypic similarity. Using this atlas, we identify the poorly-characterized disease-associated transmembrane protein TMEM251/LYSET as a Golgi-resident protein essential for mannose-6-phosphate-dependent trafficking of lysosomal enzymes, showing the power of these representations. In sum, our atlas and screening technology represent a rich and accessible resource for connecting genes to cellular functions at scale.

Schematics created with Biorender. 126 A morphology-based genome-wide perturbation map in human lung cancer cells 127 We first aimed to demonstrate the scalability and robustness of the PERISCOPE pipeline by 128 executing a whole genome pooled optical CRISPR screen in human lung cancer cells (A549). Importantly, we also observed that knocking out genes known to act in well-defined cell 148 compartment-specific roles produced strong morphological phenotypes in those compartments. 149 Specifically, we selected genes encoding five compartment-associated protein complex members 150 and grouped their morphological profiles by complex. For each of these complexes, we observed 151 an enrichment in phenotypic features extracted from the expected cellular compartment (Fig. 2c). 152 For example, while perturbations targeting outer mitochondrial membrane (OMM) proteins produce 153 morphological phenotypes throughout the cell, a plurality (32%) of the overall signal is 154 concentrated in the mitochondria. Likewise, sgRNAs targeting genes involved in protein 155 mannosylation display an enrichment in phenotypic features from the endoplasmic reticulum (ER), 156 where synthesis of mannosyl donor substrates and mannosyltransfer to proteins takes place 28 .

158
We next benchmarked image-based gene knockout profiles against existing databases of gene 159 function. First, using profile correlation between gene knockouts as a proxy for functional similarity 160 between genes, we compared our screen data to the protein-protein interaction databases 161 CORUM 29 and STRING 30 . Of 3,659 total hits, we identified 1,271 genes belonging to 501 unique 162 complexes present in the CORUM4.0 database. Profiles from hit gene pairs within a cluster 163 showed higher correlation values than the background distribution of all possible hit gene pairs 164 (Fig. 2d). Additionally, morphological profile pairs with higher correlations demonstrated higher 165 protein-protein interaction confidence scores from the STRING database (Fig. 2e). We 166 subsequently evaluated the extent to which image-based gene knockout profiles were correlated 167 with gene knockout fitness effects using the Broad Institute's Dependency Map (DepMap) 168 database 31 . While essential genes were more likely on average to produce a high signal score (see 169 Methods), the majority of screen hits (72%) were nonessential genes, consistent with most gene 170 knockouts producing optical phenotypes beyond simple cell toxicity (Fig. 2g & Extended Data Fig.   171 4a). Further, morphological signal score was not well correlated with baseline gene expression, 172 with many genes expressed at low levels still producing significant morphological signal when 173 perturbed demonstrating orthogonality of optical phenotypes (Extended Data Fig 4d).
We performed unbiased clustering of screen hits based on morphological similarity and visualized 176 high-level similarity between morphological profiles via 2-dimensional UMAP embedding (Fig 2.f). 177 We observed logical clustering by biological function across an array of processes, such as 178 translational initiation, lysosome acidification, autophagy, proteasomal protein catabolic processes, 179 mRNA processing, rRNA metabolic process, glycosylation, regulation of GTPase activity and 180 others. Hierarchical clustering based on high dimensional profiles also revealed biologically 181 coherent clustering of perturbations targeting related genes. For example, we identified a cluster of 182 molecular chaperones (Fig. 2h) displaying high similarity between sgRNAs targeting CCT genes 183 that form the chaperonin-containing TCP1 complex, which is essential for producing native actin, 184 tubulin, and other proteins involved in cell cycle progression. 32

186
We also observed biologically coherent similarity within the family of genes encoding proteins that 187 form the proteasome (Fig. 2i). The mammalian proteasome is a large protein degradation complex 188 that exists in multiple configurations: the canonical proteasome consists of the 20S catalytic core 189 particle capped on either end by the 19S regulatory particle and the interferon-inducible 190 immunoproteasome, which replaces several catalytic subunits in the 20S and can be alternatively 191 capped by the 11S particle 33 . We observed a highly correlated subcluster of 30 screen hits, 192 representing mostly essential genes in the A549 cell line (average -1.63 Chronos gene effect 193 score 34 ). The alpha and beta subunits of the catalytic core particle, as well as the ATPase subunits 194 of the regulatory 19S particle, showed the strongest levels of correlation in our data, whereas the 195 non-ATPase subunits of the regulatory particle exhibited lower signal. A notable exception is 196 PSMD14/Rpn11, which de-ubiquiylates substrates as they enter the proteasome and clustered 197 with the catalytic core. Blocking de-ubiquitylation stalls substrate entry, thus impairing overall 198 proteasome function, consistent with the morphological similarity observed for catalytic subunits 199 responsible for substrate translocation and degradation 35 . As expected, the interferon gamma- 200 inducible subunits of both the catalytic core particle (PSMB8, PSMB9, and PSMB10) and 11S 201 particle (PSME1, PSME2) displayed weak signal in the absence of interferon stimulus and did not 202 correlate with core proteasome components.
screen include some single-compartment and some impacting multiple compartments and features across 206 the cell. Green represents hit genes called based on a subset of cell compartments (ER,mitochondria,207 actin, DNA and Golgi/membrane) and blue represents hit genes called based on overall gene profile. 208 Detailed   We visualized the two HeLa screens by performing 2D embedding followed by manual annotation 281 of a subset of morphologically similar gene clusters. These clusters recapitulate known biological 282 relationships within complexes and processes such as lysosome acidification, DNA replication, 283 mannosylation, protein N-linked glycosylation, aerobic respiration, mitotic cell cycle, ribosome 284 biogenesis, Golgi vesicle transport, and ARP2/3 protein complex (Fig. 3e,f). Despite differing 285 media conditions, we found that the majority of hit genes in both screens were shared. To further 286 visualize similarities between screens, we generated comparative diagonally-merged heatmaps  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint Identifying media-specific perturbation signatures from optical profiles 297 We also identified media-specific signatures in these data, representing unique gene-by-298 environment interactions. To investigate these differences, we performed preranked gene set 299 enrichment analysis (GSEA 40,41 ) on both HeLa screens based on a list of morphological profiles. 300 We quantified the strength of each profile compared to control profiles using a parameter called the 301 "morphological signal score" (see Methods), visualizing the results in a gene enrichment map for 302 both screens (Extended Data Fig. 8a).

304
Based on the GSEA analysis, 391 gene sets were enriched as hits in the DMEM screen and 321 305 were enriched in the HPLM screen (Supplementary Table 3). Of these, 275 were common between 306 the two screens, 116 were specific to the DMEM screen and 46 were specific to the HPLM screen.

307
From these, we were able to identify a subset of processes selectively enriched only in a single 308 screen. In the DMEM screen, for example, we observed selective enrichment of processes 309 associated with central carbon metabolism, such as NADH regeneration (a metabolic process that 310 generates a pool of NADH by the reduction of NAD+) and glucose catabolism/glycolysis. To further 311 investigate the enrichment of these processes in the DMEM screen, we again used comparative 312 diagonally-merged heatmaps, observing higher levels of signal and correlation within the profiles 313 from the DMEM screen ( Fig. 3i and Extended Data Fig. 8c). Here, we also observed that Iron 314 Sulfur cluster assembly, which is required for mitochondrial respiration 42 , and mitochondrion 315 transcription processes were selectively enriched in the DMEM screen. Taken together, the overall 316 enrichment of hits associated with central carbon metabolism in the DMEM screen may be 317 reflective of metabolic differences induced by high (>25 mM) glucose levels present in DMEM 21 .

318
Conversely, we also observed selective enrichment of processes in the HPLM screen related to 319 DNA damage repair, such as "cellular response to gamma radiation" or "positive regulation of DNA 320 recombination" (Fig. 3i and Extended Data Fig. 8c). This process enrichment is also likely linked to 321 metabolic rewiring induced by significant decreases in glucose and glutamine upon culture in 322 HPLM, as HeLa cells have been previously shown to exhibit hallmarks of DNA damage when 323 cultured with reduced concentrations of these nutrients 43 .

324
Annotating protein complexes and cellular signaling pathways from imaging data 325 We next sought insights on specific structures and processes. Genes encoding various types of 326 ribosomal proteins largely grouped into three distinct clusters (Fig. 4a). The largest cluster is 327 enriched for genes encoding the large and the small subunits of the mitochondrial ribosome which 328 is essential in the translation of mitochondrial genes 44 , while two other clusters show enrichment 329 for components of the large 60S subunit and the small 40S subunit of the mature 80S eukaryotic 330 ribosome 45 . This example highlights the ability of optical pooled screens to capture structural 331 information, as recently demonstrated 12 . We also found that signaling pathways were often well-332 captured: as an example, perturbations targeting the phosphatidylinositol 3-kinase/AKT serine- (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint morphological profiles to distinguish the directionality of these signaling factors is a useful tool in 339 understanding the underlying biology. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Figure 4. Clustering by optical profiles captures physical interactions and signaling pathway
The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint Genome-wide screens for subcellular phenotypes of interest  To support the validity of the dataset's single feature screens, we looked at groups of genes whose 369 protein products are known to function in the compartments that we labeled in PERISCOPE and 370 determined which features had hit lists that were enriched for those groups. In Figure 2c we 371 showed that perturbing these groups of genes produces signal across the channels whereas we 372 show in Figure 5c that there is specific enrichment in our hit lists for features in expected 373 categories for protein mannosylation, vacuolar-type ATPase, cortical cytoskeleton, and outer 374 mitochondrial membrane protein complex, though unsurprisingly perturbation of DNA polymerase 375 generated a more pleiotropic phenotype.

377
As a complementary way to assess specificity, we focused on perturbations that altered granularity 378 features 48 , a measure of the signal present within differently-sized intracellular structures, from 379 small cellular details to increasingly large scale structures, relative to the total signal. We found that 380 disruption of the vacuolar ATPase (either V0 or V1 subunit), but not genes involved in its assembly,   (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint 394 A primary advantage of image-based profiling over traditional microscopy is that the former allows 395 quantitative and automated assessment of phenotypic features, overcoming the subjectivity of 396 analyzing images by eye. Nonetheless, our atlas contains over 30 million individual cell images 397 that can be evaluated for phenotypes of interest by a trained eye. Thus, to enhance the usefulness 398 of these datasets, we have developed an atlas cell retrieval tool (see Methods) enabling the 399 retrieval of individual images of cells containing perturbations of interest (Extended Data Fig. 5a-f).

400
Using this tool, we show that it is possible to find examples of readily interpretable image-based 401 phenotypes, such as the depletion of TOMM20 signal in cells containing sgRNAs targeting 402 TOMM20 (Extended Data Fig. 5e), but also that most single gene knockout phenotypes are more 403 subtle (Extended Data Fig. 5b-d,f), demonstrating the usefulness of computational feature 404 extraction and profiling beyond simple visual inspection.

405
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ;

406
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Identification of TMEM251/LYSET as a Golgi-resident protein essential for mannose-

440
How could a Golgi-resident protein influence glycan storage in the lysosome? We postulated that 441 the lysosomal WGA phenotype was due to impaired biogenesis of lysosomal proteins in the Golgi.

442
Notably, GNPTAB/GPNTG showed strong phenotypic similarity to TMEM251 in PERISCOPE and 443 human LOF of TMEM251 results in a clinical presentation similar to that of human LOF in 444 GNPTAB/GNPTG 49 . We therefore hypothesized that TMEM251 may participate in the mannose-6- (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint single knockdowns were indistinguishable from wildtype, consistent with the primary screen (  Because of the strong morphological similarity between TMEM251 and V-ATPase subunits, we 471 examined the effect of TMEM251 KD on lysosomal pH using a fluorescence lifetime sensor 51 .

472
Whereas treatment with Bafilomycin A1 or ATP6V1E1 KD robustly alkalinized lysosomes, neither 473 GNPTAB nor TMEM251 KD significantly changed lysosomal pH ( Fig. 6g and Extended Data Fig.   474 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint 10c). We therefore reasoned that acidic lysosomal pH might be required for proper trafficking and

495
In addition to being practical, PERISCOPE generates rich, data-driven representations of gene 496 function. A central goal of massively parallel genetic screens is to understand how genes 497 coordinate to produce complex cell phenotypes and in this regard PERISCOPE is valuable both as 498 a profiling technology -generating high-dimensional representations of cell state -and as highly-499 parallelized screens of subcellular biological parameters (e.g. cell size, organelle 500 size/shape/number). We demonstrate that whole-cell optical profiles can be used to reconstruct 501 relationships between genes in biological pathways and proteins in complexes. We further 502 demonstrate that spatially restricted subcellular phenotypes in these data can be used to gain 503 mechanistic insight into gene function, as in the case of Golgi-specific phenotypes affected by 504 perturbation of TMEM251. Further, we found that individual morphological features (e.g. regional 505 granularity) could also be used to classify genes by function such as with genes involved in V-

506
ATPase assembly or N-glycan synthesis.

508
Massively parallel CRISPR modifier screens have been proven to be very useful for mapping 509 gene-by-environment interactions at scale. By enabling facile, cost effective genome-scale 510 screening with high-dimensional cell profiling, we demonstrate that genetic perturbations can be 511 readily combined with environmental perturbations to produce rich, high-resolution maps to 512 systematically interrogate gene-by-environment interactions at genome scale. As an example, we 513 show how such maps can uncover media-specific effects on cellular programs, but we additionally 514 envision using this platform to execute genome-wide screens for modifiers of therapeutic 515 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint compound-induced phenotypes, or to carry out genetically anchored CRISPR screens 55 to 516 elucidate genetic interaction networks.

518
Beyond the current scope, there are several improvements that could be built upon the foundation 519 of the work presented here. In its current form, the PERISCOPE platform could be deployed to 520 explore the effects of other CRISPR-based perturbations such as CRISPR-a 56,57 , CRISPR-i 58,59 , or 521 base editing 60-62 , where sgRNAs can be expressed as a DNA Pol II transcript (as in CROP-seq). In 522 this study, we profile two cancer cell lines, HeLa and A549, but our pipelines are amenable to 523 screening a wide variety of 2D cell models, including cell lines and primary cells, though assay 524 scale and data quality are cell density-dependent. Our screens demonstrate that significant signal 525 is present in every measured cell compartment, and highly multiplexed imaging technologies such 526 as CODEX 63 and CyCIF 64 could improve the sensitivity and robustness of PERISCOPE by 527 capturing a wider range of perturbation effects or enabling the inclusion of ground truth epitopes to 528 anchor biological interpretation. Extracting biological signals from fluorescence multicolor images is 529 a compelling machine learning problem which will likely be improved using various forms of deep 530 learning, such as self-supervised learning, to extract features. Though such features lack inherent 531 interpretability, they are more powerful for capturing similarities and can be useful for many 532 applications 65,66 .

534
In sum, this study lays the groundwork for building high-dimensional morphology-based 535 perturbation maps at scale and presents the first genome scale atlas of human cell morphology.

536
Containing more than 30 million perturbation-assigned cell images, this atlas is a useful resource  The whole genome library was designed to target 20,393 genes with ~4 sgRNAs per gene for a total of 80,408 543 sgRNAs. 47,792 sgRNAs were selected from the Brunello CRISPR library (Addgene #73179), 20,520 544 sgRNAs were selected from the TKO V3 CRISPR library (Addgene #90294), and 12,096 sgRNAs were 545 selected from the extended CRISPR library published by the Broad Institute's Genetic Perturbation Platform 546 (Addgene #73178). Additionally, 601 non-targeting sgRNAs were included as negative controls. All sgRNA 547 sequences were selected/designed to maintain a balanced nucleotide distribution at each base position, 548 which facilitates optical barcode calling. The CRISPR library was designed for complete library deconvolution 549 with eleven bases and for levenshtein error correction with twelve bases. 550 551 Library cloning 552 In order to prepare pooled plasmid libraries, targeting and non-targeting guide subpools were first individually 553 amplified by dialout PCR using orthogonal primer pairs. 67 . PCR products were purified using the QIAquick 554 PCR Purification Kit (Qiagen LLC #28104). The amplified libraries were cloned into the CROPseq vector 555 (Addgene #86708) via Golden Gate assembly using BsmBI restriction sites as previously described 15 . To 556 prevent self ligation events in Golden Gate reactions, the CROPseq vector was pre-digested and purified via 557 gel extraction using the QIAquick Gel Extraction Kit (Qiagen LLC #28706) in order to remove the filler 558 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; sequence. The resulting plasmid libraries were purified and concentrated via SPRI bead cleanup before being 559 transformed into electrocompetent cells (Lucigen Endura, VWR International LLC #71003-038) for plasmid 560 library amplification. Following transformation, bacterial cells were grown in liquid cultures for 18 hours at 561 30 o C before extracting the plasmid DNA. The plasmid library was validated via Next Generation Sequencing 562 as described in NGS methods below. 563 564 Tissue culture 565 A549 cells were cultured in High-glucose DMEM (VWR International LL #45000-304) supplemented with 2mM 566 L-glutamine (Life Technologies Corporation #25030081), 100 U/mL penicillin-streptomycin (Life Technologies  567 Corporation #15140163), and 10% heat-inactivated fetal bovine serum (Sigma-Aldrich Inc #F4135-500ML). 568 HEK293FT cells were cultured in DMEM-GlutaMax, pyruvate (Thermo Fisher Scientific #10569010) 569 supplemented with 10% heat-inactivated fetal bovine serum, and 100 U/mL penicillin-streptomycin, and 2mM 570 L-glutamine. HEK293FT cells were also cultured without antibiotics 24 hours prior to lentiviral packaging. 571 HeLa cells in the conventional media screen were cultured in DMEM (VWR International LL #45000-304) 572 supplemented with 10% dialyzed FBS (ThermoFisher Scientific #26400044). HeLa cells in the physiological 573 media screen were cultured in HPLM (Thermo Fisher Scientific #A4899101) supplemented with 10% dialyzed 574 FBS. 575 576 Lentivirus production 577 Prior to lentivirus production, the plasmid pools for targeting and nontargeting sgRNAs were combined 578 resulting in a 10% (m/m) of nontargeting sgRNAs and a 90% (m/m) of targeting sgRNAs. 24 hours before 579 transfection, HEK293FT cells were seeded on 10cm2 dishes at a density of 100,000 cells/cm2 using antibiotic 580 free media. Lentivirus was generated using the Lipofectamine 3000 (Thermo Fisher Scientific L3000015) 581 transfection kit and packaging plasmids pMD2.G (Addgene #12259) and psPAX2 (Addgene #12260). 582 HEK293FT cells were transfected with a plasmid ratio of 2:3:4 (by mass) of pMD2G, psPAX2, and plasmid 583 library, respectively. Media was exchanged four hours after transfection. Lentivirus was harvested 48 hours 584 after media exchange and filtered through a 0.45um cellulose acetate filter (Corning 431220). The viral 585 supernatant was incubated in dry ice until frozen and stored at -80°C. 586 587 Lentivirus titering 588 A viral titer was individually determined for A549 and HeLa cells. A549 cells were seeded at a density of 589 100,000 cells/cm 2 while HeLa cells were seeded at a density of 150,000 cells/cm in a 6 well format. The 590 seeded cells were transduced with the viral library by supplementing their media with 8 μg/mL of polybrene 591 (Sigma-Aldrich Inc # TR-1003) and adding a variety of viral volumes ranging from 0 μL to 50 μL prior to 592 centrifugation at 1000 g for 2 hours at 33°C. After centrifugation, the cells were incubated at 37 °C for 4 hours 593 followed by a media exchange. At 24 hours post-infection, cells were divided into media containing either 0 594 μg/mL or 2 μg/mL of puromycin (Life Technologies #A1113803). Cells in both media conditions were 595 incubated at 37°C for 72 hours. Following incubation, cells were counted and multiplicity of infection (MOI) 596 was estimated by the ratio of surviving cells in the 2 μg/mL puromycin conditions over puromycin free 597 conditions. Infectious units per microliter (ifu/μL) were then calculated by multiplying the MOI by the original 598 cell seeding density and dividing by the viral volume added. The ifu/μL for each viral volume were averaged 599 and used to estimate viral volume required to achieve an MOI between 0.1 and 0.3. 600 601 Lentivirus transduction 602 For screens, cells were transduced with the genome-wide viral library in a 6-well format by adding 8 μg/mL of 603 polybrene and the volume of viral supernatant calculated for an MOI of 0.2 as well as a non-infection control 604 with 0 μL of viral supernatant. Cells were centrifuged at 1000 g for 2 hours at 33°C. At 4 hours post-infection, 605 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint media was exchanged. At 24 hours post-infection, the infected cells were passaged into T-225 flasks (VWR 606 International LLC #47743-882) containing media supplemented with 2 μg/mL puromycin. A fixed number of 607 cells (~300,000) for the infection and uninfected conditions were set aside and seeded in a 6-well plate format 608 under media containing either 0 μg/mL or 2 μg/mL of puromycin. All cells were incubated at 37°C for 72 hours. 609 Following the 72 hours of selection, the cells seeded in the 6-well plate were counted and the MOI was 610 calculated as described above. the expectation that cell populations will double at least once before fixation. The remainder of the cells were 627 kept in T225 flasks and cultured until day 7 of Cas9 expression where a sample of 13,500,000 cells from each 628 biological replicate were lysed and prepared for NGS analysis. This analysis was then used to determine 629 sgRNA dropout rates due to lethal CRISPR events. 48 hours after being seeded in optical plates, the cells 630 were fixed with 4% paraformaldehyde in 1X PBS for 30 minutes, followed by in situ sequencing (ISS) as 631 described below. After RCA amplification in ISS, the cells were stained with cell compartment-specific probes 632 as described in Cell Staining and phenotypic images were acquired. The disulfide-linked probes were de-633 stained by cleaving the disulfide bridge between the probe and its fluorophore with 50 mM TCEP (Thermo 634 Fisher Scientific #363830100) in 2X saline-sodium citrate (SSC) for 45 minutes at room temperature. 635 After destaining phenotypic probes, the cells are washed three times with 1X PBS-T (1X PBS + 0.05% Tween-636 20) before performing 12 cycles of in situ sequencing by synthesis (ISS). 637 638 HeLa screens 639 HeLa-TetR-Cas9 were transduced with the genome-wide viral library in three biological replicates by seeding 640 cells at a density of 210,000 cells/cm 2 in a 6-well format and performing lentiviral transduction as described 641 above. A total of 240,000,000 cells were transduced at an MOI of 0.2 for a cell library representation of 300 642 cells/sgRNA post transduction. After antibiotic selection, the transduced cells were cultured in conventional 643 DMEM media until a representation of 600 cells/sgRNA was achieved. In order to confirm the target 644 representation, a sample of 20,000,000 cells from each biological replicate were lysed and prepared for Next 645 Generation Sequencing (NGS) as described below. The cell library was then divided into two culturing 646 conditions, conventional DMEM media and Physiological HPLM media (media formulations are described 647 above). Simultaneous to the addition of these two media conditions, Cas9 expression was induced with 2 648 μg/mL doxycycline (reagent reference) for 7 days. Throughout Cas9 expression, cells for each condition were 649 cultured in T-225 flasks and passaged once the flasks reached 70% confluency. Between passages, a 650 minimum of 24,000,000 cells were re-seeded per biological replicate thus maintaining a representation of 300 651 cells/sgRNA for each media condition. The cells were supplemented with 2 μg/mL of doxycycline every two 652 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint days by exchanging the culturing media. On day 5 of Cas9 expression, the cell libraries under both media 653 conditions were seeded into five 6-well glass-bottom plates (Cellvis #P06-1.5H-N) at a density of 42,000 654 cells/cm2. A total of 14,000,000 cells across the three biological replicates were seeded in optical plates for 655 each media condition with the expectation that cell populations will double at least once before fixation. The 656 remainder of the cells were kept in T225 flasks and cultured until day 7 of Cas9 expression where a sample 657 of 20,000,000 cells from each biological replicate were lysed and prepared for NGS analysis. This analysis 658 was then used to determine sgRNA dropout rates due to lethal CRISPR events. 48 hours after being seeded 659 in optical plates, the cells were fixed with 4% paraformaldehyde in 1X PBS for 30 minutes, followed by in situ 660 sequencing (ISS) as described below. After RCA amplification in ISS, the cells were stained with cell 661 compartment-specific probes as described in Cell Staining and phenotypic images were acquired. The 662 disulfide-linked phenotypic probes were destained by cleaving the disulfide bridge between the probe and its 663 fluorophore with 50 mM TCEP (Thermo Fisher Scientific #363830100) in 2X saline-sodium citrate (SSC) for 664 45 minutes at room temperature. After probe destaining, the cells are washed three times with 1X PBS-T (1X 665 PBS + 0.05% Tween-20) before performing 12 cycles of ISS. 666 667 Synthesis of de-stainable phenotyping probes 668 Due to the spectral overlap between the fluorescent dNTPs required for ISS and the available fluorophores 669 for phenotypic markers, the probes used to label the mitochondria and the endoplasmic reticulum (ER) were 670 synthesized in house to include a disulfide bridge between the probe and its fluorophore that will allow for 671 cleavage of the fluorophore after imaging. For mitochondria labeling, the secondary anti-TOMM20 antibody, 672 F(ab')2-goat-anti-rabbit IgG (H+L) (Thermo Fisher #31239) was conjugated to Alexa Fluor 594-Azide (Thermo 673 Fisher #A10270). For ER labeling, the protein Concanavalin A (ConA) (Sigma Aldrich #C2010) was 674 conjugated to Cyanine 5-Azide (Lumiprobe #B3030). In the synthesis of these probes, we leveraged the 675 thermal stability and high specificity of the click chemistry reaction between dibenzocyclooctyne (DBCO) and 676 Azide groups. Hence, the anti-TOMM20 antibody and the ConA protein were functionalized for click chemistry 677 with the addition of a NHS-SS-DBCO (DBCO) (Sigma Aldrich #761532) molecule that subsequently reacted 678 with the azide groups linked to their respective fluorophores. Prior to functionalizing the probes, the anti-679 TOMM20 antibody and ConA protein were diluted to 1.1 mg/mL and 2 mg/mL in freshly prepared 0.1M sodium 680 phosphate solutions at pH 8.5 and 6.8 respectively. The DBCO was freshly dissolved to 10mg/mL in 681 anhydrous DMSO (Sigma Aldrich #227056). The diluted proteins and DBCO were then incubated for 2 hours 682 at 4 o C while shaking. The molar ratios and buffers for this reaction are listed in Table. 1. Following incubation, 683 the reaction was quenched with 2M Tris HCl (pH 7.4) at a 10% reaction volume. The resulting product was 684 purified using Zeba columns (Thermo Fisher #89883). Product retention after column purification was ~90%. 685 The azide linked fluorophores were diluted to 10mg/mL in anhydrous DMSO and reacted with their respective 686 functionalized probes at 3:1 molar ratio. This reaction proceeded for 20 hours at 4 o C while shaking; reaction 687 vials were protected from light during this incubation. The final product was purified by running each reaction 688 through three Zeba columns in order to do a final buffer exchange into 1X PBS. After synthesis the destainable 689 probes were stored at -20 o C. The A549-TetR-Cas9 cell line was created by simultaneously transfecting A549 cells with piggyBac 778 transposase and a piggyBac cargo plasmid containing TetR-inducible Cas9 (Addgene #134247), and 779 selecting for 7 days with 500 ug/mL G418. Single cells were sorted into 96-well plates (Sony SH800) and 780 expanded into colonies. An optimal clone was selected based on Cas9 activity, aiming for high and low activity 781 in the presence and absence of doxycycline, respectively. Cas9 activity was evaluated using the fluorescence-782 based reporter pXPR011 (Addgene #59702), which expressed GFP and cognate sgRNA to assess GFP 783 knockdown upon successful CRISPR activity. Fluorescence readouts of Cas9 activity were detected via 784 FACS and Indel-Sequencing. The HeLa-TetR-Cas9 cell line was a gift from Iain Cheeseman; this cell line is 785 a single cell clone selected for high Cas9 activity by transducing with the eGFP reporter mentioned above 786 (pXPR011) and using FACS to read out efficiency of protein knockdown. 787

Image processing 788
We used CellProfiler bioimage analysis software (version 4.1.3) 27 to process the images using classical 789 algorithms and Fiji (with openjdk-8) 68 for image stitching 69 and cropping. For the ISS images, we corrected 790 for variations in background intensity, aligned channels within cycles, and performed channel 791 compensation. For the phenotypic images, we corrected for variations in background intensity. We then 792 stitched the ISS and CellPainting images independently into a full-well view and cropped them into 793 corresponding pseudo-sites to account for the fact that they were imaged at different magnifications. 794 Corrected, pseudo-site images from both ISS and phenotypic images entered our final analysis pipeline 795 where they were aligned to each other, nuclei and cells were segmented using phenotypic images, ISS foci 796 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made https://github.com/broadinstitute/pooled-cell-painting-image-processing. 805

Image-based profiling 806
We processed outputs of CellProfiler into image-based profiles using scripts available at 807 https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe. It is highly configurable beyond the 808 configurations used for this report. The first step generates summaries of a variety of quality control metrics 809 about the image acquisition, modified Cell Painting, and in situ sequencing. The second step uses 810 pycytominer workflows to process the single cell features extracted using Cell Profiler. We median-811 aggregated the single cell profiles by guide for each plate independently. Next we defined the center and 812 scale parameters as the median and median absolute deviation of feature values from non-targeting control 813 perturbations, and then normalized the averaged profiles by subtracting the center value and scaling to the 814 standard deviation, for each plate independently. 815 We further processed the per-plate guide level profiles to create the per-screen profiles we use in our 816 analyses. We performed feature selection independently for each screen to eliminate noisy features and 817 retain the most informative features by filtering out redundant features (all features that have Pearson 818 correlation greater than 0.9 to a given feature), features with low variance, and features with missing values 819 across all the plates as previously described 72 . Then we median-aggregated each experiment's feature 820 selected per-plate profiles to obtain a unique profile per guide for each experiment. For perturbation-level 821 (gene-level) profiles, each experiment's guide-level profiles were median-aggregated. 822 823 Each dataset is independently welded to the recipe, effectively versioning the recipe, using a Template, 824 available at https://github.com/broadinstitute/pooled-cell-painting-profiling-template. Our A549 screen data 825 with versioned recipe is available at https://github.com/broadinstitute/CP186-A549-WG. Our HeLa screens 826 data with versioned recipe is available at https://github.com/broadinstitute/CP257-HeLa-WG. Code used for 827 further profile processing is in this paper repository at https://github.com/broadinstitute/2022_PERISCOPE. 828 829 Hit-calling, statistical analysis and distribution of hits 830 To determine the genes with significant signal above the noise (Hit-calling) we developed an algorithm to 831 compare the distribution of values per feature for all the guides targeting the same gene to a set of non-832 targeting control guides using the Mann-Whitney U-test. The number of features significantly different from 833 the non-targeting controls based on the statistical test (p-value 0.001) were added up to calculate profile 834 score for each perturbation. Then to ensure that the perturbations called significant are truly not null we 835 defined a control group called zero-TPM genes. Zero-TPM genes are the genes without significant 836 expression in a given cell line (with zero transcript per million (TPM)) and were determined based on the 837 RNA expression levels reported by the Broad Institute Dependency Map portal 31 . To obtain a false 838 discovery rate (FDR) of 5%, perturbations with profile scores above 95% of zero-TPM genes were 839 determined to have significant signal above the noise. Terms "whole-cell hits" and "compartment hits" were 840 used to distinguish between perturbations with significant signal in overall profile features or perturbations 841 with targeted signal in features from a specific cell compartment (based on one of the five fluorescent 842 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint markers). For whole-cell hits all of the collected features were used in the hit-calling process explained 843 above but for the compartment hits a subset of features from one cell compartment were used (including 844 texture, intensity, correlation, radial distribution, and granularity measures from that compartment). It is 845 important to note that a single perturbation can be a compartment hit, targeting simultaneously, two or rarely 846 even three compartments but still not be a whole cell hit (see Extended Data Figure  Comparison between pair-wise correlation of perturbations to other databases 854 To assess the ability of phenotypic profiles to recall known biological relationships, we calculated the 855 correlation between profiles as a measure of similarity and used it to perform two global assessments. 856 Considering the large number of features in each profile (1520 in A549,1597 in HeLa DMEM and 1709 in 857 HeLa HPLM datasets) and to improve the signal to noise ratio, principal component analysis (PCA) was 858 performed on the datasets to capture at least 70% of the variation. The resulting profiles were then used to 859 calculate the Pearson correlation coefficient between all hit perturbation profiles (gene-level). First, 860 annotated protein clusters were obtained from the 28.11.2022 CORUM4.0 database 29 . Clusters with at least 861 66% of the hit genes were identified using the gene symbols from both datasets (501 clusters in A549, 799 862 clusters in HeLa HPLM and 871 clusters in HeLa DMEM). Then all the correlations between each pair of 863 genes in a cluster were calculated. The distribution of all the correlations between profiles within clusters 864 versus the distribution of all the correlation between profiles from all hit genes were plotted in the figure. 865 Second, we performed a similar analysis based on the protein link scores as predicted by the STRING 866 database (v11.5, "9606.protein.links.v11.5.txt.gz" ) 30 . To start, protein IDs from STRING were mapped to 867 gene symbols using preferred_name extracted from the "9606.protein.info.v11.5.txt.gz" file. All the possible 868 pairwise correlations between hit gene profile with a reported link score in the STRING database were 869 calculated. Next, the correlations were binned into eight equally spaced bins and the distribution of the 870 STRING link scores for each bin were plotted using seaborn.boxenplot 73 in python. 871 872 Comparison to Cancer Dependency Map Data 873 From DepMap data, we divided genes expressed in A549 cells into essential and nonessential categories 874 based on Chronos gene effect scores 34 using a threshold score of -0.5 for gene essentiality and plotted the 875 distributions of essential and nonessential genes versus their morphological signal score. 876 877 UMAP clustering of the hit perturbation profiles 878 To evaluate and demonstrate the ability of morphological profiles to uncover biologically relevant 879 interactions and structures, UMAP (Uniform Manifold Approximation and Projection) algorithm was used to 880 project the hit gene profiles into a 2-dimensional plane. PCA was performed on the datasets to capture at 881 least 70% of the variation as described above before the application of the UMAP algorithm. The Python 882 library UMAP was used to apply the UMAP algorithm using "cosine" for parameter "metric". The details of 883 the parameters used are available on the GitHub repository. Some of the resulting clusters were manually 884 labeled to highlight some underlying interesting biology using gene ontology terms (biological processes 885 and cellular components) as listed on the GSEA-MSigDB web portal (http://www.gsea-886 msigdb.org/gsea/msigdb/human/collections.jsp#C5). 887 888 Hierarchical clustering of hit perturbation profiles and representative heatmaps 889 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint Correlations between morphological profiles is a powerful tool to extract biological insights from data sets. 890 For example, similarity (or dissimilarity) contains information regarding functional clusters, protein structure, 891 signaling pathways and their directionality. To this purpose, first, PCA was performed on the datasets to 892 capture at least 70% of the variation as described above followed by the selection of a subset of 893 perturbations associated with a functional gene-set as specified in each instance. Then the corr function 894 from the pandas library in Python was used to calculate the pairwise Pearson correlation coefficient of the 895 perturbation profiles for each dataset. The hierarchical clustering of the correlations and the plotting of the 896 heatmaps was performed using the seaborn's clustermap function in python. The ward variance 897 minimization was used as the clustering algorithm ('method') based on the 'euclidean' as the distance 898 metric. 899 900 For The p-values were calculated as described in the hit calling section and n refers to features significantly 917 different from the non_targetting controls (p-value 0.001). The code used to calculate the morphological 918 signal score as well as the list of perturbation scores for each dataset is available on the GitHub repository. 919 The EnrichmentMap application based on the Cytoscape v3.9.1 software platform was used to visualize the 920 enrichment maps (node cutoff Q-value 0.05). 921 922 Preranked GSEA analysis was performed to determine enrichment for biological terms based on 923 morphological profile similarity to a query gene of interest. Genes were ranked based on cosine similarity to 924 the profile of the query gene, then GO term enrichment was performed using the GSEApy package and the 925 "GO_Cellular_Component_2021" database. 926 927

Single Feature Screen Analysis 928
For each feature in the feature-selected dataset, genes were sorted by p-value (as generated during hit 929 calling) and a Top 20+ list was created for each feature that contained all genes with a p-value less than or 930 equal to that of the 20th gene. The Top 20+ list was assessed for GO term enrichment using the Python 931 GOATOOLS library 74 with the default Benjamini-Hochberg FDR correction. GO terms were considered 932 enriched if they had a p-value <.05 after an additional Bonferonni correction. Compartment-specific gene 933 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint lists were assayed for enrichment in the Top 20+ lists using a fisher exact test with a Benjamini-Hochberg 934 FDR correction from the Python SciPy library 75 . Plots were made with Python library Matplotlib 76 . For 935 exploration of Granularity features, guide-normalized but not feature-selected datasets were aggregated 936 with pycytominer and plotted with Seaborn 73 . Gene lists were taken from the Metabolic Atlas 77 . Granularity 937 features were visualized with Python SciPy and scikit-image 78 libraries as implemented in CellProfiler. 938

Atlas cell retrieval tool 939
Example single-cell image crops can be retrieved from any of the screens using a retrieval script included in 940 our paper repository at https://github.com/broadinstitute/2022_PERISCOPE. Images are retrievable by gene 941 name or sgRNA barcode sequence and example images can be chosen randomly or set to the most 942 representative cells for that barcode as determined by closest k-means clustering using scikit-learn 79 . 943 Individual channel crops are from corrected images on which the final analysis measurements are made. Comparison of the relative abundance of barcodes as quantified by NGS or in situ sequencing (R 2 = 0.92 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint

Main figures & figure legends
Use this link to access the high-resolution version of the figures: https://github.com/broadinstitute/2022_PERISCOPE/tree/main/High_Resolution_Figures . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ;

Figure 5. Identifying biological pathways using individual subcellular image features in HeLa datasets (a) GO enrichment is found in many individual features in a manner that is fairly evenly distributed across the cellular structures (i.e. channels) imaged in PERISCOPE. Outer ring is the total number of features in our feature-selected dataset. Inner ring is the number of features that show GO enrichment. (b) GO enrichment in individual features is not distributed evenly across classes of features. Outer ring is the total number of features in our feature-selected dataset. Inner ring is the number of features that show GO enrichment. (c)
Given gene groups whose protein products are expected to function specifically in a cellular structure imaged in PERISCOPE, are specifically enriched in hit lists for features in those compartments. Outer ring indicates the channel in which enrichment is expected. Inner ring is the breakdown of actual channels that show enrichment for the gene group. (d) Disruption of the Vacuolar ATPase (either V0 or V1 subunit) but not genes involved in its assembly causes a decrease in WGA signal in small structures as seen specifically with screen feature WGA_Granularity_1 but not larger granularities. Each trace is a single gene; those genes that are not hits in the screen are dashed. Bold lines are the mean of all genes in the group. (e) Loss of function in genes involved in N-Glycan synthesis in the Endoplasmic Reticulum but not in other organelles nor the GPI synthesis pathway causes an increase of ConA signal in small structures as seen specifically with screen feature ConA_Granularity_1 but not larger granularities.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint Extended Data Figure 9. Identifying biological pathways using individual subcellular image features in A549 dataset The A549 dataset shows minimal GO enrichment in individual features making distribution of enrichment across channels (a) and feature categories (b) difficult to interpret. Outer ring is the total number of features in our feature-selected dataset and inner ring is the number of features that showed GO enrichment for A and B. (c) Vacuolar ATPase protein products are expected to function specifically in the WGA channel and vATPase genes are specifically enriched in hit lists for features in those compartments. N.S. indicates no enrichment in that gene list. Outer ring indicates the channel in which enrichment is expected. Inner ring is the breakdown of actual channels that show enrichment for the gene group. (d) Disruption of the Vacuolar ATPase (either V0 or V1 subunit) but not genes involved in its assembly causes a decrease in Golgi/Membrane signal in small structures as seen specifically with screen feature WGA_Granularity_1 but not larger granularities. Each trace is a single gene; those genes that are not hits in the screen are dashed. Bold lines are the mean of all genes in the group. (e) Specific signal in granularity features is not observed for a loss of function in genes involved in N-Glycan synthesis in the Endoplasmic Reticulum. Visualization of the signal measured at each granularity is shown for Golgi/Membrane (f) and Endoplasmic Reticulum (g).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 7, 2023. ; https://doi.org/10.1101/2023.08.06.552164 doi: bioRxiv preprint