Subcellular mapping of the protein landscape of SARS-CoV-2 infected cells for target-centric drug repurposing

The COVID-19 pandemic has resulted in millions of deaths and affected socioeconomic structure worldwide and the search for new antivirals and treatments are still ongoing. In the search for new drug target and to increase our understanding of the disease, we used large scale immunofluorescence to explore the host cell response to SARS-CoV-2 infection. Among the 602 host proteins studied in this host response screen, changes in abundance and subcellular localization were observed for 97 proteins, with 45 proteins showing increased abundance and 10 reduced abundances. 20 proteins displayed changed localization upon infection and an additional 22 proteins displayed altered abundance and localization, together contributing to diverse reshuffling of the host cell protein landscape. We then selected existing and approved small-molecule drugs (n =123) against our identified host response proteins and identified 3 compounds - elesclomol, crizotinib and rimcazole, that significantly reduced antiviral activity. Our study introduces a novel, targeted and systematic approach based on host protein profiling, to identify new targets for drug repurposing. The dataset of ∼75,000 immunofluorescence images from this study are published as a resource available for further studies.


Introduction
The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causing COVID-19 has led to more than 5 million deaths (https://covid19.who.int/) and initiated an unforeseen health and socioeconomic crisis (1). Such continued threats by emerging pathogens emphasize the need for new approaches to comprehensively identify therapeutic targets and drug candidates.
Upon infection, viruses commonly hijack host cell machinery to enable replication. This leads to re-organization of the cellular proteome composition (2,3) such as up-or downregulation of signaling pathways (4-6) which can be quantified by large-scale omics methods. Additionally, host cells may alter abundance of proteins related to cellular defense and homeostasis. Accumulating evidence highlights the importance of host protein translocations from one organelle to another during viral infection, contributing either to host protection or viral replication (7). While bulk omics analyses capture the systematic changes, they lack spatial resolution at a single cell level and thereby information about infectioninduced phenotypic changes of cellular components and translocations of specific host proteins within the cell. This information can provide deeper insight into the host cell response to viral infection since protein location is often correlated with function (8)(9)(10)(11).
Employing systematic in situ methodologies for studying virus-host interactions can also reveal host protein targets required by the virus for further replication and thus have implications for antiviral drug target identification, as well as drug repurposing and discovery efforts.
Publications during the first year of the pandemic have reported hundreds of host cell proteins directly interacting with at least one of the 31 SARS-CoV-2 viral proteins (12)(13)(14)(15)(16)(17)(18). These interactions were mainly studied by affinity capture methods and protein tagging of bait proteins.
In this study we investigated host cell responses upon infection using immunofluorescence and antibodies from the Human Protein Atlas (HPA) (8,9,19,20) to map the changes of host protein abundance levels and subcellular localization upon infection with SARS-CoV-2.
Additionally, we selected existing and approved small-molecule drugs against our identified altered host proteins and identified compounds with antiviral activity.
Our study introduces a novel systematic approach based on spatial protein profiling to identify novel host targets for drug repurposing, here demonstrated for SARS-CoV-2.

SARS-CoV-2 infection affects diverse cellular functions and pathways
In order to better understand the interplay between SARS-CoV-2 and the host cell machinery, we developed an image-based assay to detect SARS-CoV-2 infection in Vero E6 cells, followed by high-resolution immunofluorescence microscopy to investigate changes in the subcellular localization and abundance of host cell proteins upon infection (Figure 1a). The host cell proteins were selected and included in the host-response screen based on literature mining of previous reports identifying cellular proteins interacting with SARS-CoV-2 proteins (14)(15)(16)(17)(18). Building on the unique antibody resources generated within the HPA project (www.proteinatlas.org) and our established workflow for systematic subcellular mapping of proteins (8,19), 602 antibodies targeting proteins encoded by 662 genes (25 multi-targeting for highly similar proteins within the same family) were immunostained in mixed populations of infected and non-infected Vero E6 cells 24 hours after introduction of SARS-CoV-2 to the cell culture (Supplementary table 1). For the multi-targeting antibodies, at least one of the targets had previously been identified to interact with SARS-CoV-2. After confocal microscopy, images were uploaded into an in-house developed Covid Image Annotator tool in the ImJoy platform (21). Using a DPNUnet model, individual cells were segmented and labeled as infected and non-infected based on staining intensity of the SARS-CoV-2 nucleocapsid protein. Host protein staining intensities were then quantified and compared between infected and non-infected cells to identify altered protein abundance.
Staining patterns were manually annotated to assess changes in subcellular location between the populations (Figure 1a, Supplementary Figure 1). The complete image dataset of the host response screen is available at Figshare data portal (22).
By mapping the host response of 602 proteins, we identified 97 proteins with either changed subcellular location (spatial redistribution) or altered abundance (defined as a significant difference in staining intensity) between the infected and non-infected cells. By combining our observations for the 97 proteins with the results on host cell protein interactions from literature with specific viral proteins 18 we generated a network map of the host cell-virus interactions using Cytoscape (23) (Figure 1B).
Among the 97 proteins, most connections are linked to the SARS-CoV-2 components ORF3, ORF7B and membrane (M) protein. The M protein is one of the structural components common to all coronaviruses. ORF3 and ORF7B belong to the accessory factors, which tend to be non-essential for viral replication, but important for pathogenesis and virus-host interactions (24,25), and both ORF3 and ORF7B have been shown to modulate host cell immune responses (26,27). The host responses to SARS-CoV-2 are associated with multiple cellular components and diverse cellular functions ( Figure 1B). Grouping and functional enrichment analysis of responding proteins includes proteins localizing to endosomes and mitochondria, and for factors involved in Golgi ribbon formation, Ras signaling, TLR4 signaling, heat shock response, lipopolysaccharide response, heparin metabolism and chloride membrane transport according to the Gene Ontology (GO), Reactome, and KEGG databases. This is in agreement with the known ability of coronaviruses to manipulate the host cell at various levels (28,29).
For example, multiple host proteins corresponding to the following genes STX6, VAMP4, VPS35, RAB5A, RAB5B, CLIP1 and GJA1 that are known to be involved in endosomal functions, displayed increased abundance as well as subcellular re-location following SARS- Table 1, Figure 1B). This observation is in agreement that endosome formation is known to be important for host cell entry and systemic infection by coronaviruses (30,31). Furthermore, SARS-CoV-2 infection resulted in increased abundance of the proteins from the EXOSC3 and EXOSC5 genes, core components of the RNAexosome complex, which plays a major role in RNA homeostasis in the cytosol and nucleus, including quality control, degradation and processing of different RNA species (32). Both EXOSC3 and EXOSC5 interact with SARS-CoV-2 NSP8 protein which is a cofactor of the viral RNA-dependent RNA polymerase (33), suggesting these host proteins might play a role in viral RNA replication. Other recent studies have also reported direct interactions between host proteins and viral RNA and how SARS-CoV-2 infection profoundly remodels the cellular RNA-bound proteome (34).

CoV-2 infection (Supplementary
We also identified three proteins from the toll-like receptor 4 (TLR4) signaling pathway, IRAK3, RIPK1 and NFKBIA, showing increased levels upon SARS-CoV-2 infection. TLR4 recognizes pathogen-associated molecular patterns (PAMPs) and activates innate immune systems by releasing proinflammatory cytokines via a series of events (35,36). One of the major transcriptional response to cellular stress is mediated by the heat shock response and we detected chaperones HSPA1A, HSPA9 and HSBP1 with increased staining intensity which thus suggested higher abundance upon infection, suggesting the stress response activation in infected cells as reported by previous studies (37)(38)(39)(40)(41).
We observed increased abundance of HS2ST1 and GLCE. From the work by Gordon and Stukalov the SARS-CoV-2 proteins NSP7 and ORF7B are shown to interact with HS2ST1 and GLCE, respectively (15,18). These proteins are known to be involved in heparin metabolism, and heparin being an anticoagulant and anti-inflammatory protein (42). Our data showing increased abundance in infected cells, supports the importance of heparin metabolism against the viral infection (43). (Interleukin-1 receptor-associated kinase), a marker for inflammation and innate immune system regulator displayed higher abundance in the plasma membrane, Golgi apparatus and ER compartment following infection ( Figure 2C). Another target displaying significantly increased abundance was SRP72 with increased expression in both the cytosol and ER (Figure 2A).

Majority of host proteins show increased protein abundance and diverse spatial reorganization upon infection
Taken together our data reveal alterations in abundance for many host proteins with the majority showing higher abundance upon infection. Other proteomics studies recently published also report on a large number of significantly altered proteins upon SARS-CoV-2 infection, however with varying fraction of upregulated versus downregulated proteins depending on time point of measurement (6,14).
Further, we identified 42 proteins undergoing spatial reorganization upon infection, among which 22 proteins also showed increased abundance. A circos plot representing the reorganization of host cell proteins upon infection is shown in Figure 2D. Upon infection, massive reorganization of proteins occurs with a large number of proteins relocating to Golgi, ER and cytosol. For example, HSPA9 is a mitochondrial residing heat shock protein, which is partially translocating to the cytosol upon infection ( Figure 2E). Furthermore, the cytosolic levels vary within the population of infected cells, which could potentially be linked to viral replication cycle stage. A second example is NUP98, a protein in the nuclear pore complex, which undergoes spatial reorganization from nucleus and vesicles in non-infected cells to nucleus, vesicles and Golgi apparatus in SARS-CoV-2 infected cells ( Figure 2F). Also for the ER resident protein GANAB, the staining indicates a translocation to the Golgi Apparatus ( Figure 2H). However, this may be rather a result of changed ER morphology upon infection, as the target protein stain overlaps with the ER marker used in the screen. A third example is CMPK1, a nuclear resident protein that localizes to the Golgi apparatus in noninfected cells, but also to vesicles in infected cells. Among the proteins with both spatial reorganization and altered abundance is the chaperone HSPA1A, which displays increased abundance as well as redistribution from vesicles in non-infected cells to cytosol and plasma membrane in infected cells ( Figure 2G). Altogether, our host-response screen of 602 host cell proteins identified 97 proteins with altered protein abundance and/or subcellular distribution 24 hours post SARS-CoV-2 infection.
As mentioned above, instead of carrying all necessary elements for replication and spread, viruses hijack host cell machinery. Thus, we hypothesize the identified host cell proteins with altered spatial or expression profile to be putative targets for modulation to limit viral infection and spread.

Drug repurposing based on host-virus interplay mapping reveals antiviral activity of rimcazole, elesclomol and crizotinib.
In order to identify any available drugs designed to target the putative host proteins, the SPECS repurposing library was selected as a collection of annotated drugs. The library has been gathered based on the design criteria of the Broad Repurposing collection (44) Table 2).
To test the antiviral activity of the 123 drug repurposing candidates against SARS-CoV-2, our host response image based assay was transferred from 96 to 384 well plates and supplemented with a compound treatment step ( Figure 3A). Vero E6 cells were infected with SARS-CoV-2 in suspension and seeded onto pre-spotted compounds in duplicate for 24 hours. Infected cells treated with DMSO served as a control for infection baseline. Cells were immunostained for SARS-CoV-2 Spike protein, Calreticulin as an ER marker, and by Hoechst to identify cell nuclei. Images were subsequently acquired by high-content confocal microscopy ( Figure Table   1). The remaining ten hit compounds showed varying activity by reducing the infection rate from 65% to 39% (Figure 3B). Cell viability was reduced to 59-65 % during the treatment with crizotinib, epalrestat, ranirestat and SMI-4a, but not with the rest of the hit compounds, highlighting the need for dose-response activities to identify drug therapeutic windows ( Figure 3C, Supplementary Figure 4). Altogether, this data presents a target-centric workflow and the identification of 13 compounds as repurposing candidates against COVID-19.

Discussion
Diverse families of viral pathogens are well known to alter host proteome organization as part of their replication cycle (3,28) and accumulating evidence highlights the importance of protein translocations during viral infection (2). For better understanding of the host response to infection it is a golden standard to investigate virus and host proteins using bulk systematic methods such as quantitative proteomics (47,48) or sophisticated but low-throughput microscopy to unravel structure and interactions between specific viral and host cell structures or proteins at low scale (49,50). Some studies have explored the subcellular localization of the individual viral proteins using tagged versions of the proteins and immunofluorescence (47). While this adds important information about the viral proteins and complement the affinity capture methods used for studying interactions, it does not reveal changes in subcellular distribution of the host cell proteins. Most studies focusing on identifying antivirals and treatments neglect the spatial as well as single-and sub-cellular information on a systematic scale.
In this work, we leveraged spatial information at subcellular resolution during infection to build an approach for systematically shortlisting host proteins as potential antiviral targets.
Mapping subcellular changes during infection enables a view inside intimately balanced homeostasis and its disturbances on an organelle, biomolecule or protein level that can guide therapeutic target or drug discovery (7). We present a novel systematic spatial profiling approach of SARS-CoV-2 infected cells to map the in situ landscape of host responses and subsequently demonstrate its opportunities for target-specific drug repurposing. In fact, our study feeds several potential antiviral targets to future follow-up studies as well as for drug discovery. Our data-driven approach also differs from conventional hypothesis-driven drug discovery where often a single specific target is chosen for drug screening.
By utilizing the antibody resources and expertise gathered within the HPA project and selecting antibodies specific to host proteins with previously validated interactions with SARS-CoV-2, we mapped abundance and re-localization of host responses to SARS-CoV-2 infection. Of the 602 proteins we studied, 97 changed in abundance and/or spatial relocalization, illustrating multifaceted responses in these interactions. Most proteins responded to SARS-CoV-2 with increased rather than reduced abundance which could reflect either activated cellular defense mechanisms or virus-orchestrated support to its replication machinery. We then considered the host responses as potentially druggable phenotypes, matched proteins with existing drugs from the SPECS repurposing library and identified elesclomol, rimcazole and crizotinib as drug repurposing candidates. Importantly, the approach with HPA antibodies covering most of the relevant proteome enables larger targetfocused systematic screening campaigns for the discovery of new host-virus biology.
Thorough validation of target role in a disease state is undoubtedly a critical step in classical drug discovery, however studying solely the change in protein abundance and/or spatial Rimcazole is a carbazole derivative acting partially as a SIGMAR1 antagonist with additional affinity for dopamine transporters which was discontinued as an anti-schizophrenia drug in early 1980s due to lack of efficacy (53). In fact, other SIGMAR1 inhibitors, but not rimcazole, have been previously identified as antivirally active in in vitro repurposing screens against HCV, Ebola virus (EBOV) and coronaviruses, including SARS-CoV-2 (54-58). However, concerns were recently raised around SIGMAR1 inhibitors when induction of phospholipidosis was described to underlie the antiviral activity of most of the proposed repurposing candidates (59). Similarly to crizotinib, rimcazole is the only one out of 22 SIGMAR1 inhibitors that presented antiviral activity.
Elesclomol is mainly known as a pre-clinical anti-cancer drug candidate inducing oxidative stress in mitochondria (60), triggering apoptosis in cancer cells and activation of heat shock proteins and signaling pathway (61) . Increased abundance of HSPA1A gene encoding for the major cytosolic HSP70 protein upon SARS-CoV-2 infection, as well as elesclomol antiviral activity indicates HSP70 putative role in the virus infection cycle. We speculate the elevated expression levels of heat shock proteins to be part of the cellular defense mechanism, aiding in targeting viral proteins for degradation, rather than assisting in folding of the viral proteins.
A limitation of this study is that results are based on data from the non-human cell line Vero E6, originating from the African green monkey. This cell line is known for its dampened innate immune response and permissiveness to SARS-CoV-2, which makes it a feasible but a limiting virology model (62). When comparing subcellular localization of the hits from the host-response screen between non-infected Vero E6 cells and the immunofluorescence data on human cell lines as previously generated within the HPA, 80% of the patterns overlap between the species (data publicly available at www.proteinatlas.org). Looking at staining similarities across all proteins (n=546) stained in Vero E6 in addition to human cell lines within the HPA, 75% show overlap in subcellular localization. Due to the inter-species similarities, we speculate that host proteome landscape responses to SARS-CoV-2 infection in a partially similar manner in human cell lines.
In this work we performed targeted phenotyping of disease relevant proteins as a funnel to guide target selection for drug repurposing or discovery. Further, we suggest that "targeted phenotyping" can be used to prioritize host targets for novel drugs, in this case for the treatment of COVID-19.
Our approach can be applied as a stand-alone filter or as an integrated layer in multi-omics study for the selection of relevant host responses in infectious diseases. Given that the approach is easily scalable and transferable for infectious agents or other diseases beyond SARS-CoV-2, we anticipate that the spatial dimension will support fitting a crucial piece in the puzzle of various diseases.

Cell culture and virus infection
Vero E6 cell line was grown at 37°C in a 5% CO 2

Antibody selection
Validated antibodies from the HPA project were blasted against the Chlorocebus sabaeus sequence from Ensembl. Proteins with more than 60% identity across the whole length of the antigen sequences used to generate the HPA antibody were selected.

Host response screen
Cells were fixed, permeabilized and stained as previously described (19). Briefy, fixed cells were washed with PBS, permeabilized using 0.1% Triton X-100 (Sigma Aldrich) in PBS for a total of 15 min with new Triton solution added every 5 minutes (3 x 5 minutes). very and

Image acquisition
Immunostained cells were imaged in PBS using a laser-based Opera Phenix high-content microscope (PerkinElmer) in confocal mode with a 63X water objective. Nine to twelve fields of view were imaged per well (corresponding to a few hundred to thousand individual cells), at three different z-planes to ensure proper focus throughout the automatic image acquisition across the plates. Raw 16-bit images were exported as TIFF files and the z-planes were combined to create max projections prior to analysis. All raw data images including all For the drug repurposing screen images were additionally acquired with a 10X air objective and four fields of view per well for inclusion of the entire cell population per well.

Cell segmentation and Image quantification host protein interaction screen
The acquired images were transferred to an application built in the ImJoy platform (21) (a server-based web application) for manual annotation of the subcellular locations of each protein under investigation.

For quantification of relative protein expression in infected and non-infected cells, images
were segmented to identify individual cells, as well as the regions of the nucleus and the cytoplasm. We used a DPNUnet model trained with manually segmented HPA images) to generate binary cell masks for each image. The segmentation masks were generated for the nucleus by using the DAPI channel and the whole cell by using the ER channel as input. The cytoplasm was defined by subtracting the nuclear region from the whole cell. Intensities for the target protein, ER and SARS-CoV-2 channels were then quantified separately for these regions, both as mean and integrated values. To define infected and non-infected cells, we calculated the mean pixel values for each cell by using the third quartile (51% to 75% highest values (above the median) pixel values for the virus channel in the region of the cytoplasm.
Based on the value a threshold is set to define whether cells are infected or not. The segmentation panel marks the infected and non-infected cells differently to verify the segmentation model and enables further fine tuning of the threshold value if needed. For each well, protein staining intensity was quantified for infected and non-infected cell populations and the fold change between the populations was calculated. A t-test was done to calculate the statistical significance between the infected and non-infected populations and the p-value was adjusted for false discovery rate with Benjamini Hochberg. Proteins with adjusted pvalue < 0.01 and log2 of fold change > 1 were considered upregulated. Proteins with adjusted p-value < 0.01 and log2 of fold change < 1 were considered downregulated. Violin plots were generated and the overall distribution of fold change for the analysed proteins were displayed in a volcano plot.
During manual annotation of protein subcellular localization, annotators assigned the population of infected and non-infected cells to one or multiple subcellular organelles, which included nucleoplasm, nuclear membrane, nucleoli, nucleoli fibrillar center, nuclear speckles, nuclear bodies, kinetochore, mitotic chromosome, endoplasmic reticulum, Golgi apparatus, vesicles, peroxisomes, endosomes, lysosomes, intermediate filaments, actin filaments, focal adhesion sites, microtubules, microtubule ends, cytokinetic bridge, midbody, midbody ring, cleavage furrow, mitotic spindle, primary cilia, centriolar satellites, centrosome, lipid droplets, plasma membrane, aggresome, cytosol, mitochondria, cytosol, cytosolic bodies and rods and rings. The proteins were categorized as spatial hits if the subcellular locations were annotated differently between non-infected and infected cell populations. Ontology.

Quantification of SARS-CoV-2 infection upon compound treatment
Image analysis was performed using Harmony software (PerkinElmer). Cell nuclei were identified using the Hoechst 33242 channel, through application of the "Find nuclei" algorithm. Cell boundaries were identified using the Alexa 647 channel detecting Calreticulin signal through the application of the "Find cytoplasm" algorithm. To distinguish infected and non-infected cells, average intensity of the Alexa 555 channel detecting SARS-CoV-2 Spike protein was calculated for the perinuclear space by a 60% increased area around the nuclei with applied threshold. After image analysis, per-well data was used to calculate cell viability and infection rate. Cell viability was calculated as percentage from the total number of cells in DMSO-treated control and infection rate was calculated as SARS-CoV-2 + cell percentage from the total number of cells in corresponding treatment. Data was plotted using Graphpad Prism software.

Drug library annotation
The SPECS repurposing library was obtained from Chemical Biology Consortium Sweden.
The 5291 available drugs were annotated using the CLUE API service     infected (I) cells. P-value more than 0.05 = ns, less than or equal to 0.05 = *, less than or equal to 0.01 = **, less than or equal to 0.001 = *** and less than or equal to 0.0001 = ****.