The proximal proteome of 17 SARS-CoV-2 proteins links to disrupted antiviral signaling and host translation

Viral proteins localize within subcellular compartments to subvert host machinery and promote pathogenesis. To study SARS-CoV-2 biology, we generated an atlas of 2422 human proteins vicinal to 17 SARS-CoV-2 viral proteins using proximity proteomics. This identified viral proteins at specific intracellular locations, such as association of accessary proteins with intracellular membranes, and projected SARS-CoV-2 impacts on innate immune signaling, ER-Golgi transport, and protein translation. It identified viral protein adjacency to specific host proteins whose regulatory variants are linked to COVID-19 severity, including the TRIM4 interferon signaling regulator which was found proximal to the SARS-CoV-2 M protein. Viral NSP1 protein adjacency to the EIF3 complex was associated with inhibited host protein translation whereas ORF6 localization with MAVS was associated with inhibited RIG-I 2CARD-mediated IFNB1 promoter activation. Quantitative proteomics identified candidate host targets for the NSP5 protease, with specific functional cleavage sequences in host proteins CWC22 and FANCD2. This data resource identifies host factors proximal to viral proteins in living human cells and nominates pathogenic mechanisms employed by SARS-CoV-2.

. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint SARS-CoV-2 proteins and provide insights into the pathogenicity of new or emerging 48 coronaviruses.

49
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. is either unknown or highly variable across differing coronaviruses, underscoring the need to 65 begin mapping their putative localizations and functions. 66 Proximity proteomics (BioID) uses enzymes, such as the modified bacterial biotin ligase, BirA, to 67 biotinylate nearby proteins on lysine residue-containing proteins within a radius of 10-20nm (4). 68 When fused to a protein of interest it labels not only proteins that directly bind the fused protein 69 but also those adjacent to it, enabling rapid isolation of biotinylated proteins whose identity can 70 provide clues about the localization and function of the protein studied. When coupled to mass 71 spectrometry it provides an alternative to traditional tandem affinity purification and mass 72 spectrometry (TAP-MS) (5). Whereas, TAP-MS can isolate protein complexes that stably bind 73 the protein of interest in a manner robust enough to survive protein extraction, BioID-MS labels 74 both transient and stable interactors in living cells, particularly those stabilized by cellular 75 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint membranes that can be destroyed in traditional TAP-MS experiments. In this way, BioID may 76 localize the cellular "neighborhoods" of a given fusion protein. We recently generated a biotin 77 ligase derived from Bacillus subtilis, which has 50 times greater activity than the original E. coli 78 BirA (4, 6), allowing decreased labeling times and increased signal-to-noise ratios. Applying 79 proximity proteomics to SARS-CoV-2 viral proteins in human cells may facilitate insight into their 80 localization and putative functions. 81 The actions of specific SARS-CoV-2-encoded proteins are only partially understood at present. 82 The replication transcription complex, which includes the RNA-dependent RNA polymerase and 83 other factors, and the structural proteins, which are necessary for protecting the newly 84 synthesized genomes and assembling the viral particles, comprise the core viral replication 85 machinery. Other viral gene products, generally termed accessory factors, are believed to be 86 dedicated to manipulating the host environment to foster viral replication (7). One of the main 87 functions of accessory factors is to block host antiviral response (8). Non-SARS-CoV-2 88 coronaviruses have also been shown to block host translation (9, 10), inhibit interferon signaling 89 (11, 12), antagonize viral RNA sensing (13, 14), and degrade host mRNAs (15). The degree of 90 homology between SARS-CoV-2 and other coronaviruses, suggests the existence of both shared 91 and divergent host protein interactions between its viral proteins and those of the other members 92 of the coronavirus family. 93 Here we used proximity proteomics to identify the human proteins vicinal to 17 major SARS-CoV-94 2 proteins and, from that data and validation studies, to predict their likely location and function. 95 We examined the intersection of the resulting atlas of human factors adjacent to SARS-CoV-2 96 viral proteins with risk loci associated with severe COVID-19 by genome wide association studies 97 (GWAS). This nominated specific, viral protein-adjacent host candidates whose natural variation 98 in expression may contribute to differences in COVID-19 susceptibility in the population. We also 99 demonstrated that multiple SARS-CoV-2 products can affect host translation and host innate 100 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint 6 immune signaling and define a list of potential host targets and pathways for the NPS5 protease. 101 Taken together, these resource data plot the location of the 17 major SARS-CoV-2 within the cell, 102 define an atlas of human host proteins adjacent to them, and offer insight into potential pathogenic 103 mechanisms engaged by SARS-CoV-2.

105
Host proteins proximal to viral proteins and their subcellular localization 106 To identify the human host proteins vicinal to the 17 major SARS-CoV-2 encoded viral proteins, 107 HA epitope tagged fusions of BASU-BirA (6) were generated with each of these 17 viral ORFs 108 (Fig. 1A). BASU was introduced at the N and C terminus to minimize disruption as previously 109 described (16). Samples were prepared from plasmid-transfected 293T cells after 2 hours of 110 biotin labeling and the biotinylated proteins were then isolated using streptavidin. Samples were 111 divided for LC-MS/MS and immunoblotting (Fig. S1). MS data search was performed and protein 112 lists were analyzed and scored using the Significance Analysis of Interactome (SAINT) method 113 (17). Using a cutoff of a SAINT score of 0.9 generated a list of 2422 host proteins (Fig. 1B, Fig.   114 Table S1) across the 17 viral proteins studied, 514 of which were unique to a specific viral 115 protein. These data (Table S2) comprise a compendium of candidate human proteins adjacent to 116 SARS-CoV-2-encoded proteins. 117 The identity of these 2422 human proteins provided clues to SARS-CoV-2 biology. Molecular 118 function analysis (Fig. 1B-D) identified processes associated with SARS-CoV-2 viral protein 119 impacts. This included translation initiation, RNA binding, the 26S proteasome, signaling, and 120 SNARE-associated intracellular transport. It also identified adjacencies to major histocompatibility 121 (MHC) proteins and components of the nuclear pore complex (NPC). A number of these 122 processes, such as protein translation, are known processes affected by coronaviruses, while 123 others, such as RNA-binding, are less well characterized.

124
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To begin to map putative localizations for the 17 studied SARS-CoV-2 proteins within the cell, 125 cellular component GO-term enrichment analysis was performed ( Fig. 2A), which pointed to 126 possible intracellular localizations for each viral protein based on curated knowledge of the host 127 proteins identified adjacent to each viral protein. To validate and extend this, protein fractions 128 were prepared from cells expressing each SARS-CoV-2 protein studied. These included four 129 overlapping fractions: a) cytoplasm b) cytoplasm/membrane c) nucleus/membrane, and d) 130 nucleus (Fig. 2B). Integrating GO-term analysis with immunoblotting of these fractions enabled 131 predictions of the likely intracellular localization of each viral protein (Fig. 2C). We further 132 confirmed NSP5 diffuse expression and ORF3a membrane localization through immunostaining. 133 Many SARS-CoV-2 accessory proteins concentrate in the ER or in ER-proximal membranes (M, 134 ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, and ORF10). A number, however, appear to be 135 predominantly cytoplasmic (NSP1, NSP2, NSP5, NSP9, NSP15, ORF9b) and, interestingly, 136 several appear to localize in part to the nucleus (NSP14, ORF6, ORF9c). The localization 137 predicted from these data is consistent with observations from other recent work (16, 18). Of the 138 membrane localized proteins, subtle differences in location could be inferred. In the case of M 139 protein, association with membranes in the endocytic pathway as well as lysosomal membranes 140 was predicted. ORF8 and ORF10 clustered similarly with enrichment for ER interactions in the 141 lumen. These data indicate that specific SARS-CoV-2 may display increased localization to a 142 variety of intracellular sites, including the cytoplasm, nucleus and distinct endomembranes. 143 Viral proximal interactors include drug targetable host genes. 144 There is a lack of SARS-CoV-2 specific antiviral therapies or against coronaviruses generally. 145 Many current and experimental therapeutics were developed for activity against other viruses and 146 are being tested for cross efficacy against SARS-CoV-2. Others are therapies known to have 147 broad antiviral effects. There is significant interest in developing drugs that directly target SARS- 148 CoV-2 viral proteins, but research and development may take years before use in patients. 149 Another approach is using drugs against host genes critical to virus infection and replication. For 150 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint 8 example, drugs targeting ACE-2, the main receptor for SARS-CoV-2, or ACE-2 expression and 151 function have been pursued. To expand the list of possible drugs beyond entry inhibitors, we 152 compared the viral proximal proteome generated in this study against the "druggable" genome, 153 which include databases of the gene targets of available drugs. This generated a list of 47 host 154 genes (Fig. S3, Table S3) and highlights, as previously reported (16)  The resulting disease risk-linked variants were further distilled to those identified as expression 174 quantitative trail loci (eQTLs) for specific putative eGene targets (Fig. 3A). These eGenes, which 175 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint represent a set of genes whose expression may be controlled by natural variants in the human 176 population linked to COVID-19 risk, were then intersected with the atlas of host factors identified 177 as adjacent to SARS-CoV-2 viral proteins by proximity proteomics. Publicly available protein 178 interaction data was then integrated to project the connectedness of resulting gene set (Fig. 3B). 179 The resulting network was notable for host proteins implicated in cytokine signaling, cell cycle 180 control, transcription, and translation, suggesting that genetic susceptibility to COVID-19 may link 181 to variations in the expression of proteins that mediate these processes. 182 Among proteins identified by this analysis was TRIM4, a RING E3 ligase, that activates type I 183 interferon signaling through activation of the cytosolic RNA sensor RIG-I. TRIM4 was significantly 184 associated with SARS-CoV-2 M protein in proximity proteomics data (Table S1) and, using  proximal proteome (Fig S2C). To nominate possible host targets of NSP5 whose levels are 258 decreased upon protease expression, we performed SILAC mass spectrometry comparing wild 259 type SARS-CoV-2 NSP5 to the catalytically-inactive NSP5 C145A mutant (16, 46). Residue 145 is 260 the critical catalytic cysteine and mutation to alanine prevents protease activity (47). A number 261 of host proteins showed significant depletion in cells expressing wild type NSP5, but not protease-262 inactive NSP5 C145A (Fig. 5A). Combining both data generated identified an additional 26 263 candidates resulting in a pool of 60 potential host protein targets for NSP5 (Fig 5B). 264 To begin to examine potential cleavage of these candidate proteins by NSP5, we searched their 265 peptide sequences for potential cleavage sites using a published a cleavage prediction algorithm 266 (48). We then took these peptide sequences and tested them for cleavage by NSP5 using a loss 267 of fluorescence resonance energy transfer (FRET) fluorescence assay. In brief, potential 268 cleavage sites were inserted between a FRET pair and then this construct was co-transfected 269 along with plasmids expressing either NSP5 or the NSP5 C145A , with loss of FRET signal only after 270 wild type NSP5 expression as indicative of cleavage. Four sequences taken from SARS-CoV-2-271 ORF1AB polyprotein, which is normally cleaved by NSP5, were cleaved as expected and as 272 demonstrated by loss of FRET signal (Fig. 5C-D). Testing of sequences from human CDKN2AIP, 273 CWC22, FANCD2, and P53 proteins indicated NSP5 cleavage of one CWC22 and two FANCD2 274 peptide sequences (Fig. 5C-D). Neither CDKN2AIP nor P53 sequences tested were cleavable 275 by NSP5 in our assay and their depletion in the SILAC data may represent indirect effects of 276 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made proteins, namely CWC22 and FANCD2, that are involved in these processes. Here we present a compendium of human host proteins adjacent to 17 SARS-CoV-2 viral 294 proteins, with a goal to offer insight into potential mechanisms that these viral proteins may 295 engage during pathogenesis. These data encompass the less well understood SARS-CoV-2 296 accessory factors and predict the localization of each these viral proteins as well as identify 297 significant adjacencies to proteins that mediate core cellular processes, including translation, 298 signaling, RNA interactions, and intracellular transport. For translation, SARS-CoV-2 NSP1 was 299 found to be adjacent to subunits of the EIF3 translation initiation complex and proved a broad 300 inhibitor of translation. For innate immune signaling, viral ORF6 was found proximal to the RLR 301 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made proteins whose levels were decreased by this viral protease and nominated cleavage sequences 307 in human CWC22 and FANCD2, implicating specific candidates for viral disruption of normal pre-308 mRNA splicing and DNA damage pathways, respectively. We also observed a number of SARS- (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint classified as the main protease. They are both necessary for the processing of the ORF1ab 378 polyprotein containing the viral replicase proteins. NSP5 shows similarity to proteases found in 379 picornaviruses and noroviruses (76). Beyond their importance in viral replication, these viral 380 proteases can target host proteins containing their target residues (77). NSP5 recognizes certain 381 glutamine-serine/alanine/glycine residues, with added specificity being determined by two to three 382 flanking residues (48). Picornavirus virulence has been shown to be mediated in part by 3C 383 protease cleavage of host proteins (44). Using both BioID and SILAC metabolic labeling followed 384 by mass spectrometry, we sought to identify candidate host proteins and use a modified FRET-385 based cleavage assay to determine if these candidates contained sequences cleavable by NSP5. 386 We identified human CWC22 and FANCD2 as candidates; both proteins contained sequences 387 that could be cleaved by NSP5 in an assay used here which can be used to rapidly assess other       For NSP1 translation assays, in-vitro transcribed transcripts were generated by first PCR 617 amplifying DNA containing T7 promoter followed by UTR or IRES elements and firefly or renilla 618 luciferase. Second, using HiScribe™ T7 ARCA mRNA Kit (with tailing) (NEB) capped and 619 polyadenylated transcripts were synthesized. 5x10 5 293T cells were transfected with 2ug of 620 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made VATLQAENV, was found to be shared in the SARS-COV-2 protein sequence and was also used 642 as a control.

643
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 23, 2021. ; https://doi.org/10.1101/2021.02.23.432450 doi: bioRxiv preprint