An Autoantigen-ome from HS-Sultan B-Lymphoblasts Offers a Molecular Map for Investigating Autoimmune Sequelae of COVID-19

To understand how COVID-19 may induce autoimmune diseases, we have been compiling an atlas of COVID-autoantigens (autoAgs). Using dermatan sulfate (DS) affinity enrichment of autoantigenic proteins extracted from HS-Sultan lymphoblasts, we identified 362 DS-affinity proteins, of which at least 201 (56%) are confirmed autoAgs. Comparison with available multi-omic COVID data shows that 315 (87%) of the 362 proteins are affected in SARS-CoV-2 infection via altered expression, interaction with viral components, or modification by phosphorylation or ubiquitination, at least 186 (59%) of which are known autoAgs. These proteins are associated with gene expression, mRNA processing, mRNA splicing, translation, protein folding, vesicles, and chromosome organization. Numerous nuclear autoAgs were identified, including both classical ANAs and ENAs of systemic autoimmune diseases and unique autoAgs involved in the DNA replication fork, mitotic cell cycle, or telomerase maintenance. We also identified many uncommon autoAgs involved in nucleic acid and peptide biosynthesis and nucleocytoplasmic transport, such as aminoacyl-tRNA synthetases. In addition, this study found autoAgs that potentially interact with multiple SARS-CoV-2 Nsp and Orf components, including CCT/TriC chaperonin, insulin degrading enzyme, platelet-activating factor acetylhydrolase, and the ezrin-moesin-radixin family. Furthermore, B-cell-specific IgM-associated ER complex (including MBZ1, BiP, heat shock proteins, and protein disulfide-isomerases) is enriched by DS-affinity and up-regulated in B-cells of COVID-19 patients, and a similar IgH-associated ER complex was also identified in autoreactive pre-B1 cells in our previous study, which suggests a role of autoreactive B1 cells in COVID-19 that merits further investigation. In summary, this study demonstrates that virally infected cells are characterized by alterations of proteins with propensity to become autoAgs, thereby providing a possible explanation for infection-induced autoimmunity. The COVID autoantigen-ome provides a valuable molecular resource and map for investigation of COVID-related autoimmune sequelae and considerations for vaccine design.


Introduction
and 6 14-3-3 proteins. The majority of the proteins in these families have been reported as autoAgs (Table   1). For example, all hnRNP and snRNP proteins identified by DS-affinity in this study are among the bestknown nuclear autoAgs. Interestingly, autoAgs included in clinical diagnostic autoimmune disease ANA screening panels, such as SSB (lupus La), SNRPD1 (Sm D1), SNRPD3 (Sm D3), histones, and TOP1, are all identified in this study by DS-affinity enrichment from HS-Sultan cells.
In addition to proteins, such as ribosomal and ribonucleoproteins, that can be consistently identified from a variety of cell types, HS-Sultan B lymphoblast cells give rise to a large number of unique DS-affinity protein categories. In particular, many proteins associated with biomolecule biosynthesis are identified. Overall, HS-Sultan cells appear to be especially rich in biosynthetic protein machinery, which may explain the rapid proliferation of these cells in Burkitt lymphoma.
Thirteen aminoacyl-tRNA synthetases were identified by DS-affinity from HS-Sultan cells, including AARS, DARS, ERPS, FARSB, GARS, HARS, KARS, NARS, PUS1, SARS, VARS, WARS, and YARS. Ten of these are already known autoAgs (Table 1), although we suspect that the remainder will also likely be autoAgs. This group of proteins are the culprits of antisynthetase syndrome, an autoimmune disease characterized by autoantibodies against one or multiple tRNA synthetases. Antisynthetase syndrome is a chronic disorder that can affect many parts of the body, with common symptoms including myositis, interstitial lung disease, polyarthritis, skin thickening and cracking of fingers and toes, or Raynaud disease. Antisynthetase syndrome has been reported in a case report of COVID-19 interstitial lung disease [39].
Of these 315 proteins, 209 are up-regulated and 248 are down-regulated at protein and/or mRNA levels, and 95 are in the interactomes of individual SARS-CoV-2 viral proteins. Because the COVID-19 multi-omics data have been obtained with various methods from different infected cells or patients, there are proteins found up-regulated in some studies but down-regulated in other studies, but nevertheless, these proteins are affected by the infection and thus considered COVID-affected in our analysis (Supplemental Table 1).
The COVID-affected DS-affinity proteins are highly connected (Fig. 3). By STRING analysis, these 315 proteins exhibit 2,507 interactions at high confidence level, which is significantly higher than randomly expected (1,002 interactions) with PPI enrichment p-value <1.0E-16. The proteins are primarily associated with RNA and mRNA processing, translation, vesicles, and vesicle-mediated transport (Fig. 3), which is consistent with our findings in other cell types [1,2,8]. In addition, these proteins are enriched in protein folding, peptide biosynthesis, granulocyte activation, emerin complex, IL-12 mediated signaling pathway, CDC5L complex, and metabolic reprogramming (Fig. 2B).

AutoAgs that interact with SARS-CoV-2 components
There are 95 DS-affinity proteins found in the interactomes of various SARS-CoV-2 proteins (Fig. 5), meaning that these proteins can interact directly or indirectly with viral components. They appear to be intimately involved in protein metabolism, including 17 proteins related to peptide biosynthesis, 25 related to protein folding, 29 related to protein localization, and 22 related to proteolysis. In addition, these proteins are associated with the symbiont viral process, translational initiation, protein deubiquitination, protein stabilization, and posttranslational protein modification.
The replication machinery of SARS-CoV-2 interacts with 41 different DS-affinity proteins. Nsp12, an RNAdependent RNA polymerase and the central component of the replication machinery, interacts with the largest number (i.e., 22) of DS-affinity proteins (Fig. 5). Its cofactor Nsp7 interacts with 12 proteins and Nsp8 interacts with only one. The replication machine also includes a helicase (Nsp13), 2 ribonucleases (Nsp14 and Nsp15), 2 RNA-cap methyltransferases (Nsp14, Nsp16), and cofactor Nsp10. Nsp15 interacts with 10 DS-affinity proteins, Nsp16 interacts with 8 proteins, Nsp13 interacts with SRP14 and RDX, Nsp14 interacts with IDE and CCT8, and Nsp10 interacts with PSMA3. Nsp12-interacting DS-affinity proteins are strongly associated with protein folding, particularly prefoldin mediated transfer of substrates to CCT complex and cooperation of prefoldin and CCT in protein folding (Fig. 5). Nsp15-interacting proteins are also associated with prefoldin-mediated substrate transfer to CCT. DS-affinity proteins interacting with other individual viral replication components have no clear biological associations.
Orf3b of SARS-CoV-2 interacts with 12 DS-affinity proteins, including 6 proteasomal proteins, 3 protein disulfide-isomerases, IDE, ST13, and PAFAH1B3 (Fig. 5). Orf3a interacts with 7 proteins, including STIP1 (stress-induced-phosphoprotein 1) and 6 ER proteins (HSPA5, HSP90B1, CNPY2, ERO1L, PRKCSH, and PDIA3). CANPY2 prevents MIR-mediated MRLC ubiquitination and its subsequent proteasomal degradation. ERO1L (or ERO1A) is an oxidoreductase in disulfide bond formation in the ER. PRKCSH (glucosidase II subunit beta) cleaves sequentially the 2 innermost glucose residues from the Glc 2 Man 9 GlcNAc 2 oligosaccharide precursor of immature glycoproteins. Based on the normal functions of their interacting proteins, Orf3a and Orf3b appear to affect host stress response and protein processing in the ER. 10 The S protein of SARS-CoV-2 is found to interact with HSPA5 (GRP78/BiP), PRKCSH, PRS27A (ubiquitin-40S ribosomal protein), MSN, and EZR. EZR and MSN are members of the ezrin-moesin-radixin (ERM) family, and its third member RDX is found to interact with Nsp13 of the virus. Moesin is localized to filopodia and other membranous protrusions that are important for cell-cell recognition, and ERM proteins connect the plasma membranes to the actin-based cytoskeleton. Actin and cytoskeleton proteins have been consistently found to be altered in SARS-CoV-2 infection in our previous studies [1,2], and this finding suggests that ERM proteins facilitate the viral trafficking from host cell membrane to the cytoskeleton. All three ERM proteins are confirmed autoAgs.
Nsp1 is a major virulence factor of coronavirus. COVID-19 patients with autoantibodies are found to have higher levels of antibodies against SARS-CoV-2 Nsp1 protein [9]. Nsp1 has been reported to hijack the host 40S ribosome by inserting its C terminus into the mRNA entry tunnel, which effectively blocks RGIdependent innate immune responses [40]. In this study, we found that Nsp1 interacts with 7 subunits of the translation initiation factor 3 complex (EIF3 A, B, C, E, F, I, L). EIF3 complex binds the 40S ribosome and serves as a scaffold for other initiation factors, auxiliary factors, and mRNA. Hence, our study extends previous reported activities of Nsp1 and shows that Nsp1 engages both the 40S ribosome and EIF3 to manipulate host protein translation.
A few interesting SARS-CoV-2-interacting DS-affinity proteins may provide clues to potential COVID-19 symptoms. PAFAH1B3 is a catalytic unit of the platelet-activating factor acetylhydrolase complex and plays important roles in platelet activation regulation and brain development, and it interacts with Nsp12, Nsp5, and Orf3b. Another subunit, PAFAH1B2, is altered in SARS-CoV-2 infection. Both this and our previous studies [2] have identified PAFAH1B2 and B3 as potential COVID-altered autoAgs, and their roles in COVID coagulopathy merit further investigation. IDE (insulin degrading enzyme) is a ubiquitously expressed metalloprotease that degrades insulin, beta amyloid, and others. IDE interacts with 6 SARS-CoV-2 proteins (Nsp4, Nsp12, Nsp14, Nsp15, Nsp16, and Orf3b). Although its role in COVID remains unknown, IDE has been partially characterized in other viral infections. It is one of the host factors of hepatitis C virus [41], and it degrades HIV-1 p6 Gap protein and regulates virus replication in an Env-dependent manner [42]. In varicella zoster virus infection, the viral gE protein precursor associates with IDE, HSPA5, HSPA8, HSPD1, and PPIA in the ER of infected cells [43].
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. Interestingly, this group of ER proteins is also identified in this study, although we identified PPIB instead of PPIA. Although IDE has not yet formally been described as an autoAg, we have identified IDE in this and another study [2], and its importance for COVID-19 and autoimmunity merits further investigation.

DS-affinity and B-cell-specific IgH-ER complex
Because HS-Sultan cells are derived from B lymphoblasts infected by Epstein-Barr virus, we compared the DS-affinity autoantigen-ome with single-cell mRNA expression profiles of B-cells from 7 patients hospitalized with COVID-19 [23]. We identified 39 DS-affinity proteins that are up-regulated at mRNA level in COVID B-cells, which include 7 heat shock proteins, 6 proteasomal proteins, 4 protein disulfideisomerases, HDGF (heparin binding growth factor), CLIC1, CPNE3, SND1, TALDO1, TCL1A, and others (Fig.   6). These up-regulated proteins are primarily associated with protein processing in the ER and the proteasome. We also identified 21 DS-affinity proteins that are down-regulated in COVID B-cells, including 4 translation elongation factors, 2 translation initiation factors, 2 hnRNPs, 2 aminoacyl-tRNA synthetases, NACA, NAP1L1, and PABPC1. These down-regulated proteins are primarily associated with gene expression (Fig. 6).
In particular, MZB1 (marginal zone B-and B1-cell-specific protein) is found up-regulated in B-cells from 5 of the 7 COVID-19 patients. MZB1 is a B-cell-specific ER-localized protein that is most abundantly expressed in marginal zone B-cells and B1-cells [44]. These cells are also termed innate-like B cells. They differ from follicular B-cells by their attenuated Ca 2+ mobilization, fast antibody secretion, and increased cell adhesion. MZB1 plays important roles in humoral immunity and helps diversify peripheral B-cell functions by regulating calcium stores, antibody secretion, and integrin activation. MZB1 mRNA expression was found increased by >2-fold in B-cells of SLE patients with active disease [45]. High MZB1 gene expression predicted adverse prognosis in chronic lymphocytic leukemia, follicular lymphoma, and diffuse large B-cell lymphoma [46]. High prevalence of MZB1-positive plasma B-cells in tissue fibrosis was found in human lung and skin fibrosis, and MZB1 levels correlated positively with tissue IgG and negatively with diffusion capacity of the lung [47].
Interestingly, in our study of murine pre-B1 lymphoblasts, we also found that DS interacts with the same IgH-associated multiprotein complex in the ER [5]. In addition, we had observed that DS interacts directly with GTF2I in murine pre-B1 cells, and GTF2I is also identified by DS-affinity in human B lymphoblast HS-Sultan cells in this study. GTF2I is a required gene transcription factor at the IgH gene locus. Pre-B1 cells, which express precursor B-cell receptors (preBCRs) that are polyreactive and autoreactive, are a critical check point in the development of mature autoreactive B cells. The Ig heavy chain (IgH) repertoire of autoantibodies is determined at the pre-B stage. Our previous findings from pre-B1 cells suggested that DS is a potential master regulator of IgH at both the gene and protein expression levels, i.e., DS recruits GTFI for IgH gene expression and engages IgH-associated ER complex for autoantibody production. The findings from this study provide further support for a key role of DS in regulating autoantibody production and autoreactive B1-cell development. Furthermore, the finding from B-cells of COVID-19 patients point to a potential significance of autoreactive B1 cells in COVID-induced autoimmunity.

Conclusion
Exploiting the affinity between autoAgs and DS glycosaminoglycan, we identified 362 DS-affinity proteins from EBV-immortalized HS-Sultan cells. 201 of these DS-affinity proteins are already known autoAgs in a wide variety of autoimmune diseases and cancer. Of the 362, 315 DS-affinity proteins are affected by SARS-CoV-2 infection, and 186 COVID-affected DS-affinity proteins are known autoAgs. These COVIDaltered proteins are largely affected by phosphorylation, ubiquitination, or interaction with viral protein components. They are associated with gene expression, mRNA processing, mRNA splicing, translation, protein folding, DNA replication fork, telomerase maintenance, chromosome organization, biosynthesis and catabolism of nucleobase-containing molecules and proteins, vesicles, and nucleocytoplasmic transport. CCT/TriC chaperonin, insulin degrading enzyme, and platelet-activating factor acetylhydrolase are found in the interactomes of multiple viral Nsp and Orf proteins. By integrating DS-affinity autoAgs with multi-omic data from COVID, our study suggests that viral infections can cause significant proteomic alterations, give rise to a diverse pool of autoAgs, and may lead to infection-induced autoimmune diseases. The COVID autoantigen-ome provided in this paper may serve as a molecular map and resource . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. ; https://doi.org/10.1101/2021.04.05.438500 doi: bioRxiv preprint 13 for investigating autoimmune phenomena of SARS-CoV-2 infection and its long-term sequelae.
Understanding immunogenic proteins of COVID may also enhance vaccine target design.

HS-Sultan cell culture
The human B lymphoblast HS-Sultan cell line was obtained from the ATCC (Manassas, VA) and cultured in complete RPMI-1640 medium. The growth medium was supplemented with 10% fetal bovine serum and a penicillin-streptomycin-glutamine mixture (Thermo Fisher). The cells were grown at 37 °C in a CO 2 incubator, and about 300 million cells were harvested for the study.

Protein extraction
Protein extraction was performed as previously described [4]. In brief, HS-Sultan cells were lysed with 50 mM phosphate buffer (pH 7.4) containing the Roche Complete Mini protease inhibitor cocktail and then homogenized on ice with a microprobe sonicator until the turbid mixture turned nearly clear with no visible cells left. The homogenate was centrifuged at 10,000 g at 4 °C for 20 min, and the total protein extract in the supernatant was collected. Protein concentration was measured by absorbance at 280 nm using a NanoDrop UV-Vis spectrometer (Thermo Fisher).

DS-affinity fractionation
The total proteomes extracted from HS-Sultan cells were fractionated in a DS-Sepharose column in a manner similar to previously described [4]. About 40 mg of proteins in 40 ml of 10 mM phosphate buffer . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. ; https://doi.org/10.1101/2021.04.05.438500 doi: bioRxiv preprint 14 (pH 7.4; buffer A) were loaded onto the DS-affinity column at a rate of 1 ml/min. Unbound and weakly bound proteins were removed with 60 ml of buffer A and then 40 ml of 0.2 M NaCl in buffer A. The remaining bound proteins were eluted in steps with 40 ml 0.5 M NaCl and then with 40 ml 1.0 M NaCl in buffer A. Fractions were desalted and concentrated with 5-kDa cut-off Vivaspin centrifugal filters (Sartorius). Fractionated proteins were separated in 1-D SDS-PAGE in 4-12% Bis-Tris gels, and the gel lanes were divided into two or three sections for mass spectrometric sequencing.

Mass spectrometry sequencing
Protein sequencing was performed at the Taplin Biological Mass Spectrometry Facility at Harvard Medical School. Proteins in gels were digested with sequencing-grade trypsin (Promega) at 4 °C for 45 min. Tryptic peptides were separated in a nano-scale C 18 HPLC capillary column and analyzed in an LTQ linear ion-trap mass spectrometer (Thermo Fisher). Peptide sequences and protein identities were assigned by matching the measured fragmentation pattern with proteins or translated nucleotide databases using Sequest. All data were manually inspected. Proteins with ≥2 peptide matches were considered positively identified.

COVID data comparison
DS-affinity proteins were compared with currently available COVID-19 multi-omic data compiled in the Coronascape database (as of 02/22/2021) . These data have been obtained with proteomics, phosphoproteomics, interactome, ubiquitome, and RNA-seq techniques. Up-and down-regulated proteins or genes were identified by comparing cells infected vs. uninfected by SARS-CoV-2 or COVID-19 patients vs. healthy controls. Similarity searches were conducted to identify DS-affinity proteins that are up-and/or down-regulated in viral infection at any omic level.

Protein network analysis
Protein-protein interactions were analyzed by STRING [49]. Interactions include both direct physical interaction and indirect functional associations, which are derived from genomic context predictions, high-throughput lab experiments, co-expression, automated text mining, and previous knowledge in databases. Each interaction is annotated with a confidence score from 0 to 1, with 1 being the highest, indicating the likelihood of an interaction to be true. Pathways and processes enrichment were analyzed with Metascape [17], which utilize various ontology sources such as KEGG Pathway, GO Biological Process, Reactome Gene Sets, Canonical Pathways, CORUM, TRRUST, and DiGenBase. All genes in the genome were used as the enrichment background. Terms with a p-value <0.01, a minimum count of 3, and an . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. ; https://doi.org/10.1101/2021.04.05.438500 doi: bioRxiv preprint 15 enrichment factor (ratio between the observed counts and the counts expected by chance) >1.5 were collected and grouped into clusters based on their membership similarities. The most statistically significant term within a cluster was chosen to represent the cluster.

Autoantigen literature text mining
Every DS-affinity protein identified in this study was searched for specific autoantibodies reported in the PubMed literature. Search keywords included the MeSH keyword "autoantibodies", the protein name or its gene symbol, or alternative names and symbols. Only proteins for which specific autoantibodies are reported in PubMed-listed journal articles were considered "confirmed" autoAgs in this study.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. ; https://doi.org/10.1101/2021.04.05.438500 doi: bioRxiv preprint

Acknowledgements
We thank Jung-hyun Rho for technical assistance with experiments. We thank Ross Tomaino and the Taplin Biological Mass Spectrometry facility of Harvard Medical School for expert service with protein sequencing.

Funding Statement
This work was partially supported by Curandis, the US NIH, and a Cycle for Survival Innovation Grant (to MHR). MHR acknowledges NIH/NCI R21 CA251992 and MSKCC Cancer Center Support Grant P30 CA008748. The funding bodies were not involved in the design of the study and the collection, analysis, and interpretation of data.

Competing interest statement
JYW is the founder and Chief Scientific Officer of Curandis. WZ was supported by the NIH and declares no competing interests. MWR and VBR are volunteers of Curandis. MHR is a member of the Scientific Advisory Boards of Trans-Hit, Proscia, and Universal DX, but these companies have no relation to the study.

Authors' contributions
JYW directed the study and wrote the manuscript. WZ performed part of the experiments. MWR and VBR assisted with data analysis and manuscript preparation. MHR consulted on the study and edited the manuscript. All authors have approved the manuscript.
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made     was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made   CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. ; https://doi.org/10.1101/2021.04.05.438500 doi: bioRxiv preprint Fig. 5. DS-affinity proteins in the SARS-CoV-2 interactomes. Total: marked proteins are involved in protein folding (25 proteins, red), peptide biosynthetic process (17 proteins, green), protein localization (29 proteins, blue), or proteolysis (22 proteins, pink). Orf3b: proteolysis (pink). Orf3a: endoplasmic reticulum (dark purple). Orf9b: nuclear function of prefoldin (amber), AAA+ ATPase domain or P-loop containing nucleoside triphosphate hydrolase (dark green). Nsp15: prefoldin-mediated transfer of substrate to CCT/TriC (yellow), nucleotide binding (dark green). Orf10: protein folding (red), CCT chaperonin (yellow). Orf8: protein folding (red), SRP-dependent cotranslational protein targeting to membrane (blue), CCT chaperonin (yellow). Nsp1: translation initiation (green), CCT chaperonin (yellow). Nsp12: protein folding (red), multi-organism process (aqua), CCT chaperonin (yellow).  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 6, 2021. ; https://doi.org/10.1101/2021.04.05.438500 doi: bioRxiv preprint