Data-driven analysis of COVID-19 reveals specific severity patterns distinct from the temporal immune response

Key immune signatures of SARS-CoV-2 infection may associate with either adverse immune reactions (severity) or simply an ongoing anti-viral response (temporality); how immune signatures contribute to severe manifestations and/or temporal progression of disease and whether longer disease duration correlates with severity remain unknown. Patient blood was comprehensively immunophenotyped via mass cytometry and multiplex cytokine arrays, leading to the identification of 327 basic subsets that were further stratified into more than 5000 immunotypes and correlated with 28 plasma cytokines. Low-density neutrophil abundance was closely correlated with hepatocyte growth factor levels, which in turn correlated with disease severity. Deep analysis also revealed additional players, namely conventional type 2 dendritic cells, natural killer T cells, plasmablasts and CD16+ monocytes, that can influence COVID-19 severity independent of temporal progression. Herein, we provide interactive network analysis and data visualization tools to facilitate data mining and hypothesis generation for elucidating COVID-19 pathogenesis.


Introduction 70
The pandemic coronavirus disease 2019 (  is caused by the severe acute 71 respiratory syndrome coronavirus 2 (SARS- . As of 25 th January 2021, there are 100 72 million cases with more than 2 million deaths worldwide . While the 73 immunogenic responses in  vaccinated individuals remain largely unknown, the 74 majority of the  patients are asymptomatic or have mild flu-like symptoms, some 75 develop severe diseases, which may lead to death from respiratory failure or even multi- Although many studies have independently identified different immunotypes and 85 cytokines associated with SARS-CoV-2 infection and severe , their interplay is still 86 largely unknown for distinguishing active infection, severity and post-infection aberrations. In 87 this work, we analyzed the blood samples of 77 hospitalized patients and 10 healthy controls. 88 Also, to obtain a more full-bodied approach, we have compiled the data and provided a 89 comprehensive COVID-19 resource, which integrates high dimensional mass cytometry and 90 bead-based cytokine/chemokine arrays. Network analyses between the key immune subsets 91 and the associated cytokine levels in the plasma were used to correlate clinical states and 92 disease trajectory of . One of the key findings is the identification of a link between 93 hepatocyte growth factor (HGF) and low-density neutrophils (LD Neu). While both factors are 94 closely associated with disease severity, it represents only the tip of the iceberg; the network 95 of underlying pathways is complex and involves many players. By sharing these data 96 together with easy-to-use visualization tools, we provide a rich resource to study the 97 manifestation of  as well as the particulars of the SARS-CoV-2 induced immune 98 response. 99 by plasmablasts, C./Int. Mono and LD Neu. Further, there was also a sharp decrease in 153 numbers, which may indicate the mobilization of subsets such as NC. Mono, T regulatory 154 (Treg), T follicular helper (T FH ) and CD4 + T cells to the lung mucosa as front-line responses. 155 Their frequencies increased later during convalescence ( Figures 2B and 3A). We also 156 observed a loss in blood MAIT cell frequency during active infection that was restored to 157 healthy levels during convalescence ( Figure 3A), which is in line with a previous study 158 suggesting active recruitment of MAIT to the lung mucosa (Parrot et al., 2020). Also the 159 robust increase in the frequency of circulating plasmablasts during the active phase is 160 consistent with previous single-cell analysis of patients with severe COVID-19 (Wilk et al., 161 2020). In contrast, non-class-switched memory (NSM), class-switched memory (CSM) and 162 IgM + memory B cells showed decreased frequencies, which increased during convalescence 163 but did not recover to healthy levels. 164

Activation and differentiation of immune cells during the course of the SARS-CoV-2 infection 165
were indicated by changes in the expression levels of surface markers ( Figure 3A, right). 166 The profiles of more than 5000 immunotypes derived from the 38 basic immune subsets can 167 be assessed by an interactive online viewer (see Materials and Methods: Data and Code 168 Availability and Table S2). Although expression of functional markers was highly 169 heterogeneous across immune subsets and disease phases, we observed some notable 170 trends. CD38 was upregulated across diverse immune subsets during active infection, 171 indicative of widespread cell activation in response to viral infection. CD169, an adhesin that 172 binds sialic acid, was also selectively upregulated in all monocyte subsets during the early 173 but not late active phase. Many T and B cell subsets -Treg TEM, NSM and CSM cells -174 downregulated the lymph node homing molecule CCR6 over the course of disease, primarily 175 during active infection, which may impede homing to inflamed tissues and development of 176 germinal center responses ( Figure S1). A range of cell subsets exhibited elevated levels of 177 CD57 from early active infection to early convalescence, indicative of immune senescence 178 ( Figure 3A). Finally, CD16 (FcRIII) was upregulated on MAIT cells in late active infection 179 and on LD Neu, all monocyte subsets and CD16 + NK cells during convalescence ( Figure 3A). 180 Based on the observed phenotype variations, we defined and curated 327 partly 181 overlapping immunotypes and compared the frequencies of these subsets in each phase of 182 COVID-19 ( Figure 3B and Table S3). High levels of CD169 + C. Mono were observed 183 exclusively during early active infection. The same also applied for CD38 high plasmacytoid 184 dendritic cell (pDC) subsets and IgA + plasmablasts, followed by IgAplasmablasts. Notably, 185 four subsets of monocytes, namely CD141 + HLADR -C. Mono, CD56 + C. Mono, CD45RA - Int. 186 Mono and CD56 + Int. Mono peaked during late active SARS-CoV-2 infection and may 187 portend recovery ( Figure 3B). The increase in the LD Neu population from early to late active 188 infection is reflected in both its PD-L1 + and CD5 + subsets. The active-phase lymphopenia 189 affected virtually all CD8 + central (CD8 TCM) and effector memory T subsets (CD8 TEM) 190 ( Figure 3B). Significant losses of CD56 + CD8 + MAIT, CCR6 + CSM, HLA-DR + CD38 -CD4 + T, 191 CD57 -CD8 + and NKT subsets were also observed during active infection ( Figure 3B). Next,192 in convalescence, a frequency increase persisting into late phase was observed for T cells 193 (such as CD57 + CD4 + T, HLA-DR + CD38 + CD8 + T, CD38 + /ICOS + T FH defined as circulatory 194 CXCR5 + , Treg TCM/TEM), NC. Mono (IgA + , CD45ROand CD16 + CD169 + ) and CCR6 - CSM 195 above healthy levels. Also in this phase, the loss of CD5conventional type 2 DC (cDC2), 196 V2 TCM, V2 TEM, B cells memory CXCR5 + , CCR6 + CSM and CD5transitional B cells 197 persisted into late convalescence below healthy levels ( Figure 3B). These late- convalescent 198 immunotypes, which do not recover to healthy levels, may represent post-infection 199

Alterations of immunotypes associated with disease severity 201
Depending on the treatment given, active and convalescent patients were stratified into 'mild ' 202 (symptomatic), 'moderate' (symptomatic with suppl. O 2 ) and 'severe' (suppl. O 2 and ICU 203 admission). Thus we obtained groups I, II and III denoting active mild, active moderate, and 204 active severe, respectively and groups IV (severe), V (moderate), and VI (mild) for patients 205 in the convalescent phase ( Figure 1A). The global changes in the cell subset composition 206 are particularly evident in UMAP plots colored according to the fold change in reference to 207 healthy controls ( Figure 3A). The most striking association was observed for LD Neu. While 208 a minor increase was observed in group I (active, mild), the numbers increased dramatically 209 in group II (active, moderate) and peaked in group III (active, severe). The numbers 210 persisted in group IV (conv., severe) and gradually declined from group V (conv., moderate) 211 to group VI (conv., mild) ( Figures 4A, 4B and S2A). This was also reflected in the increased 212 LD Neu-to-lymphocyte ratios compared with group VI mild convalescent and healthy ( Figure  213 S3A). The expansion of LD Neu was due to CD16 + neutrophils and CD16 high neutrophils. Two other immune cell subsets, which closely associated with severe  are 217

V2 T and NKT cells (Figures 4B and S2A). Their frequencies sharply declined in group 218
II/III/IV patients. Further analysis revealed that the loss of V2 T is likely due to V2 TCM 219 and V2 TEM, which remained significantly reduced in frequency in groups IV/V compared 220 with healthy ( Figures 4B, 5A and S4A). Similarly for NKT sub-populations, the CD57 -NKT 221 cell but not CD57 + NKT cell subset was reduced in frequency in groups II/III/IV ( Figure S4D). 222 The loss of the CD57 -NKT cells included the CD4 + , CD8 + and the double-negative (DN) 223 NKT sub-populations. Although reduced numbers were also detected for dendritic cell 224 subsets, the frequencies of pDC and cDC2 were only marginally suppressed across groups 225 Figures 4B, S4B and S4C). For plasmablasts, a subtle increase in frequency was 226 observed in the active phase groups I to III but waned during convalescence ( Figures 4A, 4B, 227 S5A, and S5B). Notably, IgAplasmablasts showed a higher increase compared with its IgA + 228 counterpart (6.53-fold vs 2.89-fold compared with HD) in group III ( Figures 4B and S5B). 229 Among the T cells, the strongest association with severity was observed for 230 HLADR + CD38 + CD8 + T cells. Unlike the frequencies of MAIT and NK cell subsets across 231 groups I/II/III and groups IV/V/VI, which were largely invariant across groups and thus did not 232 associate with severity ( Figures S6A and S6B). The numbers of hyperactivated 233 HLADR + CD38 + CD8 + T cells peaked in severity groups III/IV, whereby the highest frequency 234 was detected in group IV during convalescence ( Figures 4B and S7A). Also, a number of 235 emerging studies suggested a role of inflammatory monocytes in the pathogenesis of 236 . In line with these results, although our data also showed elevated numbers of C.  The association of the frequencies of the 38 basic immune subsets with the six 241 severity groups as well as the associated surface marker expression is summarized in 242 Figure S2. Their phenotypes can be assessed by using the online viewer (see Materials and 243 Methods: Data and Code Availability). Figure 5A shows the subsets out of the 327 common 244 immunotypes, which exhibited the most significant frequency changes with regard to the six 245 severity groups. Based on the frequency distribution, they were divided into five different 246 clusters ( Figure 5A, left). Immune subsets in the first cluster positively associated with mild 247 infection such as CD161 + NKT and CD5 + cDC2, which only increased in group I and may 248 thus play a potential protective role ( Figure 5A). The second cluster is positively associated 249 with active SARS-CoV-2 infection across mild to severe groups I/II/II. Besides the 250 plasmablasts, CD169 + /CD38 + C. Mono and CD86 + /CD45RA -/CD38 + Int. Mono but not total 251 monocytes remained elevated in groups II/III. Also, like CD169 + C. Mono, both 252 CD38 high CD45RA + pDC and HLA-DR + CD56 DIM NK peaked in group I and remained high in 253 groups II/III above healthy levels. Notably, HLA-DR Low CD141 + C. Mono, which has been 254 reported to be over-represented in ICU patients did not show a pronounced association with 255 the most severe group III but was rather increased throughout the active phase (Silvin et al., 256 2020). This suggests that the aforementioned subset is unlikely to be a marker for severity. 257 The third cluster is positively associated with severe . In this cluster, CXCR5 + B 258 cells memory subset was only increased in more severe groups II/III but various LD Neu 259 analytes using either Luminex or high-sensitivity Quanterix bead arrays. Of these, thirteen 287 cytokines showed statistically significant associations with the six severity groups (adjusted p 288 < 0.05) ( Figure 6A). A comparison between the association with severity and the timing of 289 the immune response revealed that IFN, while abundantly found in the early active phase, 290 had very little prognostic value with regard to severity ( Figure 6B). While the levels of IL6 291 seem to better match disease severity, this applied only for the active phase. Also, IL6 was 292 mostly detected during the early active phase. The association with severity however 293 substantially improved for IP10, TNF, HGF and VEGF-A. Especially the latter two were 294 detected in similar extent during both the active and convalescent phases. 295 To further identify the cytokine/immune cell associations during SARS-CoV-2 296 infection, we first correlated the cytokine levels with the frequencies of the 327 immunotypes. 297 The various plots for these correlations can be assessed in an interactive online viewer (see 298 Materials and Methods: Data and Code Availability). As the immune responses in different 299 phases of infection are vastly different, we carried out separate correlations for the active 300 and convalescent phases ( Figure 6C,  and HGF was positively associated with LD Neu but negatively with NKT. 306 During convalescence, a greater number of cytokines was associated with various 307 immunotypes, albeit the association was typically weaker as compared to the active phase 308 ( Figure 6C, bottom). VEGF-A was strongly associated with HLA-DR + CD38 + CD8 + T cells but 309 negatively with CD8 + CD56 -MAIT cells. Interestingly, IgA + , CD5 + and PD-L1 + LD Neu 310 positively associated with TNF, which is known to promote neutrophil degranulation (Cross 311 et al., 2008;Salamone et al., 2001). Additionally, PIGF1 and IL17A positively associated with 312 CD38 + NKT and CXCR5 -B cells, respectively, while negative associations of both 313 proinflammatory TNF and HGF with V2 T cells were observed, as well as negative 314 associations of both IL6 and HGF with CD8 + CD56 - MAIT. 315 As cytokines can also directly influence the surface expression of biomarkers, we 316 carried out a similar correlation for the surface molecules detected during the active phase 317 on the 327 immunotypes ( Figure 6D). While there were fewer cytokines found significant, 318 they often correlated with the same surface marker on a number of different cell subsets. 319 IFN was positively associated with upregulated CD169 + on both C. and Int. Mono. The 320 same cytokine also triggered the upregulation of CD38 on various NC. Mono as well as 321 Mono

Integrated network analysis of immune subsets and plasma cytokines 328
Current data analysis so far has revealed a number of subsets and cytokines clearly linked 329 to COVID-19 but the interaction of these components as well as the underlying pathways 330 appear to be very complex and under-appreciated. To obtain a comprehensive overview of 331 the interplay of these key players, we performed Bayesian network analysis using the 332 complete sets of the mass cytometry data of the 327 cell subsets as well as the 333 corresponding bead array measurements of the 13 cytokines ( Figure 7 and Table S3). In 334 order to separate the immune signatures of COVID-19 severity from those involved in anti-335 viral immune response, the fluctuations in these datasets were correlated independently with 336 regard to the time point of sampling during the SARS-CoV-2 infection ( Figures 7A and 7B) 337 and the severity score assigned to the respective patient ( Figures 7C and 7D). Interactive 338 viewer for each of the networks is available as online resource (see Materials and Methods: 339

Data and Code Availability) 340
For the timing of the immune response in the active phase of SARS-CoV-2 infection, 341 the fold change of subset frequency and cytokine levels was determined independently for 342 the median day 9 PIO ("early active"; Figure 7A) and median day 24 PIO ("late active"; 343 Figure 7B). In the early active phase, the strongest correlation was observed for IFN 344 ( Figure 7A). The increase in levels of IFN inversely correlated with a general decline in the 345 number of NC. Mono including the CD141 + CD11b -, CD141 -HLA-DR1 -, CD45ROand 346 CD16 + CD11bsubsets. For these cells, the absolute fold change was between 5 to 8, and 347 the most pronounced reduction observed during the early active phase. The most substantial 348 increase was detected for CD38 high CD45RA + pDC and subsets of C. Mono expressing 349 CD169, a marker directly induced by IFN (compare Figure 6D). Other subsets positively 350 associated with the early phase of infection included several subsets of plasmablasts, 351 activated B memory and V2 T cells. Of these, IgAplasmabasts positively correlated with 352 IL6, which is also significantly upregulated in the early phase. The increase in plasma IL6 353 was also associated with a loss of CD8 + T cells, MAIT and cDC2. In addition, elevated IP10 354 was a node associated with a depletion of MAIT and certain B cell subsets (NSM 355 CD27 + CD38and NSM CD27 -CD38 + ). However, we did not find any nodal association with 356 V2 T cells, C. Mono and pDC populations to any of the cytokines, while SCF and IL1RA 357 were nonetheless early responders in the blood. 358 During the late active phase of SARS-CoV-2 infection, the IFN levels dominating 359 the early phase strongly subsided ( Figure 7B). Although the correlation with NC. Mono 360 nearly diminished, Int. and C. Mono strongly expanded in this phase, in particularly for the 361 CD169 + subsets with a 8-fold increase in frequency. Next, the correlation with IL6 levels 362 maintained at the late active phase but also two other cytokines, VEGF-A and TNF, begun 363 to show multiple cellular associations corresponding to significant losses of  T, ILC2 and 364 MAIT cells, while IL6 was a node associated with late increase in LD Neu and Int. Mono,365 potentially linking these cytokines to the lymphopenia, neutrophilia and monocytosis, which 366 are characteristic of SARS-CoV-2 infection. 367 A strikingly different picture emerged when the network analysis was carried out in 368 reference to the severity of COVID-19 manifestation. Due to the different nature of the 369 immune response during active infection and convalescence, two independent networks 370 were generated for severity groups I to III ("active"; Figure 7C) and groups IV to VI ("conv."; 371 Figure 7D). The degree of association was determined by a linear correlation of the subset 372 frequencies and cytokine levels in these groups and is expressed by their rho value. Notably 373 during the active phase, the strongest positive correlation was observed for LD Neu and its 374 CD38 + , PD-L1 + , IgA + and CD5 + subsets ( Figure 7C). With the exception of the latter, all of 375 them also positively correlated with HGF, which was also directly associated with disease 376 severity. While an increase in HGF correlated with an increased frequency of LD Neu 377 subsets, it was also associated with a loss of NKT cells, namely of the CD57subsets. 378 Although their frequencies were inversely associated with disease severity, this does not 379 necessarily indicate a beneficial effect, as the disappearance of these subsets from the 380 blood may likely be a consequence of their redistribution towards the inflamed tissue. Other 381 subsets, which positively associated with severity are IgAplasmablasts and CD16 + CD11b -382 C. Mono, possibly pointing to the role of IgG in the COVID-19 pathogenesis. Moreover, 383 plasmablast frequencies directly correlated with IL6 levels, which were also inversely 384 associated with cDC2. Similar to pDC and V2 T cells, their frequencies in the blood also 385 inversely correlated with disease severity. 386 The direct association of the LD Neu subsets with severity was also observed during 387 convalescence, and no significant negative correlation was detected for any of the immune 388 subsets ( Figure 7D). Positive correlations were found for IgA + NC. Mono, several CD8 + TEM 389 subsets, as well as CD38 + , CD5 + , IgA + and PD-L1 + LD Neu. Of the cytokines, VEGF-A, HGF 390 and TNF correlated directly with severity, while fewer interactions between cytokines and 391 cell subsets were detected. At least during convalescence, the increase in TNF but not 392 HGF was positively associated with IgA + LD Neu and PD-L1 + LD Neu subsets (Cross et al., 393 2008). Additionally, elevated VEGF-A levels strongly associated with enriched HLA-394 DR + CD38 + CD8 + T cells (Voron et al., 2015), which are also directly associated with severity. 395 While especially the early response against SARS-CoV-2 seems to be dominated by  and IL6-driven immune responses ( Figures 7A and 7B), network analysis did not provide any 397 evidence that the pathways and cellular components activated by these cytokines have a 398 major impact on the severity of the disease in either the active phase of infection ( Figure 7C) 399 or during convalescence ( Figure 7D).  subset was assessed by comparing their relative frequencies in these phases, and their 415 contribution to severity was quantified by the rho 2 parameter of the disease score correlation 416 ( Figure S8). Based on this, we propose the following immune progression: infection with 417 SARS-CoV-2 leads to early IFN-α production, as evidenced by an early upregulation of 418 CD169 on monocytes. This drives the disappearance of NC. Mono from the blood, which 419 together with V2 T, cDC2, pDC, NKT, CD8 + MAIT and CD8 + T cells appear to be recruited 420 to the inflamed tissues. In parallel, during this early active phase (median days PIO: 9) there 421 is also an expansion of activated memory B cells and importantly, antibody-producing 422 plasmablasts. The plasmablast frequency remains high during the late active phase, a 423 process associated with elevated levels of the proinflammatory cytokines IL6, IP10 and 424

TNF. 425
The late active phase (median days PIO: 20) is also characterized by an expansion 426 of C. Mono, which likely differentiates into Int. and NC. Mono. The relative fraction of Int. 427 Mono remained high during convalescence, above healthy controls even at the late 428 convalescent phase (median days PIO: 39). In comparison, NC. Mono frequency was the 429 highest during convalescence ( Figure 3A). The same applies for CD8 + TEM cells, in 430 particular for the HLADR + CD38 + subset, which has been found to be enriched in lungs of 431 deceased COVID-19 patients (Xu et al., 2020). Notably, major changes in the frequency of 432 NK subsets were only observed during convalescence, suggestive of a minor role during the 433 early phase of infection. In summary, while the cellular and molecular interactions underlying  are 499 very complex, we have identified some new candidates for potential treatment interventions. 500 Here, the HGF/LD Neu axis may represent a promising new lead for direct interventions, and 501 future studies have to show if HGF is actually a better pharmacological target than IFN and 502 IL6. The trials with IL6 inhibitors have already failed and there is acute demand for effective 503 treatments. In support of this hunt, this study generates a database that could be used as a 504 crucial resource to evaluate the pathways and to validate, identify or exclude candidate 505 targets that could help to control the ongoing pandemic. 506 507

Study design, sample size and participants 509
For this study, 77 COVID-19 patients and 10 healthy donors were recruited. Enrollment of 510 COVID-19 patients was via PROTECT, a Singapore COVID-19 cohort study among seven 511 public health institutions. Healthy individuals were recruited under a Singapore Immunology 512 Network study entitled, "Study of blood cell subsets and their products in models of infection, 513 inflammation and immune regulation". Both studies had received prior approval from their 514 respective institutional review boards (IRBs). All individuals involved in this study were over 515 the age of 21, comprising 66 males and 21 females. Additional demographic details can be 516 found in Table S1.  Internal control samples were included in each plate to remove any potential plate effects. 581 Readouts of these samples were then used to normalize the assayed plates. A correction 582 factor was obtained from the median concentration values observed across the multiple 583 assay plates and this correction factor was then used to normalize all the samples. The Severity based clinical end points were defined for active and convalescence phase samples 607 separately. Three severity groups were defined for each phase consisting of symptomatic 608 patients, patients requiring oxygen supplementation and patients requiring oxygen 609 supplementation and awarded into intensive care unit as shown in Figure 1A. 610 Mass cytometry and cytokine measurements were associated to the clinical end points (time 611 based as well as severity based) using Kruskal-Wallis tests followed by Dunn's post hoc 612 tests. Correlations between mass cytometry and cytokine measurements were done using 613 Spearman Rank correlations. In the event that multiple samples from the same patient was 614 available for same time period, the earliest of the samples were used for analyses to ensure 615 that all samples used in the analyses are distinct. Multiple testing correction was done using 616 the method of Benjamini and Hochberg. P values less than 0.05 were deemed to be 617 significant. All statistical tests were two-sided (when appropriate) unless otherwise indicated. 618 Statistical analyses were done using the R statistical language version 3. 6 Overviews of the mass cytometry immune cell subpopulations were generated using UMAP 623 in R version 3.6.2 using the uwot package. Heat maps were generated in R version 3.6.2 624 using the CompexHeatmap package. Graphs of the significant associations were generated 625 in R version 3.6.2 using the iGraph package and visualized in Cytoscape version 3. 8 The mass cytometry, cytokine and clinical data is available as an Excel file at 641 https://www.dropbox.com/s/yd2spn3lholhuv3/all_cytof_multimodal_data_paper.xlsx?dl=1 642 An interactive viewer of interaction network in Figure 7A  Timelines for individual                      (2) cDC2 (12) B IgD+CD27+ (1) VD2 (4) cDC2 CD5-(7) NKT CD57-CD8+     I  III  VI  HD  II  IV  V   1  2  3  4  5  6 f r e s h h e a l t h y