SARS-CoV-2 receptor ACE2 and TMPRSS2 are predominantly expressed in a transient secretory cell type in subsegmental bronchial branches

The SARS-CoV-2 pandemic affecting the human respiratory system severely challenges public health and urgently demands for increasing our understanding of COVID-19 pathogenesis, especially host factors facilitating virus infection and replication. SARS-CoV-2 was reported to enter cells via binding to ACE2, followed by its priming by TMPRSS2. Here, we investigate ACE2 and TMPRSS2 expression levels and their distribution across cell types in lung tissue (twelve donors, 39,778 cells) and in cells derived from subsegmental bronchial branches (four donors, 17,521 cells) by single nuclei and single cell RNA sequencing, respectively. While TMPRSS2 is expressed in both tissues, in the subsegmental bronchial branches ACE2 is predominantly expressed in a transient secretory cell type. Interestingly, these transiently differentiating cells show an enrichment for pathways related to RHO GTPase function and viral processes suggesting increased vulnerability for SARS-CoV-2 infection. Our data provide a rich resource for future investigations of COVID-19 infection and pathogenesis.

To quantify gene expression in the lung, single nuclei RNA sequencing was performed on surgical specimens of healthy, non-affected lung tissue from twelve lung adenocarcinoma (LADC) patients, resulting in 39,778 sequenced cell nuclei. All major cell types known to occur in the lung were identified (Fig. 1B, C). Independent of the cell types present in the lung, the median ACE2 levels were below five counts per million (CPM) (Fig. 1D), which given a typical mRNA content of 500,000 mRNA molecules per cell indicate that only about half of all cells were statistically expected to contain even a single ACE2 transcript. The reads per patient and cell type were therefore aggregated into pseudo-bulks and analysis was continued. As expected from prior literature, the AT2 cells showed highest ACE2 expression in the lung both in terms of their CPM values (Fig. 1D, further referred to as ACE2 + cells) and the percentage of positive cells (Supp. Fig. 3). The expression of TMPRSS2 across cell types of the lung (further referred to as TMPRSS2 + cells) was much stronger with a certain specificity for AT2 cells (Fig. 1E), which is in agreement with previous studies.
For the subsegmental bronchial branches, air-liquid-interface (ALI) cultures were grown from primary human bronchial epithelial cells (HBECs) and subjected to single cell RNA sequencing, resulting in 17,451 cells across four healthy donors (further referred to as HBECs). In this dataset, we identified all expected cell types, including recently described and rare cell types such as FOXN4-positive cells and ionocytes ( Fig. 1F, Supp. Fig. 1, 2). Expression levels of ACE2 were comparable but slightly elevated compared to the lung tissue samples, with the strongest expression being Lukassen, Chua, Trefzer, Kahn, Schneider et al. (2020) 6 observed in a subset of secretory cells ('secretory3'; Fig. 1G). While TMPRSS2 was less strongly expressed in HBECs than in lung tissue cells, its expression was still markedly higher than that of ACE2 (Fig. 1H). Interestingly, ACE2 + cells were also most strongly expressing TMPRSS2 (Fig. 1F, H). In the HBECs, a significant enrichment of the number of ACE2 + /TMPRSS2 + double-positive cells was observed (1.6 fold enrichment, p=0.002, Supp. Fig. 4).

Transient secretory cells display a transient cell state between goblet and ciliated cells.
As the 'secretory3' cells in the HBECs showed comparably strong expression of both genes implicated in SARS-CoV-2 entry, with ACE2 expression levels in a similar to higher range than those in AT2 cells, we decided to further investigate this cell type.
As airway epithelial cells are known to continuously renew in a series of differentiation events involving nearly all major HBE cell types in a single trajectory, we mapped the 'secretory3' cells according to their position in this hierarchy using pseudo-time mapping ( Fig. 2A). 'Secretory3' cells were found at an intermediate position between club or goblet cells and ciliated cells ( Fig. 2A), which could be confirmed by shared marker gene expression (Fig. 2B). This placed 'secretory3' cells at a point shortly before terminal differentiation is reached (Fig. 2C, D). Hence, we will refer to these cells as 'transient secretory cells '. In agreement with ongoing differentiation towards an epithelial cell lineage, the cell-

FURIN expressing cells overlap with ACE2 + /TMPRSS2 + and ACE2 + /TMPRSS2cells in lung and in bronchial cells.
We next sought to explore whether we could derive additional evidence for other factors recently suggested to be involved in SARS-CoV-2 host cell entry. Therefore, we investigated the expression of FURIN, as SARS-CoV-2 but not SARS-CoV was reported to have a FURIN cleavage site in its S protein, potentially increasing its priming upon ACE2 receptor binding. In lung tissue, AT2, endothelial, and immune cells were most strongly positive for FURIN expression (Fig. 3A). Across cell types, and FURIN were less strongly expressed in HBECs than in distant cells from surgical lung tissue, their expression was still markedly higher compared to ACE2. In the HBECs, a significant enrichment of the number of ACE2 + /TMPRSS2 + double-positive cells was observed, while ACE2 + /TMPRSS2 + /FURIN + enrichment did not reach significance (Fig. 3D). The latter finding, however, may be due to differences in coexpression or the generally lower number of positive cells reducing statistical power.
Interestingly, our data showed that the additional possibility of priming the SARS-CoV-2 S protein by FURIN would potentially render 25% more cells vulnerable for infection as compared to by exclusively TMPRSS2 S protein priming. Thus, our data suggest that FURIN might increase overall permissiveness of cells in the respiratory tract by potentially equipping more cells with proteolytic activity for SARS-CoV-2 S protein priming after ACE2 receptor binding.

Sex, age and history of smoking in correlation with ACE2 expression
Early epidemiological data on SARS-CoV-2 transmission and spread have suggested It has to be noted, though, that the power of our datasets to detect such changes with lung cells from surgical lung tissue samples from twelve patients (three male, nine female) and HBECs from four patients (three male, one female) is limited. On the individual cell type level, no ACE2 expression differences correlating with age could be observed (Fig. 4E, F), and neither did the cell type composition change notably (Supp. Fig. 6). Although no dependency of ACE2 expression on age, gender and sex was found in single cell populations, we observed interesting trends in the lung for age and gender dependencies when aggregating all reads per sample into a single ACE2 expression value ( Figure 4G, H). Here, we see a trend for increasing overall expression of ACE2 with age for all female lung samples (R 2 =0.40; p=0.065). Since we only have samples from males of younger age available, we cannot study age dependencies for males here. However, when comparing ACE2 expression of all five female and three male samples in the young age group between 40 and 50 years of age, we see a clearly higher overall expression of ACE2 in males compared to females (p=0.048). Note that smoking did not seem to affect ACE2 expression levels in the lung cells within our dataset as all smokers in our study were of about the same age and ACE2 expression correlated with age ( Fig. 4B).
Altogether, we here demonstrate how single cell sequencing studies of samples from anatomically precisely defined parts of the respiratory system can provide new insights into potential host cell vulnerability for SARS-CoV-2 infection.

DISCUSSION
The here presented transcriptome data on single cell level of healthy human lung tissues, including surgical lung specimen and subsegmental bronchial branches, contains valuable information about human host factors for SARS-CoV-2 infection and other related infectious diseases affecting the respiratory tract. This resource certainly comes with limitations (see below). However, we believe the unprecedented Lukassen, Chua, Trefzer, Kahn, Schneider et al. (2020) 9 depth of our analysis on a single cell level will provide a valuable resource for future mechanistic studies and target mining for pulmonary host factors that are involved in facilitating virus entry and replication, ultimately leading to defining genes that are of urgent interest for studying transcriptional changes in COVID-19 patients or the SARS-CoV-2 pathogenicity in general.
This data set comprises 16 individuals and a total of 57,229 cells. Thus, this large single-cell and nuclei expression resource enables the analysis of weakly expressed genes such as ACE2 and identification of rare cell types and the transient secretory cell type, for which our data showed a particularly high level of potential vulnerability for SARS-CoV-2 infection assuming that ACE2 is the receptor that is likely to be crucial for its cell entry. Although we strongly believe that the here presented data is well suited as reference data set to study SARS-CoV-2 infection on the single cell transcriptional level, we are also well aware of potential limitations of our data.
First, the here presented data is derived from individuals that have no infection history with SARS-CoV-2. However, we deem our data clinically meaningful as our patient cohort is representative for those being affected by SARS-CoV-2 associated disease. The patient cohort analyzed here is in line with recently published data on characteristics of COVID-19 patients with regards to age (this study: median age 48 years -lung cohort or 55 years -HBEC cohort; Chen et al., 2020: 55 years;Guan et al., 2020: 47 years) [Chen et al. (2020), Guan et al. (2020] (Guan NEJM, Chen Lancet) and especially comorbidity burden [Chen et al. (2020)] with 50% of the patients being affected by chronic cardiovascular or pulmonary disease or diabetes (this study: 50%;Chen et al., 2020: 51%;Guan et al., 2020: no data available).
However, there was a different prevalence of smoking or history of smoking in our cohort (this study: 25% -lung cohort or 100% HBEC cohort; Guan et al., 2020: 85%) Second, we are combining data of different sections of the lung and use two different RNA sequencing methods on single cell level. While single cell RNA sequencing typically delivers higher quality data, the cells must be intact for processing, as damaged cells present a loss of RNA molecules with mitochondrial reads preferentially retained, introducing a skewed expression profile [Luecken and Theis (2019)]. Adequate sampling of lung specimen for preparing suspensions of intact cells is often challenging. Single nuclei RNA sequencing also reveals transcriptome data on single cell level and can be used on frozen tissue. This method typically provides a more faithful estimate of cell type proportions due to lessened dissociation bias. However, it typically comes with less reads per cell and less genes detected [Bakken et al. (2018)], although this bias seems low in our dataset (Supp. Fig. 7).
The identification of closely related cell populations is mostly concordant in both methods, with single cell RNA sequencing being inferior for detection of lowly expressed genes on a single-cell level [Bakken et al. (2018) Third, the number of samples included here is very much limiting the scope of understanding the pathogenicity of SARS-CoV-2 in the context of different confounding factors such as age, gender and history of smoking. As a consequence of small sample numbers, the significance level for dependency of expression of single genes, most predominantly ACE2, on such confounding factors are at best weak. However, we believe that it is a strength of our data that our clinical samples are well annotated w.r.t. anatomical position and patient characteristics. Thus, our data is in principle suited for testing hypothesis derived from larger epidemiological studies. Clearly, measuring expression levels for specific genes in larger patient cohorts is required to further substantiate the hypothesis derived from our and others' data.
Bearing the above limitations in mind, we investigated ACE2 expression levels in all cell types in the lung and bronchial branches revealing very low expression levels that are nevertheless significantly enriched in AT2 cells in the lung as described previously [Hamming et al. (2004)] and are comparable with the levels in specific cell types in the bronchial branches. As ACE2 is the only currently known SARS-CoV-2 receptor on the host cell surface, we propose that higher expression levels of ACE2 facilitate infection by SARS-CoV-2.
While we did not find any dependency of ACE2 expression on sex, gender and age on the single cell level, we observed a trend for age dependency on ACE2 expression aggregated over all cell types in lung samples from females. Recent data suggest that infection rates are similar in children as in adults, but remains undiagnosed since the clinical picture is often subclinical [Bi et al. (2020)]. Hence, it would be of utmost interest to study both healthy and infected children to understand the transcriptional basis accounting for differences in development of clinical symptoms. Furthermore, analyzing further samples from both males and females, both healthy and infected, of different age groups on the single cell level will lay the foundation for understanding vulnerability of different cell types for SARS-CoV-2 infection in different parts of the respiratory system.
One emergent question is why the human-to-human transmission of SARS-CoV-2 is much higher compared to SARS-CoV or MERS-CoV. Potential explanations comprise i) the binding of SARS-CoV-2 to another, yet unknown receptor on the host cell surface, ii) enhanced cleavage of the SARS-CoV-2 S protein resulting in higher efficiency of the virus' entry into the cell, and iii) additional host factors increasing the virus entry into the cell, e.g. by facilitating membrane fusion. We used our reference map and our above described findings, seeking for additional host factors that might be involved in any of those mechanisms.
Coronaviruses were shown to be able to enter into the host cell via several pathways, including endosomal and non-endosomal entry in the presence of proteases As the involvement of an additional protease during S protein priming might enhance entry of SARS-CoV-2 compared to SARS-CoV, we also investigated co-expression of ACE2, TMPRSS2, and FURIN. Using the here presented reference data set, we were able to show co-expression of ACE2 with TMPRSS2 and/or FURIN in both healthy human lung tissue and HBECs. The potential additional priming of 25% of cells for SARS-CoV-2 infection by FURIN may support the hypothesis for increased human-to-human transmission rates. This hypothesis, however, must be followed up by future studies.  Fig. 5). Taken together, these data offer the interesting hypothesis that both high human-to-human transmission rate of SARS-CoV-2 and the severe COVID-19 cases might be caused by additional cleavage sites, resulting in higher ACE2 binding affinity and/or more efficient membrane fusion.
The most striking finding of this study was the detection of a specific cell type in the bronchial branches expressing ACE2, i.e. the transient secretory cells. Subsequent cell type marker analysis revealed that these cells reflect differentiating cells on the transition from club or goblet to terminally differentiated ciliated cells, Interestingly, SARS-CoV replication was reported to be 100-to 1,000-fold higher via the non-endosomal pathway as compared to the endosomal pathway, with the nonendosomal pathway being dependent on the presence of proteases [Matsuyama et al. (2005)]. Therefore, our data suggest to experimentally investigate whether SARS-CoV-2 is able to enter into the cell also via the non-endosomal pathway, either by the involvement of several proteases or by the ongoing membrane remodeling in the transient secretory cells, which might include RHO GTPases as one prominent pathway specifically enriched in this cell type.
Taken together, we present a rich resource for studying transcriptional regulation of SARS-CoV-2 infection, which will serve as reference dataset for future studies of primary samples of COVID-19 patients and in vitro studies addressing the viral replication cycle. With all limitations in mind, which we discussed above, we demonstrate the potential of this resource for deriving novel and testing existing hypotheses that need to be followed up by independent studies.

DECLARATION OF INTERESTS
Lukassen, Chua, Trefzer, Kahn, Schneider et al. (2020) 15 The authors declare no competing interests.    Whiskers extend to 1.5 times the inter-quartile-range. All individual data points are indicated on the plot.

LEAD CONTACT AND MATERIALS AVAILABILITY
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact Roland Eils (roland.eils@charite.de).
Primary lung and subsegmental bronchial samples are available upon request and upon installment of a material transfer agreement (MTA).
There are restrictions to the availability of the dataset due to potential risk of deidentification of pseudonymized RNA sequencing data. Hence, the raw data will be available under controlled access. Accession numbers, DOIs, or unique identifiers for raw data will be listed here upon submission to the community-endorsed public repository.
Processed data in the form of count tables and metadata tables containing patient ID,

Human lung tissues and bronchial branches
All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki. The Cryoconserved surgical healthy lung tissue from twelve patients was provided by the Lung Biobank Heidelberg.
Sex, age and smoking behavior for every individual is provided in figure 4a and additional information can be found in table 1.

Lung tissues
Tissue was assembled during routine surgical intervention from lung cancer patients.
Briefly, snap frozen healthy lung tissue from lung adenocarcinoma patients was cut into pieces with less than 0.3 cm diameter and single nuclei were isolated at low pH by homogenizing the cells in 1 ml of citric acid-based buffer (Sucrose 0.25 M, Citric Acid 25 mM, Hoechst 33342 1 g/ml) at 4°C using a glass dounce tissue grinder. After one stroke with the "loose" pestle, the tissue was incubated for 5 minutes on ice, further homogenized with 3-5 strokes of the "loose" pestle, followed by another incubation at 4°C for 5 minutes. Nuclei were released by 5 additional strokes with the "tight" pestle and the nuclei solution was filtered through a 35 µm cell strainer. Cell debris was removed via centrifugation at 4°C for 5 minutes at 500x g, the supernatant was removed, followed by nuclei cell pellet resuspension in 700 µl citricacid based buffer and centrifugation at 4°C for 5 minutes at 500x g. After carefully removing the supernatant, the nuclei cell pellet was resuspended in 100 µl cold resuspension buffer (25 mM KCl, 3 mM MgCl 2 , 50 mM Tris-buffer, 400 U RNaseIn, 1 mM DTT, 400 U SuperaseIn (AM2694, Thermo Fisher Scientific), 1g/ml Hoechst (H33342, Thermo Fisher Scientific). Nuclei concentration was determined using the Countess II FL Automated Cell Counter and optimal nuclei concentration was obtained by adding additional cold resuspension buffer, if needed. Subsequently, samples were processed using the 10x Chromium device with the 10x Genomics scRNA-Seq protocol v2 to generate cell and gel bead emulsions, followed by reverse transcription, cDNA amplification, and sequencing library preparation following the Cells were spun at 300x g for 5 minutes and resuspended in cryopreservation medium (10% DMSO, 10% FBS, and 80% Culture Medium (PneumaCult-ALI) and frozen down gradually before shipping on dry-ice.

Single cell RNA sequencing library preparation (ALI cultures)
Cryopreserved cells were thawed at 37°C, spun down at 300xG for 5 minutes, the

Pre-processing and data analysis
CellRanger software version 2.1.1 (10x Genomics) was used for processing of the raw sequencing data and the transcripts were aligned to the 10x reference human genome hg19 1.2.0. Low-quality cells were removed during pre-processing using Seurat version 3.0.0 (https://github.com/satijalab/seurat) based on the following criteria: (a) >200 or, depending on the sample, <6000 -9000 genes (surgical lung tissues)/ <3000 -5000 genes (ALI cultures), (b) <15% mitochondrial reads (Supp. Fig. 7). The remaining data were further processed using Seurat for lognormalization, scaling, merging, clustering, and gene expression analysis.
Afterwards, all control samples were merged using the "FindIntegrationAnchors" and "IntegrateData" functions with their results being merged again and used for downstream analysis. Monocle3 [Cao et al. (2019)] was used infer cellular trajectories and dynamics. Dimensionality reduction, cell clustering, trajectory graph learning, and pseudotime measurement were performed with this tool.

Statistical analyses
Statistical analyses were performed using R and Python 3.7.1 with scipy 0.14. Supp. Figure 6 -Age-dependent cell type composition. Cell type composition in the primary lung and HBEC dataset (age is color-coded).
Supp. Figure 7 - o  s  t  c  e  l  l  e  n  t  r  y  o  f  M  i  d  d  l  e  E  a  s  t  r  e  s  p  i  r  a  t  o  r  y  s  y  n  d  r  o  m  e   c  o  r  o  n  a  v  i  r  u  s  a  f  t  e  r  t  w  o  -s  t  e  p  ,  f  u  r  i  n  -m  e  d  i  a  t  e  d  a  c  t  i  v  a  t  i  o  n  o  f  t  h  e  s  p  i  k  e  p  r  o  t  e  i  n  .  P  r  o  c  N  a  t  l  A  c  a  d  S  c  i   U  S  A   1  1  1 ,