A spatial multi-omics atlas of the human lung reveals a novel immune cell survival niche

Multiple distinct cell types of the human lung and airways have been defined by single cell RNA sequencing (scRNAseq). Here we present a multi-omics spatial lung atlas to define novel cell types which we map back into the macro- and micro-anatomical tissue context to define functional tissue microenvironments. Firstly, we have generated single cell and nuclei RNA sequencing, VDJ-sequencing and Visium Spatial Transcriptomics data sets from 5 different locations of the human lung and airways. Secondly, we define additional cell types/states, as well as spatially map novel and known human airway cell types, such as adult lung chondrocytes, submucosal gland (SMG) duct cells, distinct pericyte and smooth muscle subtypes, immune-recruiting fibroblasts, peribronchial and perichondrial fibroblasts, peripheral nerve associated fibroblasts and Schwann cells. Finally, we define a survival niche for IgA-secreting plasma cells at the SMG, comprising the newly defined epithelial SMG-Duct cells, and B and T lineage immune cells. Using our transcriptomic data for cell-cell interaction analysis, we propose a signalling circuit that establishes and supports this niche. Overall, we provide a transcriptional and spatial lung atlas with multiple novel cell types that allows for the study of specific tissue microenvironments such as the newly defined gland-associated lymphoid niche (GALN).


Summary
Multiple distinct cell types of the human lung and airways have been defined by single cell RNA sequencing (scRNAseq). Here we present a multi-omics spatial lung atlas to define novel cell types which we map back into the macro-and micro-anatomical tissue context to define functional tissue microenvironments. Firstly, we have generated single cell and nuclei RNA sequencing, VDJ-sequencing and Visium Spatial Transcriptomics data sets from 5 different locations of the human lung and airways. Secondly, we define additional cell types/states, as well as spatially map novel and known human airway cell types, such as adult lung chondrocytes, submucosal gland (SMG) duct cells, distinct pericyte and smooth muscle subtypes, immune-recruiting fibroblasts, peribronchial and perichondrial fibroblasts, peripheral nerve associated fibroblasts and Schwann cells. Finally, we define a survival niche for IgA-secreting plasma cells at the SMG, comprising the newly defined epithelial SMG-Duct cells, and B and T lineage immune cells. Using our transcriptomic data for cell-cell interaction analysis, we propose a signalling circuit that establishes and supports this niche. Overall, we provide a transcriptional and spatial lung atlas with multiple novel cell types that allows for the study of specific tissue microenvironments such as the newly defined gland-associated lymphoid niche (GALN).
10 key words: Human lung cell atlas Single cell RNA-seq Single nuclei RNA-seq Visium B cell survival niche IgA plasma cell Nerve associated fibroblasts Schwann cells Non-myelinating Schwann cells pericyte Introduction Lung disease is the third-largest cause of mortality worldwide (Angelidis et al. 2019)), making a better understanding of the cell types that define lung function imperative. Histological definition of the lungs identified around 45 different cell types (Gehr, Bachofen, and Weibel 1978;Ross and Pawlina 2006), ranging from well-characterised lung epithelial cells to rare neuroendocrine and tuft cells and varying by location from the trachea to the small airways and alveoli (Boers, Ambergen, and Thunnissen 1998;Travaglini et al. 2020). In addition to its respiratory function, the lung has an important barrier function. Whilst other mucosal barrier tissues such as the gut and nasopharynx orchestrate adaptive immunity through well defined mucosa-associated lymphoid tissue (MALT), such secondary lymphoid structures have not been reported in the healthy human lung (Kato et al. 2013).
However, unbiased spatial mapping of cell types back into the tissue context is still largely lacking, and enzymatic dissociation procedures can lead to unrepresentative cell type retrieval. To overcome these limitations, we combined single cell and single nuclei RNA sequencing with Visium Spatial Transcriptomics (ST) to create a lung cell atlas of 129,340 cells, 63,768 nuclei and 16 tissue sections from human trachea, bronchi and parenchyma. We describe the transcriptional signatures of 77 cell types and states, including for the first time that of human airway chondrocytes, two types of Schwann cells, nerve-associated cells, submucosal gland (SMG) duct cells, and multiple perivascular, fibroblast and macrophage subsets. Our deep tissue profiling coupled with spatial gene expression profiling has allowed us to identify a new immune niche for IgA-secreting plasma cells at the SMG of the airways. We predict cell-cell interactions across endothelial, epithelial, stromal and immune cells that help to establish this niche. IgA secretion plays a critical role in preventing respiratory infections, and hence our findings are relevant for a deeper understanding of a wide range of respiratory infectious diseases, including COVID-19 (Sterlin et al. 2021).

Results and Discussion
A multi-omic transcriptional atlas of the human lung and airway with spatial resolution To generate a comprehensive representation of the cell types in human lung and airway we applied single cell and nuclei RNAseq (scRNAseq, snRNAseq), VDJ sequencing and spatial transcriptomics to deep tissue samples (Figure 1a, Tables S1-3). Tissue pieces (1-2 cm 3 ) from healthy donors were collected from the trachea, bronchi at the 1st or 2nd generation airway (Bronchi 2-3), bronchi at the 3rd to 4th generation airway (Bronchi 4), parenchyma from the upper left lobe (UpperPar) and the lower left lobe (LowerPar) to capture deep structures such as cartilage, muscle and the submucosal glands in the airways ( Figure S1a).
In total 193,108 good quality transcriptomes were annotated into transcriptionally distinct broad cell type groups: epithelial (secreting, basal, alveolar type 1 and 2 -AT1 & AT2pneumocytes, ciliated, and SMG cells), immune (lymphoid, myeloid, B-cells, plasma, mast cells), erythrocytes, endothelial (vascular-VE, lymphatic -LE) and stromal cells (smooth muscle, fibroblasts, mesothelial and chondrocytes) according to widely accepted marker genes (Figure 1b, Figure S1b). We combined the analysis of suspension cells and nuclei with Visium Spatial Transcriptomics (ST) analysis to map the identified cell types back to tissue structures on 16 tissue sections from 5 locations ( Figure S2) which were manually annotated into 13 regions (Table S4). As expected, we mapped ciliated epithelial cells to the lumen of the airway surrounded by basal cells, and AT1 and AT2 to lung parenchyma (Figure 1c,d), using the Cell2location (Kleshchevnikov et al., n.d.) algorithm. In order to examinethe benefits of different protocols and sampling locations, we used a Poisson linear mixed model to assess the relative effects of these variables on cell type composition (see Methods) ( Figure 1e). As expected, different enzymatic treatments enriched for specific cell type groups but had little effect, less than 1% of the total variance, on gene expression (Figure S1 h).
As a demonstration of the depth and novelty of our atlas, we transcriptionally defined chondrocytes for the first time in human lungs (ACAN, CHAD, COL9A3, HAPLN1 and CYTL1) (Figure S1 c, d) and mapped these cells to the cartilage with Visium ST (Figure 1 c, d). Chondrocytes were mostly released using single nuclei sequencing from trachea (Figure 1 e, Figure S1 e-g), demonstrating the utility of our multi-omics, multi-location atlasing of the human lung.
Overall, we generated a large scRNAseq and snRNAseq dataset with wide representation of distinct cell lineages and matching spatial gene expression data, allowing further exploration of cellular heterogeneity and possible cell-cell interactions to better understand lung function.
Identification of rare fibroblasts with immune recruiting properties.
The sequential clustering of lung and airway fibroblasts identified 11 cell types with distinct marker genes (Figure 2a, Figure S3 a, b). We annotate previously described myofibroblasts, mesothelial, adventitial and alveolar fibroblasts Xia et al. 2014;Gomez et al. 2016;Bauer et al. 2015;Kanamori-Katayama et al. 2011), as well as multiple novel and less studied cell types that were not predicted with existing annotations (Figure S3 c,d). While most of the cell types were isolated from all locations and protocols, there were clear differences in cell type proportions (Figure 2b, Figure S3b). As expected, the adventitial fibroblasts (Fibro-adv) were enriched in the airway samples, while the alveolar fibroblasts (Fibro-alv) were depleted. We further observed a rare cell type recovered only from the single cells, but not from single nuclei. This rare type expressed the chemokines CCL19 and CCL21, also highly expressed in fibroblast reticular cells responsible for T-cell positioning in the lymph nodes, and GREM1 (Kapoor et al. 2021), CXCL13 and FDCSP, a follicular dendritic cell (FDC) marker responsible for B-cell positioning (Wang et al. 2011;Marshall et al. 2002) (Figure 2c). These cells mapped to an area of immune cell infiltration on the bronchus with Visium ST, as well as with smFISH methods (Figure 2d, Figure S4 a, b) and were therefore annotated as immune recruiting fibroblasts (IR-fibro). Immune infiltrates were only found in a proportion of the donors and sections, but in keeping with the expected frequency in donors with and without smoking history (Elliot et al. 2004). IR-fibros display some characteristics of fibroblasts from secondary and tertiary lymphoid structures, and this similarity is further supported by our finding that the gene signature of germinal center fibroblast cell types from the Peyer's Patches of human gut (R. Elmentaite et al. 2021) also map to the area of the immune infiltrate on lung Visium ST sections ( Figure S4b). In conclusion we describe an IR-fibro population with a likely role in immune cell recruitment in the healthy bronchus.
Perichondrial and peribronchial fibroblasts reveal disease associations. We next examined two fibroblast populations enriched in trachea: peribronchial (PB-fibro) and perichondrial (PC-fibro) fibroblasts ( Figure 2b). For the first time we provide the human transcriptome for PB-fibro with COL15A1 and ENTPD1 markers and localisation around the airway epithelium (Figure 2e, Figure S5a). Functional GWAS analysis, that quantifies systematic associations between cell-type specific gene expression and disease-associated SNPs (R. Elmentaite et al. 2021), demonstrated that PB-fibros are linked to chronic obstructive pulmonary disease (COPD). PB-fibro also express LGR5, as reported for mouse (Tsukui et al. 2020;J.-H. Lee et al. 2017), which suggests potential for analogous mesenchymal stem cell function in humans ( Figure S5b). A COPD-related increase in WNT-5A downregulates Lgr5 (Baarsma et al. 2017), which suggests PB-fibro could be targeted therapeutically to inhibit regenerative failure in COPD specifically in the airways.
The second airway-enriched cell type, PC-fibro, was detected only in the single nuclei data (Figure 2b), likely due to difficulty in dissociation of cartilage. The cell type was located around the chondrocytes within cartilage in our Visium ST ( Figure 2g) and could be identified by the expression of the marker gene COL12A1 in the human protein atlas ( Figure S5c). Among the marker genes for PC-fibro, we detected bone development genes LGR4 and LGR6 (Doherty and Sanjay 2020) ( Figure S5b). PC-fibro, which express standard fibroblasts markers, is unique in expressing genes relevant for bone development, placing these cells into a trajectory from adventitial fibroblasts to chondrocytes ( Figure S5 d, e). Finally, PC-fibro markers were enriched in genes causing skeletal abnormalities in humans ( Figure S5f), including FLNB and FGFR2 that are mutated in rare diseases causing skeletal malformations (Bochukova et al. 2009;Robertson 2008). In conclusion, we have identified perichondrial fibroblasts around the healthy human airway cartilage.
Resolution of four distinct cell types in the peripheral nerves from the airway Finally, among the fibroblast compartment we identify four rare cell clusters within the peripheral nerves in the airway: myelinating Schwann cells (NFASC, NCMAP, MBP, PRX), non-myelinating Schwann cells (nmScwann) (NGFR, SCN7A, CHD2, L1CAM, NCAM1) Gerber et al. 2021;Renthal et al. 2020;Wolbert et al. 2020), endoneurial nerve-associated fibroblasts (NAF) (SOX9, OSR2) ) and epineurial NAF ( Figure S6a). Localisation of those cell types in peripheral nerves is shown by bulk RNA-seq across tissues ( Figure S6d Figure S6 i). Specifically, we show the expression of epineurial NAF around, and endoneurial NAF with Schwann cells inside the nerve bundle in human airway samples. This further confirms the identity of myelinating and non-myelinating Schwann cells with marker gene set enrichment in myelination and cell adhesion categories respectively (Figure S6 b, c). This analysis further reveals EVX1 -a homeobox gene in spinal cord development -as a potential regulator of myelinating Schwann cells. Second, the nmSchwann expressed some genes involved in Schwann cell development and maintenance (SOX10) and myelination (ERBB3 and LGI4) ( Figure S6 a, Figure 2i, Figure S6 i). Finally, we observed specific expression of rare peripheral nervous system (PNS) disease genes in these populations ( Figure S6 j). Thus, our atlas discerns microanatomical location within structures such as peripheral nerves ( Figure 2j) and pinpoints disease associations, even for very rare cell types. The cell types described here exist in other tissues; therefore this data could have implications outside the lung.

Resolution of the vascular cell types in the systemic and pulmonary circulation
We next focused on studying of the vasculature, which in the lung consists of both systemic circulation providing oxygen to the tissue, and pulmonary circulation where gas exchange occurs. Uniquely in our dataset, we could distinguish clusters of  Ellis et al. 2020;Niethamer et al. 2020;Gillich et al. 2020;Schupp et al. 2021). In addition, we now distinguish arterial endothelial cell types (systemic E-Art-syst and pulmonary E-Art-pulm), which have distinct localisation on Visium ST with E-Art-syst enriched in the trachea (Figure 3a, b). The smooth muscle cells included non-vascular airway smooth muscle cells (ASM), pulmonary perivascular cell types (smooth muscle SM-pulm and pericytes Peri-pulm) ) and systemic perivascular cells (smooth muscle SM-Art-syst, pericytes Peri-syst and venous perivascular cells IR-Ven-Peri) (Figure 3a, c Supplementary Figure S7b). The capture of ASM was mainly enabled by single nuclei data (Figure 3d) that allowed for the detection of reliable marker genes for non-vascular smooth muscle cell types across tissues (Supplementary Figure S7 c, d).
In addition, we identified another perivascular cell type in the airways, expressing ABCC and ICAM1, but not CSPG4 -similarly to the postcapillary venous perivascular cells with a role in the homing of immune cells in peripheral lymph nodes (Murfee, Skalak, and  In summary, we distinguish between cells of the systemic and pulmonary circulation, and describe a novel immune recruiting postcapillary venous pericyte (IR-Ven-Peri). We provide marker genes for all of these populations, map these cells back into the tissue context and thereby further define the relationship between the endothelial and perivascular cells in the systemic and pulmonary circulation (Figure 3h).

Identification of duct and myoepithelial cells of the human airway submucosal glands (SMG)
Further analysis of the epithelial compartment (excluding AT1 and AT2 cells) identified expected ciliated, goblet, basal, suprabasal, dividing basal, club, SMG mucous and serous populations along with rare deuterosomal, ionocyte & brush and neuroendocrine cells. In addition to these, we identified a novel population between serous, mucous and club/goblet cells (Figure 4a, Figure S8 a, b) with marker genes overlapping with SMG duct cells in human oesophagus and preferential localisation in trachea (Madissoon et al. 2019;Nowicki-Osuch et al. 2021). Thus, we hypothesise that these cells are the columnar non-ciliated duct cells of the human airway, previously only characterised in mice at the single cell level (Widdicombe and Wine 2015;Meyrick, Sturgess, and Reid 1969;A. E. Hegab et al. 2012). smFISH staining for ALDH1A3, MIA and RARRES1 validated the localisation at the SMG and distinguished these cells from the serous, mucous and other epithelial cells (Figure 4b, Figure S8 e, f). Cell2location integration of single-cell and Visium data could also distinguish distinct locations of SMG duct cells compared to SMG mucous and serous cells, providing orthogonal evidence of the identification of a new, distinct cell type ( Figure S8h). In addition, Velocyto analysis suggested these cells lie on a trajectory towards the surface epithelial populations ( Figure S8c), supporting a potential role in airway surface regeneration, as has been demonstrated in mice (Tata et al. 2018; Ahmed E. Hegab et al. 2011).
An additional population was revealed from snRNAseq, exhibiting both basal epithelium and muscle markers with localisation around the glands (Figure 4a, Figure S8g), consistent with a myoepithelial cell signature (Goldfarbmuren et al. 2020b). Co-localisation of FHOD3, epithelial marker KRT14 and muscle marker TAGLN in the same cells confirmed the myoepithelial cell signature by smFISH ( Figure S8g). Interestingly, mouse myoepithelial cells have also been shown to regenerate the surface airway epithelium (Tata et al. 2018)). This cell type is not as well described in humans, potentially due to difficulties in dissociating this cell type from the airways consistent with exclusive recovery of myoepithelial cells from nuclei data. Overall, we provide the human transcriptome for novel SMG populations, and mapped the constituents of the SMG and their spatial location using both spatial transcriptomics and smFISH.
Proportions and gene signatures of the airway epithelial cells Using statistical modelling that accounted for material, donor and dissociation protocol (see Methods), we examined the proportions of airway epithelial cells at the 5 different locations sampled ( Figure S8c), with significance given as local true sign rate (LTR). As expected, club cells were enriched in the parenchyma, whereas SMG epithelial cells were enriched in the trachea. While donor and protocol had little effect on the epithelial cell type proportions, the starting material of cells versus nuclei strongly influenced cell type composition ( Figure  S8c). SMG epithelial cell capture was increased in nuclei, while the capture of other epithelial cell types was better achieved with the dissociation into single cells ( Figure S8c).
Taking advantage of our multi-location data, we compared gene expression for ciliated cells, as one of the most abundant cell types present, across the five locations. We avoided any artefacts due to differing ambient RNA contamination between locations using only snRNAseq data where samples were pooled across locations for the 10X sequencing reaction (Figure 1a). Using a linear mixed model (A. M. H. Young et al. 2021) (Methods) we detected 80 differentially expressed genes in ciliated cells from trachea compared to other locations (LTSR>0.9), many of which are upregulated in nasopharyngeal carcinoma including FBXL7, TSHZ2 and RAET1E ( Figure S8i) (Borchers et al. 2009;Motz et al. 2010;Wortham et al. 2013). We also examined ACE2 expression, a SARS-COV-2 entry gene, which we found to be highest in ciliated cells from the trachea, and lower in the distal regions of the lung, where expression of ACE2 is likely to be more relevant in AT2 cells as reported previously ( Figure S8j) (Deprez et al. 2020).
Altogether, we describe transcriptomes for two key cell types involved in the SMG structure in the human airways which also functions as an immunological niche which we describe below.

Myeloid cells show previously undescribed heterogeneity
For the immune compartment, we identified all major populations including myeloid, T&NK, B lineage, mast cells and megakaryocytes which were analysed separately to reveal previously undescribed heterogeneity, especially in the myeloid cells ( Figure 4c, Figure  S9a-c, Figure S10 a). We found all major myeloid cell types (DCs, monocytes and macrophages) including many known and novel subsets. Previously identified macrophage subsets included intravascular macrophages (expressing LYVE1 and MAF) (Chakarov et al. 2019;Evren et al. 2021), Macro-AW-CX3CR1 (Chakarov et al. 2019;Pirzgalska et al. 2017;Y. Wolf et al. 2017;Hulsmans et al. 2017), Macro-CHIT1 (CHIT1 expressing with roles in asthma, COPD and lung fibrosis) Chang, Sharma, and Dela Cruz 2020) and interstitial macrophages (Macro-interstitial expressing chemokines CXCL9, 10 and 11) (Evren et al. 2021). We identified a new cluster expressing both monocyte CD14 and macrophage markers, termed Macro-intermediate ( Figure S9a). Among alveolar macrophages, two more clusters appeared: dividing cells (Macro-alv-dividing), and a novel cell cluster expressing metallothioneins (Macro-alv-MT) including MT1G, MT1X and MT1F. Metallothioneins have a role in binding and metabolising metal ions (Artur Krężel 2017), immunity and stress response (Subramanian Vignesh and Deepe 2017;Takano et al. 2004), and therefore this population may have a function in response to air pollution. The final rare novel population of macrophages expressed chemokines including CXCL8, CCL4 and CCL20 and was named Macro-CCL. While the expression of CCL4 was previously identified in interstitial macrophages (Evren et al. 2021), the expression of CXCL8 and CCL20 distinguishes this novel subset. Dysregulation of CXCL8 expression is associated with multiple lung conditions including infection, asthma, IPF and COPD (Mukaida 2003) and was identified as a marker for a separate macrophage population in psoriatic skin ).
Overall, we have identified multiple known and novel myeloid populations in the healthy human lungs and airways, many of them expressing specific sets of chemokines, orchestrating the complex lung immune homeostasis.
The T and NK cells displayed striking donor-to-donor variability in cell type proportions compared to the myeloid clusters ( Figure S9 g-i), consistent with higher inter-individual variability in the adaptive immune compartment. The location of origin, material and protocol explained little variation for any of the cell types proportions, including for the B cells.
We also obtained TCR VDJ sequencing data that confirmed MAIT cell type annotation (with preferential use of TRAJ33 and TRAV1-2) (Treiner et al. 2003), and showed low clonal expansion in naive and Treg populations compared to memory and effector subsets ( Figure  S9 e). Lastly we show that, as expected, there was no clonal sharing between individuals, but expanded clones were found in multiple locations of the lung within one donor ( Figure S9 f).
Co-localisation of IgA plasma cells with the SMG B cells included naive and memory B cells, and plasma cells that were further annotated into immunoglobulin (Ig) IgG or IgA secreting plasma cells and plasmablasts ( Figure 4c, Figure  S10a, b), and this annotation was supported by VDJ-seq data via Scirpy BCR isotype analysis. IgA, which is important for mucosal immunity (Corthesy 2013, Kunkel andButcher 2003, Salvi andHolgate 1999), was the most frequent isotype in the airway samples, while only the third most abundant in the parenchyma (Figure 4d, Figure S10 e). Interestingly, we observe that proportions of IgA plasma cells, relative to IgG, are increased in COVID-19 patients versus healthy controls in single cell data from nasal and bronchial brush samples ( Figure 4e) (Yoshida et al. 2021). The distinguishing markers for IgA versus IgG plasma cells included CCR10 and B-cell maturation antigen BCMA (TNFRSF17) (Figure S10 b), which are important for plasma cell localisation and survival, respectively (Morteau et al. 2008;Kunkel and Butcher 2003;O'Connor et al. 2004) Using Visium ST, we observed the localisation of IgA B plasma cells, but not B or IgG plasma cells, in the airway SMG in trachea and bronchi sections ( Figure 4f). Annotation of anatomical regions across all Visium sections confirmed co-localisation of IgA plasma cells with duct, serous and mucous SMG cells, whilst IgG mapped to immune infiltrates ( Figure  4f). The mapping of IgA plasma cells to the SMG is confirmed in the Human Protein Atlas which shows an abundance of plasma cells (MZB1+) in the SMG region of the bronchus and the nasopharynx (Figure S10 c, d), along with a study from the 1970s that showed localisation of IgA plasma cells in human airway SMG (Soutar 1976).
Using multiplex IHC we showed the specific presence of IgA2 plasma cells in the SMG at single cell resolution, while IgG positive cells were present in the airways only outside the SMG, consistent with Visium ST (Figure 4g, Figure S10 f). We also detected IgD+ naive B cells and CD3+CD4+ T helper cells in the SMG (Figure 4g), suggesting that IgA plasma cells are supported by a complement of cell types that can orchestrate B cell maturation for IgA secretion directly into the airway mucous. We hypothesise that together these different cell types constitute an immune niche which we term gland associated lymphoid niche (GALN). Understanding the immunological mechanisms at the SMG can help understand disease, as increased plasma cell numbers in SMG have been shown in smokers (Soutar 1976), in COPD (Zhu et al. 2007) and in Kawasaki disease (Rowley et al. 2000).

Cell-cell interactions and the SMG immune cell niche
To understand colocalization of B cells, IgA plasma cells and T cells in the SMG (Figure 4f), we explored the molecular mechanisms underpinning the SMG as a potential immune niche. We report that expression of pIgR, which facilitates transcytosis of polymeric Ig across the surface epithelium, was high across all SMG epithelial cells, as was Mucosal Epithelial Chemokine (MEC)/CCL28, known to recruit IgA plasma cells through CCR10 in other mucosal sites (Figure 5a, b) (Wilson and Butcher 2004;Morteau et al. 2008). Using cell-cell interaction analysis tool CellChat on cells from the airways (Jin et al. 2021) we saw that, in addition to the CCL28-CCR10 axis between SMG cells and B plasma cells (combined IgA, IgG and plasmablasts), SMG-Duct cells were predicted to interact with CCR6 on memory and naive B cells and CD4 T cells (combined CD4 subsets, excluding Tregs) through CCL20 (A. Y. S. Elgueta et al. 2015;Bowman et al. 2000) (Figure 5b-d).
Cellchat also predicted interactions between HLA genes expressed by SMG epithelial cells, particularly SMG-Duct cells, and CD4-T cells (Figure 5e, f). The expression of HLA-DRA and HLA-DRB1 in SMG-Duct cells was comparable to ciliated cells and, as expected, less than B memory and naive cells (Figure 5f). Interestingly, we observed strong expression of HLA-DR protein by IHC in the glands, but not in the surface epithelium showing discrepancy between transcript and protein levels ( Figure 5g). We additionally found expression of CD40, a co-stimulatory molecule key for APC-T cell interactions, in SMG epithelial cells (Figure 5f). The expression of both these factors suggests that SMG-Duct cells can present antigen to CD4 T cells. Expression of HLA-DR and CD40 has previously been observed in airway and nasal epithelial cells, resulting in promotion of T cell proliferation in vitro (Rossi et al. 1990;Kalb et al. 1991;Cagnoni et al. 2004;Gormand et al. 1999;Tanaka et al. 2001). Interestingly, CD40 expression has been observed in duct, but not other cells, of the salivary gland and is upregulated in Sjögren's syndrome, a systemic autoimmune exocrinopathy (Dimitriou et al. 2002).
In further support of an immune niche, A Proliferation Inducing Ligand (APRIL), a factor important for B cell survival, differentiation and class switching, was expressed by SMG-Duct cells, and to a lesser extent by SMG-Serous cells (Figure 5h,j). Cell-cell communication through these ligands was predicted between SMG epithelial cells and B-memory and B-plasma cells, based on differential expression of the receptors TACI and BCMA ( Figure 5 h, j). In the colon, APRIL expression can be induced on intestinal epithelial cells and leads to IgA2 class-switch recombination (CSR) in the local tissue environment (He et al. 2007). We also find that a portion of B memory cells express activation-induced cytidine deaminase (AIDCA), suggesting the possibility of local CSR at the SMG through TACI-APRIL signaling (Figure S10 g). SMG-Duct, and to a lesser extent SMG-Serous, cells expressed IL-6, which was predicted to interact with IL-6R/IL-6ST on B plasma and a subset of CD4-T cells, the CD4-naive/CM T cells (Figure 5 i,j). In combination with APRIL, IL-6 induces and supports long lived plasma cells, and is a potent inducer of IgA secretion in IgA committed plasma cells (Beagley et al. 1989;Hirano et al. 1986). Furthermore, IL-6 has been shown as a required factor for CD4 T cell memory formation, and for overcoming Treg mediated suppression (Nish et al. 2014). Salivary gland epithelial cells have been shown to induce CD4 T cell differentiation into Tfh cells in an IL-6 dependent manner (Gong et al. 2014), which induced proliferation of B cells. In this study, proliferation of B cells was also induced in co-culture only with salivary gland epithelial cells, suggesting additional T-independent mechanisms. IL-6 is upregulated in serum and bronchoalveolar lavage fluid in asthma and COPD patients, suggesting that the balance of IL-6 signaling in the SMG is important for disease (Mercedes Rincon 2012; Savelikhina et al. 2018;Tillie-Leblond et al. 1999).
In conclusion, we have identified the localisation of IgA plasma cells, naive/memory B cells and T cells at the SMG along with a large number of molecular signalling pathways involved in B lymphocyte recruitment, antigen presentation and both T cell-dependent and -independent maturation and survival. These pathways are known to be functional in other secondary lymphoid organs, including within MALT, and we now show their possible involvement in establishing and maintaining the GALN (Figure 5k) of the lung.

Summary and Conclusions
We have produced a detailed annotation of the transcriptomes of cell types in the human lung; assessing their different locations along the tracheobronchial tree and parenchyma. Within the over 193,000 high quality transcriptomes presented, we identify 77 cell types or states, including for the first time the transcriptomes of chondrocytes, peripheral nerve cells, and SMG duct cells as well as subtypes of Schwann cells, muscle and macrophages. Among the newly identified cell types we find specialised cell types expressing specific sets of chemokines within the fibroblast, smooth muscle and macrophage compartments. We also identify a novel IgA immune cell niche within the airway submucosal glands, which we term the gland associated lymphoid niche (GALN), with potential disease relevance.
The strength of this study comes from combining multiple complementary technologies: scRNAseq, scRNAseq, VDJ analysis and spatial transcriptomics. snRNAseq, though more limited in the number of genes detected than scRNAseq, is not biased by enzymatic treatments and dissociation artefacts. This has allowed us to define the transcriptomes of chondrocytes and perichondrial fibroblasts (PC-Fibro) and to separate rare cell types such as Schwann cell subsets and nerve-associated fibroblasts (NAFs). Visium ST has allowed us to demonstrate the localisation of cell types within the tissue; confirming those already known (such as ciliated epithelial cells), those previously transcriptionally undefined such as chondrocytes, and entirely undescribed cell types in the human lung.
We characterise cell types that may cooperate to generate an immune niche at the SMG, finding signalling circuits also observed in lymph nodes, MALT and immunologically active tissues such as the gut, resulting in both T cell-dependent and independent B cell responses. We propose cell-cell interactions between SMG epithelial cells, including the SMG-Duct cells, CD4-T cells and B-naive, -memory and IgA secreting plasma cells, that we map to the SMG. The survival, maturation and potentially class switching of B-lineage cells are supported by APRIL and IL-6, expressed by SMG-Duct cells, providing T cell independent factors. Additionally, the SMG epithelial cells have the potential capacity to induce T cell dependent immunity and B cell responses through expression of MHC-II and CD40. Further functional analysis will have to investigate our findings, examine the roles of the T cells we detect, and test the function of SMG-Duct cells as professional APCs and in promoting class switch recombination and somatic hypermutation at these sites.
Many of the signalling pathways we described have been observed in other tissues and are particularly well described for the recruitment of IgA plasma cells to the mammary gland and to the Peyer's patches of the gut. No such secondary lymphoid structures or pathways within healthy lung tissue has been observed in the airways, but we postulate that the SMG microenvironment may fulfil a similar function.
Overall, the multi-omics data presented herein offer a comprehensive view of cell types and states within the human lung, with both macro-and micro-anatomical cell localisation information. We present new cell types with potential impact on a number of disease conditions both within and outside of the lung, and computationally analyse cellular interactions, advancing our understanding of lung composition and immunity. This comprehensive analysis has allowed us to take a systems approach that begins to define organ function as a result of the interactions between cells from distinct compartments. The interactions observed at the newly defined GALN may be relevant for other IgA secreting sites, such as mammary, salivary and tear glands.
Our findings are likely to be highly relevant to infectious disease biology, since IgA secretion is critical in combating infectious pathogens such as SARS-CoV-2 (Sterlin et al. 2021). We report higher proportions of IgA plasma cells in the airways of COVID-19 patients, supporting the importance of IgA immune niche. Nasal vaccines could induce a strong local sIgA response, yet all currently licenced COVID-19 vaccines are administered intramuscularly (Lund and Randall 2021). These vaccines are highly effective in preventing serious disease, but preclinical studies suggest that they provide little protection against viral replication and shedding in the upper airway (Bleier, Ramanathan, and Lane 2021;van Doremalen et al. 2021). Novel nasal vaccines are being developed that in non-human primates have shown promising prevention of replication and shedding of the virus due to the induction of a mucosal IgA response in the upper and lower respiratory tract (reviewed in (Tiboni, Casettari, and Illum 2021)). Whether the same is true in humans remains to be determined, but in COVID-19 patients, SARS-CoV-2 neutralisation was more closely correlated with IgA than IgM or IgG (Sterlin et al. 2021), and increased titres of IgA were observed in tears, nasal fluid and saliva (Cervia et al., n.d.) further highlighting the importance of the IgA immune niche in COVID-19 immunity. A better understanding of the mucosal IgA immune niche and the pathways that establish this niche, which we describe here, may offer options to augment this immune response.

Declaration of Interest
In the past three years, SAT has received remuneration for consulting and Scientific Advisory Board Membership from Genentech, Roche, Biogen, GlaxoSmithKline, Foresite Labs and Qiagen. SAT is a co-founder, board member and holds equity in Transition Bio.

Data availability
Sequencing data for scRNAseq, snRNAseq, VDJ-seq and Visium ST will be available via the DCP (http://data.humancellatlas.org). Sc/snRNAseq data will be available for browsing and gene expression data download via cellxgene website.                Marker genes for all the described cell types.

Aim and study design
We aimed to generate a multi-omics view of different regions of the human lung using single cell and nuclei sequencing (10X Genomics droplet sequencing) and spatial analysis (Visium Spatial Transcriptomics and RNAScope in situ hybridisation).

Experimental Methods
Access to human tissue Tissue dissociation and single cell sequencing Tissue was collected from seven donors from five lung locations including trachea, bronchi at the 2nd/3rd generation, bronchi at the 4th generation, upper left lobe parenchyma and lower left lobe parenchyma (Figure 1a and Supplementary Tables 1-3). Following collection at the clinic, samples (ranging 1-4cm 3 ) were immediately placed into cold Hypothermasol FRS as per the protocol outlined by Madissoon et al. (Madissoon et al. 2019) and transferred to the processing laboratory. Within 12h of cessation of circulation samples were divided into fresh portions of around 1g for immediate dissociation and pieces for embedding in OCT / freezing in isopentane cooled on dry ice for later spatial analysis and nuclei isolation. The majority of samples (n=5) were digested using liberase and trypsin, and CD45 positive cells loaded on 10X as a separate fraction (protocols.io 39ygr7w), although one donor was digested with collagenase D for comparison (protocols.io 34kgquw). Details on which samples received which treatments can be found in Supplementary Table 1&2; all protocols are available on protocols.io and linked in the metadata collated in the Human Cell Atlas Data Co-ordination Platform. Briefly, for the main tissue dissociation protocol used (for 5 donors), a 1g piece of lung tissue was washed with PBS-/-and minced finely with scalpels, before treatment with 13U/ml liberase TL and 0.1mg/ml DNase I for 30 minutes at 37°C with rocking. Cells were filtered through a 70μm strainer, washed with neutralisation media (RPMI + 20% FBS) and pelleted (sample P1). Tissue remaining in the cell strainer was further digested with 0.25% trypsin-EDTA with DNase I for 30 minutes at 37°C with rocking, filtered and washed with neutralisation media. Meanwhile, sample P1 was treated with red blood cell lysis buffer before being separated into CD45 positive and negative fractions using MACS (Miltenyi, as according to the manufacturer's protocol). The CD45 negative fraction was pooled with cells from trypsin treatment, resulting in two samples for loading on 10X: CD45 positive cells from liberase TL digestion (to enrich for immune cells) and pooled CD45 negative liberase-treated cells with trypsin-treated cells (non-immune fraction). These two cell fractions were resuspended in 0.04% BSA / PBS, counted using a Nucleocounter (Chemometec) and loaded on the 10X Genomics Chromium instrument for single cell sequencing, aiming to capture 5000 cells. The majority of samples were sequenced using 10X Genomics 5'v1 chemistry (5 donors; 2 were donors run using 3'v2 chemistry; Supplementary Table 1) according to the manufacturer's protocol.

Single nucleus sequencing
Single nuclei were isolated from frozen tissue using a modified version of the protocol RNAse inhibitors RNasin (Promega) -0.4 U/ul and Superasin (Invitrogen) 0.2 U/ul). Tissue was homogenised using~15 strokes with pestle A (clearance 0.0028-0.0047 in.) then pestle B (clearance 0.0008-0.0022 in.). Isolated nuclei were filtered through a 40 uM filter, collected at 2000 x g followed by resuspension in 0.5 ml of storage buffer (PBS containing 4% BSA and RNasin (Promega) -0.2 U/ul. Nuclei were incubated with NucBlue® (ThermoFisher), 1 drop was added to 0.5 ml nuclei suspension. Nuclei were purified by FACs sorting, stained with Trypan blue and counted using a disposable haemocytometer. Nuclei were diluted to an appropriate concentration and 5,000 nuclei from 5 different samples were pooled and analysed using the 10X 3' platform. 25,000 nuclei were and loaded onto the 10X Chromium controller using the 3' v3.1 kit as per the Chromium Single Cell 3ʹ Reagent Kits v3 User Guide, targeting to recover~3,000 nuclei per sample. Post GEM-RT cleanup, cDNA amplification and 3' gene expression library construction were carried out according to the appropriate user guide. The resulting libraries were sequenced on the Novaseq platform.

Visium Spatial Transcriptomics
Samples no larger than 0.5cm 2 were cut from each of the 5 locations of lung (trachea, bronchi at the 2nd/3rd generation, bronchi at the 4th generation, upper left lobe parenchyma and lower left lobe parenchyma) and additional small airway. Majority of the parenchyma was removed from the airway tissues, embedded in OCT and flash frozen in dry-ice cooled isopentane at -55°C. Haematoxylin and eosin staining was used to determine the morphology of tissue blocks, in order to choose those to proceed to Visium. Ten micron sections were then cut from the blocks onto Visium slides (10X Genomics) and processed according to the manufacturer's protocol. Full list of samples, locations, donors and permeabilisation times are shown in Supplementary Table 3 and Supplementary Figure 3. Haematoxylin and eosin images generated during the Visium protocol were captured at 20x magnification on a Hamamatsu Nanozoomer S60 and exported as tiled tiffs for analysis.
Dual-indexed libraries were prepared as in the 10X Genomics protocol, pooled at 2.25nM and sequenced 4 samples per Illumina Novaseq SP flow cell with read lengths 28bp R1, 10bp i7 index, 10bp i5 index, 90bp R2.

RNAScope
Tissue blocks for RNAScope in situ hybridisation were chosen based on haematoxylin and eosin staining results as described under 'Visium'. Ten micron-thick cryosections cut on to superfrost plus slides were processed using the RNAScope 2.5 LS multiplex fluorescent assay (ACD, Bio-Techne) on the Leica BOND RX system (Leica). Fresh frozen lung sections were fixed for 90 minutes with chilled 4% paraformaldehyde, washed twice with PBS and dehydrated through an ethanol series (50%, 70%, 100%, 100% ethanol) before processing according to the manufacturer's protocol with protease IV treatment. Samples were first tested with RNAScope positive and negative control probes before proceeding to run probes of interest. Slides were stained for dapi (nuclei) and three (or four where indicated) probes of interest, with fluorophores opal 520, opal 570, opal 650 and atto 425 (4-plex only) at between 1:500 -1:1000 concentration. These were then imaged on a Perkin Elmer Opera Phenix High Content Screening System with water immersion at 20x Magnification.

Tissue preservation and antibody staining
The fresh airway tissue was received in cold Hypothermasol, fixed with 1% PFA solution for 24h at 4˚C and transferred to cold 30% sucrose solution overnight, before freezing in OCT. The fixed tissue was sectioned at 10µm thickness. Iterative staining of human trachea sections was performed as described by Radtke et al. (Radtke et al. 2020). 10µm sections were permeabilised and blocked in 0.1M TRIS, containing 0.1% Triton (Sigma), 1% normal mouse serum, 1% normal goat serum and 1% BSA (R and D). Samples were stained for 2h at RT (primary antibodies) or 1h at RT (secondary antibodies) in a wet chamber with the appropriate antibodies, washed 3 times in PBS and mounted in Fluoromount-G® (Southern Biotech). Images were acquired using a TCS SP8 (Leica, Milton Keynes, UK) inverted confocal microscope. The coverslip was then removed and slides were washed 3 times in PBS to remove any mounting medium. Bleaching of the fluorochromes was achieved using a 1mg/mL solution of lithium borohydride in water (Acros Organics) for 15 minutes at RT. The slides were then washed 3 times in PBS prior to staining with a different set of antibodies as described above. The process was repeated a total of 5 times. Raw imaging data were processed using Imaris (Bitplane) using Hoechst as fiducial for the alignment of subsequent images. The staining set-up and antibody information is in Supplementary Table 5

Spatial mapping of cell types using Visium Spatial Transcriptomics and cell2location
Visium Spatial Transcriptomics (ST) data was used to establish tissue locations of the cell types presented in the paper. This was achieved by applying the Cell2location method to estimate the abundance of each cell type in the Visium slides by integrating the single cell/single nuclei and spatial transcriptomes (Kleshchevnikov et al., n.d.). The first step of cell2location workflow is estimation of reference expression signatures of cell types representing gene expression profiles for each annotated cell type and subpopulation (n=85). This step was performed using Negative Binomial regression (from cell2location package) that allows accounting for technology (sc/sn), batch and donor effects. To spatially map iBALT-like immune infiltrate regions we extended the cell type reference by including immune populations involved in the germinal center reaction. This was done by extending the above mentioned set of reference cell type signatures with the signature of these populations which were derived using published human gut data sets that featured good representation of such tissue infiltrating populations (R. Elmentaite et al. 2021). In the second step of cell2location workflow these reference signatures are used to estimate absolute spatial abundance of cell types, integrating and normalising data across 11 Visium sections (5 were excluded based on quality control metrics). 10X Visium data was processed as described above to untransformed and unnormalised mRNA counts, filtered to genes shared with scRNA-seq, which were used as input to cell2location with hyperparameters that were determined using information about the tissue and experiment quality: 1) Expected cell abundance per location , which were estimated by computinĝ = 20 average cell count across 20 representative locations derived by manually counting cells in histology images.
2) Regularization of within-experiment variation in RNA detection sensitivity , α = 20 representing more lenient regularisation to account for large technical within-slide variability in total count observed in this data.
The mode was trained until convergence was achieved (40,000 iterations). To aid interpretation, all locations in the Visium data were annotated with anatomical regions label using paired histology images. Each location on each Visium slide that was aligned with the tissue was annotated as "Tissue" (manual annotation) and was used for analysis. Specific tissue regions were annotated as follows: Cartilage, Glands, Multilayer epithelium, Parenchyma, Nerve bundle, Perichondrium, Airway smooth muscle, Arterial vessel, Venous vessel, Weird morphology, Mesothelium, Pulmonary vessel, Immune infiltrate (iBALT-like), Small airway (on parenchyma sections only). The annotation was used to compute average estimated cell abundance of each cell type across annotated regions, which was normalised per cell type and presented as a dotplot.

Single cell and nuclei RNA-seq analysis
The CellRanger unfiltered matrices were used as an input for the SoupX algorithm (M. D. Young and Behjati 2020) to remove ambient RNA contamination. The single cell RNA-seq libraries were corrected according to the tutorial (https://github.com/constantAmateur/SoupX). For each single nuclei RNA-seq library, CellRanger filtered nuclei were first subjected to a set of QC filters. Pass QC nuclei were processed using standard scanpy pipeline and were clustered at a relatively low resolution to form 5-10 clusters. Nuclei that didn't pass QC were assigned to those clusters by logistic regression. This clustering was then passed to SoupX, which used it to derive a set of cluster-specific genes for automatic estimation of contamination rate, which together with estimated soup composition were used by SoupX to clean the data. Default values were used when calling SoupX's functions, except for "autoEstCont()" where "soupQuantile" was set to 0.8 to improve stability of estimation. The single-cell and nuclei libraries with SoupX correction were analysed using the standard scanpy 1.7.1 workflow (F. A. Wolf, Angerer, and Theis 2018). The cells with >4000 counts in nuclei and 20,000 counts in the cells were removed. In the cells, droplets with >10,000 features were removed. Lower threshold of 1000 features was applied to donor A37 due to ambient RNA contamination that was difficult to remove. The rough level clustering was used to annotate master cell types. Master cell types were further subtracted and re-analysed with scanpy workflow including new highly variable genes detection. Typically, between 1000 and 3000 highly variable genes (HVG-s) were used for calculating 40 principal components that were used for calculating the UMAP coordinates. Data integration via Harmony (Korsunsky et al. 2019) or BBKNN (Polański et al. 2020) or scVI (Lopez et al. 2018) was used with either "Material" as input to correct between cells and nuclei, or "Material and Donor" in order to take into account the variability between donors. Clustering at different resolutions was done in order to determine biologically meaningful cell types. Doublets and contaminated cells were detected via overclustering and observing markers from more than one cell type, and in case of nuclei-seq also observing higher counts. Doublet clusters were removed. Iterative clustering approach was used to further derive clusters for less abundant cell types. In addition to known marker genes, new ones were derived using the scanpy' rank_genes_groups function. Cell types and master clusters were annotated according to known and newly derived markers as in Supplementary File 1.

Post-processing analysis
Gene set enrichment analysis was performed in GSEA online tool (https://www.gsea-msigdb.org/gsea/index.jsp) (Subramanian et al. 2005) for specific gene sets and in gProfileR (https://biit.cs.ut.ee/gprofiler/gost). The analysis of gene expression in GTEx tissues was performed in the GTEx portal (https://gtexportal.org/home/). Cell-cell interaction analysis was performed with CellPhoneDB (https://www.cellphonedb.org/) (Efremova et al. 2020) and with CellChat (http://www.cellchat.org/). In order to cope with donor to donor differences, the dataset was downsampled to a set number of cells per every donor-celltype combination. Pseudotime analysis for selected cell populations was performed with Monocle3 (ref), its functionality to infer a pseudotime based on UMAP coordinates. The root was identified as the cell with the highest combined expression of canonical progenitor markers (VCAN for chondrocytes; TGM2, HMCN2 and SULF1 for smooth muscle). Cell trajectory analysis was performed using the scVelo package (v0.2.1) (Bergen et al. 2020) and specifying stochastic model. Label transfer was performed via Azimuth tool (https://azimuth.hubmapconsortium.org/) with the lung reference data . Human protein atlas (https://www.proteinatlas.org/humanproteome/tissue) was used for extracting images of protein stainings with antibodies on human tissues. BCR and TCR analysis from VDJ-data VDJ analysis was done with Scirpy 0.6.0 (https://icbi-lab.github.io/scirpy/) (Sturm et al. 2020). For TCR data, clonotypes were defined based on CDR3 nucleotide sequence identity. For BCR data, clonotypes were defined based on the Hamming distance between CDR3 amino acid sequences with a cutoff of 2 and orphan VJ chains removed. In both cases, V gene identity was required and the CDR3 sequence similarity was evaluated across all of a cell pair's V(D)J chains.

Poisson linear mixed model for cell type composition analysis
Poisson regression with various metadata as covariates was applied to adjust confounding effects on the cell type count data as previously described Rasa Elmentaite et al. 2020). We used Location as biological factor and Protocol, Material (scRNAseq vs snRNAseq) and Donor as technical factors in the model as random effects to overcome the collinearity (see Supplementary Notes in Elmentaite et al. for more details).

Variance in gene expression
In order to determine the effects of the metadata features on the expression data, a linear mixed model was used (A. M. H. Young et al. 2021). Genes expressed in less than 5% of the samples were filtered out. The count matrix was then normalised and log-transformed. The percentages of variance in gene expression data explained by each metadata feature were obtained through fitting the linear mixed model. The Bayes factor was then computed to determine the gene-specific effects of some metadata features in the expression data, assigning an effect size and a local true sign rate for all genes analysed. Genes presenting a local true sign rate (ltsr) value greater than 0.9 were considered significantly affected by the metadata feature analysed. See Supplementary Notes of (A. M. H. Young et al. 2021) for more details.

Functional GWAS analysis (fGWAS)
The fGWAS approach to determine disease relevant cell types is described elsewhere (R. Elmentaite et al. 2021

Cell-cell interaction analysis
Cell-cell interaction from scRNAseq data was predicted using CellChat (Jin et al. 2021). B plasma subsets were combined (B-plasma) and cell types of interest (B-naive, B-memory, SMG-Duct, SMG-Mucous, SMG-Serous, CD4-naive/CM, CD4-EM/Effector and CD4-TRM) were downsampled to 200 cells per cell type per donor. Analyses were performed both with individual CD4-T cell subsets and all CD4 subsets combined. Normalised count matrix along with cell annotation metadata was processed through the standard CellChat pipeline, except that the communication probability was calculated with a truncated mean of 10%.