Abstract
Cigarette smoke first interacts with the lung through the cellularly diverse airway epithelium and goes on to drive development of most chronic lung diseases. Here, through single cell RNA-sequencing analysis of the tracheal epithelium from smokers and nonsmokers, we generated a comprehensive atlas of epithelial cell types and states, connected these into lineages, and defined cell-specific responses to smoking. Our analysis inferred multi-state lineages that develop into surface mucus secretory and ciliated cells and contrasted these to the unique lineage and specialization of submucosal gland (SMG) cells. Our analysis also suggests a lineage relationship between tuft, pulmonary neuroendocrine, and the newly discovered CFTR-rich ionocyte cells. Our smoking analysis found that all cell types, including protected stem and SMG populations, are affected by smoking, through both pan-epithelial smoking response networks and hundreds of cell type-specific response genes, redefining the penetrance and cellular specificity of smoking effects on the human airway epithelium.
Introduction
The human airway epithelium is a complex, cellularly diverse tissue that plays a critical role in respiratory health by facilitating air transport, barrier function, mucociliary clearance, and the regulation of lung immune responses. These airway functions are accomplished through interactions among a functionally diverse set of both common (ciliated, mucus secretory, basal stem) and rare cell types (pulmonary neuroendocrine, tuft, ionocyte) which compose the airway surface epithelium. This remarkable cell diversity derives from the basal airway stem cell by way of multiple branching lineages1,2, yet, the nature of these lineages, their transcriptional regulation and the functional heterogeneity to which they lead, remain incompletely defined in humans. Equally important to airway function, if even more poorly understood on both a molecular and cellular level, is the epithelium of the airway submucosal glands (SMG), a network that is contiguous with the surface epithelium and a critical source of airway mucus and defensive secretions.
Gene expression and histological studies of the airway epithelium have demonstrated that both molecular dysfunction and cellular imbalance due to shifting cell composition in the epithelium are common features of most chronic lung diseases, including asthma3 and chronic obstructive pulmonary disease4 (COPD). This cellular remodeling is largely mediated by interaction of the epithelium with inhaled agents such as air pollution, allergens, and cigarette smoke, which are risk factors for these diseases. Among these exposures, cigarette smoke is the most detrimental and, as the primary driver of COPD5 and a common trigger of asthma exacerbations6, constitutes the leading cause of preventable death in the U.S.7 Smoking is known to induce mucus metaplasia8 and gene expression studies based on bulk RNA-sequencing have established the dramatic influence of this exposure on airway epithelial gene expression9–12. However, these bulk expression changes are a composite of all cell type expression changes, frequency shifts, and emergent metaplastic cell states, making it impossible to determine the precise cellular and molecular changes induced by smoke exposure using this type of expression data.
Here, we use single cell RNA-sequencing (scRNA-seq) to define the transcriptional cell types and states of the tracheal airway epithelium in smokers and nonsmokers, infer the lineage relationship among these cells, and determine the influence of cigarette smoke on individual surface and SMG airway epithelial cell types with single cell resolution.
Results
Smoking induces both shared and unique gene expression responses across diverse airway epithelial cell types
To interrogate the cellular diversity of the human tracheal epithelium, we enzymatically dissociated tracheal specimens from seven donors and subjected these cells to scRNA-seq (Figure 1a). These donors included never-smokers, light smokers and heavy smokers (Supplementary Table S1), allowing us to evaluate the transcriptional effects of smoking habit on each epithelial cell type. Shared nearest neighbor (SNN) clustering of expression profiles from 13,840 epithelial cells identified eight broad cell clusters, each containing the full range of donors and smoking habits (Figure 1b, Supplementary Figure S1ab). Between 200-1500 differentially expressed genes (DEGs) distinguished these clusters from one another (Figure 1c, Supplementary Figure S1c).
a. Schematic for studying human tracheal epithelium by scRNA-seq.
b. tSNE visualization of cells in the human trachea depicts unsupervised clusters defining broad cell types present.
c. Dot plots highlight known and novel markers distinguishing broad cell categories based on average expression level (color) and ubiquity (size). Colored bar corresponds to the cell types/states in b.
d. Immunofluorescence (IF) labeling of human tracheal sections shows MKI67 (red) and TP63 (green) in a subset of KRT5high (white) basal cells. Dotted line denotes the approximate apical surface of epithelium. Scale bar = 25 μm. DAPI labeling of nuclei (blue).
e. Fluorescence in situ hybridization (FISH) co-localizes MUC5B (white) and MUC5AC (red) mRNA to non-ciliated cells. FOXJ1 (ciliated marker, green).
f. IF labeling localizes KRT8 (green) to mid-upper epithelium, MUC5AC (red), KRT5 (white).
g. FISH distinguishes rare cell mRNA markers: Left, PNECs (CHGA, white) and ionocytes (CFTR, red); Right, ionocytes and tuft-like cells (POU2F3, green).
h. IF labeling localizes MUC5B (white) and KRT14 (green) to distinct cells in the SMGs.
i. Published bulk RNA-seq upregulated smoking genes9 are upregulated in most heavy smoker cell populations. Box plots show distributions of mean normalized expression, p-values are based on one-sided t-tests comparing means for heavy and non-smokers. Rare cells were excluded due to small cell numbers. Box plots for downregulated genes from the same published study are in Supplementary Figure S1f.
j. Venn diagram summarizes core and unique smoking responses across seven broad cell populations (colors match those in b), with number of up and downregulated genes unique to populations given in the tips and number of core genes affected in ≥ four populations in the center. Note degree of overlap in the diagram is not proportional to gene overlap for readability. Detailed percentages in Supplementary Figure S1g.
k. Protein-protein interaction (PPI) networks show shared function across core smoking upregulated genes. Redness indicates number of cell populations where a gene was significantly upregulated (FDR < 0.05). See also Supplementary Figure S1, Supplementary Tables S1-S2.
Assignment of clusters to cell types or states was accomplished by examining expression of known epithelial cell type markers13,14 (Figure 1c). High KRT5 expression identified two basal cell populations, distinguished by the presence/absence of proliferation markers (e.g. MKI67, Supplementary Figure S1c). Proliferative heterogeneity within KRT5high basal cells was confirmed by immunofluorescence (IF) labeling in tracheal tissue sections (Figure 1d). Expression of the two major airway gel-forming mucins15, MUC5AC and MUC5B, identified the mucus secretory population via fluorescence in situ hybridization (FISH) (Figure 1ce). The ciliated cell population was identified through expression of ciliogenesis and ciliary function markers, including FOXJ116 and DNAH1117 (Figure 1ce). We also identified a cluster characterized by high KRT8 expression. Consistent with KRT8 being a differentiating epithelial cell marker18, KRT8high cells localized to the mid-to-upper epithelium, above KRT5high basal cells and often reaching the airway surface by IF (Figure 1f). Gene expression across KRT8high cells was highly heterogeneous, with a wide range of expression for both basal (KRT5, TP63) and early secretory cell (SCGB1A1, WFDC2) markers. The smallest cluster contained cells expressing markers diagnostic for pulmonary neuroendocrine cells or PNECs19 (CHGA, ASCL1), ionocytes20,21 (FOXI1, CFTR) and tuft cells22 (POU2F3). Although these cells comprised less than 1% of the total epithelium, FISH of tracheal tissue identified cells expressing each of these markers (Figure 1g).
In addition to surface epithelial populations, two clusters highly expressed known glandular genes23 (Figure 1c, Supplementary Figure S1d), suggesting our digest isolated SMG epithelial cells. One of these SMG clusters highly expressed basal markers (e.g. KRT5, KRT14), while the other exhibited a mucus secretory cell character, including high MUC5B expression. IF labeling of tracheal tissue using these markers allowed visualization of SMG secretory and basal cells (Figure 1h, Supplementary Figure S1e).
Each identified population contained cells from all three smoking groups (Supplementary Figure S1b) and thus, we investigated the transcriptomic effects of smoking habit in each of the cell populations independently. We first examined smoking effects of genes previously reported to be differentially expressed between current and never-smokers using bulk RNA-seq data from bronchial airway epithelial brushings9. For both the upregulated and downregulated gene lists reported, a significant shift in mean expression between heavy smokers and nonsmokers was observed in six of seven populations that matched the direction of effect in the bulk data (Figure 1i, Supplementary Figure S1f). These effects, however, were not consistently observed in light smokers (Supplementary Figure S1f), leading us to focus further investigation of smoking effects on heavy smoker vs. nonsmoker cells.
Transcriptome-wide single-gene differential expression analysis identified over 150 DEGs between heavy and nonsmokers in each cell population (Supplementary Figure S1g, Supplementary Table S2). Importantly, 7%-87% of the smoking DEGs for each population were unique to that population, revealing a previously unappreciated cell type-specific aspect to the smoking response, discussed below (Figure 1j, Supplementary Figure S1g). Additionally, we identified a “core” response to heavy smoking, encompassing genes consistently up-or downregulated in at least four populations. Among these, MUC5AC was notably upregulated in six of seven (non-rare) cell types, while protein-protein interaction (PPI) network analysis of upregulated core genes revealed a pan-epithelial induction of nine interacting secretion-related genes with heavy smoking (Figure 1k). The upregulated core response also included genes enriched for xenobiotic metabolism and chemokine signaling (Figure 1k), suggesting that known airway responses to smoking, like toxin metabolism and macrophage recruitment10,24, are a joint effort conducted across epithelial cell types. The downregulated core response largely involved deactivation of immune function, such as the complement system, which helps clear microbes and damaged cells, and secretoglobin 1A1 (SCGB1A1) production, important for airway defense25 (Supplementary Figure S1i). Notably, the downregulated core response contained multiple HLA type I and II genes (Supplementary Figure S1i), possibly signaling an underappreciated role for antigen presentation in the epithelium, which is suppressed by smoking.
Secretory cells form a continuous lineage that culminates in mucus secretory cells
Airway secretory cells canonically include club cells, which produce SCGB1A1-laden defensive secretions, and mucus secretory cells, which varyingly express the major gel-forming mucins (MUC5AC and MUC5B). Although inflammatory stimuli have been shown to induce conversion of club cells into mucus cells in mice26, the lineage relationship between these cells in the homeostatic human airway is unclear. Moreover, while NOTCH signaling is a likely mediator of secretory cell fate in the differentiating airway27,28, and the transcription factor (TF), SPDEF29,30, specifically drives inflammation-induced mucus metaplasia, little else is known regarding regulation of human secretory cell development. To investigate this area, we reconstructed the human secretory cell lineage using pseudotime trajectory analysis31 of the mucus secretory cells and KRT8high populations, which contained cells with both an intermediate basal/secretory profile and club-like cells. This analysis aligned most cells along a single lineage (Supplementary Figure S2a) in which basal-like cells transitioned into mucus secretory cells through expression of three successive gene modules (Figure 2a). These modules included TFs and signaling molecules that may drive their expression (Figure 2b). The first of these modules (secretory preparation) was highly enriched for genes involved in ATP production and protein translation elongation, likely reflecting necessary preparation for the high energy demands of secretory protein production (Figure 2a). Secretory preparation genes were enriched for NOTCH signaling and included the NOTCH3 receptor, as well as potential novel TF regulators (BTF3, KLF3) (Figure 2ab).
a. Heat map of smoothed expression across a Monocle-inferred lineage trajectory shows transitions in transcriptional programs that underlie differentiation in the in vivo human airway epithelium, from basal-like pre-secretory (KRT8high) cells into mucus secretory cells. Select genes that represent these programs are shown, all significantly correlated with pseudotime. Key enrichment pathways and genes belonging to each block are indicated at right. Subcluster colors are the same as those in the pseudotime trajectory shown in Supplementary Figure S2a.
b. Scaled, smoothed expression of select transcriptional regulators (colored lines) and canonical markers (black dashed/solid lines) across pseudotime differentiation of human tracheal secretory cells in vivo. The x-axis corresponds to the x-axis in a.
c. Pie chart depicts proportions of all mucus secretory cells exhibiting different MUC5AC and MUC5B mucin co-expression profiles.
d. Co-expression of common secretory markers at the mRNA level (Left, FISH with SCGB1A1 in red, MUC5B in green) and protein level (Right, IF labeling with MUC5AC in red, MUC5B in green and KRT5 in white). Scale bar 25 μm.
e. Smoking-independent correlation coefficients of MUC5B-correlated and MUC5AC-correlated genes. Genes are colored based on whether they were significantly correlated with only MUC5B (green), only MUC5AC (blue), or both (orange). Select genes are labeled.
f. Box plots illustrate the converse effects of smoking on the mean expression of MUC5B and MUC5AC-correlated genes. P-values are from one-sided Wilcoxon tests.
See also Supplementary Figure S2.
A second module followed that was characteristic of club cells32 including expression of SCGB1A1, WFDC2, and CYP2F1. This module was highly enriched for O-linked glycosylation of mucins and xenobiotic metabolism, and contained airway transmembrane mucin genes33 (MUC1, MUC4, MUC16). In this club secretory phase of pseudotime, an array of known and novel TFs increased expression, eventually reaching a crescendo in the mucus secretory cells, consistent with these cells transitioning into mucus secretory cells. The first of these to appear was the novel cAMP responsive TF, CREB3L1, which was followed by expression of both SPDEF and another novel TF, MAGED2. Expression of SCGB1A1 began later and was coincident with expression of XBP1, a TF likely driving the cellular stress response to the initial secretion of secretoglobin and accompanying secreted proteins34,35 (Figure 2b). The secretory cell trajectory terminated with the mucus secretory module, containing both MUC5AC, MUC5B, and the TF, FOXA3, while being highly enriched for genes involved in O-linked glycosylation, vesicle coating, SLC-mediated membrane transport, and unfolded protein response, consistent with these cells actively producing and secreting mucus (Figure 2ab). Together, these data support a single developmental lineage of human secretory cells, driven by sequentially activated TFs, which transitions through functional intermediates (club cells) to culminate in a multi-functional mucus secretory cell.
Heavy smoking drives mucus secretory cells to express a MUC5AC secretory program
We investigated whether transcriptionally functional subsets of mucus secretory cells exist that may carry out the known mucociliary and airway defense responsibilities of these cells. Agnostic subclustering yielded two subpopulations (Supplementary Figure S2b), one of which contained only 15% of secretory cells and was surprisingly distinguished by its expression of many known ciliated cell markers, including critical regulator FOXJ116 (Supplementary Figure S2c). We speculated that this hybrid secretory/ciliated population was undergoing transdifferentiation, discussed later. The larger subpopulation, consisting of mucus secretory cells, exhibited no functionally distinct transcriptome-level subgroups and failed to reflect goblet cell subtypes previously reported36 (Supplementary Figure S2d).
To further explore the heterogeneity in this mucus secretory subpopulation, we inspected the distribution of the canonical secretory genes, SCGB1A1, MUC5AC and MUC5B, and observed high co-expression of all three of genes, with 94% of cells expressing SCGB1A1 and 78% of cells expressing both MUC5AC and MUC5B (Figure 2c). Pervasive co-expression was confirmed in tracheal sections at both the mRNA and protein level (Figures 1e, 2d). These patterns are consistent with the transcriptome-wide homogeneity observed and suggest these genes all reach peak expression in this mucus secretory state.
Considering that MUC5AC was a core smoking response gene, we next examined whether mucin co-expression differed between nonsmokers (NS) and heavy smokers (HS). We found a sharp increase in the frequency of MUC5AC+ only cells (NS=1%, HS=10%) with heavy smoking and a corresponding decrease in both MUC5B+ only (NS=17%, HS=5%) and MUC5AC− / MUC5B− double negative cells (NS=8%, HS=2%; Supplementary Figure S2e), consistent with published data showing that MUC5B is more homeostatic and defensive37, whereas MUC5AC is more inducible and characteristic of inflammatory disease states15,30. Moreover, we found little overlap in genes correlated with MUC5AC and MUC5B, suggesting these mucins are associated with distinct functional programs (Figure 2e). For example, MUC5B-specific correlated genes encoded known secretory defense proteins, including C3, CFB, SAA1, SAA2, and LCN2, whereas MUC5AC-specific correlated genes contained a different set of defensive proteins (MSMB, LYZ, TFF1, BPIFB2, CEACAM5) while also being enriched for pro-secretory pathways related to ER-based protein processing and glycosylation (Figure 2e). Notably, among genes uniquely co-expressed with MUC5AC was XBP1, a TF previously implicated in both mucus production and its associated unfolded protein responses34,38. Not only were the two mucins themselves anti-correlated with smoking, but mean expression of MUC5AC- or MUC5B-correlated genes also increased or decreased, respectively, with smoke exposure (Figure 2f). Both IL33, a master regulator of type 2 mucus metaplasia39–41, and NKX3-1 are potential regulators of these smoking-induced changes in secretory cells (Supplementary Figure S2f). Together, these data further support the concept of a continuous secretory cell lineage and show how smoking may mediate an additional transition, from mucin-balanced terminal secretory cells into an extended endpoint where MUC5AC (and its co-expressed program) dominate.
MUC5Bhigh SMG secretory cells shift toward MUC5AC production and away from specialized defensive secretions with heavy smoke exposure
Human airway mucus is formed from the composite of secretions produced by both surface and SMG mucus secretory cells42. We thus compared expression profiles between these two populations to examine similarities and differences in their secretory products and the molecular mechanisms that underlie them.
We identified over 100 DEGs defining mucus secretory cells in both SMG and surface populations, which were enriched for transmembrane transport and mucosal defense (Supplementary Figure S3a). Despite these similarities, an even larger number of genes were uniquely characteristic of one or the other cell type (Figure 3a, Supplementary Figure S3b). The SMG population specifically expressed a highly unique repertoire of secretory proteins with strong enrichment for bacterial defense and innate immunity functions (Figure 3a). Furthermore, we found that while both populations highly expressed MUC5B (Figure 3b), expression of MUC5AC in SMG secretory cells was much lower (18.4-fold reduction) and less ubiquitous (SMG=30% vs. surface=84%). Reduced MUC5AC within SMG cells was accompanied by significantly reduced or absent expression of a host of genes involved in ER-to-Golgi vesicle-mediated transport, protein processing in the ER, and both O-linked and N-linked mucin glycosylation (Figure 3a). Distinct panels of TFs in the two groups likely govern these different expression states. For example, CREB3L1 and SPDEF, the canonical secretory cell TF, were most predominant on the surface, while SMG cells uniquely expressed the SMG TF, SOX9, as well as FOXC1 and BARX2, which are known to be involved in lacrimal gland development43–45 (Figure 3a). Together, our data suggest that unique TF drivers in SMG cells result in MUC5B-dominated mucus, which requires considerably less post-translational processing and glycosylation than surface mucus production, and is equipped with specialized defensive functions. This is consistent with recent studies detailing distinct physical properties of mucus from the SMG compared to epithelial surface46,47.
a. Heat map depicts select genes, functional terms and TFs that distinguish surface and SMG secretory cells. Detailed heat map in Supplementary Figure S3a.
b. Box plots of normalized mucin expression across surface secretory (orange), SMG secretory (brown) and non-secretory cells (grey). Median fold change between surface and SMG secretory cells is indicated.
c. Top, Pie chart depicts proportions of SMG secretory cells exhibiting different MUC5AC and MUC5B mucin co-expression profiles. Bottom, bar plot showing how proportions of cells belonging to each mucin co-expression class differ between nonsmokers (black; n = 3) and heavy smokers (red; n = 2).
d. Dot plot showing the expression of markers that unite and distinguish SMG basal cell substates, relative to surface populations and each other.
e. Scaled, smoothed expression of key genes and regulators across a pseudotime trajectory that models the differentiation process of SMG cells. MEC, myoepithelial cells. A minimum spanning tree of the trajectory can be found in Supplementary Figure S3h.
f. IF labeling illustrates myoepithelial cells (ACTA2+, green) transitioning to SMG basal cells (KRT5+, red). Example myoepithelial (green arrows), SMG basal (red arrow) and transitioning (yellow arrow) cells are highlighted. DMBT1 is in white and scale bar is 50 μm.
g. IF labeling illustrates presence of TSLP (red) in SMG basal cells of heavy smokers. ACTA2, green; MUC5B, white. Scale bar, 25 μm.
See also Supplementary Figure S3.
Examining smoking effects in SMG mucus secretory cells, we found that MUC5AC was induced and MUC5B was suppressed by heavy smoking (Figure 3c), echoing mucin responses on the surface. Heavy smoking also increased levels of inflammatory cytokine interleukin-6 (IL6) uniquely in these cells, which has been implicated in lung disease48 (Supplementary Figure S3c). However, most notable was the unique downregulation of 48 genes in SMG cells with smoking, which were related to multiple functions, including calcium ion binding (S100A6, S100A16) and secretion (e.g., DMBT1, TF, and GNAS) (Supplementary Figure S3c). Heavy smoking thus appears to induce inflammation, shift the balance of mucins, and diminish the diversity of specialized proteins produced by SMG mucus cells, likely compromising barrier and defense functions of the airway.
Human SMG basal cell states include myoepithelial cells and are modified by heavy smoking
Recent work in mice has established the myoepithelial cell population as the SMG stem cell, which can differentiate into luminal cells through a basal cell intermediate. These cells were also shown to regenerate the surface epithelium in settings of severe injury. How these observations translate to the human airway is unclear. Investigating this, we identified a cell population with high glandular gene expression which highly expressed KRT14, a marker of murine SMG basal cells2,49,50. Also upregulated in this population were several other genes associated with glandular basal cells, including CAV1, CAV2, IFITM3, ACTN1, and VIM51–53 (Figure 3d, Supplementary Figure 3d).
Subclustering of this group revealed additional heterogeneity, with three major states identified (Supplementary Figure S3defg). The smallest of these expressed over 200 genes related to muscle function absent from the other two subpopulations (Supplementary Figure S3dg), a profile highly similar to murine SMG myoepithelial (ACTA2+) cells54,55. This population was not proliferating (MKI67−) and poorly expressed KRT5, suggesting it represents a quiescent human myoepithelial population. Compared to myoepithelial cells, the other two major SMG basal states exhibited expression more typical of surface basal cells, including high KRT5 expression. One of these KRT5high populations was proliferating (MKI67+) and in fact clustered with surface proliferating basal cells upon epithelium-wide clustering (Figure 1b). The second state was non-proliferative and appeared to be differentiating in that it highly expressed, IL33, a marker of surface differentiating basal cells in our dataset (Figure 3d).
A pseudotime trajectory31 of all (except proliferating) SMG populations proceeded from myoepithelial cells into differentiating basal, and then SMG mucus secretory cells (Figure 3e, Supplementary Figure S3h). Transitioning out of the myoepithelial state involved losing expression of ACTA2 and muscle-related genes while simultaneously gaining expression of basal cell genes (KRT5, IL33). TFs distinctively characteristic of SMG (compared to surface) mucus cells (SOX9, FOXC1, and BARX2) initiated high expression in the differentiating basal population, consistent with this state being the precursor to mucus SMG cells (Figure 3e). IF labeling of tracheal sections further supported these transitions as well as the presence of these populations at the protein level (Figure 3f).
Even these SMG basal cells at the base of glands were affected by prolonged smoking, exhibiting a total of 174 DEGs (Supplementary Figure S3ij). Notably, smoker SMG basal cells uniquely upregulated TSLP, a major driver of type 2 airway inflammation56,57, which we confirmed with IF labeling (Figure 3g), suggesting a potentially unrecognized role for these cells in the onset of chronic inflammatory airway disease.
Sequential transcriptional programs drive motile ciliogenesis
Upon cell fate acquisition, nascent ciliated cells activate expression of a large ciliary program, precipitating the generation of hundreds of cytoplasmic basal bodies which traffic to and dock with the apical membrane where they then elongate motile axonemes58. As our in vivo scRNA-seq data did not wholly capture the heterogeneity reflective of this progression, we studied the process by culturing basal tracheal epithelial cells from a subset of the donors at air-liquid interface (ALI) and harvesting replicate cultures at 20 timepoints across mucociliary differentiation for scRNA-seq sequencing analysis (Supplementary Figure S4abc). Clustering of 5,976 cells yielded three in vitro populations distinguished by their high expression of ciliary genes (Figure 4a, Supplementary Figure S5a). Trajectory reconstruction59 identified two major lineages, one of which transitioned from basal through early secretory cells, culminating in the three ciliated cell populations (Supplementary Figure S4d), whose ordering matched the real-time appearance of states across ALI differentiation (Supplementary Figure S5b). The first state to appear was highly enriched for genes involved in basal body assembly60,61 (DEUP1, STIL, PLK4) (Figure 4a) and also contained known early transcriptional drivers of ciliogenesis62–64 (MCIDAS, MYB, and TP73) (Figure 4b). We also found that TF, E2F7, was highly expressed in this state. Since E2F4 and E2F5 act at the top of the ciliogenesis program62, other family members may also be involved in this initial early ciliating stage.
a. Heat map depicts gene signatures of three ciliated cell states (function summarized in schematic above) in human airway epithelial ALI cultures sampled across differentiation. Select genes from each state are indicated.
b. Dot plots reveal TFs exhibiting expression associated with ciliated states in vitro.
c. Wholemount IF labeling of FOXN4 knockout in human tracheal epithelial ALI cultures is consistent with early expression of FOXN4 (green). Acetylated α-Tubulin (ACT), red, identifies immature (white arrows) and mature ciliated cells (yellow arrows) in control cultures at ALI timepoints indicated. Scale bar = 25 μm.
d. Left, Quantification of mature and immature ciliated cells on Day 21 as determined by ACT labeling morphology (see Methods). Right, Wholemount IF labeling of FOXN4 knockout illustrates aberrant ciliogenesis where basal bodies are generated (γ-Tubulin, green), but fail to dock (white arrows) and deuterosomes are assembled (DEUP1, red), but retained (yellow arrows). Scale bar = 25 μm.
e. Average expression of markers from two in vitro ciliogenesis states (top and middle) and in vivo mature mucus secretory cells (bottom) reveals that the hybrid secretory/ciliated state contains both early ciliating and mature secretory character. Marker expression for the later ciliating in vitro state in Supplementary Figure S5e.
f. Left, box plots summarize expression of genes uniquely upregulated in mature ciliated cells with heavy smoking, Right, functional gene network (FGN) of non-core upregulated genes in mature ciliated cells. Colored edges indicate shared enrichment annotations between genes that belong to functional categories summarized by the exemplar terms listed, grey edges indicate shared annotations across different functional categories. Edge thickness corresponds to the number of shared terms. Genes in bold are those uniquely upregulated in mature ciliated cells.
g. Smoking downregulates ciliogenesis in hybrid secretory/ciliating cells but not mature ciliated cells. Left, box plots summarize expression of genes uniquely downregulated in hybrid ciliating cells, Right, FGN summarizes functional relatedness among these genes and is as described for f except that genes in bold are those annotated as cilia-related by CiliaCarta82.
See Supplementary Figures S4-S5.
The two subsequent states were both highly enriched for mature ciliated cell genes, but the first of these in pseudotime was distinguished by the presence of basal body docking (CEP290, TTBK2)65 and axoneme assembly (IFT52) genes (Figure 4a), as well as peak expression of known ciliogenesis TFs (GRHL2, RFX2 and RFX3) 17,66,67. These TFs were downregulated in the third and final state, which displayed the highest expression of another canonical ciliogenesis and ciliary maintenance TF, FOXJ116 (Figure 4b). This state also showed high expression of mitochondrial formation and ATP synthesis genes, consistent with the significant energy requirements of axonemal motility68 (Supplementary Figure S5c). Finally, 233 known ciliary genes displayed higher unspliced-to-spliced ratios69 in the second compared to the third state, while only 10 genes showed the inverse pattern (Supplementary Figure S5de), supporting the trajectory’s ordering of states and illustrating a putative role of mRNA processing during the final completion of ciliogenesis.
Interestingly, forkhead box N4 (FOXN4), a known regulator of ciliogenesis in Xenopus70, was highly and nearly exclusively expressed in the early state, and may thus be a novel driver of this population. Consistent with its early expression, we detected nuclear FOXN4 at ALI Day 9 but no signal in mature ciliated cells at ALI Day 21 (Figure 4c). CRISPR-Cas9 knockout (KO) of FOXN4 carried out in basal cells resulted in a partially penetrant block to ciliogenesis upon differentiation. At Day 21, 76% of ciliated cells had no or only short, sparse cilia compared to only 2% in the control (Figure 4d, left; Supplementary Figure S5f). The abnormal KO cells retained basal bodies and deuterosomes60 in the cytoplasm (Figure 4d, right), indicating that the basal body generation machinery was intact, but basal body docking and deuterosome disassembly was blocked. Thus, our data are consistent with FOXN4 regulating this later step in early ciliogenesis.
In vivo, most ciliated cells were mature and the only cluster resembling the early ciliating state (Figure 4e, top), including high FOXN4 expression, was the hybrid secretory/ciliated cell population that subclustered out of mature mucus secretory cells (Supplementary Figure S2cd). These hybrid cells expressed SPDEF, MUC5AC, and mature mucus secretory genes (Figure 4e, bottom), in contrast to the non-mucus producing early secretory cells that gave rise to the FOXN4+ early ciliating state in vitro (Supplementary Figure S4d). Together these data suggest that during de novo epithelization, ciliated cells derive from early secretory cells, but in the homeostatic airway, mature mucus cells transdifferentiate into ciliated cells, possibly in response to stimulus.
Mature ciliated cells exhibited 42 genes uniquely upregulated in heavy smokers which included the TF, XBP1, and genes involved in ER processing and unfolded protein and heat shock responses (Figure 4f). As heat shock family chaperonins were recently shown to be required for axonemal protein complex assembly71, prolonged smoking may enhance the ciliated cell-specific protein-folding program to counteract smoking-related protein damage and misfolding.
Smoking leads to decreased ciliary function and ciliated cell loss72–75, yet we found that genes downregulated by heavy smoking in mature ciliated cells were not related to ciliogenesis or ciliary function (Supplementary Table S2). However, these genes were strongly downregulated in the hybrid secretory/ciliated cells (Figure 4g), suggesting that the ciliogenesis program in this hybrid population is uniquely vulnerable to prolonged smoking. Thus, smoking may hinder the regeneration of ciliated cells rather than impairing their function once fully developed.
Decoupled FOXI1 and CFTR expression and a potential lineage relationship among rare epithelial cell types
Subclustering of the rare cell population identified three distinct groups, each expressing canonical markers of highly disease-relevant epithelial cell types: PNECs19 (CALCA), tuft cells22,76,77 (POU2F3), or ionocytes21,36,78(CFTR) (Figure 5a). We transcriptionally defined the function of these cells in humans using differential expression analyses, which confirmed highly enriched expression of Achaete-Scute family BHLH TFs, ASCL1, ASCL2 or ASCL3, in PNECs, tuft cells or ionocytes, respectively36,76,79 (Figure 5b, Supplementary Figure S6a). These data confirm that ionocytes populate the human tracheal epithelium and highly express CFTR, as recently recognized36,78. On a per cell basis, ionocyte CFTR expression was between 17- and 467-fold higher than in other cell types, yet low frequency of these cells (average 0.2%) means only 11% of the total CFTR expressed by the epithelium was derived from ionocytes, whereas other more abundant cells contribute more, such as the KRT8high population which supplied 56% of epithelial CFTR (Figure 5c). Exploring whether this result was due to a scRNA-seq sampling bias, we examined bulk RNA-seq data from our ALI differentiation time course, finding that CFTR expression began and peaked much earlier than ionocyte marker genes (Figure 5d), further supporting a significant CFTR contribution from other epithelial cell types.
a. Left, tSNE depicts subclustering of rare cells found in the human tracheal epithelium, Right, violin plots show expression of rare cell markers identify the three subclusters.
b. Heat map of unique gene signatures across rare cell types. Select genes in each block are indicated at left and select gene ontology terms enriched by the genes in each block are indicated at right.
c. Left, Level of CFTR expression is shown across the tSNE plot of cells in the in vivo human tracheal epithelium. Grey points indicate cells with zero expression. Rare cell cluster is circled. Right, Table details the distribution of CFTR UMIs across major cell populations in the in vivo human tracheal epithelium.
d. Geometric mean of scaled bulk RNA-seq expression for in vivo marker genes of non-rare cells, ionocytes, and CFTR across samples from 20 timepoints of human epithelial ALI differentiation indicate that bulk CFTR appears days before ionocytes in culture.
e. Violin plots show that FOXI1 is expressed in both tuft-like cells and ionocytes in vivo. Point color indicates co-expression of FOXI1 with CFTR (red), POU2F3 (green) or neither (white).
f. FISH of FOXI1 (white), CFTR (red) and POU2F3 (green) illustrates overlap of FOXI1 in ionocytes (FOXI1+/CFTR+, pink arrows) and tuft-like cells (FOXI1+/POU2F3+, yellow arrows) in the human tracheal epithelium in vivo. Green arrow, FOXI1−/POU2F3+tuft-like cell. Representative images from more than 8.4 cm of basolateral membrane across four donors are shown.
g. Average co-expression quantification of FISH for the three markers in f across 561 total ionocytes and tuft-like cells imaged in 4 donors. Number of cells is indicated in each pie. Break down of each donor’s profile can be found in Supplementary Figure S6f.
h. Tuft-like cell markers appear days before ionocyte or PNEC markers in vitro. The geometric mean of scaled bulk RNA-seq expression from the top 25 in vivo markers for each rare cell type are shown.
See also Supplementary Figure S6.
The expression signature in our human PNECs revealed neurotransmitter processing pathway genes employed by these cells as well as a host of secreted neurotransmitters (Figure 5b). Despite characteristic POU2F3 and ASCL2 expression in our human tuft cell population, expression signatures in these cells were distinct from those reported in murine tuft cells36 (Figure 5b, Supplementary Figure S6a). For example, many diagnostic markers in mice (GNAT3, TRPM5, GNG13, HMX2, etc.) were not well-represented in our human scRNA-seq dataset, while other murine markers were present (HOXA5, HCK, and LRMP). Therefore, we classified our POU2F3+/ASCL2+ cells as “tuft-like” to signal the uniqueness of their transcriptional profile compared to previously described tuft cells.
Along with murine lineage tracing results36, the appearance of all these populations in ALI cultures (Supplementary Figure S6bcd) demonstrates that rare cells all ultimately derive from basal airway epithelial cells. Yet, little is known about the paths a basal cell takes to differentiate into these cell types. That these three rare populations clustered together when the entire epithelium was analyzed (in vitro and in vivo, and by others78) potentially indicates a shared origin as well as phenotype. Supporting this, differential expression analysis identified 67 genes highly expressed across only these three populations, as well as hundreds of genes uniquely shared between pairs of rare cell types (Supplementary Table S3). Interestingly, HES6 was one of the 67 genes uniting rare cells, suggesting a possible role for NOTCH competition80 during fate determination of these cell types, and also potentially reflecting a shared neuronal character81 (Supplementary Figure S6e).
Most intriguing of the rare cell pair relationships was that observed between ionocytes and tuft-like cells, which uniquely shared expression of 114 genes, including the reported ionocyte TF, ASCL336,78 (Supplementary Table S3). Moreover, we found that ionocyte marker, FOXI120, whose expression has been reported to be sufficient to produce CFTRhigh ionocytes36,78, was expressed by roughly half of POU2F3+ tuft-like cells (Figure 5e). Despite FOXI1 expression levels comparable to ionocytes, these tuft-like cells lacked detectable CFTR expression. Confirming this, FOXI1 was present by FISH in nearly all CFTRhigh cells and about half of POU2F3+ cells (Figure 5f). Quantifying the FISH data, on average 48% of FOXI1+ cells exhibited an ionocyte expression pattern (CFTR+/POU2F3−), while 38% of FOXI1+ cells exhibited a tuft-like pattern (CFTR−/POU2F3+) (Figure 5g, Supplementary Figure S6f). Consistent with this, bulk RNA-seq data from the ALI differentiation time course shows that the tuft-like expression signature begins and peaks early in differentiation, whereas signatures of both ionocytes and PNECs appear much later, continuing to increase after much of the epithelium has matured (Day 21) (Figure 5h). Tuft-like cell expression was also most similar to that in basal cells (Supplementary Figure S6g), further supporting the possibility that they could serve as precursor cells. Together, these data suggest that tuft-like cells may be a precursor to ionocytes, and possibly PNECs.
We observed fewer tuft-like cells and more ionocytes in heavy smokers (Supplementary Figure S6h), suggesting that smoking may alter fate choice in favor of ionocytes over tuft-like cells and lending further evidence for a possible lineal relationship among these populations. In ionocytes, we observed 307 genes downregulated in heavy smokers, which included many genes highly specific to this cell type (Supplementary Figure S6ij), suggesting that ionocytes present in smokers, while not decreasing in number, may exhibit compromised function.
Discussion
In this study, we have generated an agnostic atlas of the human in vivo tracheal airway epithelium, identifying and characterizing cell types, cell states, and lineage relationships among them. As such our study expands on the mouse in vivo and human in vitro airway epithelial atlases published recently36,78, and we also provide a much more densely sampled in vitro time course of human airway epithelial differentiation. Our data reveal that during both in vitro differentiation and in vivo homeostasis, ciliated cells derive from a secretory progenitor through multiple, discrete, transcriptional states, regulated by a suite of TFs that include FOXN4, which we identify as a novel regulator of the earliest ciliating state. Similarly, we show that the heterogeneity in secretory cells (club, mucus secretory cells expressing one or both of MUC5B and MUC5AC) is likely all part of a continuous secretory lineage that culminates in a multi-mucin producing mucus secretory cell.
Our atlas also produces the first transcriptional picture of human airway SMG cells, allowing us to identify a human equivalent to the recently described murine myoepithelial stem cell54,55. Our analysis suggests this human counterpart also exhibits stem function, as it silences its muscle expression program to assume both surface basal (KRT5, TP63) and unique glandular expression (SOX9), as well as engage in proliferation. This basal cell state can then differentiate into a mucus secretory cell, as orchestrated by TFs distinct from those involved in surface mucus secretory cell differentiation. The uniqueness of this program produces a vastly different secretory cell, with distinct mucin expression and processing and a specialized repertoire of protein secretions. It remains unclear whether these SMG stem cells can repopulate the surface epithelium in humans as in mice54,55. We confirm that the homeostatic human airway epithelium does contain ionocytes and that they highly express CFTR. However, the large proportion of CFTR expression deriving from other epithelial cell types and our observation of FOXI1/CFTR decoupling, cautions against the simple FOXI1 -> CFTR -> cystic fibrosis model. Lastly, our data suggest an unrecognized lineage relationship between at least tuft cells and ionocytes, if not also PNECs, which may relate to recently reported tuft-like variants of small cell lung cancer, generally thought to be a PNEC-derived tumor76.
Importantly, we use scRNA-seq to deconstruct smoking effects on the epithelium to the cell type level, which we can then reassemble into a comprehensive model of how smoking modifies epithelial function as a whole (Figure 6). To summarize, pan-epithelial effects of smoking reach the basal stem cells and include induction of chemokine signaling and xenobiotic metabolism at the expense of antigen presentation and innate immune signaling. Surface and SMG secretory cells shift their mucin programs toward a MUC5AC-dominated inflammatory state while SMG secretory cells lose many of their distinctive defensive secretions and SMG basal cells upregulate the type 2 inflammatory cytokine, TSLP. Early ciliating cells preferentially lose ciliary function, potentially hindering regeneration of ciliated cells upon injury, and tuft-like cells are being depleted in conjunction with an increase in functionally-impaired ionocytes. Taken together, these data paint a smoker epithelium that has been rendered more functionally monochromatic, carrying out a MUC5AC inflammatory program at the expense of performing its normal defensive, interactive and reparative roles essential to lung health and homeostasis.
a. Functional Gene Network (FGN) based on all genes upregulated with heavy smoking shows how genes that respond to smoking in distinct cell types of the airway epithelium may collaborate in carrying out dysregulated function. Node (i.e. gene) colors in Node Key refer to the cell type in which a gene was differentially expressed if “unique”; nodes for “semi-unique” and “core” DEGs are white and black, respectively. Edges connect genes annotated for the same enriched term. Exemplar enriched functions are given next to each functional metagroup (or category), which are indicated by the underlay colors that encompass all genes annotated only for the terms within the metagroup. Nodes without colored underlay represent genes in multiple metagroups. Other properties of the network, including node size, connecting edge thickness, and label size/redness are defined in the Network key.
b. FGN as in a, but for all genes downregulated with heavy smoking in the airway epithelium. Legend serves for both a and b.
c. Schematic summarizes the smoking response of the whole epithelium.
Author contributions
Conceptualization, K.C.G. and M.A.S.; Methodology, K.C.G., N.D.J., S.P.S. and M.A.S.; Software, N.D.J., S.P.S., N.D. and K.S.L.; Validation, K.C.G., N.D.J., E.K.V. and M.A.S.; Formal Analysis, N.D.J, S.P.S., N.D., E.G.P. and K.S.L.; Investigation, K.C.G., C.L.R., M.T.M., J.L.E. and E.K.V.; Resources, K.C.G., C.L.R., J.L.E., E.K.V. and M.A.S.; Writing – Original Draft, K.C.G., N.D.J. and M.A.S.; Writing – Review & Editing, K.C.G., N.D.J., E.K.V. and M.A.S.; Visualization, K.C.G., N.D.J., S.P.S., N.D., K.S.L., E.G.P., E.K.V. and M.A.S.; Supervision, K.C.G., N.D.J., E.K.V. and M.A.S.; Funding Acquisition, M.A.S.
Declaration of interests
The authors have no competing interests.
Acknowledgements
This work was supported by the National Jewish Health Regenerative Medicine and Genome Editing Program (REGEN) and NIH grants R01 HL135156, R01 MD010443, R01 HL128439, P01HL132821, P01 HL107202. The authors would like to thank Dr. HongWei Chu, Dr. Reem Al Mubarak and Nicole Pavelka in the NJH Live Cell Core, as well as Dr. Carolyn Morris, Dr. Yingchun Li, Ari Stoner and Dr. Meghan Cromie in the Seibold Lab and Dave Heinz, Katrina Diener and Todd Woessner for assistance with tissue processing, sequencing and useful discussion.