The enteric nervous system of the human and mouse colon at a single-cell resolution

1Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA 2Department of Cell Biology, Harvard Medical School, Boston, MA, USA 3Broad Institute of MIT and Harvard, Cambridge, MA, USA 4Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA 5Massachusetts General Hospital, Cambridge, MA, USA 6Department of Medical Oncology, Dana Farber Cancer Institute, Boston, MA, USA 7Department of Surgery, Massachusetts General Hospital, Cambridge, MA, USA 8Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. 9Gastrointestinal Unit and Center for the Study of Inflammatory Bowel Disease, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. 10Harvard Medical School, Boston, MA, USA. 11Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA. 12Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. 13Corresponding authors: orit@broadinstitute.org (O.R.R.), xavier@molbio.mgh.harvard.edu (R.J.X.), aregev@broadinstitute.org (A.R.).


7
To develop snRNA-seq methods that are compatible with a broader range of tissues, including colon, we performed an optimization with nuclei from adult Sox10-Cre; INTACT  Detergent type, detergent concentration, buffer, and mechanical force each impacted quality metrics (Supp. Fig. 1b-d and 2) and we identified two conditions with high ENS recovery and low contamination rates (~20% neurons, 55% glia, 25% contamination across both conditions, Fig. 3b), which also yielded high-quality profiles enriched in the ENS signature score (Fig. 3c).
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Preservation of ribosomes or rough endoplasmic reticulum on the nuclear envelope allows for mature mRNA capture To understand the basis for these performance differences among nuclei preparations, we compared nuclei structure between CST, TST, and published preparations for snRNA-seq (40, 41), using ultrathin-section transmission electron microscopy (TEM) (Materials and Methods; Fig.   3d; Supp. Fig. 2). As expected, the two published methods yielded isolated intact nuclei (Fig. 3d).
In contrast, CST preserved not only the nuclear envelope, but also the ribosomes (42) on the outer nuclear membrane (Fig. 3d); we thus termed this method RAISIN (Ribosomes And Intact SIngle Nucleus) RNA-seq. TST maintained both the rough ER and its attached ribosomes (43) on the outer nuclear membrane (Fig. 3d); we thus termed this method, INNER Cell (INtact Nucleus and Endoplasmic Reticulum from a single Cell) RNA-seq. Consistent with the TEM results, both RAISIN RNA-seq and INNER Cell RNA-seq yielded higher exon:intron ratios than the published methods ( Fig. 3e; 41% and 64% increases, respectively), suggesting greater recovery of mRNA relative to pre-mRNA.
Of the two methods, we opted to use RAISIN RNA-seq to profile the mouse and human ENS, because it captures more neurons and has fewer contaminants than INNER Cell RNA-seq ( Fig.   3b; Supp. Fig. 1b-d). To test whether RAISIN RNA-seq is compatible with massively parallel droplet-based scRNA-seq, we also sequenced 10,889 unsorted RAISINs from the mouse colon (Materials and Methods). We recovered most major cell types in the colon, including epithelial cells, myocytes, fibroblasts, endothelial cells, immune cells, mesothelial cells, neurons, and glia ( Fig. 3f), without any apparent "doublet" clusters (Materials and Methods), indicating that RAISINs correspond to single nuclei rather than to cellular aggregates. Therefore, even though . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019.
Broadly, neurons partitioned into either cholinergic (Chat + ) or nitrergic (Nos1 + ) subsets (Fig. 4f, Ach and NO producing, respectively), except for four subsets expressing both Chat and Nos1 (defined as log2(TP10K+1) > 0.5), which we validated in situ (Supp. Fig. 5a), and one subset that expressed neither marker. Based on expression of known marker genes, we defined putative neurons subsets (Fig. 4d,e; Supp. Table 2 The only major marker that we could not detect was the neuronal enzyme for serotonin synthesis, Tph2 (51,52). We probed for Tph2 in situ in the colon as well as targeted brain regions, which served as positive (raphe nuclei) and negative (pontine reticular nucleus) controls (Supp. Fig. 4a-. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint e), but only observed Tph2 signal in the brain. We considered the possibility that Tph2-expressing enteric neurons are rare (53,54), and examined published bulk RNA-seq data (55), finding Tph2 expression in the brain, but not the colon (Supp. Fig. 4f). Lastly, an independent scRNA-seq study of the small intestine myenteric plexus did not yield serotonergic neurons (56). However, we cannot exclude the possibility that Tph2 is expressed only under different physiological conditions, in other locations, or cannot be captured using current genomic and RNA-FISH tools. One possibility is that serotonergic neurons only populate the small intestines, as conditional Tph1 knock-out mice crossed with a Villin-Cre driver, which lack serotonin production by the mucosa, have detectable serotonin in the duodenum and jejunum; although these regions still had detectable Tph1 mRNA in the conditional knock-out (57).

ENS composition and expression programs vary by region and with circadian oscillations
To systematically assess sources of variation in the ENS, we leveraged the fact that our atlas comprises samples that vary by genetic background, age, sex, circadian phase, and intestinal location, to test how each factor impacts ENS composition (i.e. the relative proportions of neuron subsets) or gene expression within each neuron subset.
The transgenic background had profound effects on neuron composition (Fig. 4f), suggesting distinct developmental origins for some neuron subsets. In particular, two subsets of putative sensory neurons (PSN1 and PSN2) were nearly absent from Sox10-Cre mice (Fig. 4f), suggesting they may arise from distinct lineages (58,59). ENS composition also varied significantly along the length of the colon within each of the Sox10-Cre; INTACT and Uchl1-H2B mCherry lines, with distinct neuron subsets enriched in different regions (Fig. 4f). For example, PSN1 and PSN2 . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint were enriched in the proximal colon (P < 10 -22 and 10 -6 , respectively; Fisher's exact test), whereas distinct subsets of putative motor neurons (PMNs) were enriched in either the proximal or distal colon (Fig. 4f).
We next used a regression framework to identify genes that were differentially expressed (DE) with respect to age, sex, circadian phase, and colon location, in a manner shared across neuron subsets (Materials and Methods). Overall, few DE genes were associated with age or sex beyond genes on the sex chromosomes (Supp. Table 3); however, the circadian clock and colon location had substantial impacts on gene expression of many neuron genes (Supp. Table 3). For example, core clock regulators were among the most DE genes during morning (Arntl) and evening (Per1, Per2, Per3) (Fig. 5a). In the morning, there was also increased expression of cytoskeletonassociated genes (e.g., Tubb3, Prph, Tubb2a, Cfl1), suggesting circadian regulation of structural remodeling (60), and genes involved in neuronal signaling (e.g., Scg2, Pcsk1n, and Slc7a11). In PSN1 and PSN2, we also observed morning upregulation of genes involved in neuro-immune signaling (e.g., Calcb, Il13ra1) ( Fig. 5b; Supp. Table 2) (61,62). In the evening, several TFs were upregulated relative to morning, including Nr1d2, Tef, Rfx2, and Dbp ( Fig. 5a; Supp. Table 2), many of which are known circadian regulators (63).
In addition, there were significant changes in gene expression across colon regions ( Fig. 5c; Supp.  Fig. 4f). Most notably, neurons in the distal mouse colon had higher expression of several neurotransmitter receptors, including serotonin receptors (Htr3a, Htr3b), glutamate receptors . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint (Gria3, Grid1), acetylcholine receptors (Chrna7, Chrm1), and potassium and sodium channels (Kcnq5, Scn5a), suggesting electrophysiological differences along the ENS.

Motor neuron expression profiles suggest that mechanosensation drives the peristaltic reflex
The myenteric plexus is a major functional unit of the ENS, moving luminal contents along the intestine through coordinated muscle contraction and relaxation (64). The canonical model of the peristaltic reflex (Fig. 5d) (65)(66)(67) begins with the release of serotonin (5HT) by enterochromaffin cells in the mucosa, which acts on sensory neurons. Interneurons then relay this signal to ascending and descending motor neurons, which elicit muscle contraction and relaxation, respectively. This model is based on associations between 5HT stimulation and muscle contraction (initially described (68,69), but was subsequently questioned because (1) removal of the mucosa does not ablate contraction (70)(71)(72)(73)(74), (2) the timing of serotonin release does not precede contraction (75), and (3) deletion of Tph1 (mucosal serotonin synthetic enzyme) has no effect on GI transit in vivo (76); although the knock-out may have been incomplete (77). We therefore hypothesized that the molecular signatures of neuron subsets could help build and test models of peristalsis.
The transcriptional profiles of putative motor neurons suggest possible revisions to the peristaltic model, with a potential role for direct mechanosensation of gut distention by motor neurons in driving peristaltic reflexes. First, nearly all putative motor neurons express the mechanosensitive ion channel, Piezo1 (Fig. 5e, PEMNs and PIMNs; confirmed in situ, Supp. Fig. 5b), suggesting they have the capacity to directly sense distention. Previous studies reported Piezo1 expression in the ENS (78, 79), which our atlas allows us to map to specific subsets. This refinement raises the . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint hypothesis that peristalsis is at least partially driven by distention, specifically via motor neuron depolarization through Piezo1. Moreover, 5HT receptor 4 (Htr4) expression on enteric neurons is well-documented and its expression has been attributed to sensory or secretomotor neurons (80,81); also possibly expressed more broadly (82). We find Htr4 expression in two putative sensory neuron (PSN) subsets and all putative excitatory motor neuron (PEMN) subsets (Fig. 5e); we confirmed this in situ in Chat + neurons of the myenteric plexus (Supp. Fig. 5c). Because Htr4 activation facilitates acetylcholine release in enteric neurons (83) we hypothesize a model where distention evokes motor neuron activity (through Piezo1) and mucosal serotonin enhances this distention-evoked stimulation of muscle contraction. This model was previously proposed for sensory neurons (83) and our atlas has allowed its refinement to excitatory motor neurons.

Sensory neurons express key regulators of ILC responses and tissue homeostasis
We identified four subsets of putative sensory neurons (PSNs) by expression of calcitonin generelated peptide (CGRP), a marker of sensory neurons expressed in two forms (Calca, Calcb), which is involved in feeding, pain sensation, hormone secretion, and inflammation (84). While all four subsets express Calcb, only PSN3 expresses Calca at significant (but low) levels ( Fig. 4f; Supp.   Fig. 6a), which we confirmed in situ ( Fig. 5f; Supp. Fig. 5d). The CGRP receptor (Calcrl) and one of its three co-receptors (Ramp1) are expressed in all neurons, except putative secretomotor neurons (Supp. Fig. 6a).
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint We inferred the likely target cells for each PSN subset based on the signaling molecules and receptors that they express ( Fig. 4f; Supp. Fig. 6a,b; Supp. Table 3). For example, most sensory neuron subsets express receptors for glucagon (Gcgr), glucagon-like peptide 1 (Glp1r), and galanin (Galr) (Fig. 4f; Supp. Fig. 6a), peptides that are produced by enteroendocrine cells with roles in hunger and satiety (85). One subset, PSN3, co-expresses Cck and Vip (Fig. 5e), markers of intestinofugal neurons that innervate the prevertebral ganglia (86), thus supporting connections to the sympathetic nervous system. This subset also uniquely expresses brain-derived neurotrophic factor (Bdnf, Supp. Fig. 6b), which is elevated in patients with irritable bowel syndrome (IBS), where it is correlated to abdominal pain (87), and Piezo2 (Fig. 5e), a mechanosensitive ion channel, which may help detect and regulate smooth muscle tone (88); confirmed in situ; Supp. Fig. 5e).
Another Calcb + subset, PSN4, uniquely expresses somatostatin (Sst; Fig. 4f; Supp. Fig. 6b; validated in situ, Supp. Fig. 5f), previously attributed to interneurons (89); the role of SST in the GI tract is poorly understood, but has been broadly linked to regulating most GI functions, including motility, secretion, absorption and the sensation of visceral pain (90). Localization of Sst expression to a single neuron subset will help dissect its function in the ENS.
One sensory neuron subset, PSN1, uniquely expresses Noggin (Nog) and Neuromedin U (Nmu) ( Fig. 4f; Supp. Fig. 6b; validated in situ Fig. 5g,h), and is rarely marked by the Sox10-Cre driver and immune cells (92), respectively. In particular, Noggin is a BMP antagonist that is necessary for maintaining the intestinal stem cell niche, but whose cellular source is unknown. Noggin expression by sensory neurons raises the hypothesis that these neurons could help regulate the positioning or differentiation of intestinal stem cells. Furthermore, the neuropeptide NMU . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available  Fig. 6a), and Il-7 (Supp. Fig. 6b), a major regulator of ILC differentiation and survival (94). Lastly, both PSN1 and PSN2 cells express gastrin-releasing peptide (Grp, Fig. 4f), which in the lung is produced by neuroendocrine cells and contributes to the response to tissue injury (95).

Secretomotor neurons may integrate epithelial and immune signals
Secretomotor/vasodilator neurons (SVNs) integrate signals from the mucosa and sympathetic ganglia to regulate fluid movement between the body and the lumen. We identified two subsets of putative secretomotor/vasodilator neurons (PSVNs) corresponding to non-cholinergic (PSVN1) and cholinergic (PSVN2) subtypes (96) (Fig. 4d,f) . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Profiling the human muscularis propria using RAISIN RNA-seq Next, we profiled human colon enteric neurons. Unlike genetic mouse models, we could not enrich for nuclei from human enteric neurons, and thus opted to profile the muscularis propria (MP), which has a higher proportion of neurons than the submucosa or mucosa. We isolated and profiled nuclei from cancer-adjacent normal colon segments from colorectal cancer resections from both genders (5 male, 5 female) and a range of ages (35 -90) (Supp. Table 4). Based on our mouse data (Supp. Fig. 3c), we conservatively estimated a 0.5% capture rate for neuron nuclei, such that in order to capture 500 human neurons, we would need to profile at least 100,000 unsorted nuclei.
Profiling 134,835 human RAISINs from the muscularis propria recovered transcriptomes from neurons, resident adipocytes, endothelial cells (lymphatic, vascular), fibroblasts, glia, immune cells (macrophages, mast cells, lymphoid cells), interstitial cells of Cajal (ICCs), myocytes, and pericytes ( Fig. 6a), each annotated by expression of marker genes ( Fig. 6a; Supp. Table 4). Some subsets were enriched in specific patients (Supp. Fig. 7a-f), which may be due to differences in sampled locations, variable cellular states or variation in the sampling of rare cells. Additionally, human RAISIN RNA-seq data contained more background contamination than either mouse RAISIN SMART-Seq2 or droplet data, possibly due to delayed tissue freezing time following resection.

Human enteric neurons cluster into 11 subsets with distinct transcriptional programs
The 134,835 RAISINs include 831 human enteric neurons (0.6%), which clustered into 11 subsets ( Fig. 6b) after correcting for putative differences in cell quality (Supp. not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. Methods). The neuron recovery rate in humans slightly exceeded our original estimate, likely because the muscularis propria is somewhat enriched for neurons relative to the rest of the colon.
Although we detect many hallmark neurotransmitters, CHAT was lowly expressed (Supp. Fig.   8a), either due to actual low expression in human cells, reduced levels in the nucleus, an alternative 3' end compared to the current annotation (102), or cancer-adjacent effects. We do detect SLC5A7 (Supp. Fig. 8a), a transporter that mediates choline uptake into cells, and is known to be expressed with CHAT in cholinergic neurons (103), and is co-expressed with CHAT in human neurons in our data (Pearson's r=0.60); likewise, Slc5a7 is co-expressed with Chat in mouse neurons. We therefore used SLC5A7 as a surrogate marker for CHAT in human neurons. Interestingly, we observed broad, albeit low, levels of expression of tryptophan hydroxylase 2 (TPH2; required for serotonin biosynthesis) across almost all human neuron subsets (Supp. Fig. 8b), but not in mouse neurons (above, Supp. Fig. 4), suggesting differences in serotonergic signaling between the two species.
Human ENS contains sensory, motor, interneuron, and secretomotor/vasodilator subsets that share core transcriptional programs with mouse We used a classification-based approach (Materials and Methods) to map the 11 subsets of human neurons onto the 24 mouse subsets (Fig. 6c), leveraging the larger number of cells and deeper sequencing data in mouse to annotate the human cells. The neurons partitioned into 2 PEMN subsets, 5 PIMN subsets, 1 PSN subset, 2 PIN subsets, and 2 PSVN subsets (Fig. 6b), annotated with known markers (Supp. Fig. 8a,b). Despite representing distinct regions of the colon (i.e. full colon vs. muscularis propria), both species contained similar neuron compositions, . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint with excitatory and inhibitory motor neurons being the most abundant classes (Fig. 6c). However, sensory neurons were more abundant and more diverse in mouse: humans contained only one sensory subset, whereas mice contained four. This may be due to removal of the human submucosa, although we cannot entirely rule out the possibility that the different number of profiled neurons may contribute to this difference as well. Furthermore, while the fraction of secretomotor/vasodilator neurons was similar across both species, the human muscularis propria lacked the cholinergic subtype, as assessed by CHAT and SLC5A7 expression, whereas mice contained both cholinergic and non-cholinergic subsets.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Human Interstitial Cells of Cajal (ICCs) may underlie smooth muscle relaxation
Our reference map of the human muscularis propria includes 431 KIT + ANO1 + ICCs (Fig. 6a), which are regarded as pacemaker cells that rhythmically alter the excitability of smooth muscle tissue (105,106). Two major models have been proposed for ICC function (107): either (1) neurons signal directly to smooth muscle, with an indirect role for ICCs (e.g., to generate motor patterns), or (2) neurons signal to ICCs, which then relay signals to smooth muscle to coordinate peristalsis.
To help distinguish between these possibilities, we defined a gene signature for ICCs (Fig. 6e) and mapped known ligand-receptor pairs onto neurons, ICCs, and smooth muscle cells (Materials and Methods). Although motor activity requires both excitatory (i.e. cholinergic) and inhibitory (i.e. nitrergic) signals to elicit contraction and relaxation, respectively, smooth muscle cells only expressed the receptors for acetylcholine (Fig. 7a). In contrast, the receptor for nitric oxide was expressed by ICCs (Fig. 7a), which we validated in situ (Fig. 7b). As a positive control, we note that nitric oxide receptors are detected in pericytes and neurons ( Fig. 7a) (108). These results suggest a revised model of smooth muscle function, where enteric neurons directly activate smooth muscle contraction, but elicit smooth muscle relaxation indirectly via ICCs (Fig. 7c). Consistent with this hypothesis, smooth muscle-specific knockout of the β1 subunit of the nitric oxide receptor only partially reduces relaxation, whereas its global knockout nearly abolishes relaxation (109).

Enteric neurons can interact with diverse stromal and immune cells in the colon
To systematically examine interactions between the enteric nervous system and other cell types in the human colon, we analyzed profiles from the 134,835 RAISINs from the muscularis propria (above) together with scRNA-Seq profiles from 115,517 cells from the colon mucosa (i.e. . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint epithelium and lamina propria) (110). In total, these data span a wide range of cell types in the human colon, including 16 epithelial subsets, 26 immune subsets (myeloid and lymphoid), 7 endothelial subsets, 9 fibroblast subsets, myocytes, ICCs, resident adipocytes, 2 glia subsets (muscularis propria and lamina propria), and 11 neuron subsets. We mapped thousands of receptorligand pairs onto this dataset and identified pairs of cell subsets expressing a significantly greater number of cognate receptor-ligand pairs than is expected under a null model ( Fig. 7d; Materials and Methods).
Broadly, neurons were enriched for putative interactions with other cells from the muscularis propria rather than from the mucosa, suggesting the recovery of local interactions. This approach highlighted known interactions between excitatory motor neurons and smooth muscle (111), secretomotor/vasodilator neurons and both epithelial cells (i.e. tuft and enteroendocrine) and lymphatics (112), and glia and multiple subsets of neurons (Fig. 7d).
More unexpectedly, we found statistically enriched interactions between neurons and diverse stromal cells, most notably resident adipocytes and fibroblasts (Fig. 7d,e), the two largest producers of neurotrophic growth factor (NGF) outside of the ENS in our data (Supp. Table 4).
Even if cell subsets are not enriched for interactions, they may still interact through a more limited, but functionally important, receptor-ligand repertoire. Given recent reports describing neuroimmune crosstalk (115), we searched for specific examples of interactions between neurons and immune cells (Fig. 7e). We identified potential neuron signaling to (1)

Human enteric neurons express risk genes for enteric neuropathies, intestinal inflammatory disorders, and extra-intestinal disorders with GI dysmotility
To interrogate potential contributions of the ENS to human diseases, we examined whether enteric neurons expressed any genes associated with diseases with varying degrees of known ENS involvement. These ranged from Hirschsprung's disease (HSCR), a primary enteroneuropathy that directly affects the ENS, to autism spectrum disorder (ASD) and Parkinson's disease (PD), which are extra-intestinal CNS disorders that are associated with dysfunctions in gut motility that occur early in disease progression (118)(119)(120). In addition, because the ENS is thought to play a pivotal role in inflammation -for example, through the activation of ILCs (121) -we also examined whether IBD-associated genes are expressed by enteric neurons.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Mapping a curated list of 185 disease-associated genes (Materials and Methods) onto cell subsets from the muscularis propria, lamina propria, and epithelium (as above), we identified many genes that were specifically enriched in enteric neurons (Fig. 8a). For example, even though it is a neurodevelopmental disorder, most HSCR-associated genes were expressed in adult enteric neurons, including RET, PHOX2B, GFRA1, ZEB2, and ECE1 (Fig. 8a). The two exceptions, EDN3 and EDNRB, mediate endothelin signaling in the embryonic neural crest (122). Although most IBD risk genes are expressed in epithelial and immune cells, a subset of genes were most highly expressed in neurons, including GRP, BTBD8, KSR1, NDFIP1, and REV3L (Fig. 8b). In particular, GRP products stimulate GI hormone release, muscle contraction, and epithelial cell proliferation (123). Another such gene, REV3L, is also perturbed in the craniofacial neurologic disorder Möbius syndrome (124). Indeed, increased expression of many neuropeptides (e.g., tachykinin and galanin) has been reported in IBD patients (125).
The risk genes for CNS diseases with concomitant GI dysfunction were predominantly expressed in enteric neurons, with some notable exceptions in ASD and PD (e.g., P2RX5 and IL1R2 in B cells and epithelial cells, respectively) ( Fig. 8c). CNS disease risk genes that mapped specifically to enteric neurons include ANK2, DSCAM, and NRXN1 for ASD, and DLG2, SCNA and SCN3A for PD (Fig. 8c). Expression of these risk genes specifically by enteric neurons, compared with a colon reference map, motivate further investigation of the role that enteric neurons play in the development and progression of dysmotility in intra-and extra-intestinal disorders; it also suggests that studying the ENS, in the far more accessible GI, can provide a window for investigation of GWAS genes of some CNS disorders.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Discussion
Here, we constructed reference maps of the colon ENS of adult mice and humans at single cell resolution, revealing the broad potential of neurons to orchestrate tissue homeostasis. Isolating individual enteric neurons from adult animals for transcriptional profiling has not been possible due to technical limitations, and recent efforts using whole-cell dissociations have been limited to embryonic or post-natal animals (126,127). Our development of RAISIN and INNER Cell RNAseq, which preserve ribosome-attached RNA on intact nuclei, allowed us to profile 2,447 mouse and 831 human enteric neurons, along with other diverse cell types from both species (e.g., epithelial, stromal, and immune cells). These methods can be applied to both fresh and frozen tissue specimens, opening the way to characterizing the ENS and a range of archived frozen tissue samples. Additionally, preservation of the ER on nuclei may allow for the enrichment of nuclei with antibodies targeting specific membrane proteins, which are synthesized in the ER.
We identified all major classes of enteric neurons, spanning 24 mouse subsets and 11 human subsets, including motor, sensory, secretomotor/vasodilator and interneuron types. Mining their expression signatures allowed us to infer signaling among neurons and between neurons and nonneuronal cells, such as resident adipocytes, ICCs, immune cells, and epithelial cells. We show circadian regulation of the ENS, including core clock genes, motivating further investigation into temporal variation of ENS function, nutrient absorption, and metabolism (128). We also show differences in neuron composition across the mouse colon (e.g., sensory neurons enriched in the proximal colon) suggesting that ENS function varies along the length of the GI tract. Comparison of mouse and human neurons allowed us to derive core transcriptional signatures for subsets across species, highlighting biological processes that can be modeled in mouse; for example, sensory . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint neurons in both species express Noggin, a gene known to support the epithelial stem cell niche (129). Taken together, these data enable the generation of testable hypotheses and experimental dissection of ENS function.
Finally, given the extensive potential for neuro-immune signaling we observe in the mouse and human ENS, we propose that neuronal dysfunction can lead to immune dysregulation, which can exacerbate inflammation and related pathologies. For example, several IBD risk genes are expressed in neurons, raising the need to further characterize the role of enteric neurons in intestinal inflammation. Intriguingly, dozens of risk genes for early-life and late-onset CNS disorders with concomitant gut dysmotility are highly expressed by enteric neurons suggesting a mechanism for gut motility dysfunction in these diseases, and that profiling the much more accessible ENS may allow us to study human disease biology. Furthermore, recent associations between the gut microbiota and extra-intestinal diseases, such as autoimmune disorders (reviewed in (130)) and cancers and cancer therapies (reviewed in (131)), suggest that immune modulation in the gut can have systemic effects. Proper immune function is thought to be necessary for CNS maintenance and repair, with immune dysregulation contributing to neurodegenerative disease (reviewed in (132). Thus, the ENS may be a central nexus linking the gut, the immune system and the brain, and neurological dysfunction in the gut may exacerbate diseases of the CNS.    conditional tdTomato alleles were used to evaluate concordance of genetically labeled cells and TUBB3 immunofluorescence.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint         . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Human donors and tissue samples
All colon resection samples were obtained from colon cancer patients after informed consent at either the Dana Farber Cancer Institute, Boston (IRB 03-189; ORSP 3490) or Massachusetts General Hospital, Boston (IRB 02-240; ORSP 1702). Normal colon located proximal to tumor was placed into conical tubes containing Roswell Park Memorial Institute (RPMI) media supplemented with 2% human serum and placed on ice for transport to the Broad Institute, Boston. Upon arrival, the muscularis propria was dissected from the remainder of the tissue (e.g., submucosa), divided into pieces (approximately 20-120 mg), which were placed into cryo-vials, frozen on dry-ice and stored at -80°C. When possible, a portion of the tissue was fixed overnight in 4% paraformaldehyde at 4°C for histology. not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

Tissue collection for snRNA-seq
For snRNA-seq optimization, tissue was collected from 11-14 week animals. For the ENS atlas, tissue was collected from 11-14 week old and 50-52 week old mice at either 7-8am or 7-8pm. Each colon was isolated and rinsed in ice cold PBS. Next, the colon was opened longitudinally and separated into four equally-sized sections, which were frozen in a 1.5 mL tube on dry ice. For brain collection, the brain was removed, quartered and frozen in a 1.5 mL tube on dry ice. Frozen tissue was stored at −80°C until subsequent tissue processing.
Tissue collection and preparation for RNA fluorescence in situ hybridization and immunohistochemistry . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint For RNA fluorescence in situ hybridization (RNA FISH) and Immunohistochemistry (IHC), isolated colon was cut into four sections of equal size and processed as described (140). Briefly, tissue was fixed in 4% paraformaldehyde overnight at 4°C. Then, tissue was sequentially passed through PBS containing 7.5%, 15% and 30% (w/v) sucrose at 4°C. Tissue was then embedded in

Immunofluorescence (IF)
Slides with tissue sections were washed three times in PBS for 10 minutes, blocked 1 hour in CAS-Block Histochemical Reagent (00-8120, Thermo Fisher Scientific), incubated with primary antibodies overnight at 4°C, washed three times in PBS for 10 minutes, and then incubated with secondary antibodies at for 1 hour at room temperature. Slides were then washed twice in PBS for 10 minutes and then for 10 minutes with a PBS containing DAPI (D9542, Sigma-Aldrich). Lastly, slides were mounted using Southern Biotech Fluoromount-G (010001, VWR) and sealed.

Combined smFISH and IF
smFISH was performed as described above, with the following changes. After the final HRP-Block step, tissue sections were incubated with primary antibodies overnight at 4°C, washed in TBST for 5 minutes, twice, and then incubated with secondary antibodies for 30 min at room temperature. Slides were then washed in TBST for 5 minutes, twice, followed by a 10 minutes wash with containing DAPI (Sigma-Aldrich) before mounting with Southern Biotech Fluoromount-G (VWR) and sealed.

Nuclei Extractions
The following nuclei extractions were performed from either mouse colon or brain and subsequently processed for profiling: . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Dounce homogenization: Nuclei were extracted using either dounce homogenization followed by sucrose gradient centrifugation as described (134), or using the Nuclei EZ Prep (NUC101-1KT, Sigma-Aldrich) as described (22), with the following modifications. The tissues were dounce homogenized with a 7 mL Dounce Tissue Grinder (VWR 22877-280) (20 times pestle A, 20 times pestle B) and buffer volumes were increased to 5 mL for homogenization.  Table 1).
Chopping extraction: Fresh-frozen tissues were disaggregated in 1 mL of custom nuclear extraction buffer (see Supp. Table 1 for all combinations used) with mild chopping by Tungsten Carbide Straight 11.5 cm Fine Scissors (14558-11, Fine Science Tools, Foster City, CA) for 10 minutes on ice. Large debris were removed with a 40 µm strainer (Falcon). An additional 1 mL of buffer was used to wash the filter before proceeding to fluorescence-activated cell sorting (FACS).
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint For droplet-based RNA-Seq, nuclei were isolated as described above, but with the addition of 3 ml of ST (Salts and Tris; Supp. Table 1) to extracted nuclei. Nuclei were then pelleted at 500g for 5 mins at 4°C. Supernatant was discarded and the nuclei pellet was resuspended in 100-500 µL of ST buffer (Salts and Tris; Supp. Table 1) before filtering through a 40 µm strainer-capped round bottom tube (Falcon).

Fluorescence-activated cell sorting (FACS)
Prior to sorting, isolated nuclei and RAISINs were stained with Vybrant DyeCycle Ruby Stain (V-10309, Thermo Fisher Scientific). Sorting was performed on a MoFlo Astrios EQ Cell Sorter The 96 well plate was sealed tightly with a Microseal F and centrifuged at 800g for 3 minutes before being frozen on dry ice. Frozen plates were stored at −80°C until whole-transcriptome amplification, library construction, sequencing, and processing.

Whole-transcriptome amplification, library construction, sequencing, and processing
Libraries from isolated single nuclei and RAISINs were generated using SMART-seq2 as described (147), with the following modifications. RNA from individual wells was first purified with Agencourt RNAClean XP beads (A63987, Beckman Coulter) prior to oligo-dT primed reverse transcription with Maxima reverse transcriptase (EP0753, Thermo Fisher Scientific) and locked TSO oligonucleotide, which was followed by 21 cycles of PCR amplification using KAPA HiFi HotStart ReadyMix (NC0295239, KAPA Biosystems). cDNA was purified twice using . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Agencourt AMPure XP beads (A63881, Beckman Coulter) as described (148). The Nextera XT Library Prep kit (FC-131-1096, Illumina, San Diego, CA) with custom barcode adapters (sequences available upon request) was used for library preparation. Libraries from 384 wells (nuclei/RAISINs) with unique barcodes were combined and sequenced using a NextSeq 500 sequencer (FC-404-2005, Illumina).

Droplet-based RAISIN RNA-seq
Single RAISINs were processed through the GemCode Single Cell Platform using the GemCode Gel Bead kit (v2 chemistry), Chip and Library Kits (10X Genomics, Pleasanton, CA), following the manufacturer's protocol. RAISINs were resuspended in ST buffer (Salt and Tris; Supp. Table   1). An input of 7,000 RAISINs was added to each channel of a chip. The RAISINs were then partitioned into Gel Beads in Emulsion (GEMs) in the GemCode instrument, where lysis and barcoded reverse transcription of RNA occurred, followed by amplification, shearing and 5′ adaptor and sample index attachment. Libraries were sequenced on an Illumina NextSeq 500.

Transmission electron microscopy (TEM)
Extracted nuclei and RAISINs were pelleted and fixed at 4°C overnight in 2.5% Glutaraldehyde and 2% Paraformaldehyde in 0.1 M sodium cacodylate buffer (pH 7.4). The pellet was then washed in 0.1M cacodylate buffer, and post-fixed with 1% Osmiumtetroxide (OsO4) and 1.5% Potassiumferrocyanide (KFeCN6) for 1 hour. Next, the pellet was washed in water 3 times and incubated in 1% aqueous uranyl acetate for 1 hour followed by 2 washes in water and subsequent dehydration in grades of alcohol (10 minutes each; 50%, 70%, 90%, 100%, and 100%). The pellet was then put in propyleneoxide for 1 hour and infiltrated overnight in a 1:1 mixture of . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available Ultrathin sections (about 60nm) were cut on a Reichert Ultracut-S microtome, picked up on to copper grids stained with lead citrate and examined in a JEOL 1200EX Transmission electron microscope and images were recorded with an AMT 2k CCD camera.

Processing FASTQ reads into gene expression matrices
For SMART-seq2, FASTQ files were demultiplexed and aligned to a reference transcriptome (see "Mouse and human reference transcriptomes"), and transcripts were quantified using RSEM, as previously described (149). For droplet-based scRNA-Seq, Cell Ranger v2.0 was used to demultiplex the FASTQ reads, align them to a reference transcriptome, and extract their "cell" and "UMI" barcodes. The output of each pipeline is a digital gene expression (DGE) matrix for each sample, which records the number of transcripts or UMIs for each gene that are associated with each cell barcode. DGE matrices were filtered to remove low quality cells, defined as cells with fewer than 500 detected genes. This cutoff was set to remove contaminating cells, while retaining neurons and glia, which typically have high numbers of detected genes. To account for differences in sequencing depth across cells, DGE counts were normalized by the total number of transcripts or UMIs per cell and converted to transcripts-per-10,000 (henceforth "TP10K").

Mouse and human reference transcriptomes
For the optimization of nuclei extraction conditions, reads were aligned to the mm10 reference transcriptome. However, for the mouse and human ENS atlases, we augmented the reference . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint transcriptomes with introns, thus allowing pre-mRNAs to be mapped along with mature mRNAs. Both the mm10 and hg19 reference transcriptomes were modified according to the instructions provided by the 10X Genomics website (https://support.10xgenomics.com/single-cell-geneexpression/software/pipelines/latest/advanced/references). Briefly, we converted the standard GTF files into pre-mRNA GTF files by changing all "transcript" feature tags to "exon" feature tags. Using these modified GTF files, we then constructed Cell Ranger compatible references using the Cell Ranger "mkref" command. These modified GTF files were used for both the Cell Ranger pipeline and for our SMART-seq2 data (i.e. mouse ENS atlas).

Cell clustering overview
To cluster single cells into distinct cell subsets, we followed the general procedure we have previously outlined in (150) with additional modifications. This workflow includes the following steps: the selection of variable genes, batch correction, dimensionality reduction by PCA, and clustering. In all cases, clustering was performed twice: first, to separate neurons and glia from other cells, and then, to sub-cluster the neurons and glia to obtain high-resolution clusters within each group.

Partitioning cells into neuron, glia, and "other" compartments
Cells were partitioned into neuron, glia, and non-ENS compartments based on their expression of known marker genes (see "Gene signatures"). Signature scores were calculated as the mean log2(TP10K+1) across all genes in the signature. Each cluster was assigned to the compartment of its maximal score and all cluster assignments were inspected to ensure the accurate segregation of cells. Neurons and glia were then assembled into two separate DGE matrices for further analysis.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Variable gene selection
To identify variable genes within a sample, we first calculated the mean (µ) and the coefficient of variation (CV) of expression of each gene. Genes were then grouped into 20 equal-frequency bins (ventiles) according to their mean expression levels. LOESS regression was used to fit the relationship, log(CV) ~ log(µ), and the 1,500 genes with the highest residuals were evenly sampled across these expression bins. To extend this approach to multiple samples, we performed variable gene selection separately for each sample to prevent "batch" differences between samples from unduly impacting the variable gene set. A consensus list of 1,500 variable genes was then formed by selecting the genes with the greatest recovery rates across samples, with ties broken by random sampling. This consensus gene set was then pruned through the removal of all ribosomal, mitochondrial, immunoglobulin, and HLA genes, which were found to induce unwanted batch effects in some samples in downstream clustering steps.

Batch correction
We observed substantial variability between cells that had been obtained from different mice or different individuals, which likely reflects a combination of technical and biological differences.
In some cases, these "batch effects" led to cells clustering first by mouse or individual, rather than by cell type or cell state. To control for these batch differences, we ran ComBat (151) with default parameters on the log2(TP10K+1) expression matrix, allowing cells to be clustered by cell type or cell state. Importantly, these batch-corrected data were only used for the PCA and other steps relying on PCA (e.g. clustering, t-SNE visualization); all other analyses (e.g. differential expression analysis) were based on the original expression data. Note that we tested two additional methods for batch correction -one based on Canonical Correlation Analysis (152) and another on . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint a k-nearest neighbors (k-NN) approach (153) -but did not obtain any enhancement in performance (data not shown).

Dimensionality reduction, graph clustering, and t-SNE visualization
Cells were clustered at two stages of the analysis: first, to initially partition the cells into neuron, glia, and "other" compartments, and second, to sub-cluster neurons and glia into different subsets.
In all cases, we ran low-rank PCA on the variable genes of the batch-corrected log2(TP10K+1) expression matrix. We then applied Phenograph (154) to the k-NN graph defined using the first n PCs and k nearest neighbors, which were separately estimated for each dataset. First, to estimate n, we calculated the number of "significant" PCs using a permutation test. Because this test may underestimate the number of PCs, we conservatively increased this number (i.e. to 15 or 30; see table below) to ensure that most of the variability in the dataset was captured. Next, to estimate k, we considered a range of clustering solutions with varying values of k, and calculated the marker genes for each set of clusters. We selected k based on inspection of the data. When clustering data from multiple cell types, we tried to select k such that the major cell types (e.g. neurons, glia, and muscle) were split, without fragmenting them into several sub-clusters. When clustering neurons and glia, we tried to select a k yielding the highest granularity clusters that were still biologically distinct, determined by close examination of the marker gene lists. Finally, the Barnes-Hut t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm was run on the selected PCs with perplexity = 20 and for 10,000 iterations to produce two-dimensional embeddings of the data for visualization.

Dataset Cell type # Sig PCs Used PCs k-NN
Optimization All cells 13 1 to 15 250 (separates neurons and glia) . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available

Clustering of human neurons
Initial clustering of the 831 human neurons revealed 15 subsets (Supp. Fig. 7h). However, in several cases, we noticed that a single neuron type had been split into two clusters based on the expression of oxidative phosphorylation genes, which were strongly enriched in PC1 (Supp. Fig.   7i,j). This could reflect differences in differentiating vs. mature neurons (155), cancer-proximal effects, or a rapid transcriptional response to tissue resection or handling. We therefore re-clustered the cells based on the other PCs (i.e. PCs 2 to 30), yielding 11 final subsets of human neurons (Supp. Fig. 7c,g).

Scoring nuclei extraction conditions
To identify optimal conditions for snRNA-seq of the ENS, we performed nuclei extractions while  Table 1). In total, 104 different extraction conditions . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint were examined. For each extraction, we profiled single nuclei transcriptomes by SMART-Seq2 and clustered the resulting RNA into neurons, glia, and "other" (i.e. non-ENS or low quality) clusters (see "Cell clustering overview"). To compare extractions, we calculated several quality metrics for each condition: (1) the proportion of recovered neurons, glia, and "other" cells, (2) the mean number of detected genes per cell, and (3) the mean ENS signature score (derived from markers of neurons and glia; see "Cell type signatures"). Conditions that yielded high-quality nuclei enriched in the ENS signature score were then identified.

Cell lineage dendrogram
As an auxiliary tool, cell subsets were organized on a dendrogram according to their transcriptional similarities (Fig. 2b, top). To construct this tree, we performed complete linkage clustering on the distance matrix corresponding to the mean transcriptional distances among all cell subsets, calculated using the variable genes from the log2(TP10K+1) expression matrix. These calculations were performed using the "hclust" and "dist" functions in R with default parameters.

Enteric neuron annotation and classification
We employed the following markers and considerations in annotating enteric neurons post hoc.

Broad segmentation of the mouse ENS
Broadly, neurons segmented into two major divisions comprising either cholinergic or nitrergic subsets. This broad division was correlated with several other genes. For example, the glial cell line-derived neurotrophic factor (GDNF) family receptors α1 (Gfra1) and α2 (Gfra2) segregate Nos1 and Chat expressing neurons, respectively. Gfra1/2 are co-receptors for the GDNF receptor, . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Ret, which is necessary for ENS formation (156,157). Similar, Chat and Nos1 expressing subsets also differentially expressed the transcription factors (TFs), Casz1 and Etv1.

Annotating mouse inhibitory motor neurons
We annotated 7 subsets of putative inhibitory motor neurons (PIMNs) ,which have high Nos1 and Vip co-expression (161,162), and occupy one subtree of the dendrogram (Fig. 2b). In total, 73% of Vip-positive neurons co-express Nos1, which is consistent with the previously reported estimate of 75% (163,164).
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint Some of these subsets (3,5,6) are at least partly matched as discrete clusters in our data, whereas others (1,2,4) are not clearly observed in our atlas. PIMN7 is a potential candidate for the descending Vip + Chat + Nos1 + INs with ATP signaling (3 above), based on co-expression of Vip, Chat, Nos1, and various ATP transporters (e.g. SLC28a1, Slc28a2, Slc28a3, Slc29a1, Slc29a2, Slc29a3, Slc29a4; (169). PSN3 also express these genes, but their expression of Cck, Calca, and Calcb makes it unlikely they are interneurons. Three subsets of Chat + Penk + putative INs (PIN1-3) may reflect either descending Penk + INs (5 above; responsive to Sst), or ascending Chat + Penk + INs with ATP signaling (6 above). Because all express combinations of Sst receptors, they may be descending INs. However, given the substantial number of additional receptors expressed by all of these PINs (for 5HT, VIP, GAL, GLP, prolactin, prostaglandin E2, EGF and BMP) or some of them (e.g., catecholamine synthetic enzymes), they may not be INs. Finally, there was little to no evidence for other IN subtypes: we did not detect any serotonergic (5HT) neurons (1 above) in our sampling, consistent with previous observations (170); found no discernible cluster of Nos1 + Vip + Grp + Chatcells; and the only Chat + Sst + neurons we observed were the Calcb + PSN4 subset, which we interpret as a sensory neuron, not INs.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Annotating human interneuron subtype 2
Human PIN2s express two specific markers of mouse sensory neurons, CALCB and GRP, suggesting they may be misannotated sensory neurons. Another possibility is that PIN2s correspond to multiple neuron subtypes, which cannot be resolved with the number of neurons we profiled. Consistent with this possibility, PENK and CALCB expression are mutually exclusive within this subset (3 of 34 co-positive cells; expected = 7.24; Fisher test, p < 0.001).

Differential expression analysis
Differential expression (DE) tests were performed using MAST (173), which fits a hurdle model to the expression of each gene, consisting of logistic regression for the zero process (i.e. whether the gene is expressed) and linear regression for the continuous process (i.e. the expression level).

Acquisition and scoring of gene signatures
We compiled the following lists of marker genes for enteric neurons and glia from the literature (174). These gene signatures were then combined to construct an overall "ENS" signature score ( Fig. 1c; Supp. Fig. 2).
To prevent highly expressed genes from dominating a gene signature score, we scaled each gene vector of the log2(TP10K+1) expression matrix by its root mean squared expression across all cells (using the 'scale' function in R with center = FALSE). The signature score for each cell was then computed as the mean scaled expression across all genes in the signature.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Estimation of false discovery rate
Unless otherwise specified, false discovery rates were estimated with the Benjamini-Hochberg correction (175), using the "p.adjust" R function with the "fdr" method.

Matching human and mouse subsets
To map human neurons onto their mouse counterparts, we first trained a Random Forest classifier to distinguish each of 24 subsets of mouse neurons using the log2(TP10K+1) expression matrix of the mouse variable genes that also had human orthologs (see "Variable gene selection"). The Random Forest model was built with the R "randomForest" package using default parameters with the following exception: to account for class imbalances, we down-sampled each neuron class to the minimum class size while constructing each tree (implemented using the "sampsize" argument). In total, the "out of bag" estimate of the error rate (which estimates test rather than training error) was 8.8%, indicating that we can accurately distinguish among major neuron classes. Next, to extend this model to humans, we predicted the class for each human neuron using expression data for the human orthologs of the variable genes. All class assignments were then manually examined to ensure accurate predictions for all cells. Note that we also tested an alternative approach using a variational autoencoder (VAE) (176), but did not observe a noticeable improvement in performance (data not shown).

Identifying a core transcriptional program for major neuron classes
To identify conserved transcriptional signatures for each of the five major neuron classes (i.e., PEMN, PIMN, PIN, PSN, PSVN), we first mapped all mouse genes to their corresponding human orthologs (using only 1:1 orthologs), and combined both expression matrices according to these . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint genes. We next calculated DE orthologs within each major neuron class (see "Differential expression analysis"), then selected genes that were significantly DE in the combined dataset, the mouse dataset, and the human dataset (Supp. Table 5).

Using receptor-ligand pairs to infer cell-cell interactions
To identify cell-cell interactions, we mapped the FANTOM5 database of literature-supported receptor-ligand interactions (177) onto our lists of cell subset markers. Following a recent approach (CellPhoneDB (178)), we filtered this database to remove all integrins (defined using the HUGO "Integrin" gene group), which were involved in many non-specific cell-cell interactions.
We further required cell subset markers to be expressed in at least 5% of all cells within the subset.
For all networks, we quantified the interaction strength between two cell subsets as the number of unique receptors and ligands connecting them, resulting in an adjacency matrix summarizing all cell-cell interactions within the dataset. Statistical significance was then empirically assessed by permuting the receptors and ligands among all cell subsets, thus preserving the number of receptors and ligands encoded within each cell subset, and preserving the distribution of ligand-receptor connectivity (but possibly changing the connectivity between cell subsets, in those cases where one receptor has multiple ligands, or vice versa). After running 10,000 total permutations, p-values were computed as the number of times the edge strength in the permuted network was greater than or equal to the edge strength in the true network. To plot cell-cell interaction networks, we applied the Fruchterman-Reingold layout algorithm to a network defined using the -log10(p-value), using only the edges with p-value < 0.05. Although edge weights were used to generate the layout, they were removed from the final visualization for visual clarity (Fig. 3i).
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint

Defining disease risk genes
We compiled lists of genes that have been implicated by human genetics studies as contributing to risk for the following diseases: Hirschsprung's disease (HRSC), inflammatory bowel disease (IBD), autism spectrum disorders (ASDs), and Parkinson's disease (PD). Because human genetics studies do not always pinpoint a causative risk gene, we used the literature to identify sets of genes that are particularly likely to contribute to disease risk, including: 9 HRSC-associated genes (179), 106 IBD-associated genes (180), 28 ASD-associated genes (181), and 29 PD-associated genes (182).
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint   . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted September 4, 2019. ; https://doi.org/10.1101/746743 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available