Discrete and Continuous Cell Identities of the Adult Murine Striatum

The striatum is a large brain region containing two major cell types: D1 (dopamine receptor 1) and D2 (dopamine receptor 2) expressing spiny projection neurons (SPNs). We generated a cell type atlas of the adult murine striatum using single-cell RNA-seq of SPNs combined with quantitative RNA in situ hybridization (ISH). We developed a novel computational pipeline that distinguishes discrete versus continuous cell identities in scRNA-seq data, and used it to show that SPNs in the striatum can be classified into four discrete types that reside in discrete anatomical clusters or are spatially intermingled. Within each discrete type, we find multiple independent axes of continuous cell identity that map to spatial gradients and whose genes are conserved between discrete types. These gradients correlate well to previously-mapped gradients of connectivity. Using these insights, we discovered multiple novel spatially localized region of the striatum, one of which contains patch-D2 SPNs that express Tac1, Htr7, and Th. Intriguingly, we found one subtype that strongly co-expresses both D1 and D2 dopamine receptors, and uniquely expresses a rare D2 receptor splice variant. These results collectively suggest an organizational principal of neuron identity in which major neuron types can be separated into discrete classes with little overlap and no implied spatial relationship. However these discrete classes are then continuously subdivided by multiple spatial gradients of expression defining anatomical location via a combinatorial mechanism. Finally, they suggest that neuronal circuitry has a substructure at far higher resolution than is typically interrogated which is defined by the precise identity and location of a neuron.

Within each discrete type, we find multiple independent axes of continuous cell identity that map to 23 spatial gradients and whose genes are conserved between discrete types. These gradients correlate well to 24 previously-mapped gradients of connectivity. Using these insights, we discovered multiple novel spatially 25 localized region of the striatum, one of which contains patch-D2 SPNs that express Tac1, Htr7, and Th.

26
Intriguingly, we found one subtype that strongly co-expresses both D1 and D2 dopamine receptors, and 27 uniquely expresses a rare D2 receptor splice variant. These results collectively suggest an organizational 28 principal of neuron identity in which major neuron types can be separated into discrete classes with little 29 overlap and no implied spatial relationship. However these discrete classes are then continuously 30 subdivided by multiple spatial gradients of expression defining anatomical location via a combinatorial 31 mechanism. Finally, they suggest that neuronal circuitry has a substructure at far higher resolution than is 32 typically interrogated which is defined by the precise identity and location of a neuron.

36
Neuronal computation is typically thought of as the interactions of defined circuits between large groups 37 of cells called neuron types, which are interrogated by using marker genes that are expressed only in a 38 given type in a brain region. Single-cell transcriptomics has advanced these efforts by enabling the 39 identification of markers for hundreds of candidate neuron types across many brain regions [1][2][3][4][5][6][7][8][9][10] . Broad 40 neuron classes generally form clear, non-overlapping clusters, often distinguishable by a single marker 41 gene 1,11,12 . Within those classes, however, neuron types often overlap or form gradients 1,11,12 suggesting 42 the existence of a further layer of identity structure within each discrete type 14 . It is not clear how best to 43 analyze single cell data to reveal these differences, or whether this continuous heterogeneity can even be 44 used to classify neurons 13,14 . These controversies are partly due to the possibility that continuous 45 heterogeneities represent transient cell states or genetic differences between animals 13 . More generally, this phenomenon calls into question the paradigms of assigning cell types into discrete identities and defining neuronal circuits by large, homogeneous groups of cells.

49
We created a single-cell sequencing dataset of a brain region paired with an in situ analysis of every 50 subtype that provides a ground-truth reference for which neuronal identities are discrete and which are 51 varying continuously. To analyze this data we developed a novel algorithm that distinguishes between 52 continuous and discrete variation. We used this algorithm to show that striatal heterogeneity is comprised 53 of four discrete subtypes of SPNs; three of these discrete subtypes vary continuously along multiple 54 shared axes of gene expression, and these shared continuous axes encode for spatial location. We also 55 find spatial compartments that are limited to a single discrete type, and we discovered a novel region of 56 the striatum, which we call the ventromedial patch and is composed of D2 SPNs that co-express Tac1, 57 Penk, Htr7, and Th. We used full-length cDNA to show that a subtype, which we previously referred to as 58 D1-Pcdh8, co-expresses Drd1a and the rare "short" isoform of the D2 receptor 12 . Due to this hybrid-like 59 expression, we name it the D1H SPN.

61
Our results reveal an important organizing principle of neuronal identity and circuits. Continuous 62 variation of identity has been observed previously to relate to spatial location in simple systems 63 comprised of single cell types or single spatial axes [15][16][17][18] . Here we show that cell types are subdivided by 64 multiple, independent spatial gradients of expression. We further show that the genes underlying these 65 spatial axes can be conserved between discrete subtypes, allowing for the inference of spatial position of 66 multiple cell types in complex tissues from a small number of reference markers. This also provides a 67 fundamental explanation for why combinations of genes often are required to specify neuronal cell types: 68 a cell is best described by the combination of its discrete identity and its position along each independent 69 spatial axis. This discovery has obvious implications for the choice of strategy to establish comprehensive 70 cell atlases of model organisms.

72
More broadly, our results have implications for neuronal circuits and computation. We find that neurons 73 of the same type have highly localized gene expression patterns due to multiple spatial gradients of gene 74 expression. We find that previously mapped connectivity correlates well to these gene expression patterns, 75 suggesting that neuronal circuits can be subdivided into very small units depending on gene expression.

76
This contrasts with existing models of neuronal circuits, which are traditionally defined by large groups of 77 cells with shared regulatory regions from single marker genes. Our results suggest that developing 78 combinatorial, spatially-precise approaches to targeting neurons may substantially improve our 79 understanding of neuronal computation.

81
We previously found that most cells clearly classified as D1 or D2 SPNs 12 . However, we also reported 82 several novel subtypes that co-expressed Tac1 and Penk (D1-Pcdh8, D2-Htr7) as well as continuous 83 gradients of expression within the major D1 and D2 subtypes. Importantly, we observed that some of 84 subtypes seemed more discrete and defined by binary on/off patterns of gene expression, and other 85 heterogeneity was more continuous and defined by gradients of gene expression. However, we were 86 limited in our ability to clearly distinguish these distributions and determine the exact number of 87 intermediates by our limited number of cells. We therefore prepared 1343 single dissociated D1-tdTom+ 88 or D2-GFP+ SPNs from 5 adult mouse brains using Smart-Seq2 full-length amplification, and analyzed 89 1211 cells with at least 1e5 mapped reads and 1000 detected genes (Sup Fig 1A,B).

91 92
Improved detection of discrete SPN types

93
We applied a standard clustering pipeline, PCA of the top variable genes followed by tSNE projection 94 and Louvain clustering on a shared-nearest-neighbor (SNN) graph. This pipeline produced two apparently 95 discrete clusters on a tSNE mapping. We first assessed the continuity of the clusters using partial 96 approximate graph abstraction (PAGA) 19 . We initially found that none of the clusters separated into 97 discrete types (Fig 1A-i). However, the gene expression underlying the D1/D2 difference appeared to 98 change sharply (Fig 1B, Sup Fig 1E). Previously, we showed that the gene expression distribution 99 underlying the D1/D2 separation was strongly bimodal and nearly all cells could be classified as either D1 100 or D2 12 . Additionally, we showed that we could considerably decrease the amount of technical variability 101 in a PCA dimensionality reduction by truncating the PC loadings so that only the top PC-associated genes 102 contributed to PCA (truncated PCA) 20 . We applied this modification to PCA and PAGA, and this resulted 103 in a clear, discrete separation between D1 and D2 SPNs (Fig 1A-ii). We found evidence that this 104 adjustment reduced the average contribution of batch (sequencing plate) variability to PCs (Sup Fig 1L), 105 possibly because batch effects are caused by small changes consistent across thousands of genes. We also 106 removed PCs that likely did not represent stable cell identity, like immediate early gene expression or 107 batch effect (Sup Fig F-I). After this computational optimization, we identified 4 discrete clusters: D1 108 SPN, D2 SPN, a subtype expressing Drd1a and Rreb1 that we identified as SPNs located in the islands of 109 Calleja (ICj), and a subtype expressing Nxph4 and Pcdh8 that we had previously reported (D1-Pcdh8) 110 ( Fig 1B). We then re-applied dimensionality reduction and clustering to each of the 4 discrete subtypes.

111
We found that truncated PCA-based clustering did not reveal any further discrete subtypes; however, 112 truncated PCA of D2 subtypes considerably reduced batch effect (Sup Fig 1J). We compared cluster 113 marker genes to a publicly-available RNA ISH database 40 to map the heterogeneity within D1 and D2

114
SPNs to anatomical regions of the striatum (

120
Mapping the discrete cell types of the striatum.

121
We then sought to map the discrete cell types of the striatum and understand whether their in situ 122 distributions were also discrete. We quantified the discrete identity of SPNs by RNAscope in situ 123 hybridization of their markers. We first stained and quantified D1 vs. D2 identity by the log ratio of Tac1   124 to Penk expression (Fig 2A). As expected, D1 and D2 cells were spatially intermingled throughout most 125 of the striatum ( Fig 2B). We found that this identity was very strongly bimodal with few intermediates, 126 corresponding to the discrete nature of D1 vs. D2 identity (with a small number of co-expressing cells that 127 correspond to other subtypes) ( Fig 2C).

129
We then analyzed the spatial distribution of the small discrete populations in our analysis. Our 130 computational analysis predicted that the islands of Calleja (ICj) SPN subtype was discretely separated 131 from the rest of the SPN types ( Fig 1A-ii). Interestingly, this subtype is not present in previously 132 published single-cell atlases of the nervous system. We quantified the identity of the ICj subtype as in log intensity space ( Fig 2E). We found that expression of this identity 134 changed in a discrete step along a spatial axis from the olfactory tubercle to the ICj (Fig 2F, G). This 135 matches with previous anatomical observations that the islands of Calleja appeared distinct from the rest 136 of the striatum, supporting the validity of our scRNAseq and in situ analyses 21 .

138
Finally, we stained for the Tac1/Penk-co-expressing D1H subtype, which is comprised of two continuous 139 subtypes (Nxph4+ or Tac2+). D1H-Nxph4 cells were intermingled with classical D1 and D2 cells in the 140 dorsal striatum and clustered in ventrolateral structures (Fig 2I-K). We quantified D1H identity in situ as in log-intensity space and found a discrete step change in D1H identity from the 142 ventral striatum to the D1H cluster (Fig. 2L). We performed a similar analysis for the Tac2+ D1H subtype, 143 and saw it exclusively in discrete ventromedial clusters (Fig 2N-P (Fig 3A, B). Generally, we found that continuous subtypes were spatially-adjacent. For 152 example, the subtype mapping to the medial shell was continuous only with the NAcc subtype and not the 153 CPu, which corresponds to the fact that the medial shell is physically adjacent only to the NAcc (Fig 3C, 154 D). This led us to believe that the gene expression distributions underlying continuous subtypes 155 represented gradients of expression. We found a signature of genes correlated to Cnr1 and Crym shared 156 between 3 discrete subtypes in the scRNAseq data ( Fig 3E).

166
We then found that genes correlated to the patch-matrix markers Kremen1 and Id4 formed a gradient in 167 both D1 and D2 cells that was largely orthogonal to the Cnr1-Crym gradient (Fig 1D, Fig 3I). We stained 168 for the patch-matrix organization using probes to Kremen1, Sema5b (which was highly correlated to 169 Kremen1), and Id4 ( Fig 3I). We quantified patch-matrix identity as in log-170 intensity space. The patches appeared to lack discrete boundaries, but the 2D spatial arrangement was too 171 complex to easily map to a single spatial axis (Fig 3J, K). Therefore, to determine whether the patch-172 matrix organization was discrete or not, we manually identified boundaries of the patches and performed 173 a Monte Carlo simulation of discrete patches (Fig 3K). We quantified the degree of spatial organization as 174 the correlation between a cell and its second-nearest-neighbor (ߩ ଶ ) ( Fig 3L). We found that the patch-175 matrix was much more spatially organized than a discrete-patch model would predict ‫(‬ ൏ 0 . 0 0 1 ) (Fig   176   3L, M). This indicated that patch-matrix identity is not binary and instead depends continuously on 177 distance from the center of a patch. We thus present the first proof that the patch-matrix distribution is a 178 continuous spatial relationship at the single cell level.

180
We also observed continuous subtypes that predicted anatomical location over very small length scales.

181
One of the D1-only continuous subtypes localized to the "ruffles", small protrusions of the olfactory 182 tubercle. In the scRNA-seq data, a small subgroup of cells expressing Nts was most directly continuous 183 with the D1 population expressing Cxcl14, a marker enriched in the OT ruffles (Fig 1D, Fig 3N). To 184 determine if these were gradients, we stained for Cxcl14, Nts, and Synpr (a gene de-enriched in the OT) 185 (Fig 3O, P). We found that in situ expression levels of these genes changed continuously from the ventral 186 NAcc to the OT (Sup Fig 3F, G). Their expression again changed continuously along a spatial axis from 187 the flat OT to the OT ruffles over just 150 µm (Fig 3G, H). This demonstrates that continuous 188 relationships in scRNAseq data predict anatomical position even over very small length scales. We also 189 identified a novel expression gradient within the OT-ruffles where Tnnt1 increased with proximity to the 190 pial surface only in the lateral OT-ruffles (Fig 1D). Previous single-cell atlases of the brain did not 191 analyze the ventrolateral and OT regions of the striatum 10 and therefore omitted these subtypes.

195
As reported previously, a subtype of D2 SPNs (D2-Htr7 or D2H) co-expresses Penk and Tac1 10,12 . Unlike 196 the D1H population, this population exclusively expresses Drd2 and not Drd1a. Our scRNAseq analysis 197 predicts that it is continuous with the D2-patch cells, but not the D2-matrix cells (Fig 4A). This is 198 supported by the fact that there is strong overlap between marker genes defining the D2-patch and D2H 199 cell types (Fig 4A). Since the D2-patch cells have a distinct spatial distribution, and we have shown that 200 continuous subtypes are always anatomically related, we hypothesized that the spatial distribution of the 201 D2H cells would also be patch-like. We thus stained for Th, Htr7, and Tac1. Htr7+/Tac1+ cells appeared 202 in patchy distributions, but more ventromedial than most of the patches marked by Kremen1 or Sema5b 203 ( Fig 3J, Fig 4B, C). Htr7+/Tac1+ cells were 14x more likely to be Th+ than Htr7-/Tac1+ cells, matching 204 the scRNA-Seq data (Fig 4A,D). Our scRNAseq analysis predicted that Th would mark only the most 205 distinct tip of the D2H continuum ( Fig 4A). Indeed, triple-positive cells were limited to the most 206 ventromedial part of the D2H-patches, particularly the medial shell ( Fig 4B). We therefore define a novel 207 spatial compartment of the striatum composed of ventromedial patch-regions containing Tac1+ (Fig 1C). The subtype with highest co-expression of Drd1a and Drd2 was the D1H-Nxph4 219 subtype ( Fig 1C). Drd1a/Drd2 co-expression in this subtype was not observed in recent droplet 220 microfluidics atlases, possibly due to the low detection rate of those technologies 10,22 . The D2 receptor is 221 expressed in two isoforms, short (D2S) and long (D2L) 23 (Fig 5B). We found that just 8% of isoform-222 specific reads mapped to D2S in D2 SPNs, consistent with previous literature (Fig 5C) 23 . Expression was 223 similar in the novel ventromedial-patch subtype D2-Htr7, which co-expresses Tac1 and Penk, but not 224 Drd1a and Drd2 (Fig 1B, Fig 5C). Intriguingly, we found that 34% of isoform-specific reads mapped to 225 D2S in the D1H-Nxph4 subtype (Fig 5C). Therefore we find that the D2S isoform is specific to D1R/D2R 226 co-expression.

228
To understand the molecular pathways driving this phenomenon, we investigated transcription factors 229 (TFs) and splicing factors (SFs) enriched in the D1H-Nxph4 subtype. We found 3 SFs enriched in D1H-

258
However, these results have not been combined into a broad framework that can be applied to more 259 complex systems.

261
Our work addresses this challenge by providing a clear framework for relating neuronal identity and 262 anatomy, and for revealing in greater detail than ever before how discrete and continuous heterogeneity 263 relates to anatomy. In the striatum, we showed that discrete types have no implied spatial relationships to    518  519  520  521  522  523  524  525  526  527  528  529  530  531  532  533  534  535