Whole-cortex in situ sequencing reveals peripheral input-dependent cell type-defined area identity

The cortex is composed of neuronal types with diverse gene expression that are organized into specialized cortical areas. These areas, each with characteristic cytoarchitecture (Brodmann 1909; Vogt and Vogt 1919; Von Bonin 1947), connectivity (Zingg et al. 2014; Harris et al. 2019), and neuronal activity (Schwarz et al. 2008; Ferrarini et al. 2009; He et al. 2009; Meunier et al. 2010; Bertolero et al. 2015), are wired into modular networks (Zingg et al. 2014; Harris et al. 2019; Huang et al. 2020). However, it remains unclear whether cortical areas and their modular organization can be similarly defined by their transcriptomic signatures and how such signatures are established in development. Here we used BARseq, a high-throughput in situ sequencing technique, to interrogate the expression of 104 cell type marker genes in 10.3 million cells, including 4,194,658 cortical neurons over nine mouse forebrain hemispheres at cellular resolution. De novo clustering of gene expression in single neurons revealed transcriptomic types that were consistent with previous single-cell RNAseq studies(Yao et al. 2021a; Yao et al. 2021b). Gene expression and the distribution of fine-grained cell types vary along the contours of cortical areas, and the composition of transcriptomic types are highly predictive of cortical area identity. Moreover, areas with similar compositions of transcriptomic types, which we defined as cortical modules, overlap with areas that are highly connected, suggesting that the same modular organization is reflected in both transcriptomic signatures and connectivity. To explore how the transcriptomic profiles of cortical neurons depend on development, we compared the cell type distributions after neonatal binocular enucleation. Strikingly, binocular enucleation caused the cell type compositional profiles of visual areas to shift towards neighboring areas within the same cortical module, suggesting that peripheral inputs sharpen the distinct transcriptomic identities of areas within cortical modules. Enabled by the high-throughput, low-cost, and reproducibility of BARseq, our study provides a proof-of-principle for using large-scale in situ sequencing to reveal brain-wide molecular architecture and to understand its development.


Abstract 1
The cortex is composed of neuronal types with diverse gene expression that are organized into specialized 2 cortical areas. These areas, each with characteristic cytoarchitecture (Brodmann 1909 compositions of transcriptomic types, which we defined as cortical modules, overlap with areas that are 15 highly connected, suggesting that the same modular organization is reflected in both transcriptomic 16 signatures and connectivity. To explore how the transcriptomic profiles of cortical neurons depend on 17 development, we compared the cell type distributions after neonatal binocular enucleation. Strikingly, 18 binocular enucleation caused the cell type compositional profiles of visual areas to shift towards 19 neighboring areas within the same cortical module, suggesting that peripheral inputs sharpen the distinct 20 transcriptomic identities of areas within cortical modules. Enabled by the high-throughput, low-cost, and 21 reproducibility of BARseq, our study provides a proof-of-principle for using large-scale in situ sequencing 22 to reveal brain-wide molecular architecture and to understand its development. 23 24

Main text 1
The vertebrate brain is organized into subregions that are specialized in function and distinct in 2 cytoarchitecture and connectivity. This spatial specialization of function and structure is established by 3 developmental processes involving intrinsic genetic programs and/or external signaling (Cadwell et al. 4 2019). Both intrinsic and extrinsic developmental processes drive the expression of specific sets of genes, 5 which together specify the cell fates and establish specialized cellular properties (Hobert 2008). Although 6 gene expression can change during cell maturation and remains dynamic in response to internal cellular 7 conditions and external stimuli, a core transcriptional program that maintains the cellular identity usually 8 remains steady in mature neurons (Zeng and Sanes 2017). Thus, resolving the expression of core sets of 9 genes that distinguish different types of neurons provides insight into the functional and structural 10 specialization of neurons. 11 Many large brain structures are spatially organized into divisions, or modules, within which neurons are 12 more similar in morphology, connectivity, and activity. In the cortex, these modules usually involve a set 13 of adjacent cortical areas that are highly interconnected (Zingg et  . This modular organization is perturbed in Alzheimer's disease (Stam et 16 al. 2007), mild cognitive impairment (Yao et al. 2010), schizophrenia (Lynall et al. 2010), and depression 17 (Fingelkurts et al. 2007), suggesting that cortical modules are critical to normal functions. In contrast to this 18 modular organization in activity and connectivity, many cortical areas share the same medium-grained and 19 fine-grained transcriptomically defined neuronal types (Tasic et al. 2018; Yao et al. 2021b). Whether and 20 how the areal and modular organization of cortical connectivity and activity is reflected in the 21 transcriptomic signatures of areas is unknown. 22 To address this question, here we apply BARseq (Chen et  organization of gene expression at cellular resolution over whole brain structures, such as the cortex. 36 Here we use BARseq as a standalone technique for sequencing gene expression in situ, and scale it up to a 37 brain-wide scale in nine animals, with or without binocular enucleation, to resolve the distribution of 38 neuronal populations and gene expression across the cortex. We generate high-resolution maps of 10. 3 39 million cells with detailed gene expression, including 4,194,658 cortical cells. We find that although most 40 neuronal populations are found in multiple cortical areas, the composition of neuronal populations is distinct 41 across areas. The neuronal compositions of highly connected areas are more similar, suggesting a modular 42 organization of the cortex that matches cortical hierarchy and modules defined by connectivity in previous 43 studies ( confusion, we first define the cell type nomenclature we will use in this paper. The highest hierarchical 8 level we use, or H1 type, divides neurons into excitatory neurons, inhibitory neurons, and other cells; this 9 level is referred to as "class" level in many studies. Within each H1 type, we subdivide neurons into H2 10 types, which are sometimes referred to as "subclasses" (Yao et al. 2021a; Yao et al. 2021b). In particular, 11 cortical excitatory neurons can be divided into nine H2 types that are shared across most cortical areas. This 12 division refines the traditional projection-based IT (intra-telencephalic) /PT (pyramidal tract) /CT 13 (corticothalamic) neuron (Harris and Shepherd 2015) classification as follows: PT and CT neurons 14 correspond to L5 ET (extra-telencephalic neurons) and L6 CT neurons, respectively, whereas IT neurons 15 are subdivided into L2/3 IT, L4/5 IT, L5 IT, L6 IT, NP (near-projecting neurons), Car3, and L6b. This 16 division follows recent single-cell RNAseq studies but differs from the classical tripartite of IT/PT/ CT 17 neurons, which were defined largely based on long-range projections and failed to capture differences 18 among some transcriptomically distinct cell types, such as NP, Car3, and L6b. Each H2 type can be further 19 divided into H3 types, which correspond to "cluster" or "type" level in some studies (Yao et al. 2021a; Yao 20 et al. 2021b). Previous studies have shown that H1 and H2 types are largely shared across most cortical 21 areas, but the expression of many genes is localized to specific parts of the cortex both during development 22 (O'Leary et al. 2007; Cadwell et al. 2019) and in the adult (Lein et al. 2007). Clusters at the H3 level 23 appeared to be enriched in neurons dissected from different parts of the cortex in single-cell RNAseq studies 24 (Tasic et al. 2018; Yao et al. 2021a). However, because these dissections could only be performed at a 25 coarse resolution, the detailed distribution of neuronal populations at this higher granularity across cortical 26 areas remains unclear. 27 To assess the distribution of neuronal populations across the cortex, we first generated a pilot dataset by 28 applying BARseq to interrogate the expression of 104 cell type marker genes (Supplementary Table 1) in 29 40 hemi-brain coronal sections that cover the whole forebrain in one animal (Fig. 1A, B). We applied the 30 same approach that we previously used to resolve cortical excitatory neuron types in the motor cortex (Sun 31 et al. 2021). Briefly, we selected marker genes that were optimized for distinguishing excitatory neuronal 32 types in the cortex (see Supplementary Note 1). We evaluated this gene panel using a comprehensive 33 single-cell RNAseq dataset from the cortex (Yao et al. 2021b), and found that these genes performed 34 similarly to the full transcriptome in distinguishing H2 (i.e. subclass-level) and H3 (i.e. cluster-level) types 35 ( Fig. 1C; ED Fig. 1; see Supplementary Note 1). We used up to 12 padlock probes to target each gene; 36 each probe carried a 7-nt gene identification index (GII) that uniquely identified the gene. These GIIs were 37 designed to have a minimum hamming distance of 3-nt to allow for error correction. We further included 5 38 blank GIIs that were not present in the padlock probes as negative controls when decoding the GIIs; these 39 blank GIIs allowed us to control and evaluate false detection rates. We decoded the GIIs from seven rounds 40 of sequencing using BarDensr (Chen et al. 2021a) while maintaining an optimal false detection rate (~5% 41 estimated using the blank GIIs). We registered our data to the Allen Mouse Brain Common Coordinate 42 Framework v3 (CCF v3)(Wang et al. 2020) using a semi-manual procedure that utilized QuickNii, 43 Visualign (Puchades et al. 2019), and custom python scripts (see Data and code availability). Three highly 44 expressed high-level marker genes (Slc17a7 for excitatory neurons, Gad1 for inhibitory neurons, and 45 Slc30a3 for IT neurons) were detected by hybridization rather than by sequencing (Fig. 1B). We segmented 1 cell bodies with Cellpose using DAPI as the nucleic channel in Cellpose and sequencing signals from all 2 imaging channels as the cytoplasmic channel in Cellpose (Stringer et al. 2020) (Fig. 1B), which resulted in 3 2,167,762 cells. We then removed cells with insufficient number of reads (20 reads/cell and 5 genes/cell 4 minimum), resulting in 1,259,256 cells after quality control (QC, see Methods) with a mean of 60 unique 5 reads/cell and 27 genes/cell (Fig. 1D, E). 6 1 At a gross anatomical level, many genes were differentially expressed across major brain structures and 2 cortical layers (Fig. 1A). These expression patterns were consistent with the patterns of in situ hybridization 3 in the Allen Brain Atlas (Lein et al. 2007) (Fig. 1F). For example, classical cortical layer-specific markers, 4 including Cux2, Fezf2, and Foxp2, were expressed in layer 2/3, layer 5/layer 6, and layer 6, respectively. 5 Rorb, a layer 4 marker, was seen in layer 4 throughout the cortex except in the motor cortex and medial 6 areas, which lack a classically defined layer 4. Scnn1a was expressed mostly in the retrosplenial cortex and 7 primary sensory areas, with the strongest expression in the primary visual cortex and primary 8 somatosensory cortex. Thus, our pilot dataset recapitulated the known spatial distribution of gene 9 expression. 10

11
De novo clustering reveals neuronal subpopulations that are consistent with reference transcriptomic 12 types 13 We next identified transcriptomic types of individual neurons based on single-cell gene expression. 14 Generally, two approaches can be used to identify cell types in new transcriptomic datasets. In the first 15 approach, we can map individual neurons in a new dataset directly to clusters in reference single-cell 16 RNAseq datasets to determine cell type identities. This approach can match small datasets to cell types 17 discovered in much larger, higher-resolution datasets (Cadwell et  We applied hierarchical clustering to separate neuronal populations at H1, H2, and H3 levels ( Fig. 2A;  and Gad1 (ED Fig. 2A). Of these 1.2 million cells, 517,428 were in the cortex and were the focus of the 30   neurons in the cortex were inhibitory neurons in our dataset (427,766 excitatory neurons, 83,394  3 inhibitory neurons, and 6,268 other cells). Because the excitatory and inhibitory neurons were defined by 4 clustering on the expression of all genes, a small fraction of them did not have detectable Slc17a7 (3,800  5 of 427,766 excitatory neurons, 0.9%) or Gad1 expression (100 of 83,394 inhibitory neurons, 0.1%). 6 Because we only sampled Slc17a7 and Gad1 for excitatory and inhibitory neuron markers, the excitatory 7 neurons identified were dominated by cortical neurons, although we also saw neurons in the pons and the 8 epithalamus in this group. The third group of cells, which expressed neither Slc17a7 nor Gad1, included 9 subpopulations of subcortical neurons (e.g., the midbrain and the thalamus) and non-neuronal cells (e.g., 10 glial cells, immune cells, and epithelial cells). We expect this group of cells to be under-sampled, because 11 these cells may not express cortical cell type marker genes that we probed at sufficient levels to pass quality 12 control (Fig. 1D, E). Based on the fraction of excitatory neurons that express both Slc17a7 and Gad1, we 13 estimated that the probability of segmentation errors in which two neighboring cells were merged together, 14 i.e. doublet rate, to be between 5% and 7% (ED Fig. 2B, C; see Supplementary Note 2). The 24 clusters, 15 comprising the three H1 types, largely corresponded to coarse anatomical structures in the brain (Fig. 2B). 16 For example, different clusters were enriched in the lateral group and ventral group of the thalamus, the 17 intralaminar nuclei, the epithalamus, the medial, basolateral, and lateral nuclei of the amygdala, the 18 striatum, and the globus pallidus (Fig. 2B). These results recapitulated the clear distinction of transcriptomic 19 types across anatomically defined brain structures as observed in whole-brain single-cell RNAseq studies 20 ( We then re-clustered the excitatory and inhibitory neurons separately into H2 types ( Fig. 2C; ED Fig. 2D) 22 to improve the resolution of clustering. At this level, we recovered major inhibitory neuron subclasses 23 (Pvalb, Sst, Vip/Sncg, Meis2-like, and Lamp5), all excitatory subclasses that are shared across the cortex 24 (L2/3 IT, L4/5 IT, L5 IT, L6 IT, L5 ET, L6 CT We then re-clustered each excitatory H2 type into H3 types. To quantify how well H3 types corresponded 36 to reference transcriptomic types identified in previous single-cell RNAseq studies, we used a k-nearest 37 neighbor-based approach to match each H3 type to leaf-level clusters in a previous single-cell RNAseq 38 dataset (Yao et al. 2021b) (see Methods). We found that cortical H2 types had a one-to-one correspondence 39 to subclass-level cell types in the single-cell RNAseq data (Fig. 2E). Within each H2 type, the H3 types 40 differentially mapped onto single or small subsets of leaf-level clusters in the single-cell RNAseq data ( dataset resolved fine-grained neuronal subpopulations corresponding to clusters obtained previously using 43 single-cell RNAseq data. 44 Both H2 types and H3 types were organized in an orderly fashion along the depth of the cortex. H2 types 1 were concentrated in distinct layers, whereas H3 types within a H2 type were enriched in finer divisions 2 within each layer (p < 1×10 -61 using one-way ANOVA to compare the laminar positions of all H3 types 3 within each H2 type after Bonferroni correction; Fig. 2F). For example, multiple H3 types of L2/3 IT, L4/5 4 IT, and L6 IT clearly occupied distinct sublayers in the somatosensory cortex (Fig. 2G) their matching clusters in single-cell RNAseq datasets were also found in similar cortical areas (Fig. 2H,  4 ED Fig. 2G, H, ED Fig. 3). For example, the H3 type "PT AUD" and its corresponding single-cell RNAseq 5 cluster (242_L5_PT CTX) were both enriched in the lateral cortical areas (TEa-PERI-ECT) and auditory 6 cortex (AUD), whereas the H3 type "PT CTX P" and its corresponding single-cell RNAseq clusters 7 (245_L5_PT CTX and 259_L5_PT CTX) were all enriched in the visual cortex (VIS). Thus, at this coarse 8 spatial resolution, our data recapitulated previously observed areal distribution of cortical excitatory 9 neurons. 10 To summarize, our pilot dataset resolved fine-grained transcriptomic types of cortical excitatory neurons 11 that were consistent with previous single-cell RNAseq  area-specific gene expression. Three sources of variation could contribute to the differences in gene 4 expression across areas (Fig. 3A). First, the composition of H2 types may drive the differences in gene 5 expression across the cortex (Fig. 3Aa, the cell type composition model). For example, the ratio of H2 type 6 X to type Y might be high in visual cortex but low in motor cortex, so genes that are expressed more highly 7 in X than in Y will be more highly expressed in visual cortex. Second, the expression of some genes may 8 vary across space regardless of H2 type, i.e., they change consistently across space in multiple H2 types 9 (Fig. 3Ab, the spatial gradient model). In this model, gene A may be more highly expressed in the visual 10 cortex than in the motor cortex in both H2 types X and Y. Finally, the expression of some genes may vary 11 across space in an H2-specific manner (Fig. 3Ac, the area-specialized cell type model). For example, gene 12 A may be more highly expressed in the visual cortex than in the motor cortex in H2 type X. Our pilot dataset 13 suggests that all three models contribute to the spatial variation of gene expression in the cortex, but the 14 model that contributes most to the variation for each gene can vary. 15 To determine the contribution of each source to the variation of gene expression across areas, we discretized 16 the cortex on each coronal slice into 20 spatial bins. We "un-warped" the cortex on each slice (see Methods  17 for details on unwarping), then divided the un-warped cortex into bins that spanned all cortical layers and 18 had the same number of cells across bins in the same slice (ED Fig. 4A). We then assessed how much the 19 variation in bulk gene expression across bins can be explained by space or by composition of H2 or H3 20 types using one-way ANOVA (Fig. 3B, C; see Methods). We found that variations in many genes were 21 strongly explained by the composition of H2 types; these patterns were consistent with the cell type 22 composition model (Fig. 3Aa). An example gene of this category is Ctgf, which is specifically expressed 23 in L6b neurons at a consistent level across space (Fig. 3D, top). Thus, variations in the expression (up to 24 80%) of Ctgf across space were largely explained by the fraction of L6b neurons in each bin rather than by 25 variation in gene expression within cells. Other genes, in contrast, were largely explained by the 26 composition of H3 types rather than the composition of H2 types (Fig. 3C, i.e. the area-specialized cell 27 type model in Fig. 3Ac). Many genes in this category were also highly spatially variable (Fig. 3C, rho = 28 0.23), which suggest that H3 types were likely differentially distributed across the cortex (e.g. Nnat, marker 29 of lateral areas, Fig. 3D, middle). Finally, some genes displayed high spatial variability, but relatively low 30 H2 and H3 variability. These genes were usually expressed in multiple H2 types (e.g. Tenm3, Fig. 3D, 31 bottom) and varied consistently in space across these H2 types, suggesting a general spatial gradient that is 32 independent of H2 types. The spatial patterns of these genes were consistent with the spatial gradient model 33 (Fig. 3Ab). 34 Because the spatial patterns of many genes were similar across genes and H2 types, we sought to extract 2 basic spatial components that were shared across genes and H2 types using non-negative matrix 3 factorization (NMF) (Lee and Seung 1999). Briefly, we performed NMF on the residuals of gene expression 4 of spatial bins after accounting for differences in the composition of H2 types in each bin. We extracted ten 5 NMF components, seven of which captured spatial variations in gene expression (the other three 6 components captured slice-specific technical variability and were not used for subsequent analyses; see 7 Supplementary Note 4 and ED Fig. 4B, C). We found that the majority of NMF components were 8 expressed not in broad gradients along major spatial axes, but rather in areas that were functionally related 9 and highly interconnected ( Fig. 3E; ED Fig. 4D). For example, NMF5 was expressed mostly in the visual 10 areas, whereas NMF8 was expressed in somatosensory areas. Other NMF modules, including NMF1 11 (medial areas) and NMF10 (lateral areas), were expressed in combinations of areas that were functionally 12 distinct but also highly interconnected ( (Fig. 3D, bottom); Tenm3 was strongly associated with NMF5 ( Fig.  17 3F), which was also expressed in the same sets of areas (Fig. 3E). Thus, the finding that gene expression 18 varies along sets of interconnected areas suggests an intriguing link between gene expression and intra-19 cortical connectivity across areas. 20

Cortical areas have distinct compositional profiles of H3 types. 22
Because the spatially varying NMF modules were obtained after controlling for variability in the 23 composition of H2 types, but not H3 types, we hypothesized that these modules reflected differences in the 24 composition of H3 types across cortical areas. Consistent with this hypothesis, each H3 type was enriched 25 in a small subset of NMF modules ( Fig. 4A; see Methods). All H2 types contained at least one H3 type 26 that was associated with NMF modules that were expressed in the medial and lateral areas (NMF 1, 3, 7, 27 10), and one to four H3 types that were associated with NMF modules that were expressed in subsets of the 28 dorsal cortex, including the motor, somatosensory, and visual areas (NMF 4,5,8). The associations between 29 H3 types and the NMF modules were different across H2 types, suggesting that different H2 types were 30 specialized to different degrees at the H3 level. For example, the H3 types of L5 ET neurons had strong 31 associations with individual NMFs, whereas H3 types of NP and L6b neurons showed little specialization 32 within the dorsal cortex. 33 Consistent with the association in gene expression between NMF modules and H3 types, the H3 types also 34 overlapped with their corresponding NMF modules in space ( Fig. 4B; ED Fig. 6; see Methods). For 35 example, L4/5 IT ML-P was associated with NMF5_VIS and was enriched most strongly in visual cortex; 36 similarly, L4/5 IT P-L/LA was associated with NMF1_TEa_AUD and NMF7_SSs_AUD and was most 37 highly enriched in auditory cortex and temporal association areas. Consistent with the expression patterns 38 of NMF modules, many of these sets of areas were largely within cortical modules that were defined by that are highly interconnected. 42

Figure 4. The composition of cell types is distinct across cortical areas. (A) AUROC of the enrichment for top NMF genes in each H3 type (see Methods). (B) Overlap between the spatial patterns of NMF module expression and the spatial distribution of H3 types. (C) Histogram showing the number of areas (out of 37 areas) that an H3 type was in. An H3 type was considered present in an area if that area contained at least 3% of that H3 type. (D) The spatial distribution of example L4/5 IT and L5 ET H3 types across the cortex plotted on cortical flatmaps. Colors indicate relative cell counts in each cubelet. Gray lines delineate cortical areas. (E) The distribution of L4/5 IT and L5 ET H3 types in example coronal sections. Dashed lines indicate area borders in CCF. Magnified views of the dashed boxes are shown on the right. Brackets in the top plot indicate barrels in the barrel cortex.
To further assess the areal distribution of H3 types, we discretized the cortex on each coronal slice into 1 "cubelets" with similar widths along the mediolateral axis across all slices. Each cubelet spanned all cortical 2 layers and was about 100 μm to 200 μm wide along the curvature of the cortex, and 20 μm thick along the 3 A-P axis (i.e. the thickness of each coronal section; see Methods; ED Fig. 4A). These cubelets were of 4 similar physical sizes and were narrower on the mediolateral axis than the spatial bins used in the previous 5 analysis; this higher lateral resolution makes it easier to assign cubelets to individual cortical areas. We 6 found that H3 types were shared by multiple cortical areas and not specific to any single area (each H3 type 7 was found in 9 ± 3 areas, median ± standard deviation; Fig. 4C, D; Supp. Fig. 1). Thus, the distinctness of 8 neighboring cortical areas cannot be explained simply by the presence or absence of an area-specific H3 1 type. However, we noticed that the compositional profiles of H3 types often changed abruptly near area 2 borders defined in CCF ( Fig. 4E; ED Fig. 7A). Most salient changes occurred at the lateral and medial 3 areas, which are consistent with single-cell RNAseq data (Yao et al. 2021b). Within the dorsolateral cortex, 4 although neighboring cortical areas sometimes shared sets of H3 types, their compositions usually changed 5 abruptly at or near area borders. For example, three L4/5 IT types, including L4/5 IT UL, L4/5 IT ML-P, 6 and L4/5 IT DL, were found in three adjacent somatosensory areas (trunk area, SSp-tr; barrel cortex, SSp-7 bfd; secondary somatosensory cortex, SSs; Fig. 4E, top row), but the numbers of neurons of each type were 8 distinct across the three areas. L4/5 IT UL was found in only small numbers in SSp-tr but expanded in both 9 the number of neurons and their laminar span in the barrel cortex. Furthermore, L4/5 IT UL neurons were 10 clustered along the mediolateral axis into structures that resembled barrels. In the secondary somatosensory 11 cortex, the distribution of L4/5 IT UL lost the barrel-like patterns but remained present in substantial 12 numbers. In contrast, L4/5 IT DL became more dominant relative to L4/5 IT ML-P. Similarly, in L5 ET 13 neurons, PT CTX P was more dominant in SSp-tr, whereas PT CTX ML was found mostly in SSs (Fig. 4E,  14 bottom row). To quantify how well abrupt changes in H3 type composition corresponded to borders 15 between all cortical areas, we identified positions where H3 type composition changed abruptly by 16 identifying peaks in the absolute value of the first derivatives of H3 type composition along the ML axis 17 within each slice. Consistent with the impression from images of coronal sections, 53% of peaks in the first 18 derivatives were within 150 µm from the closest CCF border (we could not assess matching using a more 19 stringent distance because 150 µm is already comparable to the cubelet size in this dataset). The fraction of 20 peaks that were close to CCF borders was higher than 99% of shuffled controls (ED cortical areas are largely distinct in their compositional profiles of H3 types, but individual H3 types are 25 rarely specific to a single cortical area. 26

H3 type composition reveals modular organization of the cortex. 28
Because variation in gene expression and H3 types both followed spatial patterns that were reminiscent of 29 highly interconnected cortical areas, we hypothesized that, like the modular organization of corticocortical 30 connectivity, cortical areas were also organized into modules based on H3 types: cortical areas within a 31 module would be composed of similar sets of H3 types, whereas areas in different modules would be more 32 distinct in their compositions of H3 types. To test this hypothesis, we first asked how well cortical areas 33 could be distinguished using the compositions of H3 types. We then identified cortical modules using the 34 differences in H3 type compositions. 35 To test how well the compositions of H3 types could predict cortical areas, we first used random forest 36 classifiers to predict the AP and ML coordinates of each cubelet given either the total gene expression in 37 that cubelet (Fig. 5A) or its H3 type composition (i.e. the fraction of each H3 type within a cubelet; Fig.  38 5B). We found that cubelet gene expression was highly predictive of locations in the cortex, capturing 94% 39 of variance on both the AP and ML axes. The distance between the predicted and true location of a cubelet 40 across the whole cortex was 235 ± 270 μm (median ± std) along the AP axis (spanning 5,900 μm) and 245 41 ± 364 μm along the ML axis (spanning 8,400 μm) (Fig. 5A, C). These prediction errors were close to the 42 sampling frequency imposed by cubelet size (200 μm between adjacent slices on the AP axis, and 100 μm 43 to 200 μm cubelet width on the ML axis). Strikingly, the H3 type compositions performed similarly well, 44 capturing 89% variance on the AP axis and 92% variance on the ML axis (the prediction errors were 312 ± 1 360 μm on the AP axis and 269 ± 402 μm on the ML axis, median ± std; Fig. 5B, C). Consistent with the 2 high precision in predicting the absolute locations in the cortex, both gene expression and the composition 3 of H3 types were highly predictive of the area labels in CCF (75% correct using gene expression and 69% 4 correct using H3 type composition, compared to 8% in shuffled control; Fig. 5D, E). H3 types within a 5 single parent H2 type were also somewhat predictive of cubelet locations, but those of H2 types in 6 superficial layers (e.g. L2/3 IT and L4/5 IT) were generally more predictive of cubelet locations along the 7 ML axis than those of H2 types in the deep layers (e.g. L6 IT and L6 CT) (ED Fig. 7B, C). The predicted 8 maps correctly captured the locations of most cortical areas, and most of the incorrect predictions occurred 9 along the borders of areas (Fig. 5D). Thus, both cubelet gene expression and H3 type compositions are 10 highly predictive of the locations along the tangential plane of the cortex and the identity of the cortical 11 areas. 12

(G) Cortical flatmaps colored by cell type-based modules as defined in (F) (left) and by connectivity-based modules identified by Harris et al. (2019) (right). Areas in gray did not contain sufficient numbers of cubelets and were excluded from this analysis.
We next assessed the similarity and modularity of cortical areas based on how well cortical areas could be 1 distinguished by their H3 type compositions ( Fig. 5F; see Methods). Briefly, we built a distance matrix 2 between cortical areas based on how well they can be distinguished pairwise using H3 type composition, 3 then performed Louvain clustering on the distance matrix. We identified six clusters, each of which 4 consisted of more than one area (Fig. 5F, gray boxes); these included two clusters that corresponded to the 5 visio-auditory areas and one cluster each for the association areas, somatosensory cortex, motor cortex, and 6 lateral areas. This modular organization is robust to small errors in CCF registration (ED Fig. 7D; see 7 Methods). We further combined these clusters with areas that did not cluster with other areas (PL, RSPd, 8 RSPv) into cortical modules based on similarity in H3 type composition. These modules largely included 9 the visio-auditory areas, somatomotor areas, the association areas, medial areas, lateral areas, respectively 10 ( Fig. 5F). Strikingly, these modules were largely consistent with cortical modules that are highly connected 11 (connectivity-based modules) (Harris et al. 2019) (Fig. 5G). Thus, highly interconnected cortical areas also 12 share similar H3 types and, consequently, characteristic transcriptomic signatures. 13 14

The H3 type compositional profiles reveal neonatal refinement by peripheral inputs 15
Transcriptomic types, areas, and modules reflect cortical organization at different scales, which suggest that 16 they may be generated through different developmental mechanisms. As a first step in understanding the 17 developmental processes that contribute to cortical organization at different scales, we applied BARseq to cell types that are not seen in a normal brain; alternatively, it could enrich or deplete existing cell types 25 (Fig. 6A). Because BARseq is cost-effective and has high throughput, it is uniquely suited for interrogating 26 changes in neuronal gene expression and cell type compositional profiles on a brain-wide scale across many 27 animals, with or without developmental perturbation. 28 We performed binocular enucleation on four mice at postnatal day 1 and collected their brains at postnatal 29 day 28 along with four matched littermate controls (N = 8 total animals) (Fig. 6B). We then performed 30 BARseq using a high-throughput in situ sequencing system with improved lasers and cameras and larger 31 field-of-views (see Methods), which allowed us to sequence the 104-gene panel in 32 hemi-brain coronal 32 sections per mouse, covering most of the forebrain of all eight animals (Fig. 6B) in 18 days (i.e., 2.3 days 33 per brain). The in situ sequencing data were processed, segmented, registered to Allen CCF v3 (Wang et  34 al. 2020), and quality controlled using the same approach as the pilot brain. The full dataset contained 9.1 35 million cells after quality control with a median of 87 reads per cell and 37 genes per cell (Fig. 6C). These 36 quality control measures were significantly improved from the pilot dataset, likely due to improvements in 37 the microscope and in data processing (see Supplementary Note 5 for discussion). Cells from individual 38 brains were interdigitated with cells from other brains in UMAP space, suggesting that there were minimal 39 batch effects ( Fig. 6D; ED Fig. 8A). Therefore, we performed de novo clustering hierarchically on the 40 concatenated data into 3,957,252 excitatory neurons, 1,526,182 inhibitory neurons, and 3,635,402 other 41 cells at the H1 level. The fraction of other cells was significantly higher than that in the pilot dataset, likely 42 because the improved data quality allowed more cells with lower read counts to pass QC and be included. 1 We then re-clustered the excitatory neurons into 35 H2 types (Fig. 6E) and 154 H3 types, including 12 H2 2 types and 70 H3 types that were predominantly found in the cortex. These H3 types in the new dataset 3 closely matched those in the pilot dataset (ED Fig. 8B, C; see Supplementary Note 5 for mapping to the 4 pilot dataset). 5 We first examined whether the effect of enucleation was predominantly reflected in enrichment and/or 6 depletion of H3 types that were already present in the control brains, or distinct H3 types that were absent 7 in the control brains. Consistent with the lack of batch effect seen in the UMAP plots (Fig. 6D), all four 8 pairs of littermates had similar fractions of H1, H2, and cortical H3 types (ED Fig 8D-F). Strikingly, no 9 H3 type was strongly enriched in either the control brains or the enucleated brains (ED Fig 8F). The lack 10 of condition-specific H3 types can be visualized in UMAP plots, in which neurons color-coded by their 11 conditions were smoothly intermixed together (Fig. 6F). This lack of new cell types is also reflected in the 12 spatial distribution of H2 types, which were visually similar across the two conditions ( Fig. 6G; but see 13 below for the distribution of H3 types). To further test the hypothesis that H3 types were shared by both the 14 control and the enucleated brains, we trained a nearest neighbor classifier to predict whether a neuron is 15 from a control or an enucleated brain. Briefly, for each neuron from a held-out litter, we found the 100 16 nearest neighbor neurons in transcriptomic space across the other three litters and predicted whether the 17 neuron was from a control or an enucleated brain based on the number of control neighbors (see Methods). 18 If an H3 type or a subpopulation of an H3 type is seen only in the enucleated brain, then we would expect 19 that neurons in that neighborhood to be surrounded by mostly neurons from the enucleated brains, thus 20 resulting in a high AUROC score. In contrast, the classifier largely failed to predict whether a neuron was 21 from a control or an enucleated brain (mean AUROC was 0.49), although we saw subtle secondary shifts 22 in gene expression within some H3 types in visual areas (see Supplementary Note 6; ED Fig. 8G, H). 23 Although we cannot fully rule out the possibility that minor changes in gene expression were missed at our 24 transcriptomic resolution, these results suggest that enucleation did not lead to the creation of new cell types 25 at the H3 level; rather, the main effect of enucleation was likely reflected in changes in the compositional 26 profiles of H3 types.

Enucleation enriches medial and lateral cell types in the visual areas. 2
Having established that enucleation did not create new H3 types, we sought to characterize enucleation-3 induced changes in area-specific H3-type composition. We divided the cortex into cubelets using a similar 4 approach as we used for the pilot data (see Methods). This discretization resulted in about 270 neurons per 5 cubelet, with a mean distance between adjacent cubelets on a section of 181 µm. To visualize the H3 type 6 composition, we plotted UMAP plots based on the fraction of H3 types in each cubelet (Fig. 7A-C). 7 Consistent with the lack of batch effect seen in the single-neuron gene expression (Fig. 6D), cubelets from 8 all eight animals mixed smoothly in most areas (Fig. 7A). Color-coding cubelets by conditions (Fig. 7B), 9 however, revealed an "island" (left) within which cubelets from the two populations (enucleated vs. control) 10 were largely segregated. This island contained mainly cubelets from VISp and other visual areas (Fig. 7B,  11 C, insets). To quantify the differences in the compositional profiles of H3 types between the control and 12 the enucleated brains, we trained a classifier to assess how distinct cubelets from each cortical area were 13 between the two conditions. If enucleation consistently altered the compositional profile of H3 types in a 14 cortical area, then we would expect that the classifier should be able to predict whether a cubelet was from 15 a control animal or an enucleated animal based on its H3 type composition above chance level. For each 16 cubelet, we identified the 10 nearest neighbor cubelets from the same cortical area in the six animals that 17 came from different litters, then trained a classifier to predict the condition of the cubelet based on the 18 conditions of the neighbors; we then compared the accuracy of prediction to a classifier trained on data with 19 the brain conditions shuffled (see Methods; Fig. 7D). In most cortical areas, the classifier performed at 20 chance level, but VISp cubelets were highly predictive of condition ( Fig. 7D; AUROC 0.90 ± 0.06 21 compared to shuffled AUROC 0.56 ± 0.27, median ± standard deviation; p = 2 × 10 -33 using rank sum test 22 and Bonferroni correction). Two higher visual areas (VISpm and VISl; AUROC medians ± standard 23 deviations are 0.70 ± 0.12 and 0.66 ± 0.15, respectively and the shuffled AUROC medians ± standard 24 deviations are 0.50 ± 0.25 and 0.44 ± 0.25; p = 3 × 10 -9 and 7 × 10 -9 , respectively, comparing each area to 25 shuffled control using rank sum test and Bonferroni correction) and a non-visual area (SSp-ll; AUROC 0.57 26 ± 0.09 and shuffled AUROC 0.42 ± 0.18, median ± standard deviation; p = 2 × 10 -8 comparing to shuffled 27 control using rank sum test and Bonferroni correction) were also predictive above chance level, although 28 the predictive powers were much lower. Thus, enucleation largely affected the relative composition of H3 29 types within visual areas. 30 The effect of enucleation can be directly observed in the distribution of H3 types in the primary visual area 31 ( Fig. 7E; ED Fig. 9A). For example, many L2/3 IT ML_2 neurons (Fig. 7E, yellow dots) were found in 32 VISp in control animals, but L2/3 IT L-M neurons (Fig. 7E, green dots) became enriched in VISp in the 33 enucleated animals. Similarly, L6 IT DL neurons were seen in higher numbers in the VISp of the enucleated 34 animals compared to that of the control animals. To systematically examine how enucleation affected the 35 compositional profiles of cortical excitatory cell types in each area, we looked for H3 types that were 36 enriched or depleted in enucleated brains using an ANOVA model, adjusting for litter and area effects (see 37 Methods). We found that 46 H3 types in 18 areas across the whole cortex (Fig. 7F) were over-or under-38 represented in the enucleated animals compared to control animals. VISp had the most H3 types (16 types) 39 whose compositions were altered by enucleation. The affected H3 types were found across most H2 types, 40 with the strongest enrichment or depletion of H3 types of L2/3 IT, L4/5 IT, and L6 IT (Fig. 7G). 41 Intriguingly, L6b/CT A-L_2, a transitional type between L6 CT and L6b H2 types that was usually found 42 only in lateral areas, was also highly enriched in VISp after enucleation. The affected H3 types remained 43 in their characteristic sublaminar positions (ED Fig. 9B), and the overall changes were consistent with, but 1 broader than those observed after dark rearing during the critical period ( areas that were immediately medial and lateral to the visual areas (ED Fig. 9D). Thus, enucleation broadly 5 shifted neurons in VISp towards H3 types that were usually enriched in the medial and lateral areas in 6 control brains. 7

Peripheral inputs sculpt characteristic cell type compositions of the visual areas within the visio-auditory
Because the enriched H3 types were consistently found in medial and/or lateral areas, we wondered if these 4 changes also shifted the overall area identity -as defined by the H3 type compositional profiles -of the 5 visual cortex towards other areas. To examine how area identities changed after enucleation, we used a 6 nearest-neighbor based approach that was inspired by MetaNeighbor (Crow et al. 2018) to assess how 7 similar cubelets in the control and the enucleated brains were to other cubelets in control brains (see 8 Methods). If enucleation shifted the compositional profile of an area towards a target area, then cubelets 9 from that area in the enucleated brain would have more neighbors in the target area than cubelets from the 10 same area in the control brain. For each cubelet in a littermate pair, we found the 20 nearest neighbor 11 cubelets in H3 type composition in the control brains of the other three pairs of littermates. We then 12 calculated the similarity, quantified by the AUROC for assigning cubelets from each area to areas in the 13 control brains based on the nearest neighbors (ED Fig. 9E). control and enucleated VISpm), indicating that their H3 type compositions remained highly distinct to other 17 areas despite the changes induced by enucleation. However, each affected area also shifted towards the 18 identities of neighboring areas as judged from the fraction of neighbors from an area (Fig. 7H). For 19 example, VISp cubelets from the enucleated brains had higher AUROC scores with both VISl and VISpm 20 than VISp cubelets from the control brains (0.85 and 0.89, for the enucleated cubelets, and 0.76 and 0.83 21 for the control cubelets; ED Fig. 9E). Consistent with the high AUROC scores, VISp cubelets from the 22 enucleated brains also had more neighbors in VISl and VISpm (Fig. 7H). Similarly, VISl cubelets from the 23 enucleated brains had more neighbors in auditory areas, and VISpm cubelets from the enucleated brains 24 had more neighbors in VISam and RSPagl (Fig. 7H, insets). Strikingly, all three areas shifted towards 25 neighboring areas that were physically further away from VISp, but within the visio-auditory module (black 26 outlines in Fig. 7H). To examine whether these changes reflected a shift in area borders or a change in 27 composition across an area, we plotted each cubelet from the enucleated brains and colored them by the 28 differences in the number of neighbor cubelets in VISl (ED Fig. 9F, top) and VISpm (ED Fig. 9F, top). In 29 VISp, the enrichment of neighbors in VISl and VISpm were found in cubelets across the whole area. In 30 particular, cubelets that had more neighbors in VISpm following enucleation (red dots in VISp in ED Fig.  31 9F, top) appeared to be concentrated at the center of VISp rather than at the borders, suggesting that the 32 changes in the similarity among these areas reflected an overall change in cell type composition, not a shift 1 in area borders. Thus, enucleation shifted the H3 type composition-defined area identities of the visual areas 2 towards neighboring areas within the visio-auditory module. 3 4 Discussion 1 Using BARseq, we have generated cortex-wide maps of transcriptomic types of excitatory neurons with 2 high transcriptomic and spatial resolution across nine animals. These maps revealed that the spatial patterns 3 of gene expression and the compositions of neuronal types delineate the divisions of cortical areas. This 4 area-based specialization in gene expression further revealed similarities across subsets of cortical areas, 5 which we call modules, that are also highly interconnected. By comparing binocular enucleated animals to 6 control littermates, we further showed that the precise cell type compositional profiles of cortical areas are 7 refined by activity-dependent postnatal development. Together, these results reveal that peripheral inputs 8 play a key role in shaping the distinct cell type compositional profiles of cortical areas, especially within a 9 cortical module. The present study is focused on understanding the organization of cell types in the cortex, but the same 26 approach can be flexibly applied to any other brain region. Application to other brain regions will likely 27 require optimizing gene panels for the target regions and/or optimizing cell segmentation. Because BARseq 28 uses combinatorial coding in the GII, the number of genes that can be resolved increases exponentially with 29 the number of sequencing cycles (Sun et al. 2021). Thus, BARseq can potentially interrogate more genes 30 to distinguish more transcriptomic types in other brain regions. When examining large numbers of genes, 31 optical crowding may eventually limit the number of reads that can be resolved in each cell, but 32 computationally demixing overlapping signals (Chen et al. 2021a) and/or using multiple sets of sequencing 33 primers can potentially overcome this limitation. Cell segmentation may also be challenging in brain 34 regions with densely packed cell bodies (e.g., dentate gyrus or piriform cortex), and new approaches, such 35 as Cellpose2 (Pachitariu and Stringer  In addition to generating a cell type map of the cortex, our study further leverages the high throughput, low 38 cost, and reproducibility of BARseq to establish causal relationships between developmental perturbations 39 and cortex-wide cell type organization by comparing across multiple animals. In this study, we collected 40 eight hemispheres from enucleated and control littermates in eighteen days of sequencing using a single 41 microscope and spent $2,000 in reagents per brain. Furthermore, data across the eight animals had no 42 apparent evidence of batch effects. This ability to compare molecular architecture across animals at such a 43 large scale would be difficult to achieve using other spatial transcriptomic techniques. 44 1

Fine-grained neuronal types reflect cortical area-based specialization 2
By leveraging a high-resolution map of gene expression across the cortex, our data reveal the 3 transcriptomically defined neuronal type and gene expression basis of cortical areas at a higher spatial 4 resolution than has previously been possible (Tasic et al. 2018;Yao et al. 2021b). Our results suggest that 5 the cell type compositional profiles of cortical areas reflect their modular organization seen in connectivity 6 studies: cortical areas that are highly interconnected also have similar H3 types (Fig. 8, top). This 7 relationship between transcriptomically defined neuronal types and connectivity, which we summarize as 8 "wire-by-similarity," is distinct from conventional wiring rules that are commonly observed at individual 9 neuronal type level. In most circuits, similar cell types usually share similar inputs and/or outputs. For 10 example, the connections from cortical neurons in a given layer to another layer are similar across areas 11 (Shepherd et  will provide crucial evidence to support or refute our hypothesis. Cheng et al. 2022). In our experiments, the combination of single cell resolution, high transcriptomic 10 resolution, and broad interrogation across many cortical areas allowed us to reveal unprecedented details 11 of how gene expression and cell type compositional profiles change after enucleation. Overall, the effects 12 of enucleation on the visual areas suggest that peripheral activity refines the cell type compositional profiles 13 of cortical areas, so that they acquire the compositional profiles that are characteristic to each area. 14 Interestingly, many H3 types that were enriched in VISp following enucleation were often found in the 15 control brains in lateral and medial areas, such as the anterior cingulate areas and the agranular insular 16 areas, which are far outside of the visio-auditory module. This broad distribution of enriched H3 type is 17 expected from our results in the pilot dataset: Whereas most individual H3 types are broadly distributed 18 higher visual areas (Fig. 8, bottom). Our study does not provide information on the mechanism through 23 which peripheral activity refines cell type composition: These changes could be driven by thalamic inputs 24 regardless of activity pattern, by patterned activity that is characteristic of visual inputs, by inducing cell 25 death in select cell types, and/or by rewiring of thalamocortical axons and/or cortical connectivity due to 26 lack of activity. Distinguishing these possibilities and determining whether the refinement role of peripheral 27 inputs is restricted to within modules for other sensory modalities will require future experiments exploring 28 changes in projections with finer manipulation of thalamic and cortical activity. 29

BARseq enables brain-wide comparison of cellular gene expression across individuals. 31
Animals in a natural population, like human populations, are diverse in their genetics and anatomy; even 32 mice within the same inbred strain can display differences in brain morphology and development 33 (Valiquette et al. 2023 three animals. The number of animals that can be interrogated is largely constrained by the low throughput 42 and high cost of many spatial transcriptomic approaches. Furthermore, technical variability across samples 43 (i.e., batch effects) also confounds integrating data across multiple animals, making it difficult to interpret 44 any differences found across animals. These factors raise questions of how generalizable these atlases are 45 across individual animals and across strains of mice, and limit how they can be used to help interpret other 1 experiments. BARseq overcomes all three barriers (throughput, cost, batch effect) to enable brain-wide 2 interrogation across many animals. Because it is easier to associate neuronal morphology, 3 electrophysiology, connectivity, and neuronal activity with gene expression than with cytoarchitecture, 4 redefining a mouse brain common coordinate framework using transcriptomic type information could be 5 more informative than the existing one based on cytoarchitecture. Our current throughput (two days per 6 hemi-brain) already allows hundreds of animals to be interrogated in a microscope year, and moderate 7 scaling up can easily achieve thousands of animals per year. BARseq thus provides a path to go beyond a 8 reference mouse brain transcriptomic atlas based on representative individuals (Langlieb et  In addition to creating a pan-transcriptomic atlas of the mouse brain, the ability to interrogate and compare 12 brain-wide molecular architecture across multiple individuals can facilitate discovery broadly across the 13 neuroscience field. This study provides a proof-of-principle by identifying how cell types change across 14 the whole cortex in response to a developmental perturbation. The same approach can also be used to 15 generate information on cell type differences associated with neuropsychiatric disease models, age, species, 16 and other experimental perturbations. The ability to identify differences brain-wide is especially useful to 17 understanding neuropsychiatric conditions, because many neuropsychiatric conditions and/or vulnerability 18 to mental disorders are associated with differences across many brain regions. Our approach based on 19 BARseq can thus be generalized to other brain regions and questions to bridge brain-wide network-level 20 dynamics with detailed changes in gene expression in single neurons and to establish causal relationship 21 between developmental processes and brain-wide cell type organization. Boise, ID). The eyelids were opened with a sterile surgical scalpel blade. Using fine forceps, eyeballs were 12 lifted away from the orbit and freed from the optic nerve and surrounding musculature. Following eye 13 removal, eyelids were closed and sealed with surgical glue (Vetbond, 3M, Maplewood, MN). After surgery, 14 pups were placed in a plastic box in a warm water bath maintained at 37°C for ~1 h for recovery before 15 returning to their mothers. Only one-half of the entire litter underwent enucleation surgery to ensure 16 maternal care and the thriving of the pups. Sham surgery was performed as the control on the other half of 17 the litter at the same age, where pups were subjected to anesthesia and revival procedures as described 18 above. All pups were housed with their mothers until they were used for experiments at P28 and were 19 weighed routinely to ensure normal thriving. 20 To collect brains for BARseq, we euthanized the animals with isoflurane overdose, decapitated the animals, 21 embedded the brain in OCT and snap-froze in an ethanol dry-ice bath. In experiments in which only one 22 hemisphere was used, we bisected the brain along the midline before OCT embedding. The brains were 23 then sliced to 20 μm hemi-coronal sections and mounted onto Superfrost Plus Gold slides. Eight sections 24 were mounted onto each slide. 25

BARseq library preparation 26
BARseq samples were prepared as previously described (Sun et al. 2021). Briefly, slides were fixed in 27 PFA, dehydrated in ethanol, rehydrated in PBST (PBS with 0.5% Tween-20), then reverse transcribed using 28 random primers with a N-terminal amine group and Revert-aid H-minus reverse transcriptase (Thermo 29 Fisher). On the second day, we crosslinked the cDNA products, then performed padlock probe hybridization 30 and ligation, followed by rolling circle amplification with ϕ29 DNA polymerase (Thermo Fisher). On the 31 third day, we crosslinked rolonies. To sequence the samples, we hybridized sequencing primers and 32 manually performed sequencing using either the Illumina HiSeq Rapid SBS kit v2 (for the pilot dataset) or 33 the Illumina Miseq nano v2 kit (for the eight littermates) following previous protocols (Sun et al. 2021). 34 After seven sequencing cycles were completed, we then striped the sequencing primers using three 35 incubations at 60℃ for 5 mins each in 60% formamide, 2× SSC, 0.1% Tween 20. We then hybridized 36 fluorescent probes in 10% formamide, 2× SSC for 10 mins, followed by incubation in 2 µg/mL DAPI in 37 PBST for 5 mins. We then imaged the final hybridization cycle.

BARseq data collection 42
BARseq data from the pilot brain was collected as described previously (Sun et al. 2021) on a Olympus IX-1  81 microscope with PI P-736 piezo z-stage, Olympus UCPLFLN20X 20× 0.7NA air objective, a Crest  2  xlight v2 spinning disk confocal, 89North LDI-7 laser bank, and Photometrics BSI-prime camera. Because  3  the microscope stage could only fit 3 slides at a time, the posterior 22 sections and the anterior 18 sections  4 were collected in two separate batches. BARseq data from the eight littermate brains were collected on a 5 Nikon Ti-2E microscope with PI P-736 piezo z-stage, Nikon CFI S Plan Fluor LWD 20XC 0.7NA 6 objective, a Crest xlight v3 spinning disk confocal, Lumencor Celesta 7-line laser, and a Photometrics 7 Kinetix camera. Four slides with eight slices per slide, all from the same animal, were imaged in each batch. 8 The filters and lasers used are listed in Supplementary Table 2.  9 10 BARseq data processing 11 BARseq data were processed as described previously (Sun et al. 2021) with slight modifications. Briefly, 12 we created max-projections from the image stacks, applied noise reduction using Noise2Void (Krull et al. 13 2018), applied background subtraction, corrected for channel shift and bleed-through, and registered images 14 through all sequencing cycles. We segmented cells using Cellpose (Stringer et al. 2020), decoded rolonies 15 using BarDensr (Chen et al. 2021a), and assigned rolonies to cells. During BarDensr decoding, we used 16 negative control GIIs (i.e., GIIs that were not carried by any padlock probe) to estimate false discovery rate 17 (FDR), and automatically adjusted decoding threshold to target about 5% FDR. Finally, the images were 18 stitched to generate whole-slice images. All processing steps were performed on each imaging field of view 19 (FOV) separately, not on the stitched images, to avoid stitching errors, to minimize alignment artifacts due 20 to imperfect optics, and to facilitate parallel processing. The stitched images were only used to generate a 21 transformation, which we applied to each rolony and cell after all other steps were finished. Overlapping 22 cells from neighboring FOVs were detected using a custom implementation of sort and sweep (Baraff and  23 Witkin 1992), an algorithm that is commonly used in detecting object collision in video games. When two 24 or more cells were detected as overlapping, cells that had more read counts (which we assumed to have 25 higher read quality) were kept, and all other cells were removed. See Data and Code Availability for 26 processing script and intermediate processed data. 27 We then registered each stitched slice to Allen CCFv3. For each slice, we first generated an image that 28 color coded each cell by its H2 type identity. We then used QuickNii (v.3 2017) to manually select the CCF 29 plane for each slice and Visualign (v. 0.9) to align area borders within each slice. After we obtained CCF 30 coordinates for each neuron, we registered all cortical cells onto a CCF flat map using python 31 ccf_streamlines package (https://pypi.org/project/ccf-streamlines/). Streamlines represent the paths that 32 most directly connect the pia of the isocortex to the white matter while following the curvature of those 33 surfaces. Because the streamlines were defined in CCF space and are thus consistent across all animals, this 34 approach allowed us to more reliably obtain relative cortical depth and relative cortical location of each 35 neuron across the eight brains compared to the manual approach used in the pilot brain. 36 37

Quality control 38
After segmentation, we obtained a total of 13,886,988 segmented cells. We kept all cells expressing at least 39 5 unique genes and at least 20 total counts, resulting in a count matrix with 104 genes and 10,378,092 cells. 40

Iterative clustering and annotation of BARseq transcriptomic data 42
To obtain H1, H2, and H3 types in the pilot brain, we adopted an iterative clustering pipeline adapted from 1 single-cell RNA sequencing (scRNAseq) studies (Amezquita et al. 2020). We performed 3 rounds of 2 clustering with the following steps: normalization, dimension reduction (PCA), computation of a shared 3 nearest neighbor (SNN) network, Louvain clustering. We normalized counts to CP10 (counts per 10)  4 values, then applied the log1p transformation (log with a pseudo-count of 1). Because our panel consists of 5 marker genes, we skipped highly variable gene selection and ran PCA on all genes. We computed PCA 6 using the scater::runPCA function (McCarthy et al. 2017) and kept the first 30 PCs. We built the SNN 7 network using the scran::buildSNNGraph function with 15 nearest neighbors and the "rank" metric. We ran 8 the Louvain clustering as implemented in the igraph::cluster_louvain function with default parameters. All 9 UMAP visualizations were obtained using the scater::runUMAP function starting from the PCA dimension 10 reduction and using the 15 nearest neighbors. 11 In the first round of clustering, we separated cells in three classes reflecting neurotransmitter expression: 12 excitatory (expressing Slc17a7), inhibitory (expressing Gad1), and others (expressing neither Slc17a7 nor 13 Gad1). Because marker genes are frequently undetected at the single-cell level, we ran the clustering 14 pipeline on all cells, obtaining 24 clusters, then assessed marker expression at the cluster level. From the 15 UMAP visualization, we distinguished 3 groups of clusters. The first group contained excitatory clusters 16 expressing Slc17a7, the second group contained inhibitory clusters expressing Gad1, the third group 17 expressed neither of these markers. Based on these observations, we manually annotated the clusters as 18 "excitatory", "inhibitory", and "other", respectively (H1 types, 642,340, 427,939, and 188,977 cells, 19 respectively). In the second round of clustering, we extracted all cells labeled as "excitatory" and 20 "inhibitory", then ran our pipeline again on each H1 type separately. By re-running the pipeline, the 21 dimension reduction and clustering are better targeted at finding variability specific to either excitatory or 22 inhibitory cells. We obtained 18 excitatory clusters and 11 inhibitory clusters, which formed the basis for 23 our H2 types. In the third round of clustering, we re-ran the pipeline on each excitatory H2 type, obtaining 24 roughly 5 to 6 clusters by type (117 total H3 types). 25 To annotate H2 and H3 types in the pilot brain, we examined the brain-wide distribution and the marker 26 expression of H2 and H3 types. Almost all H3 types showed brain-area specificity, suggesting that the data 27 were clustered at a biologically meaningful granularity. We annotated isocortical H2 types based on 28 aggregate marker expression (see Supplementary Note 1 for marker selection). We annotated non-29 isocortical H2 types based on the localization of cells (e.g., hippocampal areas, thalamus, entorhinal cortex). 30 When H2 types contained a mix of isocortical and other cells (e.g., "L6 IT-like"), we split the H2 type into 31 multiple H2 types, one containing the isocortical cells (e.g., "L6 IT"), the others containing the other cells 32 (e.g., "PIR L6 IT-like", "AON DL"). In the end, we obtained a list of 11 cortical H2 types and 23 non-33 cortical H2 types. After mapping to scRNAseq reference types, we noticed that four H3 types ("PT AUD", 34 "PT P RSP/IT-like?", "RSP DL", "CT CTX A/L-V") were assigned to incorrect H2 types ("L5 IT", "RSP 35 DL", "L5 IT", "L6b"); we manually corrected their H2 annotation (to "PT", "PT", "RSP DL", "CT"). 36 The eight littermate brains were processed as a single batch using the same overall procedure as the pilot 37 brain, with the following adjustments at the H1 and H2 levels to minimize computational load. We used 38 scater's calculatePCA and calculateUMAP functions (with external_neighbors=TRUE and 39 BNPARAM=AnnoyParam()). We computed clusters using Seurat's FindNeighbors function(Stuart et al. annotation, we used a combination of the strategy described above and top hits to pilot brain annotations 43 using MetaNeighbor. 44

Mapping of BARseq types to scRNAseq reference types 1
To map BARseq types to reference types, we used a k-nearest neighbor (kNN) approach to label each cell 2 according to its closest neighbors in a whole-cortex and hippocampus reference compendium (Yao et al. 3 2021b). First, we evaluated the accuracy of the kNN approach on a subset of the reference compendium 4 using leave-one-cell-out cross-validation (10X MOp dataset, "Glutamatergic" cells, cortical cells labeled 5 as "CTX" or "Car3", clusters with ≥ 50 cells, CP10K and log1p normalization). To transfer labels, we 6 picked each cell's closest 15 neighbors using the BiocNeighbors::queryKNN function, then predicted the 7 cell's type by taking a majority vote across the neighbors. We compared accuracies across 4 gene panels: 8 highly variable genes (HVGs, 2000 genes selected using scran::modelGeneVar and scran::getTopHVGs), 9 HVG selection followed by PCA (30 components, scater::runPCA), the BARseq panel (104 genes, after 10 excluding Gad1 and Slc17a7), the BARseq panel with reads down-sampled to match BARseq sensitivity 11 (104 genes, binomial sampling of reads, re-normalization through sample-wise ranking and scaling). For 12 the latter gene set, reads were downsampled for each gene according to the sensitivity ratio (BARseq 13 average counts divided by reference average counts). For genes that had a sensitivity ratio > 1, we 14 oversampled reads to match BARseq sensitivity (reads multiplied by ⌊ ⌋ + binomial sampling with 15 probability ⌊ ⌋ − ). 16 Having validated that the kNN mapping procedure was able to assign cell types with high accuracy, we 17 applied the same procedure (sensitivity adjustment of reference datasets, sample-wise ranking and scaling 18 of reference and target datasets, BiocNeighbors::queryKNN with 15 neighbors) to assign a reference label 19 to each BARseq cell. In contrast with the previous evaluation, we used all excitatory cells from the reference 20 compendium (40 individual datasets, all "Glutamatergic" cells) and adjusted reference reads using a 21 simplified downsampling procedure (reads multiplied by the sensitivity ratio for each gene). To compute 22 the overlap between BARseq and reference cell types, we used the Jaccard coefficient (number of cells 23 labeled as BARseq type and predicted to be reference type divided by the number of cells of type or 24 predicted to be type ). 25 We mapped both H3 types in the cortex (Fig. 2E) and those in the hippocampal formation (ED Fig. 2F) to 26 single-cell RNAseq data. However, we do not expect perfect matching for clusters outside of the cortex, 27 because our dataset sampled additional brain regions that were not sampled in the single-cell RNAseq data. 28 In addition, our gene panel was optimized for cortical excitatory neurons and could miss highly 29 differentially expressed genes in other brain regions. 30 We mapped H3 types from the eight littermate brains against the pilot brain and scRNAseq data from 31 ) using the procedure outlined above. 32

Variance of expression explained by H2 types, H3 types, and space 34
To evaluate how well H2 types, H3 types, and spatial information recapitulate the variability of expression, 35 we performed one-way ANOVA on pseudo-bulk data. This analysis was run on a subset of data containing 36 the 8 isocortical H2 types with isocortex-wide localization ("L2/3 IT", "L4/5 IT", "L5 IT", "L6 IT", "PT", 37 "NP", "CT", "L6b"). 38 We started by computing the pseudo-bulk expression matrix , providing expression of gene g for H3 39 type t in spatial bin s. We defined 540 spatial bins containing an average of 14 cells per H3 type as follows: 40 27 slices along the A-P axis (corresponding to slices 7 to 33), 20 bins along the un-warped M-L axis for 41 each slice (chosen to balance the number of cells in each bin, computed independently for each slice). 42 Slices at the anterior and posterior end of the brain were excluded because coronal sections were not 1 perpendicular to the cortical surface at these extreme positions and would thus bias gene expression. 2 Starting from the gene by cell count matrix , we have = ∈ , ∈ ( ). 3 Next, we computed the variance explained by the 8 H2 types, 51 H3 types and 540 spatial bins by applying 4 the one-way ANOVA formula for each gene and factor independently. Let = ∈ 3, ∈ ( ) be 5 the overall average expression and = ∑ ∈ 3, ∈ ( − ) 2 be the total variance. For gene g, we 6 computed the variability explained as follows: 7 Because H3 types are nested factors of H2 types, the variability explained by H3 types is necessarily higher; 11 the additional variability explained by H3 types is given by

Extraction of recurrent spatial patterns using non-negative matrix factorization 14
The ANOVA analysis revealed the presence of recurrent spatial patterning across genes and H2 types. We 15 used non-negative matrix factorization to extract these patterns. First, we defined a pseudo-bulk matrix 16 using the same procedure as the ANOVA analysis (see above), except that we computed the pseudo-bulk 17 matrix at the level of H2 types. Starting from the count matrix , we have = ∈ , ∈ ( ) ,  18 where is one of the 8 H2 types. Here, we consider the spatial bins as features (rows), genes and types as 19 variables (columns), resulting in a matrix with 540 rows and 848 columns. Because spatial patterns had 20 different scales (average level of expression) across genes and H2 types, we rescaled each column using 21 L2-normalization (squared columns sum to 1). This rescaling ensures that factors reflect recurrent patterns 22 (seen in multiple genes and H2 types) rather than a single instance (highly expressing gene in one H2 type). 23 This procedure (pseudo-bulking at H2 level and rescaling) can also be seen as a correction for H2 type 24 composition (overall expression patterns are dominated by the most common or the highest expressing H2 25 type). We extracted 10 NMF factors using the NMF::nmf function using default parameters (Brunet 26 algorithm), obtaining a nonnegative basis matrix W (540 bins by 10 factors) containing spatial patterns and 27 a nonnegative coefficient matrix H (10 factors by 848 columns) such that B≈W.H. For later analysis, we 28 removed 3 NMF factors that reflected obvious slice-specific batch effects (see Supplementary Note 4). 29 To identify genes associated with each NMF factor, we computed the average fraction of expression 30 variance explained by each NMF factor. Given an NMF factor f, we computed the predicted gene expression 31 for gene g in type t as . � = . . , where . is the column in W representing factor f, and is 32 the coefficient associated with factor f, gene g and type t. The variance explained is then computed as = 33 ( − ∑ ∈ ( − � ) 2 )/ , where = ∑ ∈ 2 is the total uncentered variance. We then took 34 the average variance explained across the 8 H2 types. To estimate the null fraction of variance explained, 35 we permuted coefficients associated with each factor (permutation of rows in H), then recomputed the 1 average variance explained across all genes and factors. We labeled gene-NMF associations as "significant" 2 if the average variance explained exceeded the 99th percentile of the null distribution. 3 To identify H3 types associated with each NMF factor, we computed two measures of association: 4 enrichment of NMF-associated genes and correlation of spatial distribution with NMF patterns. Using the 5 MetaMarkers package, we computed differentially expression (DE) statistics for each H3 type (1-vs-all 6 differential expression against other H3 types from the same H2 type). We then asked whether top DE 7 genes were enriched for NMF-associated genes (genes with significant association with a single NMF). We 8 report the enrichment as an AUROC, asking how well expression fold changes predict NMF-associated 9 genes. Independently, we computed the overlap between NMF patterns (columns in W) with the spatial 10 distribution of H3 types (count of cells in bin s divided by total count) using the Spearman correlation. 11 Because the dynamic range of the correlation coefficient depends on the sparsity of the pattern (sparser 12 patterns tend to have a smaller range of correlation values), we report the association as the Z-scored 13 Spearman correlation (Z-scoring across H3 types within a given H2 type and factor). 14 15 16 We first binned cells in the cortex into cubelets, which were drawn separately on each coronal section, 17 spanned all cortical layers, and were about 100 μm -200 μm on the M-L axis along the curvature of the 18 cortex. To draw the cubelets so that their medial and lateral borders were perpendicular to the layers, we 19 manually drew matching points along the top and bottom surface of the cortex, especially at locations where 20 the curvature of the cortex is extreme (e.g., the medial part of the cortex). This step thus separates both the 21 bottom and top surfaces of the cortex into several segments. Within each segment, we then cut the two 22

Predicting cortical areas and locations using gene expression and H3 type composition
surfaces into the same number of smaller segments of roughly equal distance. Each cubelet was then defined 23 by connecting the ends of the small segments. Slices at the anterior and posterior end of the brain were 24 excluded because coronal sections were not perpendicular to the cortical surface at these extreme positions 25 and would thus bias composition of neuronal populations. Unlike the spatial bins used in the NMF analysis, 26 which had equal cell numbers within each slice but unequal widths, the cubelets had similar widths across 27 all slices, but not necessarily similar cell numbers (ED Fig. 4A). 28 Because the cubelets are generated within each of the large segments, they may be slightly different in size 29 across different segments. We thus normalized both H3 type counts and gene read counts within each 30 cubelet by the total number of cells and/or gene reads for downstream analyses. For the A-P locations of 31 each cubelet, we used the CCF coordinates directly. For the M-L locations, we calculated an un-warped 32 coordinate along the cortex within each coronal section as follows. We connected the centroids of adjacent 33 cubelets and defined the un-warped distance between two cubelets as the sum of all connected lines across 34 all cubelets between the two. We then defined the zero position along this un-warped M-L axis as the point 35 that was closest to the point in CCF where the midline of the brain intersects the top surface of the brain. 36 Thus, cubelets on the medial side have negative M-L coordinates, whereas those on the dorsolateral side 37 have positive M-L coordinates. 38 To predict the coordinates of each cubelet, we trained random forest regression models, each with 50 trees 39 and an in-bag-fraction of 0.5. We used 100-fold cross-validation to evaluate the performance of the models. 40 To evaluate the prediction performance using the composition of H3 types within each H2 type, we trained 41 similar regression models using only the relevant H3 types and evaluated performance using 5-fold cross-42 validation. To predict cortical area labels in CCF, we first assigned CCF area labels to each cubelet using 43 its centroid location. We then trained random forest classifiers with 500 trees and an in-bag-fraction of 0. 5 44 and evaluated the performance with 10-fold cross validation. All models were built in MATLAB using the 1 TreeBagger function. 2 3

Correspondence between abrupt changes in the composition of H3 types and area borders defined by
To find positions of abrupt changes in the composition of H3 types within each coronal section, we 6 performed principal component analysis on the composition of H3 types and calculated the two-norm of 7 the 1st derivative of the first five principal components along the un-warped M-L axis. We then convoluted 8 the two-norm of the derivatives with a smoothing window of the shape [0.25, 0.5, 0.25]. We then looked 9 for local peaks with prominence that was larger than half a standard deviation, and with the value at the 10 peak higher than the median of the smoothed norm of the derivatives. Peaks were considered close to a 11 CCF-defined border if it was within 150 μm from that border, calculated based on the CCF coordinates of 12 the centroids of cubelets. All relevant analyses were performed in MATLAB. 13

Inferring cortical modules from H3 type composition 15
We only clustered areas with at least 7 cubelets. For each pair of areas, we built a support vector machine 16 classifier with Lasso regularization to predict the area identity given H3 type composition. We calculated 17 the area under the ROC curves following 5-fold cross validation. The AUROC values were used to build a 18 distance matrix among cortical areas. We then performed Louvain community detection using this distance 19 matrix, which generated six clusters and three areas that did not cluster with any other areas. We then built 20 a dendrogram of the six clusters and three areas using the medians of all pairwise distances between areas 21 from each pair of clusters and ward linkage. We then manually cut the dendrogram to generate the five To test how the strength of cortical modularity, we calculated the modularity of clusters with the following 25 perturbations: (1) Randomly shuffle cortical area labels across all cubelets. (2) Randomly assign each 26 cubelet with an area label within 1-cubelet distance with same probabilities. That is, for each cubelet, there 27 is an equal chance of assigning an area label of the cubelet itself, or the labels of the two cubelets adjacent 28 to it. (3) Randomly assign each cubelet with an area label within 2-cubelet distance with the same 29 probabilities. In addition, for the random shuffling control, we calculated modularity for both the clusters 30 identified by Louvain clustering on the original data and the clusters identified by Louvain clustering on 31 the shuffled data. 32 All relevant analyses were performed in MATLAB. The linear classifier was generated using the fitclinear 33 function, and Louvain community detection was performed using a MATLAB implementation by Antoine 34 Scherrer. 35

Gene expression changes in the enucleated animals 37
To assess the nature of the expression shift from control to enucleated animals, we first investigated the 38 neighborhood of individual neurons. If types appeared or disappeared in enucleated animals, we would 39 expect that the neighborhood of groups of neurons would be entirely composed of neurons of either control 40 or enucleated animals. To avoid donor and litter effects, we only considered neighborhoods across litters 41 (e.g., neurons from donors 3 to 8 for donor 1). For each neuron, we picked the k = 100 closest neighbors 1 (using the same procedure as kNN dataset mapping, sample-wise ranking and scaling followed by 2 queryKNN), then computed the fraction of neurons from control animals. 3 We then used the same procedure to better understand the nature of the shift. Instead of looking at the 4 "condition" label of each neighbor (control or enucleated), we collected the brain area labels of the top k = 5 20 neighbors. We aggregated these counts at the H2 level for each animal, then performed a two-sided 6 hypergeometric test within each litter to identify brain region neighborhoods that were significantly 7 depleted and enriched in enucleated animals. We combined p-values from individual litters into an overall 8 p-value using Fisher's method, then corrected p-values using the FDR procedure. 9 10

Cell type composition changes in the enucleated animals 11
To assess how cell type composition is related to cortical areas in control and enucleated animals, we 12 analyzed the differences in H3 type composition for each cubelet in all control and enucleated littermate 13 brains. For the eight littermate brains, we used the flatmap mapping to define cubelets. For each slice, we 14 fitted a polynomial curve line using flatmap coordinates of the middle layer neurons (relative depth between 15 80 to 120 on a scale of 0 to 200) and used this fit to represent the midline of the cortex. We then projected 16 neurons at all depths onto this midline based on their flatmap coordinates. After projecting all neurons, we 17 segmented the fitted line into 50 roughly equal segments. In this way, we assigned neurons into cubelets 18 with consistent number of neurons for the control and enucleated littermate brains. 19 For all cubelet-based analyses, we filtered out cubelets with damaged or warped tissue, as well as cubelets 20 on the posterior edge of cortex that did not contain all cortical layers. Areas were assigned to each cubelet 21 based the area assigned to the majority of neurons within that cubelet. UMAP plots were generated using 22 the fraction of H3 types in each cubelet. 23 To test how distinguishable an area is between the enucleated and the control brains, for each cubelet we 24 identified the 10 nearest neighbor cubelets from the same cortical area in the six animals that came from 25 different litters. We randomly selected 20% of all available cubelets for each iteration and performed 50 26 iterations. For shuffled controls, we shuffled the conditions of the six brains, but cubelets within each brain 27 retained the same condition label as other cubelets from the same brain. We then calculated AUROC based 28 on the condition labels for the nearest neighbors. We performed t-test and used the Benjamini-Hochberg 29 procedure to find the false discovery rate (FDR). 30 To test how well each area from the enucleated or the control brains maps onto areas in the control brains, 31 for each cubelet we identified the 20 nearest neighbor cubelets from any cortical area in the three control 32 animals that came from different litters.  For each BARseq H2 type, we show the mapping of BARseq H3 types with reference scRNAseq type (left), 3 the CCF enrichment of H3 types (top right), and the CCF enrichment of scRNAseq types (bottom left). The 4 mapping between BARseq and scRNAseq types is quantified as the Jaccard Index, significant associations 5 (permutation test) are shown by outlining dots with black circles. The regional enrichment is quantified as 6 odds ratios, significant deviations (hypergeometric test) are shown by outlining dots with black circles. In 7 the mapping panel, all BARseq H3 types are shown but, for readability, only scRNAseq types with 8 significant associations are plotted. In contrast, the CCF enrichment is shown for all scRNAseq types 9 belonging to subclasses that are equivalent to the BARseq H2 type (e.g., the BARseq L4/5 IT type 10 corresponds to the L4 IT and L4/5 IT subclasses in the scRNAseq dataset). Colors indicate log odds ratios 11 and circle size indicates the fraction of cells among all cells of that H2 type.  The left column shows the pattern associated 5 with each NMF component; the right column 6 shows the distribution pattern (fraction of cells 7 from the H3 type found in each bin) of H3 8 types showing above-null association with the 9 NMF pattern. Numbers in parentheses next to 10 H3 type names show the scaled Spearman 11 correlation between the NMF pattern and the 12 distribution pattern. slice, and the difference in the fractions of positions that were close to a CCF area border between the real 4 data and shuffled data was calculated (see Methods). Positive values indicate that abrupt changes in the 5 composition of H3 types were more likely to be associated with area borders in real data than in shuffled 6 control. This shuffling was repeated 5,000 times, and the distribution of this difference is plotted in a 7 histogram. (B)(C) The composition of H3 types within each indicated H2 types (x-axes) were used to 8 predict the AP (B) and ML (C) locations of a cubelet. For each H2 type, we performed 100 trials. In each 9 trial, we randomly held 10% of data as test set to determine the fractions of variance explained. (D) The 10 distribution of modularity of shuffled data, or data with 1-2 cubelets of jitter in CCF registration. For 11 shuffled data, we calculated modularity based on either the same clusters obtained from real data, or by the 12 best clusters obtained by Louvain community detection on the shuffled data. VISl, VISp, and VISpm. Enu, enucleated. 13