Abstract
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues, such as adult mammalian brains, is challenging. Here, we integrate sucrose-gradient assisted nuclear purification with droplet microfluidics to develop a highly scalable single-nucleus RNA-Seq approach (sNucDrop-Seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ~11,000 nuclei isolated from adult mouse cerebral cortex, we demonstrate that sNucDrop-Seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity, but also enables analysis of long non-coding RNAs and transient states such as neuronal activity-dependent transcription at single-cell resolution in vivo.
A fundamental challenge in deciphering cellular composition and cells’ functional states in complex mammalian tissues manifests in the extraordinary diversity of cell morphology, size and local microenvironment. While current high-throughput single-cell RNA-Seq approaches have proved to be powerful tools for interrogating cell types, dynamic states and functional processes in vivo 1, these methods require the preparation of intact, single-cell suspensions from freshly isolated tissues, which is only practical for easily-dissociated embryonic and young postnatal tissues. This requirement poses an even greater challenge for cells with complex morphology such as mature neurons. Harsh enzymatic treatment not only favors recovery of easily dissociated cell types, but also introduces aberrant transcriptional changes during the dissociation process 2. In addition, skeletal and cardiac muscle cells are frequently multinucleated and are large in size. For instance, each adult mouse skeletal muscle cell contains hundreds of nuclei and is ~5,000 μm in length and 10-50 μm in width 3. Thus, existing high-throughput single-cell capture and library preparation methods, including isolation of cells by fluorescence activated cell sorting (FACS) into multi-well plates, sub-nanoliter wells, or droplet microfluidic encapsulation, are not optimized to accommodate these unusually large cells. Isolating individual nuclei for transcriptome analysis is a promising strategy, as single-nucleus RNA-Seq methods avoid strong biases against cells of complex morphology and large size 2, 4-6, and can be potentially standardized to accommodate the study of various tissues. However, current single-nucleus RNA-Seq methods rely on fluorescence-activated nuclei sorting (FANS) 4, 5 or Fluidigm C1 6 to capture nuclei, and thus cannot easily be scaled up to generate a comprehensive atlas of cell types in a given tissue, much less a whole organism.
An ideal solution to increase the throughput of single-nucleus RNA-Seq is to integrate nucleus purification with massively parallel single-cell RNA-Seq methods such as Drop-Seq 7, InDrop 8, or equivalent commercial platforms (e.g. 10x Genomics 9). However, single-nucleus RNA-Seq is currently not supported on these droplet microfluidics platforms. Inhibitory effects due to cellular debris contamination and/or inefficient lysis of nuclear membranes might contribute to this failure. Historically, nuclei of high purity can be isolated from solid tissues or from cell lines with fragile nuclei by centrifugation through a dense sucrose cushion to protect nucleus integrity and strip away cytoplasmic contaminants. The sucrose gradient ultracentrifugation approach has been adapted to isolate neuronal nuclei for profiling histone modifications 10, nuclear RNA 11, and DNA methylation 11, 12 at genome-scale. To test whether this nuclei purification method supports single-nucleus RNA-Seq analysis, we isolated nuclei from cultured cells, as well as freshly isolated or frozen adult mouse brain tissues through douncing homogenization followed by sucrose gradient ultracentrifugation (Fig. 1a and Supplementary Fig. 1). After quality assessment and nuclei counting, we performed emulsion droplet barcoding of the nuclei and library preparation with both Drop-Seq and 10x Genomics platforms. While the10x Genomics single-cell 3’ solution workflow supports cDNA amplification only from whole cells (possibly due to inefficient lysis of nuclear membrane), the Drop-Seq platform yielded high quality cDNA and sequencing libraries from both whole cells and nuclei (freshly isolated or frozen samples) (Supplementary Fig. 2). These results suggest that nucleus purification and nuclear membrane lysis are critical factors for efficient library preparation in single-nucleus RNA-Seq.
We next validated the specificity of sucrose gradient-assisted single-nucleus Drop-Seq (sNucDrop-Seq) with species-mixing experiments, using nuclei isolated from in vitro cultured mouse and human cells. This analysis indicates that the rate of co-encapsulation of multiple nuclei per droplet (~2.6%) is comparable to standard Drop-Seq (Supplementary Fig. 3a). To assess the sensitivity of sNucDrop-Seq, we performed shallow sequencing of cultured mouse 3T3 cells at either single-cell (with Drop-Seq: detecting on average 3,325 genes with ~25,000 reads per cell for 1,160 cells with >800 genes detected) or single-nucleus (with sNucDrop-Seq: detecting on average 2,665 genes with ~23,000 reads per nucleus for 1,984 nuclei with >800 genes detected) resolution (Fig. 1b). With standard Drop-Seq microfluidics devices and flow parameters, the throughput of sNucDrop-Seq (1.9%, 1,829 / 95,000 barcoded beads) is comparable to that of Drop-Seq (1.5%, 1,160 / 77,000 barcoded beads). Comparative analysis of Drop-Seq and sNucDrop-Seq reveals that mitochondria-derived RNAs (e.g. mt-Nd1, mt-Nd2) and nucleus-enriched long-noncoding RNAs (e.g. Malat1) were enriched in cytoplasmic and nuclear compartments, respectively (Supplementary Fig. 3b). Thus, integrating sucrose gradient centrifugation-based nuclei purification with the current Drop-Seq microfluidics device and workflow may support massively parallel single-nucleus RNA-Seq.
To demonstrate the utility of sNucDrop-Seq in studying complex adult tissues, we analyzed nuclei isolated from adult mouse cerebral cortex. The average expression profiles of single nuclei from two biologically independent replicates were well correlated (r=0.993; Supplementary Fig. 3c). Out of reads uniquely mapped to the genome (78.0% of all reads), 76.3% of reads were aligned to the expected strand of genic regions (25.3% exonic and 51.0% intronic), and the remaining 23.7% to intergenic regions or to the opposite strand of annotated genic regions. The relatively high proportion of intronic reads is similar to previous single-nucleus RNA-Seq study of human cortex (~48.7%) 5, reflecting the enrichment of nascent, pre-processed transcripts in the nucleus. Because most exonic (91.4%) and intronic (86.0%) reads were mapped to the expected strand of annotated transcripts, we retained both exonic and intronic reads for downstream analyses. After quality filtering, we retained 10,996 nuclei (~20,000 uniquely mapped reads per nucleus) from 13 animals, detecting, on average, 4,273 transcripts (unique molecular identifiers [UMIs]), and 1,831 genes per nucleus (Fig. 1b). After correcting for batch effects, we identified highly variable genes, and determined significant principal components (PC) with these variable genes. We then performed graph-based clustering and visualized distinct groups of cells using non-linear dimensionality reduction with spectral t-distributed stochastic neighbor embedding (tSNE) (Methods). This initial analysis segregated nuclei into 19 distinct clusters (Fig. 1c). Each cluster contains nuclei from multiple animals, indicating the transcriptional identities of these cell-type-specific clusters are reproducible across biological replicates (Supplementary Fig. S4a).
On the basis of known markers for major cell types, we identified 10 excitatory neuronal clusters (Ex 1-10; Slc17a7+), four inhibitory neuronal clusters (Inh 1-4; Gad1+), and five non-neuronal clusters (astrocytes [Astro; Gja1+], oligodendrocyte precursor cells [OPC; Pdgfra+], oligodendrocytes [oligo; Mog+], microglia [MG; Ctss+], and endothelial cells [EC; Flt1+]) (Fig. 1c-d and Supplementary Fig. 4b). We readily uncovered all major subtypes of GABAergic inhibitory neurons expressing known canonical markers: Sst (somatostatin; cluster Inh1), Pvalb (parvalbumin; cluster Inh2), Vip (vasoactive intestinal peptide; cluster Inh3) and Ndnf (neuron-derived neurotrophic factor; cluster Inh4) (Supplementary Fig. 5a). For glutamatergic excitatory neurons, hierarchical clustering grouped the ten clusters into two major groups (Fig. 1e), largely corresponding to their cortical layer positions, from superficial (cluster Ex1-5: L2/3 and L4) to deep (cluster Ex6-10: L5a/b and L6a/b) layers (Fig. 1d and Supplementary Fig. 5). Consistent with previous studies 5, 13, 14, we readily annotated anatomical location of each excitatory neuronal cluster post-hoc by its expression of known layer-specific marker genes (Supplementary Fig. 6a-b). In addition to protein-coding marker genes, we have also identified a list of long non-coding RNAs that are specifically expressed in distinct cell clusters (Fig. 1e and Supplementary Fig. 5b). For instance, 1700016P03Rik is specifically detected in cluster Ex5, and this non-coding transcript acts mainly as a primary transcript encoding two neuronal activity-regulated microRNAs (Mir212 and Mir132) 15, 16 (Supplementary Fig. 7), which is consistent with the enrichment of other activity-dependent genes (Fos, Arc, Npas4) in this excitatory neuronal cluster (Supplementary Fig. 6a), and raises the possibility that Ex5 is enriched of activated neurons (see below). The identification of both protein-coding and non-coding transcripts as cell-type-specific markers highlights the potential of sNucDrop-Seq in exploring the emerging role of non-coding RNAs at single-cell resolution in vivo.
Cortical interneurons are highly diverse in terms of morphology, connectivity and physiological properties 17. To further annotate these inhibitory neuronal subtypes, we performed sub-clustering on the 876 inhibitory neuronal nuclei in our dataset, identifying eight sub-clusters (cluster A-H in Fig. 2a). Unlike previous single-cell RNA-Seq analysis that employed pre-enrichment of cortical inhibitory neurons from transgenic mouse lines 18, sNucDrop-Seq samples the nuclei in proportion to cells’ abundance in their native environment, which provides a more accurate description of the cellular composition at the transcriptomic level. This analysis identified Pvalb-expressing subtypes (cluster D and E; n=359/876 nuclei, 41.0%) and Sst-expressing subtypes (cluster F, G, H; n=304/876 nuclei, 34.7%) as two major groups of cortical interneurons (Fig. 2b-d), in accordance with previous observations derived from in situ hybridization (ISH)- or immunostaining-based methods that Pvalb- and Sst-positive groups account for ~40% and ~30% of interneurons, respectively, in the neocortex 19. Beyond the major interneuron subtypes, we identified one Ndnf-expressing subtype (cluster A; n=84/876 nuclei), one Vip-expressing subtype (cluster B; n=74/876 nuclei), and one synuclein gamma (Sncg)-expressing subtype (cluster C; n=55/876 nuclei) (Fig. 2b-d and Supplementary Fig. 8a). On the basis of combinatorial expression of known marker genes associated with specific cortical layer and developmental origin, interneuron subtypes identified by sNucDrop-Seq parallel those identified from previous studies of mouse or human cortex 5, 18, revealing inhibitory neuronal heterogeneity in both cortical layer distribution (Supplementary Fig. 8a-b) and the developmental origin from subcortical regions of the medial or caudal ganglionic eminences (MGE or CGE) (Fig. 2e). Therefore, sNucDrop-Seq is able to resolve cellular heterogeneity and quantify cell-type composition at transcriptomic level with high sensitivity, including rare interneuron subtypes.
For glutamatergic neurons, unsupervised graph-based sub-clustering of two groups of excitatory neurons (upper layers versus lower layers) identified a total of 18 subtypes (Upper Ex 1-11 and Lower Ex 1-7; Fig. 3a). We associated each excitatory neuronal sub-cluster with a distinct combination of known markers indicative of their superficial-to-deep layer distribution (Supplementary Fig. 9a), capturing finer distinctions between closely related subtypes in each cortical layer, which is in high concordance with subtypes previously identified in human 5 and mouse 14, 18 cortices (Fig. 3b and Supplementary Fig. 9b). Beyond excitatory neuronal subtypes defined by cortical layer-specific markers, our analysis also resolved heterogeneity in neuronal activation states. In response to an activity-inducing experience, cortical excitatory neurons express a complex program of activity-dependent genes 20. Both upper-Ex3 (n=209; 3.1% of 6,770 nuclei in upper layer sub-clusters) and lower-Ex5 (n=213; 8.1% of 2,642 nuclei in lower layer sub-clusters) neurons are specifically associated with high-level expression of activity-dependent genes (Fig. 3b and Supplementary Fig. 9c), including immediately early genes (IEGs) such as Fos, Arc, and Egr1 as well as other activity-regulated transcription factors (e.g. Npas4), genes encoding proteins that function at synapses (e.g. Homer1), and non-coding RNAs (e.g. 1700016P03Rik that encodes Mir132). We determined the genes specifically enriched in upper-Ex3 (n=160 genes, as compared to other upper-Ex sub-clusters) or lower-Ex5 (n=134 genes, as compared to other lower-Ex sub-clusters) neurons (Fig. 3c). Transcriptional signatures identified in these two sub-populations are enriched for genes involved in the MAPK signaling pathway (e.g. Dusp1; adjusted P=2.67x10-2 for upper-Ex3 sub-cluster), as previously reported in low-throughput single-nucleus RNA-Seq analysis of Fos-positive nuclei isolated from the hippocampus of adult mice exposed to a novel environment 2. Together, these results demonstrate the utility of sNucDrop-Seq in the identification of transient transcriptional states, such as neuronal activation.
In conclusion, sNucDrop-Seq is a robust approach for massively parallel analysis of nuclear RNAs at single-cell resolution. Because intact nuclei isolation can potentially be accomplished by mechanical douncing and sucrose gradient ultracentrifugation in almost any primary tissue, including frozen archived human tissues, sNucDrop-Seq and similar approaches pave the way to systematically identify cell-types, reveal subtype composition, and dissect transient functional states such as activity-dependent transcription in complex mammalian tissues.
AUTHOR CONTRIBUTIONS
H.W. and Z.Z. conceived the project. H.W., P.H. and E.F. performed experiments and carried out data analysis. H.W. wrote the manuscript.
Material and Methods
Cell cultures and animals
Mouse NIH3T3 cells were purchased from ATCC (cat# CRL-1658) and were grown in Dulbecco’s Modified Eagle’s Medium (DMEM) (Life Technologies, cat# 11965084) supplemented with 10% fetal bovine serum (FBS) (Life Technologies, cat# 26140079) and 2 mM L-glutamine (Life Technologies, cat# 25030081) at 37°C in 5% CO2. The culture was passaged every 2-3 days using 0.05% Trypsin (Life Technologies, cat# 25300054). Human embryonic stem cells (H7) have been procured from WiCell (Madison, WI) and maintained at 37°C in 5% CO2 on growth factor-reduced Matrigel matrix (Corning, cat# 354230) coated six-well tissue culture plates. The six-well plates were coated with diluted (1:30) Matrigel matrix in DMEM/F12 (Life Technologies, cat# 11320033). H7 cells (between passage 40 and 70) were maintained in TeSR-E8 medium (Stem Cell Technologies, cat# 05940) and passaged every 5-6 days as small aggregates using an enzymatic digestion-free method (0.5 mM ETDA in D-PBS without CaCl2 and MgCl2).
Animal experiments were conducted in accordance with the ethical guidelines of the US National Institutes of Health and with the approval of the Institutional Animal Care and Use Committee of the University of Pennsylvania. All of the experiments described were performed using mice of C57BL/6 background.
sNucDrop-Seq
Isolation and purification of nuclei
Mouse brains (postnatal 6 weeks) were rapidly resected on ice. Frozen cortices were flash frozen in liquid nitrogen for 2 minutes and subsequently kept at -80°C for 2 hours before nuclear isolation. 14 mL of sucrose cushion (1.8 M sucrose (CAS# 57-50-1, RNase & DNase free, ultra pure grade), 10 mM Tris-HCl pH 8.0 (Invitrogen, cat# 15568-025), 3 mM MgAc2 (CAS# 16674-78-5), protease inhibitor cocktail (Roche, cat# 11873580001)) was added to the bottom of centrifuge tubes (Beckman Coulter, cat# 326823). Using a glass homogenizer (Wheaton, cat# 357544), a freshly isolated or frozen mouse cortex sample was subjected to dounce homogenization (21 times with loose pestle followed by 7 times with tight pestle) in 12 mL of homogenization buffer (0.32M sucrose, 5 mM CaCl2 (CAS# 10043-52-4), 3mM MgAc2, 10 mM Tris-HCl pH 8.0, 0.1% Triton X-100 (CAS# 9002-93-1), 0.1 mM EDTA (Invitrogen, cat# 15575-020), protease inhibitor cocktail). For in vitro cultured cells, cell pellets (~5 million cells) were resuspended in homogenization buffer and dounced for 30 times with a loose pestle. Homogenates (~12 mL) were layered onto the sucrose cushion in the centrifuge tubes, and 10 mL of homogenization buffer was added atop of the homogenates. The tubes were then centrifuged in a Beckman Coulter L7-65 Ultracentrifuge at 25,000 rpm at 4°C for 2.5 hours using a Beckman Coulter SW28 swinging bucket rotor (Beckman Coulter, cat# 342207). The supernatant was carefully removed via aspiration. 1 mL of chilled DPBS with protease and RNase inhibitor (Lucigen, cat# 30281-2) was added to resuspend the nuclear pellet, and nuclei were subsequently transferred to a 1.5-mL tube. Nuclei were pelleted at 5,000 rpm for 10 min at 4°C, and then resuspended in 0.01% BSA (Sigma-Aldrich, cat# A8806-5G) in DPBS (Invitrogen, cat# 14190136). After resuspension, nuclei were filtered through a 40-μm cell strainer (Fisher Scientific, cat# 002087711), visually inspected for morphology and quality assurance, and counted using a Fuchs-Rosenthal counting chamber before droplet microfluidic encapsulation. For mouse cortices, we obtain 3.45 ± 2.00 x106 nuclei, per round of isolation (based on 11 measurements). The nuclear isolation efficiency for in vitro cultured cells is ~84% (number of nuclei/number of input cells x 100).
Library preparation and sequencing
The nuclear suspension was diluted to a concentration of 100 nuclei/μL in DPBS containing 0.01% BSA. Approximately 1.25 mL of this single-nucleus suspension was loaded for each sNucDrop-Seq run. The single-nucleus suspension was then co-encapsulated with barcoded beads (ChemGenes, cat# MACOSKO-2011) using an Aquapel-coated PDMS microfluidic device (uFluidix), connected to syringe pumps (KD Scientific) via polyethylene tubing with an inner diameter of 0.38 mm (Scientific Commodities, cat# BB31695-PE/2). Barcoded beads were resuspended in lysis buffer (200 mM Tris-HCl pH8.0, 20 mM EDTA, 6% Ficoll PM-400 (GE Healthcare/Fisher Scientific, 45-001-745), 0.2% Sarkosyl (Sigma-Aldrich, cat# L7414-50mL), and 50 mM DTT (Fermentas, cat# R0862; freshly made on the day of run)) at a concentration of 120 beads/μL. The flow rates for cells and beads were set to 4,000 μL/hour, while QX200 droplet generation oil (Bio-rad, cat# 186-4006) was run at 15,000 μL/hour. A typical run lasts ~20 min. Droplet breakage with Perfluoro-1-octanol (Sigma-Aldrich, cat# 370533-25G), reverse transcription and exonuclease I treatment were performed, as previously described, with minor modifications 7. Specifically, up to 120,000 beads, 200 μL of reverse transcription (RT) mix (1x Maxima RT buffer, 4% Ficoll PM-400, 1 mM dNTPs (Clontech, cat# 639125), 1 U/μL RNase inhibitor (Lucigen, cat# 30281-2), 2.5 μM Template Switch Oligo (TSO; AAGCAGTGGTATCAACGCAGAGTGAATrGrGrG) 7, and 10 U/μL Maxima H Minus Reverse Transcriptase (Life Technologies, cat# EP0753)) were added. The RT reaction was incubated at room temperature for 30 minutes, followed by incubation at 42°C for 150 minutes. To determine an optimal number of PCR cycles for amplification of cDNA, an aliquot of 6,000 beads (corresponding to ~100 nuclei) was amplified by PCR in a volume of 50 μL (25 μL of 2x KAPA HiFi hotstart readymix (KAPA biosystems, cat# KK2602), 0.4 μL of 100 μM TSO-PCR primer (AAGCAGTGGTATCAACGCAGAGT) 7, 24.6 μL of nuclease-free water) with the following thermal cycling parameter (95°C for 3 min; 4 cycles of 98°C for 20 sec, 65°C for 45 sec, 72°C for 3 min; 9 cycles of 98°C for 20 sec, 67°C for 45 sec, 72°C for 3 min; 72°C for 5 min, hold at 4°C). After two rounds of purification with 0.6x SPRISelect beads (Beckman Coulter, cat# B23318), amplified cDNA was eluted with 10 μL of water. 10% of amplified cDNA was used to perform real-time PCR analysis (1 μL of purified cDNA, 0.2 μL of 25 μM TSO-PCR primer, 5 μL of 2x KAPA FAST qPCR readymix, and 3.8 μL of water) to determine the additional number of PCR cycles needed for optimal cDNA amplification (Applied Biosystems QuantStudio 7 Flex). We then prepared PCR reactions per total number of barcoded beads collected for each sNucDrop-Seq run, adding 6,000 beads per PCR tube, and ran the aforementioned program to enrich the cDNA for 4 + 10 to 12 cycles. We then tagmented cDNA using the Nextera XT DNA sample preparation kit (Illumina, cat# FC-131-1096), starting with 600 pg of cDNA pooled in equal amounts, from all PCR reactions for a given run. Following cDNA tagmentation, we further amplified the library with 12 enrichment cycles using the Illumina Nextera XT i7 primers along with the P5-TSO hybrid primer (AATGATACGGCGACCACCGAGATCTACACGCCTGTCCGCGGAAGCAGTGGTATCAACGCAGAGT*A*C) 7. After quality control analysis using a Bioanalyzer (Agilent), libraries were sequenced on an Illumina NextSeq 500 instrument using the 75-cycle High Output v2 Kit (Illumina cat# FC-404-2005). We loaded the library at 1.9 pM and provided Custom Read1 Primer (GCCTGTCCGCGGAAGCAGTGGTATCAACGCAG AGTAC) at 0.3 μM in position 7 of the reagent cartridge. The sequencing configuration was 20 bp (Read1), 8 bp (Index1), and 50 or 60 bp (Read2). In total, 13 mouse cortex samples were analyzed with sNucDrop-Seq in four sequencing runs.
Single-cell RNA-Seq library preparation using Drop-Seq and 10x Genomics platforms
Drop-Seq was performed as previously described, with default settings7. The cell suspension was diluted to 100 cells/μL with DPBS containing 0.01% BSA and 1 mL cell suspension was loaded for each Drop-seq run. After cell capture, reverse transcription, exonuclease treatment, cDNA amplification and tagmentation, libraries were diluted to 1.9 pmol, and equal amounts of distinctively indexed libraries were mixed and subjected to 75 cycles of paired-end sequencing on Illumina NextSeq 500 sequencer. 10x Genomics single-cell 3’ libraries were constructed, as previously described 9, with recommended settings using Chromium single cell 3’ v2 reagent kits.
Data analysis of sNucDrop-Seq
Preprocessing of sNucDrop-Seq data
Paired-end sequencing reads of Single-nucleus RNA-seq were processed using publicly available the Drop-Seq Tools v1.12 software 7 with some modifications. Briefly, each mRNA read (read2) was tagged with the cell barcode (bases 1 to 12 of read 1) and unique molecular identifier (UMI, bases 13 to 20 of read 1), trimmed of sequencing adaptors and poly-A sequences, and aligned using STAR v 2.5.2a to the mouse (mm10, Gencode release vM13) or a concatenation of the mouse and human (for the species-mixing experiment) reference genome assembly. Because a substantial proportion (~50%) of reads derived from nuclear transcriptomes of mouse cortices were mapped to the intronic regions, the intronic reads were retained for downstream analysis. A custom Perl script was implemented in the Drop-Seq Tools pipeline to retrieve both exonic and intronic reads mapped to predicted strands of annotated genes. Uniquely mapped reads were grouped by cell barcodes. Cell barcodes were corrected for possible bead synthesis errors, using the DetectBeadSynthesisErrors program from the Drop-Seq Tools v1.12 software. To generate digital expression matrix, a list of UMIs in each gene (as rows), within each cell (as columns), was assembled, and UMIs within ED = 1 were merged together. The total number of unique UMI sequences was counted, and this number was reported as the number of transcripts of that gene for a given cell.
Cell clustering and marker gene identification
Raw digital expression matrices were combined and loaded into the R package Seurat. For normalization, UMI counts for all nuclei were scaled by library size (total UMI counts), multiplied by 10,000 and transformed to log space. Only genes found to be expressing in >10 cells were retained. Nuclei with a relatively high percentage of UMIs mapped to mitochondrial genes (>=0.1) were discarded. Moreover, nuclei with fewer than 800 or more than 6,000 detected genes were omitted, resulting in 11,471 nuclei that pass filter.
Before clustering, batch effects from multiple animals were regressed out using the function RegressOut in R package Seurat. The highly variable genes were identified using the function MeanVarPlot with the parameters: x.low.cutoff = 0.0125, x.high.cutoff = 3 and y.cutoff = 0.8, resulting in an output of 2,178 highly variable genes. The expression level of highly variable genes in the nuclei was scaled and centered along each gene, and was conducted to principal component analysis. We then used two methods to assess the number of PCs to be utilized in downstream analysis: 1) The cumulative standard deviations accounted for by each PC were plotted using the function PCElbowPlot in Seurat to identify the ‘knee’ point at a PC number after which successive PCs explain diminishing degrees of variance, and 2) the significance for each gene’s association with each PC was accessed by the function JackStraw in Seurat. Based on these two methods, we selected first 20 PCs for two-dimensional t-distributed stochastic neighbor embedding (tSNE), implemented by the Seurat software with the default parameters. Based on the tSNE map, twenty-one clusters were identified using the function FindCluster in Seurat with the resolution parameter set to 1.0. Clusters that co-express both non-neuronal and neuron markers, representing cell doublets, were removed. As a result, we were able to assign 11,283 nuclei (98.4% of our data) into 19 cell type clusters.
To identify the marker genes, differential expression analysis was performed by the function FindAllMarkers in Seurat with likelihood-ratio test. Differentially expressed genes that were expressed at least in 30% cells within the cluster and with a fold change more than 0.25 (log scale) were considered to be marker genes. In total, 2,399 protein-coding genes and 127 long non-coding RNAs were identified for 19 clusters. For the marker genes, average gene expression for each cluster was determined, and Euclidean distances between all pairs was calculated. This dataset was used as input for complete linkage hierarchical clustering and dendrogram assembly. To generate a heatmap of marker genes across clusters, the average expression level of marker genes within each cluster were calculated. For each cluster, the average expression was centered and scaled by each gene. Next, the heatmap.2 function in the R gplots package was used to generate the heatmap, and the parameter “dendrogram” was set to "column" to show the cluster dendrogram.
Sub-clustering
Excitatory neuronal nuclei from upper layers (cluster Ex1, Ex2, Ex3, Ex4, and Ex5) and deep layers (cluster Ex6, Ex7, Ex8, Ex9, and Ex10) were first combined, then sub-clustered using the same strategy described above, respectively. To correct for potential over-clustering, we merged the clusters showing high similarity. Briefly, pairwise comparison was conducted to identify the differentially expressed genes using the function FindAllMarkers in Seurat, with likelihood-ratio test. We merged the clusters showing <5 genes with an average expression difference greater than 2-fold between clusters. Specifically, we identified 2,218 highly variable genes associated with 12 PCs in 6,770 upper layer nuclei, which were further assigned into 11 sub-clusters, while 2,642 lower layer nuclei, containing 1,969 highly variable genes along with 9 PCs, were sub-clustered into 7 groups. After filtering out 138 unassigned nuclei, we grouped 9,412 nuclei (98.6 % of excitatory nuclei) into 17 layer-specific sub-clusters. For a comparison between active and inactive neurons, differentially expressed genes were identified using a likelihood ratio test with the p-value threshold set to 0.001.
For inhibitory neurons, nuclei from clusters Inh1, Inh2, Inh3 and Inh4 were combined and loaded into Seurat for sub-clustering. We identified 2,337 highly variable genes in 1,025 nuclei that were subjected to PCA analysis. First 10 PCs were selected for tSNE analysis and sub-clustering. After filtering out 149 unassigned nuclei, we identified 8 clusters, made up of 876 nuclei.
GO enrichment analysis
To identify functional categories associated with genes enriched in activated neurons, the GO annotations were downloaded from the Ensembl Biomart database and KEGG annotations were retrieved by KEGG API. An enrichment analysis was performed via a hypergeometric test. The p-value was calculated using the following formula: where N is the total number of genes, n is the total number of marker genes, M is the number of genes annotated to a certain GO term or KEGG pathway, and i is the number of marker genes annotated to a certain GO term or KEGG pathway. The p-value was corrected by function p.adjust with a false discovery rate (FDR) correction in R. GO terms or KEGG pathways with a FDR below 0.05 were considered enriched. All statistical calculations were performed in R.
ACKNOWLEDGEMENTS
Z.Z is supported by NIH grant R56MH111719. H.W. is supported by the National Human Genome Research Institute (R00HG007982).
References
References
- 1.
- 2.