Abstract
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues, such as adult mammalian brains, is challenging. Here, we integrate sucrose-gradient assisted nuclear purification with droplet microfluidics to develop a highly scalable single-nucleus RNA-Seq approach (sNucDrop-Seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼11,000 nuclei isolated from adult mouse cerebral cortex, we demonstrate that sNucDrop-Seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity, but also enables analysis of long non-coding RNAs and transient states such as neuronal activity-dependent transcription at single-cell resolution in vivo.
A fundamental challenge in deciphering cellular composition and cells’ functional states in complex mammalian tissues manifests in the extraordinary diversity of cell morphology, size and local microenvironment. While current high-throughput single-cell RNA-Seq approaches have proved to be powerful tools for interrogating cell types, dynamic states and functional processes in vivo1 these methods require the preparation of intact, single-cell suspensions from freshly isolated tissues, which is only practical for easily-dissociated embryonic and young postnatal tissues. This requirement poses an even greater challenge for cells with complex morphology such as mature neurons. Harsh enzymatic treatment not only favors recovery of easily dissociated cell types, but also introduces aberrant transcriptional changes during the dissociation process 2. In addition, skeletal and cardiac muscle cells are frequently multinucleated and are large in size. For instance, each adult mouse skeletal muscle cell contains hundreds of nuclei and is ∼5,000 μm in length and 10-50 μm in width 3. Thus, existing high-throughput single-cell capture and library preparation methods, including isolation of cells by fluorescence activated cell sorting (FACS) into multi-well plates, sub-nanoliter wells, or droplet microfluidic encapsulation, are not optimized to accommodate these unusually large cells. Isolating individual nuclei for transcriptome analysis is a promising strategy, as single-nucleus RNA-Seq methods avoid strong biases against cells of complex morphology and large size 2, 4–6, and can be potentially standardized to accommodate the study of various tissues. However, current singlenucleus RNA-Seq methods rely on fluorescence-activated nuclei sorting (FANS) 4,, 5 or Fluidigm C16 to capture nuclei, and thus cannot easily be scaled up to generate a comprehensive atlas of cell types in a given tissue, much less a whole organism.
An ideal solution to increase the throughput of single-nucleus RNA-Seq is to integrate nucleus purification with massively parallel single-cell RNA-Seq methods such as Drop-Seq 7, InDrop 8, or equivalent commercial platforms (e.g. 10x Genomics 9). However, single-nucleus RNA-Seq is currently not supported on these droplet microfluidics platforms. Inhibitory effects due to cellular debris contamination and/or inefficient lysis of nuclear membranes might contribute to this failure. Historically, nuclei of high purity can be isolated from solid tissues or from cell lines with fragile nuclei by centrifugation through a dense sucrose cushion to protect nucleus integrity and strip away cytoplasmic contaminants. The sucrose gradient ultracentrifugation approach has been adapted to isolate neuronal nuclei for profiling histone modifications 10, nuclear RNA 11, and DNA methylation 11, 12 at genome-scale. To test whether this nuclei purification method supports single-nucleus RNA-Seq analysis, we isolated nuclei from cultured cells, as well as freshly isolated or frozen adult mouse brain tissues through douncing homogenization followed by sucrose gradient ultracentrifugation (Fig. 1a and Supplementary Fig. 1). After quality assessment and nuclei counting, we performed emulsion droplet barcoding of the nuclei and library preparation with both Drop-Seq and 10x Genomics platforms. While the10x Genomics single-cell 3’ solution workflow supports cDNA amplification only from whole cells (possibly due to inefficient lysis of nuclear membrane), the Drop-Seq platform yielded high quality cDNA and sequencing libraries from both whole cells and nuclei (freshly isolated or frozen samples) (Supplementary Fig. 2). These results suggest that nucleus purification and nuclear membrane lysis are critical factors for efficient library preparation in single-nucleus RNA-Seq.
We next validated the specificity of sucrose gradient-assisted single-nucleus Drop-Seq (sNucDrop-Seq) with species-mixing experiments, using nuclei isolated from in vitro cultured mouse and human cells. This analysis indicates that the rate of co-encapsulation of multiple nuclei per droplet (∼2.6%) is comparable to standard Drop-Seq (Supplementary Fig. 3a). To assess the sensitivity of sNucDrop-Seq, we performed shallow sequencing of cultured mouse 3T3 cells at either single-cell (with Drop-Seq: detecting on average 3,325 genes with ∼25,000 reads per cell for 1,160 cells with >800 genes detected) or single-nucleus (with sNucDrop-Seq:detecting on average 2,665 genes with ∼23,000 reads per nucleus for 1,984 nuclei with >800 genes detected) resolution (Fig. 1b). With standard Drop-Seq microfluidics devices and flow parameters, the throughput of sNucDrop-Seq (1.9%, 1,829 / 95,000 barcoded beads) is comparable to that of Drop-Seq (1.5%, 1,160 / 77,000 barcoded beads). Comparative analysis of Drop-Seq and sNucDrop-Seq reveals that mitochondria-derived RNAs (e.g. mt-Nd1, mt-Nd2)and nucleus-enriched long-noncoding RNAs (e.g. Malat1) were enriched in cytoplasmic and nuclear compartments, respectively (Supplementary Fig. 3b). Thus, integrating sucrose gradient centrifugation-based nuclei purification with the current Drop-Seq microfluidics device and workflow may support massively parallel single-nucleus RNA-Seq.
To demonstrate the utility of sNucDrop-Seq in studying complex adult tissues, we analyzed nuclei isolated from adult mouse cerebral cortex. The average expression profiles of single nuclei from two biologically independent replicates were well correlated (r=0.993; Supplementary Fig. 3c). Out of reads uniquely mapped to the genome (78.0% of all reads), 76.3% of reads were aligned to the expected strand of genic regions (25.3% exonic and 51.0% intronic), and the remaining 23.7% to intergenic regions or to the opposite strand of annotated genic regions. The relatively high proportion of intronic reads is similar to previous single-nucleus RNA-Seq study of human cortex (∼48.7%) 5, reflecting the enrichment of nascent, pre-processed transcripts in the nucleus. Because most exonic (91.4%) and intronic (86.0%) reads were mapped to the expected strand of annotated transcripts, we retained both exonic and intronic reads for downstream analyses. After quality filtering, we retained 10,996 nuclei (∼20,000 uniquely mapped reads per nucleus) from 13 animals, detecting, on average, 4,273 transcripts (unique molecular identifiers [UMIs]), and 1,831 genes per nucleus (Fig. 1b). After correcting for batch effects, we identified highly variable genes, and determined significant principal components (PC) with these variable genes. We then performed graph-based clustering and visualized distinct groups of cells using non-linear dimensionality reduction with spectral t-distributed stochastic neighbor embedding (tSNE) (Methods). This initial analysis segregated nuclei into 19 distinct clusters (Fig. 1c).Each cluster contains nuclei from multiple animals, indicating the transcriptional identities of these cell-type-specific clusters are reproducible across biological replicates (Supplementary Fig. S4a).
On the basis of known markers for major cell types, we identified 10 excitatory neuronal clusters (Ex 1-10; Slc17a7+), four inhibitory neuronal clusters (Inh 1-4; Gad1+), and five non-neuronal clusters (astrocytes [Astro; Gja1+], oligodendrocyte precursor cells [OPC; Pdgfra+],oligodendrocytes [oligo; Mog+], microglia [MG; Ctss+], and endothelial cells [EC; Flt1+])(Fig. 1c-d and Supplementary Fig. 4b). We readily uncovered all major subtypes of GABAergic inhibitory neurons expressing known canonical markers: Sst (somatostatin; cluster Inh1), Pvalb (parvalbumin; cluster Inh2), Vip (vasoactive intestinal peptide; cluster Inh3) and Ndnf (neuron-derived neurotrophic factor; cluster Inh4) (Supplementary Fig. 5a). For glutamatergic excitatory neurons, hierarchical clustering grouped the ten clusters into two major groups (Fig. 1e), largely corresponding to their cortical layer positions, from superficial (cluster Ex1-5: L2/3 and L4) to deep (cluster Ex6-10: L5a/b and L6a/b) layers (Fig. 1d and Supplementary Fig. 5). Consistent with previous studies 5, 13, 14, we readily annotated anatomical location of each excitatory neuronal cluster post-hoc by its expression of known layer-specific marker genes (Supplementary Fig. 6a-b). In addition to protein-coding marker genes, we have also identified a list of long non-coding RNAs that are specifically expressed in distinct cell clusters (Fig. 1e and Supplementary Fig. 5b). For instance, 1700016P03Rik is specifically detected in cluster Ex5, and this non-coding transcript acts mainly as a primary transcript encoding two neuronal activity-regulated microRNAs (Mir212 and Mir132)15, 16 (Supplementary Fig. 7), which is consistent with the enrichment of other activity-dependent genes (Fos, Arc, Npas4) in this excitatory neuronal cluster (Supplementary Fig. 6a), and raises the possibility that Ex5 is enriched of activated neurons (see below). The identification of both protein-coding and noncoding transcripts as cell-type-specific markers highlights the potential of sNucDrop-Seq in exploring the emerging role of non-coding RNAs at single-cell resolution in vivo.
Cortical interneurons are highly diverse in terms of morphology, connectivity and physiological properties 17. To further annotate these inhibitory neuronal subtypes, we performed sub-clustering on the 876 inhibitory neuronal nuclei in our dataset, identifying eight sub-clusters (cluster A-H in Fig. 2a). Unlike previous single-cell RNA-Seq analysis that employed preenrichment of cortical inhibitory neurons from transgenic mouse lines 18, sNucDrop-Seq samples the nuclei in proportion to cells’ abundance in their native environment, which provides a more accurate description of the cellular composition at the transcriptomic level. This analysis identified Pvalb-expressing subtypes (cluster D and E; n=359/876 nuclei, 41.0%) and Sst-expressing subtypes (cluster F, G, H; n=304/876 nuclei, 34.7%) as two major groups of cortical interneurons (Fig. 2b-d), in accordance with previous observations derived from in situ hybridization (ISH)- or immunostaining-based methods that Pvalb- and Sst-positive groups account for ∼40% and ∼30% of interneurons, respectively, in the neocortex 19. Beyond the major interneuron subtypes, we identified one Ndnf-expressing subtype (cluster A; n=84/876 nuclei), one yip-expressing subtype (cluster B; n=74/876 nuclei), and one synuclein gamma (Sncg)-expressing subtype (cluster C; n=55/876 nuclei) (Fig. 2b-d and Supplementary Fig. 8a). On the basis of combinatorial expression of known marker genes associated with specific cortical layer and developmental origin, interneuron subtypes identified by sNucDrop-Seq parallel those identified from previous studies of mouse or human cortex 5,, 18, revealing inhibitory neuronal heterogeneity in both cortical layer distribution (Supplementary Fig. 8a-b) and the developmental origin from subcortical regions of the medial or caudal ganglionic eminences (MGE or CGE) (Fig. 2e). Therefore, sNucDrop-Seq is able to resolve cellular heterogeneity and quantify cell-type composition at transcriptomic level with high sensitivity, including rare interneuron subtypes.
For glutamatergic neurons, unsupervised graph-based sub-clustering of two groups of excitatory neurons (upper layers versus lower layers) identified a total of 18 subtypes (Upper Ex 1-11 and Lower Ex 1-7; Fig. 3a). We associated each excitatory neuronal sub-cluster with a distinct combination of known markers indicative of their superficial-to-deep layer distribution (Supplementary Fig. 9a), capturing finer distinctions between closely related subtypes in each cortical layer, which is in high concordance with subtypes previously identified in human 5 and mouse 14, 18 cortices (Fig. 3b and Supplementary Fig. 9b). Beyond excitatory neuronal subtypes defined by cortical layer-specific markers, our analysis also resolved heterogeneity in neuronal activation states. In response to an activity-inducing experience, cortical excitatory neurons express a complex program of activity-dependent genes 20. Both upper-Ex3 (n=209; 3.1% of 6,770 nuclei in upper layer sub-clusters) and lower-Ex5 (n=213; 8.1% of 2,642 nuclei in lower layer sub-clusters) neurons are specifically associated with high-level expression of activity-dependent genes (Fig. 3b and Supplementary Fig. 9c), including immediately early genes (IEGs) such as Fos, Arc, and Egr1 as well as other activity-regulated transcription factors (e.g.Npas4), genes encoding proteins that function at synapses (e.g. Homer1), and non-coding RNAs (e.g. 1700016P03Rik that encodes Mir132). We determined the genes specifically enriched in upper-Ex3 (n=160 genes, as compared to other upper-Ex sub-clusters) or lower-Ex5 (n=134 genes, as compared to other lower-Ex sub-clusters) neurons (Fig. 3c). Transcriptional signatures identified in these two sub-populations are enriched for genes involved in the MAPK signaling pathway (e.g. Dusp1; adjusted P=2.67×10−2 for upper-Ex3 sub-cluster), as previously reported in low-throughput single-nucleus RNA-Seq analysis of Fos-positive nuclei isolated from the hippocampus of adult mice exposed to a novel environment 2. Together, these results demonstrate the utility of sNucDrop-Seq in the identification of transient transcriptional states, such as neuronal activation.
In conclusion, sNucDrop-Seq is a robust approach for massively parallel analysis of nuclear RNAs at single-cell resolution. Because intact nuclei isolation can potentially be accomplished by mechanical douncing and sucrose gradient ultracentrifugation in almost any primary tissue, including frozen archived human tissues, sNucDrop-Seq and similar approaches pave the way to systematically identify cell-types, reveal subtype composition, and dissect transient functional states such as activity-dependent transcription in complex mammalian tissues.
AUTHOR CONTRIBUTIONS
H.W. and Z.Z. conceived the project. H.W., P.H. and E.F. performed experiments and carried out data analysis. H.W. wrote the manuscript.
ACKNOWLEDGEMENTS
Z.Z is supported by NIH grant R56MH111719. H.W. is supported by the National Human Genome Research Institute (R00HG007982).