Abstract
Labeling and perturbation of specific cell types in multicellular systems has transformed our ability to understand them. The rapid pace of cell type identification by new single-cell analysis methods has not been met with efficient access to these newly discovered types. To enable access to specific neural populations in the mouse cortex, we have collected single cell chromatin accessibility data from select cell types. We clustered the single cell data and mapped them to single cell transcriptomics to identify highly specific enhancers for cell subclasses. These enhancers, when cloned into AAVs and delivered to the brain by retro orbital injections, transgene expression in specific cell subclasses throughout the mouse brain. This approach will enable functional investigation of cell types in the mouse cortex and beyond.
One sentence Summary Combining scATAC-seq and scRNA-seq identifies subclass-specific enhancers to label cells in mouse cortex at high resolution
Introduction
The functional interplay of neural cell types gives rise to the complex functions of neural tissues. To fully understand the biology of the brain, we need to be able to distinguish and describe these cell types, and identify markers that can be used to selectively label and perturb them for further study1,2. In mouse, recombinase driver lines have been used to great effect to label cell populations that share marker gene expression3–5. However, the creation, maintenance, and use of lines that label cell types with high specificity can be costly, frequently requiring triple transgenic crosses, which yield low frequency of experimental animals.
Recent advances in single-cell profiling, such as single-cell RNA-seq6,7 and surveys of neuronal electrophysiology and morphology8, have revealed that many recombinant driver lines label many different cell types, often from multiple subclasses. For example, the Rbp4-Cre mouse driver line, which labels layer 5 (L5) glutamatergic neurons, labels transcriptomically and connectionally distinct subclasses: L5 intratelencephalic (L5 IT) and pyramidal tract (L5 PT) neurons9.
Here, we demonstrate that a combination of single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) can be used to identify functional and specific enhancer elements. These elements can be combined with recombinant adeno-associated viruses (AAVs) to generate subclass-specific or cell type-specific viral labeling tools. These tools can be delivered using minimally invasive retro-orbital injections to consistently label or perturb neural cell subclasses in mice with specificity that surpasses existing driver lines (Fig 1).
Results
We isolated individual neuronal and non-neuronal cells from transgenically-labeled mouse cortex by FACS (Supp Fig 1) and examined them using the Assay for Transposase-Accessible Chromatin with next generation sequencing (scATAC-seq; Fig 1)10,11. This strategy allowed us to interrogate both abundant (e.g. L4 IT neurons, ∼17% of VISp neurons) and very rare cell types (e.g. Sst Chodl neurons, ∼0.1% of VISp neurons) with the same method. To sample cells both broadly and specifically in the mouse brain, we utilized 25 different Cre or Flp-driver lines or their combinations crossed to appropriate reporter lines (Supp Fig. 2, Supp Table 1). We characterized many of the same lines previously by single-cell RNA-seq6. In addition, we employed retrograde labeling by recombinase-expressing viruses to selectively sample cells with specific projections (Retro-ATAC-seq; Supp Fig 3, Supp Table 2,3). Our method yielded scATAC-seq libraries of comparable quality to previously published scATAC-seq studies10,12,13 (Supp Fig 4).
To generate scATAC-seq data that would be directly comparable to our scRNA-seq dataset (Tasic 2018), we focused our dissections on visual cortex for glutamatergic cell types, but allowed broader cortical sampling for GABAergic cell types. This strategy is rooted in our observation that GABAergic cell types are shared across two distant poles of mouse cortex, wherease the glutamatergic cell types are distinct among different cortical regions6 Retro-ATAC-seq cells were collected only from the visual cortex. In total, we collected 3,381 single cells from 25 driver-reporter combinations in 60 mice, 126 retrogradely labeled cells from injections into 3 targets across 7 donors, and 96 samples labeled by one retro-orbital injection of a viral tool generated in this study (Supp Table 1). After FACS, individual cells were processed using ATAC-seq, and were sequenced in 60-96 sample batches using a MiSeq (Methods). We performed quality control filtering to select 2,416 samples with >10,000 uniquely mapped paired-end fragments, >10% of which had a fragment size longer than 250 bp, and with >25% of fragments overlapping high-depth cortical DNAse-seq peaks generated by ENCODE14 (Fig 2a, Supp Fig 2, Supp Table 4).
Previous studies have shown that most recombinase driver lines label more than one transcriptomic cell type6,7. To increase the cell type resolution of chromatin accessibility profiles beyond that provided by driver lines, we clustered the scATAC-seq data using a novel, feature-free method for computation of pairwise Jaccard distances (Fig 2b). These distances were used for principal component analysis (PCA) and t-stochastic neighbor embedding (t-SNE), followed by Phenograph clustering15 (Fig 2c, Methods). This clustering method clearly grouped cells from class-specific driver lines together, and segregated them into multiple clusters as expected based on transcriptomic analyses (Fig 2d). Cluster identity was then assigned by comparison of accessibility near transcription start sites (TSS ± 20 kb) to our scRNA-seq dataset for VISp6 using median correlation (Fig 2c, Methods). We found that subclass-level assignments for each driver line matched closely with those observed for the same driver lines by scRNA-seq (Supp Fig 5). Once assigned, clusters from the same subclass (e.g. Vip or L5 IT) or distinct cell type (e.g. Pvalb Vipr2) were aggregated for peak calling and examination of accessibility patterns (Fig 2e). Comparisons of these scATAC-seq aggregate profiles to previously published ATAC-seq from cortical populations showed strong correspondence between aggregate profiles and populations (Supp Fig 6), and comparisons to previously published cortical scATAC-seq data16,17 demonstrate an increase in cell type resolution using our current dataset (Supp Fig 7).
Layer 5 (L5) of mouse cortex contains three major subclasses of excitatory neurons: intertelencephalic (IT) neurons that project to other cortical regions, near-projecting (L5 NP) neurons that have mostly local projections, and cortico-fugal (a subset of which is called pyramidal tract, L5 PT) neurons that project to subcortical brain regions such as thalamus6,18. The driver line Rbp4-Cre labels both L5 IT and L5 PT neurons in cortex, but not L5 NP6. Our scATAC-seq clustering identified L5 PT and L5 IT neurons in our dataset based on correlation with scRNA-seq cell types (Fig 2c). Labeling of these cells by Rbp4-Cre and retrograde labeling from a known L5 PT target region, the lateral posterior nucleus of the thalamus (LP, Supp Fig 3), validated that these cells are likely L5 IT (Rbp4-Cre+ only) and L5 PT neurons (Rbp4 and LP Retro-ATAC-seq). We searched near transcriptomic marker genes for ∼500 bp putative enhancer regions that were specific to L5 PT or L5 IT cells, and which had strong sequence conservation (Fig 2e). We refer to these regions as mouse single-cell regulatory elements (mscREs, Supp Table 5).
To functionally test mscREs, we cloned their genomic sequences upstream of a minimal beta-globin promoter driving fluorescent proteins SYFP2 or EGFP in a recombinant adeno-associated virus (AAV) genome (Fig 3a). These constructs were packaged using a PHP.eB serotype, which can cross the blood-brain barrier19, to enable delivery by retro-orbital injection. We screened four mscREs for L5 PT cells and two for L5 IT (Supp Table 5). Two weeks after retro-orbital injection, we collected the brains of infected mice and screened expression by visual inspection of native fluorescence and immunohistochemistry to enhance SYFP2 and EGFP signal. Two of these enhancers provided specific labeling of cells in L5 (Fig 3b), and were selected for further validation.
To assess the utility of enhancer-driven fluorophores as viral tools, we performed retro-orbital injection of the mscRE4-SYFP2 virus in additional animals. From two of these, we dissected L5 of VISp, sorted labeled cells by FACS (Supp Fig 2), and performed scRNA-seq as described previously6. We compared scRNA-seq expression profiles to a VISp reference dataset6 using centroid classification of cell types (Methods). We found that the mscRE4-SYFP2 virus yielded >91% specificity for L5 PT cells within L5 (Fig 3c). We confirmed labeling of L5 PT cells by electrophysiological characterization of labeled vs unlabeled cells in the cortex (Fig 3c,d, Supp Fig 8). Cells labeled by mscRE4 had characteristics of L5 PT neurons, whereas cells that were label-negative more closely matched L5 IT neurons20. This experiment demonstrates the utility of these viral tools for electrophysiology experiments targeted to specific neuronal subclasses for which driver lines are not available. Finally, we tested stereotaxic injection of the mscRE4 fluorophore viruses directly into VISp. We found that we could achieve extremely bright and specific labeling using stereotaxic injection, although the specificity depended on the volume of injection, likely reflecting a loss of specificity at high numbers of viral genome copies per cell (Supp Fig 9).
L5 PT cells are often difficult to isolate from single-cell suspensions when in a heterogeneous mixture with other cell types due to differential cell survival6,7, and there is currently no reliable driver line to selectively label L5 PT cells. We used retro-orbital injection of the mscRE4-SYFP2 virus to enhance our scATAC-seq dataset by sorting cells labeled by mscRE4 for FACS. As expected based on scRNA-seq analysis, 55 of 61 high-quality mscRE4 scATAC-seq profiles clustered together with other L5 PT samples (90.2%).
Although fluorophore expression provided enough signal to sort cells by FACS or perform patch-clamp experiments, expression of a recombinase from a specific enhancer virus would expand the utility of these tools as drivers for reporter lines that express fluorophores, activity reporters, opsins, or other genes that are too large to package in AAVs3,21. To test the specificity of enhancer-driven recombinase expression, we cloned mscRE4 into constructs containing a minimal beta-globin promoter driving destabilized Cre (dgCre), iCre, FlpO, or tTA2, and packaged them in PHP.eB viruses (Fig 4a). These viruses were delivered by retro-orbital injection into mice with genetically encoded reporters for each recombinase (Ai14 for dgCre and iCre22; Ai65F for FlpO21; and Ai63 for tTA23). Labeling was assessed by sectioning and microscopy of native fluorescence (Fig 4a). FlpO, iCre, and tTA2 viral constructs yielded labeling of cells in L5 of the mouse cortex with varying levels of specificity, while dgCre showed non-specific labeling of cortical layers. We applied the same strategy to screen both mscRE4 and mscRE16 drivers of FlpO, iCre, and/or tTA2 by retro-orbital injection at two different titers (1×1010 and 1×1011 total genome copies, GC). We found that the specificity and completeness of labeling depended heavily on both the injected titer and the recombinase-reporter combination used in these experiments (Supp Fig 10). Based on these experiments, we chose a single titer for each FlpO virus for in-depth characterization, and injected additional animals for scRNA-seq and whole-brain two-photon tomography by TissueCyte (Figure 4b-e, Fig 5a-b, Supp Fig 11). We found that each of these viruses had a high degree of layer and subclass specificity in the cortex, with 87.5% of cells labeled by mscRE4-FlpO corresponding to L5 PT cells (Fig 4c) and 42% of cells labeled by mscRE16-FlpO corresponding to L5 IT cells (Fig 4e), with little overlap. TissueCyte imaging revealed that two viruses labeled additional subcortical populations (mscRE4 in APr, CEa, and HIP, Fig 5a; and mscRE16 in pons, BLA, and HIP, Supp Fig 11).
Viruses can also be coadministered to label multiple populations of cells, either exclusively or intersectionally (Fig 5c). This strategy reduces the need for triple-or quadruple crosses to obtain co-labeled populations of cells. We tested brain-wide co-labeling of both L5 IT and L5 PT populations by retro-orbital injection of mscRE4-iCre (to label L5 PT cells, green) and mscRE16-FlpO (to label L5 IT cells, red) in the same Ai65F;Ai140 animal (Fig 5d). We found distinct labeling of these two cell populations in L5 by microscopy (Fig 5e), demonstrating that multiple enhancer-driven viruses can be used to simultaneously label or perturb populations of prospectively defined subclasses in the same animal.
Discussion
Single-cell genomics methods continue to uncover the diversity of cell types in complex tissues. We have demonstrated that a combination of scRNA-seq and scATAC-seq identifies functional subclass-specific enhancers that can label or perturb specific cell populations using viral tools at unprecedented cell type resolution. Previous methods for enhancer identification, such as STARR-seq23, CRE-seq24, and EDGE25 rely on large screens of genomic elements identified in bulk tissues, with retrospective identification of labeled cell types. In contrast, our methodology extends concepts previously used in the retina, where defined populations of cell types were used to identify putative regulatory elements for screening26,27, and in Drosophila, where whole-organism scATAC-seq revealed developmental clade-specific regulatory elements that were validated by transgenesis28. Utilizing single-cell measurements of chromatin accessibility in the mouse cortex allowed us to prospectively generate highly targeted tools with minimal screening of putative enhancers.
The rich scATAC-seq dataset in this study provides a foundational resource for locating enhancer elements for targeting diverse cell types in the mouse cortex. We were able to resolve cell subclasses in Layer 5 of the mouse cortex, yielding viral tools with specificity beyond available recombinase driver lines. These viral tools are immediately useful in conjunction with existing reporter and/or driver lines to augment and expand existing genetic toolkits3. Further optimization of these natural regulatory elements may result in increased specificity and higher on-target expression, as demonstrated for AAV tools targeting cell types in muscle29. In some cases, these highly targeted viral tools may supercede germline transgenesis for labeling and perturbation of specific cell types. Targeted, enhancer-driven viral reagents open new frontiers for the exploration of cell types in the brain.
Materials and Methods
Mouse breeding and husbandry
Mice were housed under Institutional Care and Use Committee protocols 1508 and 1802 at the Allen Institute for Brain Science, with no more than five animals per cage, maintained on a 12 hr day/night cycle, with food and water provided ad libitum. Animals with anophthalmia or microphthalmia were excluded from experiments. Animals were maintained on a C57BL/6J genetic background.
Retrograde labeling
We performed stereotaxic injection of CAV-Cre (gift of Miguel Chillon Rodrigues, Universitat Autònoma de Barcelona)30 into brains of heterozygous or homozygous Ai14 mice using stereotaxic coordinates obtained from Paxinos adult mouse brain atlas31. Specific coordinates used for each injection are provided in Supp. Table 3. TdT+ single cells were isolated from VISp by FACS. Example FACS gating is provided in Supp. Fig 2.
Single cell ATAC
Single-cell suspensions of cortical neurons were generated as described previously32, with the exception of use of papain in place of pronase for some samples, and the addition of trehalose to the dissociation and sorting medium for some samples as shown in Supp. Table 2. We then sorted individual cells using FACS with gating of negative-DAPI and positive-fluorophore labeling (tdTomato, EGFP, or SYFP2) to select for live neuronal cells or negative-DAPI and negative-fluorophore labeling for live non-neuronal cells. Example FACS gating is provided in Supp. Fig 2.
For GM12878 scATAC, cells were obtained from Coriell Institute, and were grown in T25 culture flasks in RPMI 1640 Medium (Gibco, Thermo Fisher Cat#11875093) supplemented with 10% fetal bovine serum (FBS) and Penn/Strep. At ∼80% confluence, cells were transferred to a 15 mL conical tube, centrifuged, and washed with PBS containing 1% FBS. Cells were then resuspended in PBS with 1% FBS and 2 ng/mL DAPI (DAPI*2HCl, Life Technologies Cat#D1306) for FACS sorting.
Single cells were sorted into 200 μL 8-well strip tubes containing 1.5 μL tagmentation reaction mix (0.75 μL Nextera Reaction Buffer, 0.2 μL Nextera Tn5 Enzyme, 0.55 μL water). After collection, cells were briefly spun down in a bench-top centrifuge, then immediately tagmented at 37°C for 30 minutes in a thermocycler. After tagmentation, we added 0.6 μL Proteinase K stop solution to each tube (5 mg/mL Proteinase K solution (Qiagen), 50 mM EDTA, 5 mM NaCl, 1.25% SDS) followed by incubation at 40°C for 30 minutes in a thermocycler. We then purified the tagmented DNA using AMPure XP beads (Beckman Coulter) at a ratio of 1.8:1 resuspended beads to reaction volume (3.8 μL added to 2.1 μL), with a final elution volume of 11 μL. Libraries were indexed and amplified by the addition of 15 uL 2X Kapa HiFi HotStart ReadyMix and 2 uL Nextera i5 and i7 indexes to each tube, followed by incubation at 72°C for 3 minutes and PCR (95°C for 1 minute, 22 cycles of 98°C for 20 seconds, 65°C for 15 seconds, and 72°C for 15 seconds, then final extension at 72°C for 1 minute). After amplification, sample concentrations were measured using a Quant-iT PicoGreen assay (Thermo Fisher) in duplicate. For each sample, the mean concentration was calculated by comparison to a standard curve, and the mean and standard deviation of concentrations was calculated for each batch of samples. Samples with a concentration greater than 2 standard deviations above the mean were not used for downstream steps, as these were found in early experiments to dominate sequencing runs. All other samples were pooled by combining 5 μL of each sample in a 1.5 mL tube. We then purified the combined library by adding Ampure XP beads in a 1.8:1 ratio, with final elution in 50 μL. The mixed library was then quantified using a BioAnalyzer High Sensitivity DNA kit (Agilent).
scATAC sequencing, alignment, and filtering
Mixed libraries, containing 60 to 96 samples, were sequenced on an Illumina MiSeq at a final concentration of 20-30 pM. After sequencing, raw FASTQ files were aligned to the GRCm38 (mm10) mouse genome using Bowtie v1.1.0 as described previously32. After alignment, duplicate reads were removed using samtools rmdup, which yielded only single copies of uniquely mapped paired reads in BAM format. For analysis, we filtered to remove samples with fewer than 10,000 paired-end fragments (20,000 reads), and with at least 10% of sequenced fragments longer than 250 bp. An additional filter was created using ENCODE whole cortex DNase-seq HotSpot peaks (sample ID ENCFF651EAU from experiment ID ENCSR00COF). Samples with less than 25% of paired-end fragments that overlapped DNase-seq peaks were removed from downstream analysis. Cells passing these criteria both had sufficient unique reads for downstream analysis, and had high-quality chromatin accessibility profiles as assessed by fragment size analysis (Supp Fig 1). As an additional QC check, we compared aggregate scATAC-seq data to bulk ATAC-seq data from matching Cre-driver lines, where available. We found that aggregate single-cell datasets matched well to previously published bulk datasets (Supp Fig 6).
Jaccard distance calculation, PCA and tSNE embedding, and density-based clustering
To compare scATAC-seq samples, we downsampled all cells to an equal number of uniquely aligned fragments (10,000 per sample), extended these fragments to a length of 1kb, then collapsed any overlapping fragments within each sample into regions based on the outer boundaries of overlapping fragments. We then counted the number of overlapping regions between every pair of samples, and divided by the total number of regions in both samples to obtain a Jaccard similarity score. These scores were converted to Jaccard distances (1 – Jaccard similarity), and the resulting matrix was used as input for t-stochastic neighbor embedding (t-SNE). After t-SNE, samples were clustered in t-SNE space using the RPhenograph package with k = 6 to obtain small groups of similar neighbors15. Cluster assignments used for correlation with transcriptomic data are showin Supp. Table 6.
Correlation with single-cell transcriptomics
Phenograph-defined neighborhoods were assigned to cell subclasses and clusters by comparison of accessibility near transcription start site (TSS) to median expression values of scRNA-seq clusters at the cell type (e.g. L5 PT Chrna6) and at the subclass level (e.g. Sst) from mouse primary visual cortex6. To score each TSS, we retrieved TSS locations from the RefSeq Gene annotations provided by the UCSC Genome Browser database, and generated windows from TSS +/-20kb. We then counted the number of fragments for all samples within each cluster that overlapped these windows. For comparison, we selected differentially expressed marker genes from the Tasic, et al. (2018) scRNA-seq dataset using the scrattch.hicat package for R. We then correlated the Phenograph cluster scores with the log-transformed median exon read count values for this set of marker genes for each scRNA-seq cluster from primary visual cortex, and assigned the transcriptomic cell type with the highest-scoring correlation. We found that this strategy of neighbor assignment and correlation allowed us to resolve cell types within the scATAC-seq data close to the resolution of the scRNA-seq data, as types that were split too far would resolve to the same transcriptomic subclass or type by correlation.
scATAC-seq grouping and peak calling
For downstream analysis, we grouped cell type assignments to the subclass level, with the exception of highly distinct cell types (Lamp5 Lhx6, Sst Chodl, Pvalb Vipr2, L6 IT Car3, CR, and Meis2). Unique fragments for all cells within each of these subclass/distinct type groups were aggregated to BAM files for analysis. Aligned reads from single cell subclasses/clusters were used to create Tag Directories and peaks of chromatin accessibility were called using HOMER33 with settings “findPeaks -region -o auto”. The resulting peaks were converted to BED format.
Population ATAC of Sst neurons
We performed population ATAC-seq of neurons from Sst-IRES2-Cre;Ai14 mice as described previously32. Briefly, cells from the visual cortex of an adult mouse were microdissected and FACS sorted into 8-well strips as described above, but with 500 cells per well instead of single cells as for scATAC-seq. Cell membranes were lysed, and nuclei were pelleted before resuspension in the same tagmentation buffer described above at a higher volume (25 µL). Tagmentation was carried out at 37 C for 1 hour, followed by addition of 5 µL of Cleanup Buffer (900 mM NaCl, 300 mM EDTA), 2 µL 5% SDS, and 2 µL Proteinase K and incubation at 40°C for 30 minutes, and cleanup with AMPure XP beads (Beckman Coulter) at a ratio of 1.8:1 beads to reaction volume. Samples were amplified using KAPA HotStart Ready Mix (Kapa Biosystems, Cat# KK2602) and 2uL each of Nextera i5 and i7 primers (Illumina), quantified using a Bioanalyzer, and sequenced on an Illumina MiSeq.
Comparisons to bulk ATAC-seq data
For comparison to previously published studies, we used data from GEO accession GSE63137 from Mo, et al. (2015)34 for Camk2a, Pvalb, and Vip neuron populations, and GEO accession GSE87548 from Gray, et al. (2017)32 for Cux2, Scnn1a-Tg3, Rbp4, Ntsr1, Gad2, mES, and genomic controls. For these comparisons, we also included population ATAC-seq of Sst neurons, described above. For each population, we merged reads form all replicates and downsampled each region to 6.4 million reads. We then called peaks using HOMER as described for aggregated scATAC-seq data, above. We used the BED-formatted peaks for scATAC-seq aggregates with or without bulk ATAC-seq datasets as input for comparisons using the DiffBind package for R as described previously32.
Identification of mouse single-cell regulatory elements
We performed a targeted search for mouse single cell regulatory elements (mscREs) by performing pairwise differential expression analysis of scRNA-seq clusters to identify uniquely expressed genes in L5 PT and L5 IT subclasses across all glutamatergic subclasses. We then searched for unique peaks within 1 Mbp of each marker gene, and manually inspected these peaks for low or no accessibility in off-target cell types and for conservation. If a region of high conservation overlapped the peak region, but the peak was not centered on the highly conserved region, we adjusted the peak selection to include neighboring highly conserved sequence. For cloning, we centered our primer search on 500 bp regions centered at the middle of the selected peak regions, and included up to 100 bp on either side. Final region selections and PCR primers are shown in Supplementary Table 5.
Viral genome cloning
Enhancers were cloned from C57Bl/6J genomic DNA using enhancer-specific primers and Phusion high-fidelity polymerase (M0530S; NEB). Individual enhancers were then inserted into an rAAV or scAAV backbone that contained a minimal beta-globin promoter, gene, and bovine growth hormone polyA using standard molecular cloning approaches. Plasmid integrity was verified via Sanger sequencing and restriction digests to confirm intact inverted terminal repeat (ITR) sites.
Viral packaging and titering
Before transfection, 105 μg of AAV viral genome plasmid, 190 μg pHelper, and 105 μg AAV-PHP.eB were mixed with 5 mL of Opti-MEM I media (Reduced Serum, GlutaMAX; ThermoFisher Scientific) and 1.1 mL of a solution of 1 mg/mL 25 kDa linear Polyethylenimine (Polysciences) in PBS at pH 4-5. This cotransfection mixture was incubated at room temperature for 10 minutes. Recombinant AAV of the PHP.eB serotype was generated by adding 0.61 mL of this cotransfection mixture to each of ten 15-cm dishes of HEK293T cells (ATCC) at 70-80% confluence. 24 hours post-transfection, cell medium was replaced with DMEM (with high glucose, L-glutamine and sodium pyruvate; ThermoFisher Scientific) with 4% FBS (Hyclone) and 1% Antibiotic-Antimycotic solution. Cells were collected 72 hours post transfection by scraping in 5mL of medium, and were pelleted at 1500 rpm at 4C for 15 minutes. Pellets were suspended in a buffer containing 150 mM NaCl, 10 mM Tris, and 10 mM MgCl2, pH 7.6, and were frozen in dry ice. Cell pellets were thawed quickly in a 37°C water bath, then passed through a syringe with a 21-23G needle 5 times, followed by 3 more rounds of freeze/thaw, and 30 minutes of incubation with 50 U/ml Benzonase (Sigma-Aldrich) at 37°C. The suspension was then centrifuged at 3,000 × g and purified using a layered iodixanol step gradient (15%, 25%, 40%, and 60%) by centrifugation at 58,000 rpm in a Beckman 70Ti rotor for 90 minutes at 18°C by extraction of a volume below the 40-60% gradient layer interface. Viruses were concentrated using Amicon Ultra-15 centrifugal filter unit by centrifugation at 3,000 rpm at 4°C, and reconstituted in PBS with 5% glycerol and 35 mM NaCl before storage at −80°C.
Virus titers were measured using qPCR with a primer pair that recognizes a region of 117bp in the AAV2 ITRs (Forward: GGAACCCCTAGTGATGGAGTT; Reverse: CGGCCTCAGTGAGCGA). QPCR reactions were performed using QuantiTect SYBR Green PCR Master Mix (Qiagen) and 500 nM primers. To determine virus titers, a positive control AAV with known titer and newly produced viruses with unknown titers were treated with DNAse I. Serial dilutions (1/10, 1/100, 1/500, 1/2500, 1/12500, and 1/62500) of both positive control and newly generated viruses were loaded on the same qPCR plate. A standard curve of virus particle concentrations vs Cq values was generated based on the positive control virus, and the titers of the new viruses were calculated based on the standard curve.
Retro-orbital injections
To introduce AAV viruses into the blood stream, 21 day old or older C57Bl/6J, Ai14, Ai65F, or Ai63 mice21 were briefly anesthetized by isoflurane and 1×1010 −1×1011 viral genome copies (gc) was delivered into the retro-orbital sinus in a maximum volume of 50 µL or less. This approach has been utilized previously to deliver AAV viruses across the blood brain barrier and into the murine brain with high efficiency19. For delivery of multiple AAVs, the viruses were mixed beforehand and then delivered simultaneously into the retro-orbital sinus. Animals were allowed to recover and then sacrificed 1-3 weeks post-infection in order to analyze virally-introduced transgenes within the brain.
Stereotaxic injections
Viral DNA was packaged in a PHP.eB serotype19 to produce recombinant adeno-associated virus (rAAV) for mscRE4-minBGprom-EGFP-WPRE3, mscRE4-minBGprom-tTa2-WPRE3, and mscRE4-minBGprom-FlpO-WPRE3 viruses (titers: 1.64 × 1014, 5.11 × 1013, 6.00 × 1013, respectively), or self-complementary AAV (scAAV) for mscRE4-minBGprom-SYFP2-WPRE3-BGHpA virus (titer 1.34 × 1013). Each virus was delivered bilaterally at 250 and 50 nL or 50 and 25 nL into the primary visual cortex (VISp; coordinates: A/P: −3.8, ML: −2.5, DV: 0.6) of male and female C57Bl6/J and wild-type transgenic mice (Htr2a-Cre (-), SST-IRES-Cre; Ai67(-), Cck-IRES-Cre (-)) for rAAV-mscRE4-minBGprom-EGFP-WPRE3 and scAAV-mscRE4-minBGprom-SYFP2-WPRE3 viruses, or heterozygous Ai65F and Ai63 mice for rAAV-mscRE4-minBGprom-FlpO-WPRE3 and rAAV-mscRE4-minBGprom-tTa2-WPRE3 viruses, respectively, using a pressure injection system (Nanoject II, Drummond Scientific Company, Catalog# 3-000-204). To mark the injection site, rAAV-EF1a-tdTomato or rAAV-EF1a-EGFP was co-injected at a dilution of 1:10 with experimental virus. The expression for all viruses was analyzed at 14 days post-injection. For tissue processing, mice were transcardially perfused with 4% paraformaldehyde (PFA) and post-fixed in 30% sucrose for 1-2 days. 50 µm sections were prepared using a freezing microtome and fluorescent images of the injections were captured from mounted sections using a Nikon Eclipse TI epi-fluorescent microscope.
Immunohistochemistry
Mice were transcardially perfused with 0.1M phosphate buffered saline (PBS) followed by 4% paraformaldehyde (PFA). Brains were removed, post-fixed in PFA overnight, followed by an additional incubation overnight in 30% sucrose. Coronal sections (50 µm) were cut using a freezing microtome and native fluorescence or antibody-antibody enhanced was analyzed in mounted sections. To enhance the EGFP fluorescence, a rabbit anti-GFP antibody was used to stain free floating brain sections. Briefly, sections were rinsed three times in PBS, blocked for 1 hour in PBS containing 5% donor donkey serum, 2% bovine serum albumin (BSA) and 0.2% Triton X-100, and incubated overnight at 4°C in the anti-GFP primary antibody (1:2000; Abcam ab6556). The following day, sections were washed three times in PBS and incubated in blocking solution containing an Alexa 488 conjugated secondary antibody (1:1500; Invitrogen), washed in PBS, and mounted in Vectashield containing DAPI (H-1500, Vector Labs). Epifluorescence images of native or antibody-enhanced fluorescence were acquired on a Nikon Eclipse Ti microscope or on a TissueCyte 1000 (Tissue Vision) system.
Single cell RNA sequencing and cell type mapping
scRNA-seq was performed using the SMART-Seq v4 kit (Takara Cat#634894) as described previously6. In brief, single cells were sorted into 8-well strips containing SMART-Seq lysis buffer with RNase inhibitor (0.17 U/uL), and were immediately frozen on dry ice for storage at − 80 C. SMART-Seq reagents were used for reverse transcription and cDNA amplification. Samples were tagmented and indexed using a NexteraXT DNA Library Preparation kit (Illumina FC-131-1096) with NexteraXT Index Kit V2 Set A (FC-131-2001) according to manufacturer’s instructions except for decreases in volumes of all reagents, including cDNA, to 0.4x recommended volume. Full documentation for the scRNA-seq procedure is available in the ‘Documentation’ section of the Allen Institute data portal at http://celltypes.brain-map.org/. Samples were sequenced on an Illumina HiSeq 2500 or Illumina MiSeq as 50 bp paired-end reads. Reads were aligned to GRCm38 (mm10) using STAR v2.5.335 in towpassMode, and exonic read counts were quantified using the GenomicRanges package for R as described in Tasic, et al., (2018). To determine the corresponding cell type for each scRNA-seq dataset, we utilized the scrattch.hicat package for R6. We selected marker genes that distinguished each cluster, then used this panel of genes in a bootstrapped centroid classifier which performed 100 rounds of correlation using 80% of the marker panel selected at random in each round.
Comparisons to previous scATAC-seq studies
For comparisons to GM12878 datasets, raw data from Cusanovich, et al. (2015)13 was downloaded from GEO accession GSE67446, Buenrostro, et al. (2015)36 from GEO accession GSE65360, and Pliner, et al. (2018)12 from GEO accession GSE109828. Processed 10x Genomics data was retrieved from the 10x Genomics website. Buenrostro, Cusanovich, Pliner, and our own Gm12878 samples were aligned to the hg38 human genome using the same bowtie pipeline described above for mouse samples to obtain per-cell fragment locations. 10x Genomics samples were analyzed using fragment locations provided by 10x Genomics. For comparison to TSS regions, we used the RefSeq Genes tables provided by the UCSC Genome Browser database for hg19 (for 10X data) and for hg38 (for other datasets). To compare to ENCODE peaks, we used ENCODE Gm12878 DNA-seq HotSpot results from ENCODE experiment ID ENCSR000EJD aligned to hg19 (ENCODE file ID ENCFF206HYT) or hg38 (ENCODE file ID ENCFF773SPT).
For comparisons to previously published mouse cortex datasets, raw FASTQ files were downloaded from GEO accession GSE111586 for Cusanovich, Hill, et al. (2018)16 and from GEO accession GSE100033 for Preissl, Fang, et al. (2018)17. Multiplexed files were aligned to the mm10 genome using Bowtie v1.1.0, and were demultiplexed using an R script prior to removal of duplicate location alignments. Only barcodes with > 1,000 mapped reads were retained for analysis. Per-barcode statistics were computed using the same algorithms used for per-cell statistics from our own dataset, and samples from the Cusanovich, Hill, et al. (2018) dataset that passed our QC criteria were subjected to the same analysis pipeline as our own data after demultiplexing and duplicate read removal. Metadata from Cusanovich, Hill, et al. (2018) were obtained from the Mouse sci-ATAC-seq Atlas website at http://atlas.gs.washington.edu/mouse-atac/.
Physiology
Coronal mouse brain slices were prepared using the NMDG protective recovery method37. Mice were deeply anesthetized by intraperitoneal administration of Advertin (20 mg/kg) and were perfused through the heart with an artificial cerebral spinal (ACSF) solution containing (in mM): 92 NMDG, 2.5 KCl, 1.25 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 glucose, 2 thiourea, 5 Na-ascorbate, 3 Na-pyruvate,.5 CaCl2.4H2O and 10 MgSO4.7H2O. Slices (300 μm) were sectioned on a Compresstome VF-200 (Precisionary Instruments) using a zirconium ceramic blade (EF-INZ10, Cadence). After sectioning, slices were transferred to a warmed (32-34°C) recovery chamber filled with NMDG ACSF under constant carbogenation. After 12 minutes, slices were transferred to a holding chamber containing an ACSF made of (in mM) 92 NaCl, 2.5 KCl, 1.25 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 glucose, 2 thiourea, 5 Na-ascorbate, 3 Na-pyruvate,128 CaCl2·4H2O and 2 MgSO4·7H2O continuously bubbled with 95/5 O2/CO2.
For patch clamp recordings, slices were placed in a submerged, heated (32-34°C) recording chamber that was continuously perfused with ACSF under constant carbogenation containing (in mM): 119 NaCl, 2.5 KCl, 1.25 NaH2PO4, 24 NaHCO3, 12.5 glucose, 2 CaCl2·4H2O and 2 MgSO4·7H2O (pH 7.3-7.4). Neurons were viewed with an Olympus BX51WI microscope and infrared differential contrast optics and a 40x water immersion objective. Patch pipettes (3-6 MΩ) were pulled from borosilicate glass using a horizontal pipette puller (P1000, Sutter Instruments). Electrical signals were acquired using a Multiclamp 700B amplifier and PClamp 10 data acquisition software (Molecular Devices). Signals were digitized (Axon Digidata 1550B) at 10-50 kHz and filtered at 2-10 kHz. Pipette capacitance was compensated and the bridge balanced throughout whole-cell current clamp recordings. Access resistance was 8-25 MΩ).
Data were analyzed using custom scripts written in Igor Pro (Wavemetrics). All measurements were made at resting membrane potential. Input resistance (RN) was calculated from the linear portion of the voltage-current relationship generated in response to a series of 1s current injections. The maximum and steady state voltage deflections were used to determine the maximum and steady state of RN, respectively. Voltage sag was fined as the ratio of maximum to steady-state RN. Resonance frequency (fR) was determined from the voltage response to a constant amplitude sinusoidal current injection that either linearly increased from 1-15 Hz over 15 s or increased logarithmically from 0.2-40 Hz over 20 s. Impedance amplitude profiles were constructed from the ratio of the fast Fourier transform of the voltage response to the fast Fourier transform of the current injection. fR corresponded to the frequency at which maximum impedance was measured. While the majority of neurons we included in this study were located in primary visual cortex (n=10 YFP+, 10 YFP-), we also made recordings from motor cortex (n=1 YFP+) and primary somatosensory cortex (n=4 YFP). For illustrative purposes, we also compared the properties of YFP+ and YFP-neurons to 32 L5 pyramidal neurons located in somatosensory cortex from an uninfected mouse. To classify these neurons as IT-like or PT-like, we used Divisive Analysis of Clustering (diana) from the cluster package in R. Ih-related membrane properties are known to differentiate IT and PT neurons across many brain regions20. As such, features included in clustering were restricted to the Ih- related membrane properties - sag ratio, RN and fR. To assess statistical significance of clustering, we used the sigclust package in R.
TissueCyte imaging and analysis
TissueCyte images were collected, registered, and segmented as described previously38. After registration, 3D arrays of signal binned to 25 um voxels were analyzed in R by subtraction of background, and averaging the signal in the finest structure in the Allen Brain Atlas structural ontology. To propagate signals from fine to coarse structure in the ontology, we performed hierarchical calculations that assigned the maximum value of child nodes in the ontology to each parent from the bottom to the top of the ontology. We then filtered the ontology to remove very fine structures, and used the taxa and metacodeR packages for R39 to display the resulting ontological relationships and structure scores.
Data analysis and visualization software
Analysis and visualization of scATAC-seq and transcriptomic datasets was performed using R v.3.5.0 and greater in the Rstudio IDE (Integrated Development Environment for R) or using the Rstudio Server Open Source Edition as well as the following packages: for general data analysis and manipulation, data.table, dplyr, Matrix, matrixStats, purrr, and reshape2; for analysis of genomic data, GenomicAlignments, GenomicRanges, and rtracklayer; for plotting and visualization, cowplot, ggbeeswarm, ggExtra, ggplot2, and rgl; for clustering and dimensionality reduction, Rphenograph and Rtsne; for analysis of transcriptomic datasets: scrattch.hicat and scrattch.io; for taxonomic analysis and visualization, metacodeR and taxa; and plater for management of plate-based experimental results and metadata.
Manuscript Updates
This preprint has been updated from original biorXiv submission. The following are major changes:
Correction of GM12878 cell culturing and collection methods. GM12878 is a suspension cell line, not adherent. Thanks to Darren Cusanovich for identifying this error.
Addition of GM12878 data from Pliner, et al. (2018) to Supp Fig 4 and updates to some calculations. This better shows the high quality of current sci-ATAC-seq methods. Thanks to Jay Shendure and Darren Cusanovich for recommending this addition.
Updated analysis of Cusanovich, Hill, et al. (2018) data in Supp Fig 7 by changing the label metadata column selected. Thanks to Andrew Hill for recommending these higher-resolution labels.
Proper citation of Cusanovich, Hill et al. (2018), and discussion of the prior work in Drosophila in Cusanovich, Reddington, Garfield et al. (2018). Thanks to Jay Shendure for identifying these omissions.
Listing of R packages used for analysis.
Author Contributions
B.T., L.T.G., and T.D. designed the study; J.T., B.L,. and J.M. provided viral genome constructs and cloning protocols; G.L. performed cloning experiments. A.S.-C., T.N., and L.T.G. performed scATAC-seq experiments. B.K. performed electrophysiology experiments. S.Y. and M.M. performed viral packaging and purification. T.N., T.K., and M.M. performed retrograde injections. T.N. and E.S. performed stereotaxic injections. M.W. performed sectioning and IHC experiments. T.N. and E.G. performed additional validation experiments. N.D. managed tissue processing for RNA-seq experiments. K.S. managed RNA-seq experiments. A.C. managed virus production. Z.Y. and L.T.G. performed RNA-seq analysis. L.T.G. and A.S.-C. performed scATAC-seq analysis. H.Z. and E.L. lead the Cell Types Program at the Allen Institute. L.T.G. and B.T. wrote the manuscript, with input from all coauthors.
List of Supplementary Tables
Supp Table 1. Mouse lines
Supp Table 2. Driver/reporter donor metadata
Supp Table 3. Stereotaxic and retro-orbital injection donor metadata
Supp Table 4. scATAC-seq cell alignment and QC statistics
Supp Table 5. mscRE locations and cloning primers
Supp Table 6. scATAC-seq clustering and cell type mapping results
Supp Table 7. Virus descriptions and sources
Supp Table 8. scRNA-seq sample alignment and mapping statistics
Acknowledgments
We could not have performed this study without the support of the Allen Institute Animal Care Team for mouse husbandry, the Allen Institute Transgenic Colony Management team for colony management, the Allen Institute Laboratory Animal Services team for acute cage preps; Tamara Caspar, Kirsten Chrichton, Matthew Kroll, Josef Sulc, and Herman Tung for tissue processing; Nadiya Shapovalova, Daniel Hirschstein, and Susan Bort for FACS sorting; and Darren Bertagnolli, Michael Tieu, Delissa McMillen, Thanh Pham, Christine Rimorin, Katelyn Ward, Alexandra Glandon, and Amy Torkelson for scRNA-seq processing. The authors also thank Andrew Hill and Darren Cusanovich for assistance in accessing and reusing data published in Cusanovich, Hill, et al. (2018); and A.H., D.C., and Jay Shendure for critical reading of the manuscript. The project described was supported by award number 1R01DA036909-01 REVISED from the National Institute on Drug Abuse. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health and the National Institute on Drug Abuse. The authors thank the Allen Institute founder, Paul G. Allen, for his vision, encouragement, and support.