Abstract
Mammalian tissues are composed of highly specialized cell types defined by distinct gene expression patterns. Identification of cis-regulatory elements responsible for cell-type specific gene expression is essential for understanding the origin of the cellular diversity. Conventional assays to map cis-elements via open chromatin analysis of primary tissues fail to resolve their cell type specificity and lack the sensitivity to identify cis-elements in rare cell types. Single nucleus analysis of transposase-accessible chromatin (ATAC-seq) can overcome this limitation, but current analysis methods begin with pre-defined genomic regions of accessibility and are therefore biased toward the dominant population of a tissue. Here we report a method, Single Nucleus Analysis Pipeline for ATAC-seq (SnapATAC), that can efficiently dissect cellular heterogeneity in an unbiased manner using single nucleus ATAC-seq datasets and identify candidate regulatory sequences in constituent cell types. We demonstrate that SnapATAC outperforms existing methods in both accuracy and scalability. We further analyze 64,795 single cell chromatin profiles from the secondary motor cortex of mouse brain, creating a chromatin landscape atlas with unprecedent resolution, including over 300,000 candidate cis-regulatory elements in nearly 50 distinct cell populations. These results demonstrate a systematic approach for comprehensive analysis of cis-regulatory sequences in the mammalian genomes.