ABSTRACT
High acuity vision in primates, including humans, is mediated by a small central retinal region called the fovea. As more accessible model organisms lack a fovea, its specialized function and dysfunction in ocular diseases remain poorly understood. We used 165,000 single-cell RNA-seq profiles to generate and validate comprehensive cellular taxonomies of macaque fovea and peripheral retina. More than 80% of >65 cell types match between the two regions, but exhibit substantial differences in proportions and gene expression, some of which we relate to functional differences. Comparison of macaque retinal types with those of mice reveals that interneuron types are tightly conserved, but that projection neuron types and programs diverge, despite conserved transcription factor codes. Key macaque types are conserved in humans, allowing mapping of cell-type and region-specific expression of >190 genes associated with 6 human retinal diseases. Our work provides a framework for comparative single-cell analysis across tissue regions and species.
INTRODUCTION
Specialized anatomic regions and neural circuitry in the central nervous system (CNS) of primates are the result of evolutionary adaptations that underlie their unique sensory and cognitive capacities. One such capacity is high visual acuity. Most primates, including humans, see objects clearly only when they look straight at them so their image falls on a small central region of the retina called the fovea, which lies within the somewhat larger macula (Figure 1A). This is because the fovea, which may be the only primate-specific structure in the mammalian central nervous system, mediates high acuity vision (Bringmann et al., 2018; Provis et al., 2013). Although it occupies <1% of the retinal surface, it provides ~50% of the input to primary visual cortex (Bringmann et al., 2018; Provis et al., 2013). Diseases that selectively affect this region, such as macular degeneration and diabetic macular edema, are leading causes of human blindness in the developed world (Bourne et al., 2013). These important aspects of human retinal function and dysfunction cannot be satisfactorily studied in more accessible model organisms, including mice, because they lack a fovea (Bird and Bok, 2017).
Basic aspects of retinal structure and function are shared by all vertebrate species (Figure 1B), and between the fovea and the peripheral retina. Photoreceptors (PR, rods and cones) convert light into electrical signals, which are transferred to interneurons (horizontal, bipolar and amacrine cells; HCs, BCs and ACs). Interneurons process the information and deliver it to retinal ganglion cells (RGCs), which send axons through the optic nerve to the rest of the brain (Masland, 2012). Selective connectivity among multiple AC, BC and RGC types, along with type-specific intrinsic differences, endow each RGC type with selective responsiveness to specific visual features (Gollisch and Meister, 2010; Sanes and Zipursky, 2010).
Despite a conserved fundamental plan, there are numerous differences between the fovea and the peripheral retina (Bringmann et al., 2018; Greeff, 1874; Sinha et al., 2017). For example, many RGCs in the fovea are supplied by a single PR, which maximizes acuity, whereas peripheral RGCs are supplied by dozens to hundreds of PRs, enhancing sensitivity (Figure S1A). In addition, the fovea is rich in cones, which mediate color and contrast-sensitive vision in well-lit conditions, whereas peripheral retina is dominated by rods, which can function in dim light levels, further enhancing sensitivity (night vision). Beyond compositional differences, cell types in the two regions exhibit distinct morphologies and physiological functions (Figure S1A) (Greeff, 1874; Kolb and Marshak, 2003; Sinha et al., 2017). Because molecular information is lacking for most primate retinal cell types, the underpinnings of these differences are little explored. For example, it is unclear the extent to which differences stem from cell types unique to one region, from altered proportions across the same set of types, or from region-specific molecular programs within shared types.
Lack of molecular knowledge has also precluded systematic assessment of the conservation of retinal cell types between primates and model organisms for which greater molecular and functional information is available. As evolutionary distances grow, identification of orthologous cell types becomes challenging in the absence of such knowledge. The retina provides an opportunity to assess molecular strategies for establishing orthologies. In mouse retina, there are >110 neuronal types, with molecular signatures as well as morphological and physiological characterization available for many of them (Sanes and Masland, 2015; Shekhar et al., 2016). In contrast, although many primate retinal types have been identified (~60 to date), they have been defined largely by morphological criteria e.g. (Masri et al., 2016; Morgan and Wong, 2008; Santina and Ou, 2018), which has not proven to be a satisfactory means of matching them to mouse types. Molecular characterization could provide a lingua franca to enable mapping, which would not only inform how common model organisms relate to primates, including humans, but also aid in understanding the evolution of cell types and their associated expression programs.
To systematically address these questions, we generated and analyzed an atlas of >165,000 single-cell RNA-seq (scRNA-seq) profiles from the fovea and peripheral retina of adult crab-eating macaques (Macaca fascicularis), a widely used primate model in vision studies (Figure S1B). We identified and molecularly characterized >65 cell types in each region, providing novel molecular markers for most types, and validated many by pairing them with cell morphology. We used a comparative analytic framework to match cell types between the fovea and the periphery, showing that the two regions share a common repertoire of types, but exhibit marked differences in cell type proportions and cell-intrinsic expression programs, which likely mediate their functional differences. We then assessed the conservation of macaque retinal cell types with those in other species. We found a tight correspondence between mouse and macaque PRs, BCs and ACs, but a divergence in RGC types, despite a conserved transcription factor code. Key macaque types and molecular features are also conserved in marmosets and humans. Finally, we used our atlas to determine the cell- and region-specific expression of >190 genes implicated in 7 complex diseases that cause human blindness, demonstrating striking patterns of region- and cell type-specific expression in macaques, which are, however, not conserved in mice. Our study provides a general framework for using molecularly derived taxonomies of cell types to understand regional and species specializations in the nervous system and other organs.
RESULTS
Comprehensive molecular taxonomy of the primate retina using scRNA-seq
We generated cell atlases of the foveal and the peripheral regions of the primate retina using droplet-based scRNA-seq (Zheng et al., 2017) (see Methods), yielding high-quality cell profiles from 92,628 foveal, and 73,053 peripheral retinal cells. For foveal samples, cells were dissociated from 0.5-1.5 mm diameter samples obtained from 4 adult macaques (1 fovea per animal), and profiled without further processing. Peripheral retinal cells were obtained from 4 adults, one of which had also provided a foveal sample. Because the peripheral retina is dominated by rod PRs (~80% of total cells), we used magnetic columns to deplete rods (CD73+) or enrich RGCs (CD90+) (Figure S1B, see Methods). In parallel, we assembled a retina-specific transcriptome, which substantially improved the mapping of scRNA-seq reads compared to existing references (Figures S1C-E), emphasizing the importance of high-quality tissue-specific transcriptome for effective scRNA-seq analyses.
To maximize our ability to distinguish cell types consistently across biological replicates, we adopted an informatics approach in which we first grouped cells into the 6 major retinal classes (RGCs, BCs, PRs, ACs, HCs, and non-neuronal cells) across all samples based on known class-specific gene signatures (Figures 1C, S1F, see Table S1), and then iteratively clustered cells within each class separately for the fovea and the periphery, building on our earlier methods (Shekhar et al., 2016).
Altogether, we distinguished 64 foveal (3 PR, 2HC, 12 BC, 27 AC, 16 RGC, 4 non-neuronal), and 71 peripheral (2 PR, 2 HC, 11 BC, 34 AC, 18 RGC and 4 non-neuronal) clusters (Figures 1D-I); all clusters were found in each animal (Figures S1G-I). Non-neuronal cells were comprised of Müller Glia, pericytes, endothelial cells and microglia (Figure 1I). There are no oligodendrocytes in normal retina, and astrocytes, which predominantly ensheath RGC axons in the nerve fiber layer (Vecino et al., 2016), were not recovered in our samples. Using differential expression analysis, we identified molecular markers for each cluster within the foveal and peripheral samples. We then used this information to assign clusters to individual retinal cell types. We first describe the types, then return to document differences between corresponding foveal and peripheral types.
L and M cones are distinguished only by expression of opsin paralogs
We first analyzed PRs. Retinas of trichromatic primates, including macaques and humans, contain rods plus three cone types that preferentially detect long (L, red), medium (M, green) and short (S, blue) wavelengths, based on their expression of L, M and S opsins (OPN1LW, OPN1MW, OPN1SW) respectively. S cones were readily distinguished, whereas M and L cones mapped to a single cluster (Figure 1D). OPN1MW and OPN1LW have ~98% identical coding sequences (Onishi et al., 2002) and are not distinguished in the macaque reference genome. However, by examining relevant OPN1LW/MW reads for diagnostic single-nucleotide polymorphisms, we could assign ~70% of M/L cones as either L or M (Figure S2A-C,E).
Remarkably, no genes other than OPN1MW or OPN1LW were differentially expressed (DE) between the M and L cones (Figures 2A, S2D), supporting a model in which stochastic choice between the neighboring OPN1MW and OPN1LW genes assigns otherwise identical cones to one of two groups (Wang et al., 1999); it is less consistent with models that imply distinct transciptional programs in the two cone types (Lee et al., 2012). In contrast, many genes distinguished M/L from S cones and rods, and S cones from rods, several of which we validated using fluorescent in situ hybridization (FISH, Figures 2B,C and S2F).
Molecular classification and validation of interneuronal types
Using a combination of prior knowledge and new histological analyses, we next associated molecularly defined clusters of foveal and peripheral interneurons (BCs, ACs, and HCs) with individual cell types.
BCs, the major excitatory interneurons of the retina, are divided into those that receive predominant input from cones or rods, and cone BCs are further divided into those excited (ON BCs) or inhibited (OFF BCs) by light; all rod BCs are ON (Euler et al., 2014). Twelve BC types (1 rod, 5 OFF cone and 6 ON cone) have been previously reported in macaque peripheral retina based on morphological and immunohistochemical analyses (Joo et al., 2011; Tsukamoto and Omi, 2015, 2016).
We identified 12 foveal BC and 11 peripheral BC clusters; 10 mapped 1:1 to known types (Table S2), and supervised methods separated another cluster into two known types in both regions (Figure S2H). One foveal cluster corresponded to a type not described previously. Based on gene expression, it is likely an OFF type; we provisionally call it OFFx (Figure S2I). We identified markers for each BC type and validated several of them (Figures 2D,E, S2G,J-O).
HCs and ACs are the main source of inhibitory input to retinal circuitry. We identified 2 clusters of HCs in both fovea and periphery (Figure 1E), and confirmed that the corresponding cluster-specific markers labeled distinct HC subsets in tissue. We also validated SPP1, which encodes for the secreted protein osteopontin, as a pan-HC marker (Figure S5D). We similarly identified 27 foveal AC clusters, and 34 peripheral AC clusters, including both GABAergic and glycinergic subfamilies (Figures 1G, S4A, S5E, Table S2). ACs remain the least characterized retinal class across all vertebrate species. However, we tentatively assigned six clusters to known mouse types based on their expression of orthologous markers, including Starburst, VIP+, SEG, catecholaminergic, glutamatergic (VGluT3+), and SEG ACs (Figures S4A and S5E). In addition, we found the expression of neuropeptides in subsets of GABAergic AC types, which is consistent with previous studies (Table S2).
Markers for midget and parasol RGCs
Morphological studies have distinguished 17-20 RGC types in primate retina (Dacey, 2004; Masri et al., 2016). ON and OFF Midget RGCs (MGCs), the smallest and most abundant, comprise ≥85% of foveal RGCs; larger ON and OFF parasol RGCs (PGCs) comprise ~10%; small bistratified RGCs (SBCs) and ~12 types with larger dendritic arbors account for the remaining ~5%. MGCs and PGCs are also the most abundant RGC types in the periphery, together accounting for ~70% of peripheral RGCs (Dacey, 2004). Axons from MGCs, PGCs and SBCs innervate different layers of the lateral geniculate nucleus and, indirectly, different sets of cortical targets (Dacey, 2004).
We identified 18 peripheral and 16 foveal RGC clusters (Figure 1H). Based on their abundance, we tentatively identified one pair of clusters as ON and OFF MGCs (~85% of RGCs in fovea, 83% in periphery) and a second pair as ON and OFF PGCs (11% in fovea, 4.8% in periphery). We identified markers for these types, including TBR1 (OFF MGC), TPBG (ON MGC), CHRNA2 (ON PGC) and CA8 (OFF PGC). Among MGCs and PGCs, SPP1 and RBPMS2 are expressed by both PGC types and GUCY1A3 by both MGC types (Figure 2F, S3C-O). We confirmed the identities of these four RGC types by combining in situ hybridization for selective markers with viral labeling to visualize morphology (Figures 2G, S3B,P,Q). Interestingly, somata expressing transcription factors selective for the major ON and OFF RGC types (MGCs and PGCs) were segregated into outer and inner halves of GCL, suggesting a molecular basis for the laminar organization of physiologically distinct subclasses (Perry and Silveira, 1988) (Figures 2H and S3P). We were unable to further partition MGCs and PGCs by supervised methods.
Three peripheral RGC clusters expressed OPN4 (melanopsin), a marker of intrinsically photosensitive RGCs (ipRGCs) (Do and Yau, 2010). We identified markers for the remaining clusters, including SATB2 as a marker for SBCs and two other types (Figures 2I and S3A). To our knowledge, these are the first molecular markers for the major RGC types of the primate retina (Figure S3A).
Most retinal cell types match 1:1 between the fovea and the periphery, but differ in proportion
We devised a computational approach to relate foveal and peripheral clusters based on their molecular expression patterns. While it was straightforward to match PR types (rods and cones) and non-neuronal types (Müller glia, pericytes, endothelial cells and microglia) across regions based on known markers, HCs, BCs, ACs and RGCs required a more systematic approach. We used a multi-class learning framework that associated each foveal cell with a corresponding peripheral identity, and then examined the extent to which clusters across the two regions mapped specifically (see Methods).
Remarkably, 77% of foveal clusters exhibited a 1:1 mapping with peripheral clusters including 10 of 12 BC, 2 of 2 HC, 20 of 27 AC and 14 of 16 RGC clusters (Figures 3A-D). The matches between the foveal and the peripheral clusters was significantly higher than predicted by chance, as quantified by the Adjusted Rand Index (ARI) values, a measure of similarity between data clusterings. Six foveal clusters (RGC clusters fRGC11 and fRGC14, and AC clusters fGa3, fGa7, fGa13 and fGl8) could be partitioned because they each mapped to two closely related peripheral clusters. Notably, there were only three instances of multiple (≤ 3) foveal clusters mapping to a single peripheral cluster. Most instances of multi-mapping were for AC clusters, which likely reflects a biased frequency distribution in the peripheral samples, which were not optimized for AC recovery (Figures S4B).
Quantifying the compositional similarity between the fovea and the periphery for each neuronal class suggested that differences decrease from outer to inner retina. Specifically, values of the Jensen-Shannon divergence (JSD) - a distance measure in frequency space ranging from 0 (identical cell type composition) to 1 (non-overlapping, region-specific cell types) were: PRs (0.21), BCs (0.16), ACs (0.06), and RGCs (0.035) (see Methods), suggesting that the major compositional differences between the fovea and periphery is within the sensors (PRs) and the earliest processors (BCs). There were numerous differences in relative proportions for 1:1 matched types (Figures 3E-H and S4B, Table S2), some consistent with previous reports, such as the predominance of rod BCs in peripheral retina, and the enrichment of cone midget bipolars (IMB and FMB) in fovea. A few types appeared to be highly restricted to either fovea or periphery. Six RGC types were greatly underrepresented (>4-fold) in foveal compared to peripheral samples (Figure 3G). In addition, 6 GABAergic AC types appeared to be periphery-specific (Figure 3C) and the OFFx BC type appeared to be fovea-specific (Figure S2I). Taken together, these results suggest that foveal and peripheral neuronal circuitry draw upon a similar but not identical “parts list” of cell types.
Programs for phototransduction and GABAergic neurotransmission differ between matching types in the fovea and periphery
We next asked whether changes in gene expression between corresponding cell types further distinguished the fovea and periphery (Figures 4, S4). Indeed, we found a median of 17 ± 8 genes significantly DE between corresponding types (Table S3), with more differences for RGCs and non-neuronal cells (median 130 genes per type) compared to interneurons (HCs, BCs and ACs; median 13 genes per type) (Figure 4A).
One noteworthy difference, which might underlie a functional foveal specialization, is related to phototransduction. The phototransduction component GNGT1, which is selectively expressed by rod PRs in mice and humans, and at higher levels by rods than cones in macaques (Figure 4B), is expressed at higher levels in foveal than peripheral cones, as verified by FISH and immunohistochemistry (Figure 4C and see below). A recent study showed that light responses are slower in macaque foveal than peripheral cones (Sinha et al., 2017), and homozygous deletion of Gngt1 accelerates rod responses in mice (Kolesnikov et al., 2011). It is thus tempting to speculate that GNGT1 contributes to the kinetic differences in macaques. Conversely, another phototransduction component GNB3 is thought to be selectively expressed by cones but is also expressed at substantial levels by foveal rods (Figure 4B). Likewise, although the molecular profiles of foveal and peripheral BC types were extremely similar, the cation channel gene TRPM1 is expressed in peripheral but not foveal OFF DB1 cells (Figures 4D,E).
Another prominent difference is that several genes involved in GABAergic neurotransmission are expressed at higher levels in peripheral types than in their foveal counterparts. They include the GABAA receptor β2 subunit GABRB2 in peripheral PRs (Figures 4B,C), the GABA synthetic enzyme GAD2 in peripheral rod BCs (Figure 4D) (Lassova et al., 2010) and several GABA receptor subunits in multiple peripheral GC types, including MGCs and PGCs (Figure 4F-H and S4C). Thus, the paucity of inhibitory drive to foveal midgets observed physiologically (Sinha et al., 2017) reflects both the absence of several GABAergic AC types from fovea (Figure 3C) and decreased expression of GABAergic components in shared cell types. These differences suggest distinct signaling processes in the two regions.
Interestingly we also observed >100 DE genes between foveal and peripheral MGs, including SPP1 enriched in peripheral MGs and CYP26A1 in foveal MGs (Figures 4I-K, S4D). These differences provide potential substrates for differences between their interactions with neurons in the two regions (Bringmann et al., 2018; Greeff, 1874; Sinha et al., 2017).
Tight molecular correspondence between macaque and mouse BC and AC types
While the conservation of genes across species has been extensively studied, we know less about the extent to which cell types are conserved, and such comparisons have often been challenging given the potentially rapid evolution of defining markers (Shay et al., 2013). scRNA-seq enables systematic comparison of cellular composition across species by leveraging gene orthologues to compare gene expression programs rather than individual markers (Marioni and Arendt, 2017). To exploit this opportunity, we next compared our macaque atlases to those previously collected from mice (Macosko et al., 2015; Shekhar et al., 2016), the currently most-studied model. To this end, we adapted the multi-class classification framework used for fovea-periphery comparisons to relate mouse and macaque retinal cell types.
As expected from the conserved retinal architecture, there was a 1:1 match between cell classes in the two species (Figure S5A). In addition, there was also a tight correspondence between mouse and macaque PR, HC, BC and AC types. Mice are dichromats, with only M and S cone opsins, and many cones express both opsins (Euler et al., 2014). Mouse M and S cones most closely resembled macaque M/L and S cones, respectively (Figures 5A, S5B). Mouse HCs were more transcriptionally similar to macaque H1, which they resemble morphologically, than to H2 (Wassle et al., 2000) (Figures 5B, S5C,D, Table S2). For ACs, there was a striking correspondence between macaque and mouse, based on comparison to our previously reported (but underpowered) estimate of 21 molecularly distinct mouse AC clusters (Macosko et al., 2015) (Figure 5C). For example, GABAergic and glycinergic types corresponded, and there were clear macaque equivalents of well-studied mouse types such as starburst, AII, glutamatergic (VGluT3+), catecholaminergic, VIP-positive and SEG ACs (Figures 5D, S5E and Table S2). A more comprehensive ongoing study has revealed >60 mouse AC types, and increased the 1:1 matches with macaque (W.Y., K.S., A.R., and J.R.S. in preparation).
For BCs, 9 peripheral macaque types mapped preferentially to mouse types with a similar light response (ON vs. OFF, Figure 5E; 8 were 1:1), and expression patterns of type-enriched orthologues were conserved between the two species (Figure 5F). As noted below, we were unable to identify murine equivalents of MGCs or PGCs, the most abundant primate RGCs. However, the BCs that selectively innervate MGCs (FMB and IMB) correspond to mouse BC1 and 7; and the BCs that provide strongest innervation to PGCs (DB2, 3a and 4) (Tsukamoto and Omi, 2015, 2016) correspond 1:1 to mouse BC4, 3a and 5a. It will be interesting to ask whether postsynaptic targets of these BC types share features with MGCs and PGCs.
Dramatic divergence in expression programs between macaque and mouse RGC types
In contrast to the correspondence between mouse and macaque BCs and ACs, RGC types differ greatly in both number (>40 in mice (Baden et al., 2016; Bae et al., 2018) vs. ~ 20 in macaque) and distribution (no single mouse type accounts for >10% of all RGCs whereas MGCs account for >50% of macaque RGCs). Moreover, using a recently published scRNA-seq atlas of early post-natal mouse RGCs (Rheaume et al., 2018), we found only a few clear matches between the species (ARI = 0.16), one being the evolutionarily ancient melanopsin-positive intrinsically photosensitive RGC (Do and Yau, 2010) (Figures 5G,I); macaque RGC types exhibit a similar lack of correspondence with adult mouse RGC types (K.S., W.Y., A.R. and J.R.S, in preparation). Notably, we found no clear mouse equivalents of macaque ON MGCs, OFF MGCs or OFF PGCs; there may be a mouse equivalent of ON PGCs (Figure 5G), but it has not been characterized.
Nonetheless, the RGC comparison did reveal similar patterns of type-enriched transcription factor expression across macaque and mouse RGC clusters. For example, Tbr1, Eomes, Satb2, Tbx20 and Foxp2, several of which regulate type-specific features of mouse RGCs (Liu et al., 2018; Mao et al., 2014; Peng et al., 2017), are also expressed by restricted macaque types (Figure 5H). Among them, Satb2 is expressed by bistratified RGCs in both species (Figure 2I) (Peng et al., 2017). Moreover, patterns of transcription factor co-expression were similar in both species (Figure S5F). For example, Eomes and Meis2 mark largely non-overlapping sets of clusters whereas Neurod2 and Satb2 are coexpressed. This suggests a model wherein subfamilies of RGC types in both species are specified by TF codes, but with substantial divergence of the regulatory targets of those codes, accompanied by an abundance of MGCs and lesser diversity in macaques.
Conservation of cell types and markers across primates
A main value of studies in non-human primates is that their properties are likely to be shared with humans. We asked whether this is true of retina. To this end, we analyzed retinas of humans and another non-human primate, the common marmoset (Callithrix jacchus), which is increasingly used because of its small size and genetic accessibility (Mitchell et al., 2014).
We found strong conservation of key molecular and cellular features across macaque, human and marmoset retinas. For example, markers of MGCs, PGCs, S cones and HC types were conserved in marmoset (Figures 6A, S6A,B) and the laminar segregation of foveal OFF and ON RGCs was apparent in both marmoset and human (Figures 6B, S6D). Distinctions between corresponding peripheral and foveal cell types were also conserved, such as the enrichment of GNGT1 in marmoset foveal cones (Figure S6C) and enrichment of CYP26A1 and SPP1 in foveal and peripheral MGs respectively in both humans and marmosets (Figures 6C,D and S6E, 7E). Moreover, profiling 2,383 BCs from human peripheral retina, we found close correspondence (ARI = 0.74) between human and macaque types (Figures 6E,F, S6F-H), extending the conservation of macaque BC types from mouse to humans. Together with previous morphological studies (Haverkamp et al., 2003; Liao et al., 2016), these results reveal strong conservation of cell types among primates.
Cell type specific expression of human retinal disease genes
Finally, encouraged by the correspondence between human and macaque cell types, we assessed the cellular distribution of genes that have been implicated in 6 groups of human ocular diseases associated with disabling vision loss: retinitis pigmentosa, cone-rod dystrophy, diabetic macular edema and retinopathy, congenital stationary night blindness, glaucoma (including primary open angle and primary angle closure types), and both wet and dry forms of age-related macular degeneration. We focused on ~200 genes for which association has been demonstrated by rare highly penetrant mutations or by GWAS studies (Farrar et al., 2017; Graham et al., 2018; MacGregor et al., 2018; Wiggs and Pasquale, 2017; Zeitz et al., 2015).
For each gene we first calculated an enrichment score for each macaque foveal and peripheral cell class. Aggregating these scores by human retinal disease groups (Figure 7A) shows that, in general, expression was highest in cell classes primarily affected by the diseases: PRs for retinitis pigmentosa and rod-cone dystrophy; PRs and BCs for congenital stationary night blindness; RGCs for glaucoma; and non-neuronal cells for diabetic macular edema and retinopathy. As expected, many genes implicated in macular degeneration were expressed in PRs and non-neuronal cells, but several were also highly expressed in other cell classes, especially ACs, a class whose role in disease etiology and progression remains to be explored.
Patterns of expression for several genes are noteworthy. At least three genes associated with diabetic macular edema which, as the name implies, selectively affects the macula, were expressed at higher levels in certain foveal cell types than their peripheral counterparts, such as PDGFB and EDN1 in endothelial cells (Figure 7C) (Graham et al., 2018). Likewise, the macular degeneration susceptibility gene VTN (Fritsche et al., 2016) was enriched in foveal compared to peripheral rods and cones (Figure S7). HTRA1, a major susceptibility gene for macular degeneration (Fritsche et al., 2016), which has been implicated in dysfunction of retinal pigment epithelium (anatomically distinct from neural retina and not represented in our dataset) is also expressed at high levels in macaque HCs, but not in mouse HCs. Glaucoma associated genes CYP1B1 and CYP26A1 were enriched in a GABAergic AC type and foveal MGs, respectively (Figure 7B,E and S7). Several other glaucoma-associated genes were enriched in specific RGC subsets: for example, MAP3K1 by MGCs and PGCs; SIX6 by MGCs; and GAS7 in multiple non-MGC and PGC types (Figure 7D). These region- and cell type-selective expression patterns can inform future analyses of genetic variants associated with disease risk. Interestingly, expression patterns of majority of human retinal-disease genes were not conserved in mice, highlighting the limitations of using a mouse atlas as a reference (Figure S7).
DISCUSSION
By profiling >165,000 single-cells, we generated an atlas of molecularly defined cell types in the primate retina. The number of cells needed to generate a comprehensive atlas depends on multiple factors, including the number of types represented in the tissue and the extent to which types differ from each other (Shekhar et al., 2016). In our case, we were able to detect very rare cell types – for example ipRGCs, comprising <0.002% of all retinal cells (Liao et al., 2016). Reasonably complete compendia of cell types from comparably complex tissues will likely require at least similar cell numbers. Using this atlas as a foundation, we then explored regional differences within a tissue and examined evolutionary specializations across species. We undertook two approaches to achieve this: comparing the cellular compositions of fovea and peripheral retina of macaques; and comparing retinal cell types between macaque (which have a fovea) and mice (which do not).
Comparing the fovea and the peripheral retina revealed an overall similar “parts list” but specializations in cellular proportions and cell-intrinsic expression programs of the fovea. Our results also suggest that the compositional differences between the fovea and the periphery are higher at the sensory and preprocessing layers (PRs and BCs) than for downstream processors and feature detectors (ACs and RGCs). A few cell types are unique to or highly enriched in each region, and these may underlie certain functional differences between them. The OFFx bipolar is particularly interesting in this regard, and the absence of some foveal GABAergic AC types may contribute to the low level of foveal inhibition (Sinha et al., 2017). Nevertheless, despite these rare unique types, our results suggest that the dramatic specializations of foveal circuitry and function arise from regional differences in proportion and gene expression within shared types rather than the presence of region-specific types. Some of the differentially expressed genes, such as GNGT1 and subunits of the GABA receptor, may underlie the differences in the visual responses mediated by the foveal and peripheral circuitry. These differences might arise from developmental programs that specify the fovea (Bringmann et al., 2018; da Silva and Cepko, 2017; Hendrickson et al., 2012; Hoshino et al., 2017). Future studies will focus on elucidating the functions of such region-specific genes to understand signaling processes in the fovea.
By comparing the macaque and the mouse retinal cell taxonomies, we provide compelling evidence for orthology across cell classes, with the dramatic exception of RGCs. The known evolutionary conservation of the fundamental retinal plan extends from cell classes to cell types, with strong conservation for PRs and interneurons (HCs, BCs and ACs). However, this conservation is tenuous for RGCs at the level of gene expression patterns, number and frequency distribution of types. Most notably, MGCs account for most retinal output neurons in primate but lack clear counterparts in mouse. These differences suggest that the outer retina may comprise a conserved set of information processors, with adaptation to species-specific visual processing requirements beginning at the level of RGCs, and then continuing centrally.
Interestingly, despite dramatic differences between macaque and mouse RGCs, there may be an underlying shared genetic regulatory program that specifies RGC diversity, as evidenced by the conserved patterns of transcription factor expression and co-expression restricted to small sets of RGC types. These factors may specify groups that diversify differentially in the two species. Together with recent comparative transcriptomic studies of other parts of the nervous system (La Manno et al., 2016), this raises the possibility that cross-species differences in cell type diversity in homologous regions can be explained by a rewiring of the downstream programs specified by a conserved set of transcription factors.
Finally, our study provides both technical and conceptual foundations for investigations aiming to understand cellular mechanisms underlying blinding retinal diseases. By assessing expression of nearly 200 genes implicated in human retinal diseases, we found that many are selectively expressed in particular retinal cell classes, in particular types within classes, and in foveal or peripheral cohorts on particular types. For example, HTRA1, one of the two major risk genes for macular degeneration, is expressed predominantly by HCs among retinal neurons; its role, if any, in these cells merits investigation. Likewise, PDGFB and EDN1, implicated in diabetic macular edema, are expressed at significantly higher levels by foveal than peripheral endothelial cells, suggesting a basis for the increased susceptibility of the macula in this disease. Interestingly, several of the patterns we document in macaque are not conserved in mice -for example, HTRA1 expression in HCs-emphasizing the value of using primate models for investigating human retinal dysfunction.
AUTHOR CONTRIBUTIONS
Y.-R.P., K.S., A. R., and J.R.S conceived the study and wrote the manuscript with input from all authors. K.S., A.S. and W.Y. performed bioinformatic analysis, with guidance from A. R.. Y-R.P. and D.H. performed molecular and histological experiments, with guidance from M. T. H. D and J. R. S.. T.v.T. and G.S.B. procured and prepared tissue. M.T.H.D. provided guidance on study design and analysis.
METHODS
Tissue Procurement
Non-human primate and human retinas were obtained and used in accordance with the guidelines for the care and use of animals and human subjects at Harvard University and Boston Children’s Hospital, and supplying institutions. All the procedures on non-human primates were approved by the Institutional Animal Care and Use Committees. Acquisition and use of human tissue was approved by the Human Study Subject Committees (DFCI Protocol Number: 13-416 and MEE - NHSR Protocol Number 18-034H).
Eyes from male macaques (Macaca fascicularis, 3-9 years of age) were kindly provided by institutions including Biomere and Massachusetts General Hospital. Eyes were collected either pre-mortem under deep anesthesia or ≤ 45 min post-mortem. In some cases, whole globes were immediately placed in ice-cold Ames solution (Sigma-Aldrich; equilibrated with 95% O2/5% CO2 for all use), where they were stored before experimentation. In others, a rapid hemisection was performed to remove the vitreous and the anterior chamber, and the posterior eyecup was immersed in room-temperature Ames. Retinas were then dissected free and stored in ice-cold Ames solution. Experiments commenced within 8 hours. For access to macaque tissue, including the samples used in this study, we thank V. Belov, C. Cetrulo, D. Guberski, A. Hall, P. Kovalenko, A. LaRochelle, J. Madsen, M. Nedelman, M. Papisov, and S. Smith.
Eyes from male and female common marmosets (Callithrix jacchus, 2-10 years of age) were generously provided by the McGovern Institute for Brain Research (Massachusetts Institute of Technology). Animals were perfused with 4% paraformaldehyde (PFA) under deep anesthesia and post-fixed for 1 hr in 4% PFA. Eyes were then collected and stored in ice-cold Ames solution. We are gratefully to G. Feng, Q. Zhang and C. Wu for access to this tissue.
Macaque and marmoset eyes were collected from animals that had reached the end of unrelated studies at supplying institutions. No ocular or visual abnormalities were noted. Data presented in this manuscript did not covary with any treatment that had been applied to the animals.
The human eye used for sequencing and in situ hybridization was collected ~6 hours postmortem from a 74 year-old male. The whole globe was immediately transported back to the lab in a humid chamber. Hemisection was performed to remove the anterior chamber and the posterior pole was recovered in Ames equilibrated with 95% O2/5% CO2 before further dissection and dissociation. The donor was confirmed to have no history or clinical evidence of ocular disease or intraocular surgery We are grateful to Dr. Dejan and the Rapid Autopsy Program, Susan Eid Tumor Heterogeneity Initiative, Massachusetts General Hospital for expeditious access to this material, enabling recovery of high quality RNA for sequencing.
Other human eyes used for in situ hybridization were provided by the Lions VisionGift (Portland, OR) and the National Disease Research Interchange (NDRI; Philadelphia, PA). They were collected <11 hr postmortem from a 28 year-old male and a 69 year-old female, hemisected, and fixed overnight in ice-cold 4% PFA following removal of the cornea. No ocular disease was reported in these donors.
RNA-sequencing
Single Cell Isolation
0.5-1.5 mm diameter foveal tissues centering on the foveal pit were dissected out from four macaque eyes. Foveal samples were digested with 200 units papain (Worthington, LS003126). Foveae M1 and M2 were digested at 37°C; foveae M3 and M4 were digested at 22°C. Following digestion, retinas were dissociated and triturated into single cell suspensions with 0.04% bovine serum albumin (BSA) in Ames solution. Peripheral retinal pieces were dissected and pooled from all quadrants of the retina. Single cell suspensions were dissociated at 37°C as described for fovea. Dissociated cells were incubated with CD90 microbeads (Miltenyi Biotec, 130-096-253) to enrich RGCs or with anti-CD73 (BD Biosciences, clone AD2) followed by anti-mouse IgG1 microbeads (Miltenyi Biotec, 130-047-102) to deplete rods. CD90 positive cells or CD73 negative cells were selected via large cell columns through a MiniMACS Separator (Miltenyi Biotec). Single cell suspensions were diluted at a concentration of 500-1800 cells/μL in 0.04% BSA/Ames for loading into 10X Chromium Single Cell A Chips. Human cells were dissociated and treated with anti-CD73 as above.
Droplet-Based scRNA-seq
Single cell libraries were prepared using the Chromium 3’ v2 platform (10X Genomics, Pleasanton, CA) following the manufacturer’s protocol. Briefly, single cells were partitioned into Gel beads in EMulsion (GEMs) in the GemCode instrument followed by cell lysis and barcoded reverse transcription of RNA, amplification, shearing and 5’ adaptor and sample index attachment. On average, approximately 10,000 single cells were loaded on each channel and approximately 6,000 cells were recovered. Libraries were sequenced on the Illumina HiSeq 2500 (Paired end reads: Read 1, 26bp, Read 2, 98bp).
Viral Transduction
Macaque or marmoset retinas were divided into four quadrants according to cardinal axes with the macula in the temporal region and mounted on Millicell culture inserts (Millipore) with the ganglion cell layer facing up. Retinas were infected with 5 μl of AAV retrograde virus encoding GFP directed by the CAG promoter (Addgene) with a titer ~10e12. Methods were adapted from (Johnson and Martin, 2008; Meyer-Franke et al., 1995). Transduced retinas were cultured in Neurobasal A medium (Fisher 10888-022) supplemented with 0.1 mg/ml penicillin/streptomycin (Sigma G4333), 1 mM L-glutamine (Sigma G7513), B27 (Fisher 17504-044), 0.005 mM forskolin (Sigma F6886), 1 mM sodium pyruvate (Fisher 11360-070), 25 ng/ml BDNF (PeproTech 450-02), and 10 ng/ml CNTF (PeproTech 450-13) for four days. Retinas were then fixed in 4% PFA in phosphate-buffered saline (PBS) for 1hr at 4 and separated from the culture insert after fixation.
Biolistic Transfection
The biolistics procedure was adapted from published protocols (Masri et al., 2016; Morgan and Wong, 2008; Santina and Ou, 2018). Briefly, gold particles (1.6 μm diameter; 12.5 mg; Bio-Rad) were coated with pCMV-GFP DNA plasmid (25 μg). The particles were delivered to retinal cells in whole-mount retinas with GCL facing up using a Helios gene gun (Bio-Rad). The retinal explants were cultured for 3 days followed by fixation as described for virus transduced retinas.
Fluorescent In Situ Hybridization
Eyes were fixed in 4% PFA. Marmoset and human eyes were fixed by perfusion and immersion respectively as described under “Tissue Procurement.” For macaque eyes, slits or windows were made in the cornea of eyes collected post-mortem, and the globe was immersion-fixed overnight in ice-cold 4% PFA. Probe templates were generated using cDNA derived from adult macaque, marmoset, or human retinas following RNA extraction and reverse transcription with AzuraQuant™ cDNA Synthesis Kit (Azura, AZ-1995). Antisense probes were generated by PCR using a reverse primer with a T7 sequence adaptor to permit in vitro transcription (see Table S4 for primer sequences). DIG rUTP (Roche, 11277073910), DNP rUTP (Perkin Elmer, NEL555001EA), and Fluorescein rUTP (Roche, 11685619910) were used for synthesizing probes for single, double or triple fluorescent in situ hybridization (Kolb et al.) experiments. Retinas fixed as above were rinsed with PBS, immersed with 30% sucrose overnight at , embedded in Tissue Freezing Medium (EMS) and cryosectioned at 20 μm. Details of FISH were described in ref. 12. Briefly, sections were mounted on Superfrost slides (Thermo Scientific), treated with 1.5 μg/ml of proteinase K (NEB, P8107S), and then post-fixed and treated with acetic anhydride for deacetylation. Probe detection was performed with anti-DIG HRP (1:1000), anti-DNP HRP (1:500), anti-Fluorescein (1:1000) followed by tyramide amplification. Detection of protein epitopes was performed following probe detection. Antibodies were diluted in block consisting of 3% donkey serum (Jackson, 017-000-121), and 0.3% Triton-X in PBS at concentrations of 1:500 (anti-GFP), 1:1000 (anti-Calbindin), 1:2000 (anti-PKCα), and 1:100 (anti-CD15) (See below for antibody information).
Immunohistochemistry
Tissue was fixed as described above. Immunostaining was performed as described in refs. 12 and 31. Antibodies used were as follows: chick and rabbit anti-GFP (1:500, Abcam; 1:2000, Millipore); mouse anti-Satb2(1:1000, Abcam); goat anti anti-CHX10 (1:300, Santa Cruz); rabbit anti-TFAP2B (1:1000, cell signaling); goat anti-Choline Acetyltransferase (1:500, Millipore); goat anti-VAChT (1:1000, Millipore); rabbit and guinea pig anti-RBPMS (1:1000, Abcam; 1:5000, PhosphoSolutions); rabbit anti-Calbindin (1:2000, Swant); mouse anti-Calretinin (1:5000, Millipore); mouse anti-Gad65/67 (1:1000, Millipore); mouse and rabbit anti-PKCa (1:2000, Abcam; 1:2000, sigma); rabbit anti-Secretagogin (1:10,000; BioVendor); mouse anti-Human CD15 (1:100, BD); rabbit anti-GNGT1 (1:1000, gift from Dr. N. Gautam), mouse anti-7G6 (1:1000, gift from Dr. Peter MacLeish), and chick anti-OPN1M/LW (1:2000, gift from Dr. Jeremy Nathans). Nuclei were labeled with DAPI (1:1000, Invitrogen). Secondary antibodies were conjugated to Alexa Fluor 488, 568, and 647 (Invitrogen) and used at 1:1000. Sections and whole mounts were mounted in ProLong Gold Antifade (Invitrogen).
Image Acquisition, Processing and Analysis
Images were acquired on Zeiss LSM 710 confocal microscopes with 405, 488-515, 568, and 647 lasers, processed using Zeiss ZEN software suites, and analyzed using ImageJ (NIH). Images were acquired with 16X, 40X or 63X oil lens at the resolution of 1024×1024 pixels, a step size of 0.5-1.5μm, and 90μm pinhole size. ImageJ (NIH) software was used to generate maximum intensity projections and neuronal dendrites were reconstructed with ImageJ plugin simple neurite tracer. Adobe Photoshop CC was used for adjustments to brightness and contrast.
Assembling a Retina-specific Transcriptome for M. fascicularis
We obtained high quality total RNA from macaque retinal tissue (RNA Integrity Number (RIN) score 9.8), and prepared strand-specific libraries using the TruSeq strand-specific Total RNA kit (Illumina Inc.), which was sequenced on the NextSeq 500 system to obtain 45 million 100bp paired end reads. We used StringTie (v1.3.3) (Pertea et al., 2015) to assemble a genome-guided transcriptome, using an available NCBI transcriptome for M. fascicularis1 (annotation release 101) as an initial guide. Briefly, we mapped RNA-seq reads onto the existing transcriptome in a strand-specific manner using the Hisat2 software (with command line options --dta --rna-strandedness RF), following published guidelines (Pertea et al., 2016). Next, we used StringTie (with command line option --rf) to assemble a new transcriptome based on TruSeq reads. We then reran StringTie (with the command line option merge) to obtain an updated transcriptome annotation, which contained modifications of transcript body definitions from the existing NCBI reference as well as novel transcripts supported by the TruSeq reads. While the modified transcripts retained gene names from the original annotation, the novel transcripts were initially named according to Stringtie’s naming convention (e.g. MSTRG.5141). We call this the NCBI + TruSeq reference in Figure S1. There were a few instances where sense and antisense transcripts were predicted for some genes; in these cases we added suffixes “_p” and “_n” against the corresponding gene name to indicate this fact (e.g. FEZF1_p and FEZF1_n).
To facilitate transcriptional mapping of macaque types to mouse and human types, we used BLAT (Kent, 2002) to associate each macaque transcript with its closest ortholog (reciprocal best matches) in the NCBI mouse (version GRCm38) and human (version GRCh37) transcriptomes, respectively. Encouragingly, this automated procedure matched transcript gene names with similar letter codes (e.g. macaque TRPM1 with mouse Trpm1 and human TRPM1). We use this mapping to associate each reconstructed transcript with the closest human gene by sequence, in cases where such homology was strong. In some cases, we also found that certain tenuously named loci in the NCBI reference could be associated with homologous mouse and human gene names (e.g. macaque LOC101864869 could be associated with mouse Rps27a, LOC102132859 could be associated with mouse Cyp26a1 etc). For many loci, however, we were unable to associate a human gene and for these cases, we retained the naming convention assigned by StringTie (e.g. MSTRG.5141).
Pre-Processing Of 3’ Droplet-based scRNA-Seq Data
Sample demultiplexing, alignment to the NCBI+TruSeq reference, quantification and initial quality control (QC) was performed using the Cell Ranger software (version 2.1.0, 10X Genomics) for each sample (i.e., 10X channel) separately. We used the option “--force-cells 6000” in Cell Ranger count to obtain a 36,162 genes x 6,000 cells count matrix each sample to deliberately extract a larger number of cell barcodes in the data, as we found that the automatic estimate of Cell Ranger was too conservative, and was unfavorable for small cell-classes like BCs. Here, 6,000 represented a “loose” upper bound on the number of cells that could be recovered given the density of each the cell suspension loaded onto every channel per the manufacturer’s estimates. We grouped the count matrices separately for the foveal and the peripheral samples to generate a consolidated matrix for each region. Only cells which expressed >500 genes were retained for further analysis. Further pruning of cells (low quality cells, doublets) was done for each class separately (see below).
Separation of Major Cell Classes
Both the foveal and the peripheral datasets consisted of multiple biological replicates (Figs. S1G-I). Moreover, depending on the enrichment method, each sample contained widely different distributions of the main cell classes. For example, the foveal samples typically comprised of ~30% RGCs, ~28% BCs, ~8% ACs (Figure S1H), the peripheral CD90+ samples contained ~28% RGCs, 0% BCs, ~50% ACs, and the peripheral CD73-samples contained <1% RGCs, ~30% BCs and ~10% ACs (Figure S1I). Moreover, we anticipated that the differences in proportions also extends to types within a class (Figure 3), and that foveal and peripheral types likely had molecular differences (Figure 4). This made conventional batch correction using methods such as SVA (Leek et al., 2012) and ComBat (Johnson et al., 2007) difficult, because they make strict distributional assumptions that are violated here. Furthermore, covariates that can be used to counteract these assumptions are not known a priori. We also attempted to perform global batch correction using recently published methods that use Canonical Correlation Analysis (CCA; (Butler et al., 2018)) or Mutual Nearest Neighbour (MNN;(Haghverdi et al., 2018)) matching, but in many cases these did not completely remove batch effects, collapsed distinct types with known molecular markers, or appeared to have varying impact on different cell classes.
To make batch correction more tractable and to avoid extensive biases in the initial stages for the reasons stated above, we first analyzed the foveal and the peripheral datasets separately to stratify the cells by their class. We followed our earlier PCA + Louvain-Jaccard graph-clustering pipeline (Shekhar et al., 2016) without batch correction to obtain a set of transcriptionally distinct clusters. At this stage, we deliberately set parameters to “overcluster” the data to avoid combining distinct cell classes. Next, we scored each cell for gene signatures of well-known retinal cell classes classes - Rods, Cones, Bipolar Cells (BC), Horizontal Cells (HC), Amacrine Cells (AC), Retinal Ganglion Cells (RGC), and non-neuronal cells such as Müller Glia (MG) and others (Table S1). Classes with shared markers often scored similarly, but with one higher than the other (e.g. TFAP2A/2B and ONECUT1/2 are expressed in ACs and HCs, with TFAP2A/2B being higher in the former and ONECUT1/2 being higher in the latter). In such cases, we combined these classes and analyzed them as a group in downstream pipelines. Thus, we separated the foveal and peripheral datasets into 5 groups comprising - (i) Rods and Cones, (ii) Amacrine Cells (Fritsche et al.) and Horizontal Cells (HCs), (iii) Bipolar Cells (BCs), (iv) Retinal Ganglion Cells (RGCs), and (v) Others consisting predominantly of Müller Glia, but also pericytes, endothelial cells and microglia (Figure. 1C). Astrocytes are absent from the fovea and are largely confined to the optic fiber layer in the periphery (23). None were recovered from either region in our samples. Encouragingly, cells of the same class were more highly correlated with each other than cells of other classes, supporting this separation (Figure S1F).
Although clusters at this stage exhibited batch effects, ~90% of clusters could be unequivocally assigned to a class. The other clusters comprised low quality cells and “doublets.” These were often characterized by overall low gene count that was tightly distributed around the minimum cutoff value (~500-700 genes/cell), the mixed expression of more than one class-specific scores albeit at lower levels than “pure” clusters, and the absence of expression of non-class markers that were specific. Moreover, these cells typically clustered proximally on the dendrogram or tSNE visualization to another large, bonafide cluster, as observed previously (Shekhar et al., 2016). To avoid discarding cells erroneously, we analyzed these clusters separately (not shown), and confirmed that they were doublets and low-quality cells. Because the excluded cells were heavily biased towards low gene-count cells, we could have excluded them ab initio by setting a more stringent threshold, but this would have risked excluding genuinely small, but intact single-cell libraries.
Dimensionality Reduction, Clustering and Visualization For Each Foveal and Peripheral Group
Following the initial clustering step, we analyzed the 6 major classes of cells - RGCs, BCs, ACs, HCs, PRs, and non-neuronal cells for fovea and periphery separately (Figures 1D-I). HCs and ACs were separated for this analysis. We employed the following computational steps to cluster each of these groups as described below.
Normalization: Expression values Ei,j for gene i in cell j were calculated following (Shekhar et al., 2016). Briefly, the Cell Ranger reported UMI (i.e. transcript) count value for each gene i in each cell j was divided by the sum of the total UMI counts in cell j to normalize for differences in library size, and then multiplied by M, the median UMI counts for all cells within the group, resulting in Transcripts-per-median (TPMi,j) values. Ei,j was then calculated as log(TPMi,j + 1)
Identification of highly variable genes (HVGs): We first calculated the mean (μ) and coefficient of variation (CV) of transcript counts for each gene in the data. We then computed for each gene, the “excess CV” (or eCV) by subtracting the observed CV from a null model of CV vs μ. This null model is based on a Poisson-Gamma mixture, which was shown to accurately model the null CV vs μ relationships for data containing UMIs for a wide range of 3’-biased protocols (Pandey et al., 2018). We calculated the mean and standard deviation (sd) of the eCV values, and selected genes (HVGs) that had eCV > mean + 0.7*sd. This typically selected 700-2500 HVGs for each group.
Batch correction: We restricted the expression matrix Ei,j to HVGs and used a linear regression model (adapted from the source code of the ‘RegressOut’ function of the R package ‘Seurat’) to remove correlations of expression values associated with three covariates – the animal of origin (Figure S1G), the total number of genes observed per cell, and the expression strength of ribosomal genes within each cell. The choice of covariates was guided by running initial tests on PRs, BCs and RGCs where some prior knowledge of the underlying types exists. We tested different combinations of covariates, and settled on a combination that led to the alignment of known types across replicates (e.g., cones or IMBs), and used this scheme consistently for all cell group analyses. Here, we also tested the CCA-based (Butler et al., 2018) and MNN-based (Haghverdi et al., 2018) batch correction strategies, but found that the performance of these methods depended on the choice of certain tunable parameters, which were different for different groups. The linear regression model, on the other hand, did not contain additional parameters outside of the choice of covariates, and was computationally much faster than the other two methods. In a few cases, some residual batch effects still remained, which we corrected for in a supervised manner (see below). For BCs and RGCs, two groups extensively validated in this study, we additionally confirmed the absence of any artifacts by re-clustering single batches separately and verifying that all the abundant clusters were preserved.
We refer to the corrected expression matrix as . For each of the HVGs, we compute the Pearson correlation coefficient between the values in E and those in . Encouragingly, most genes had a similar correlation coefficient (r > 0.97), but ~30 genes had a significant reduction in correlation (r < 0.6). We flagged these “batch specific genes” in our differentialexpression analysis of clusters in 5 (see below).
Principal component analysis and 2D visualization: We restricted the corrected expression matrix to HVGs, and values were centered and scaled before input to Principal Component Analysis (PCA), which was implemented using the R package ‘irlba’. Statistically significant PCs were estimated using the Tracy-Widom distribution (Kiselev et al., 2017; Shekhar et al., 2016), which we found to be an excellent approximation to exact values computed using the more exact, but computationally intensive, bootstrap permutation test (Shekhar et al., 2016). For visualization purposes only, dimensionality was further reduced to 2D using t-distributed stochastic neighbor embedding (t-SNE). We used a recently developed rapidly accelerated implementation (Linderman, 2017) of the original algorithm (Hinton, 2008) setting the “perplexity” parameter to 30. The resulting t-SNE cell coordinates were used to generate the visualizations in Figure 1D-I.
Graph-based clustering: To partition the data into clusters of transcriptionally related cells, we used unsupervised clustering based on the Louvain algorithm (Vincent D. Blondel, 2008) with the Jaccard correction (Levine et al., 2015; Shekhar et al., 2016). We first constructed a k-nearest neighbor graph (kNN) using the Euclidean distance metric between pairs of cells in the reduced dimension space of the significant PCs, as implemented in the R package ‘RANN’. k=30 was used for all groups, except peripheral HCs where k=15 was used because he low cell count.
Cluster refinement: As the Louvain method suffers from a well-known “resolution” limit (Fortunato and Barthelemy, 2007), we refined the set of initial clusters as well as corrected for potential overclustering due to residual batch effects in three steps as follows. Step 1 and Step 2 were performed for each of the initial clusters in (5) separately. In Step 1, each cluster was sub-clustered using the more sensitive Infomap community detection algorithm (Rosvall and Bergstrom, 2008) on a new k-nearest neighbor graph (k = 5*log(cluster size)). However, this often ended up overclustering the data, as noted before (Shekhar et al., 2016). In Step 2, the subclusters within each cluster were assessed for differential expression, and iteratively merged as follows. We built a dendrogram on the subclusters (hierarchical agglomerative clustering with Euclidean distance, complete linkage), and tested the closest three pairs of leaves of the dendrogram for differential gene expression using the MAST test (Finak et al., 2015), corresponding to the three pairs of the most transcriptionally similar subclusters. If any pair did not show sufficient differential expression (minimum 5 DE genes enriched both ways, fold-change > 1.4, p-value < 0.001, excluding mitochondrial transcripts and ribosomal protein genes, or “batch genes” defined above), the corresponding subclusters were merged. If any of the three pairs of subclusters were merged, we rebuilt the dendrogram using the new subclusters, and repeated this procedure until convergence. Step 1 and Step 2 resulted in the splitting of some of the original Louvain clusters. In Step 3 we built a dendrogram on the new set of clusters and iteratively merged (as in Step 2, except with more stringest parameters) until convergence. During this final dendrogram-based merging round, we elected to merge transcriptionally proximal clusters using a more stringent rule. In addition to the criteria described above (≥ 5 DE genes enriched both ways, fold-change > 1.4, p-value < 0.001, excluding mitochondrial and ribosomal, or batch-specific genes), we required that the two clusters being tested be distinguished by ON/OFF expression (i.e. not just differences in expression levels) of at least one gene in each of them. More specifically, we required that each of the two clusters show at least one gene that was expressed in at least 4-fold higher proportion of cells the other cluster (e.g. ≥80% of cells in cluster 1 and ≤20% of cells in cluster 2, and ≥60% of cells in cluster 2 and ≤15% of cells in cluster 2).
Removing doublets and contaminants: Although dimensionality reduction and clustering were performed separately for each cell class after an initial segregation, we noted that the data was not entirely free of contaminants. For example, we found small PR-like or AC-like clusters when analyzing BCs in the fovea or periphery. These were likely doublets as these clusters also expressed BC-markers, and comprised <1-2% of the cells in the dataset. While such clusters appeared more BC-like in the initial stage, when all cell classes were analyzed together, their contaminant-like features became more apparent when analyzed only with pure BCs. These contaminant-like clusters were removed manually, and we verified that the final clusters for each cell-class expressed at most negligible levels of markers of other cell-classes.
Distinguishing M vs L Cones
The transcript sequences of OPN1MW and OPN1LW are extremely similar across Old World Monkeys and humans (Nathans et al., 1986; Onishi et al., 2002). In the crab-eating macaque, there are only 26 single nucleotide polymorphism (SNPs) over 1,095 nucleotides of the full-length cDNA sequences of OPN1MW and OPN1LW, located between exon 2 and exon 5 (Onishi et al., 2002). This made it difficult to unambiguously assign a short read (98bp) aligning to this locus to either OPN1MW and OPN1LW, as SNPs could not be distinguished from sequencing errors. Thus, cells in the foveal cone cluster that expressed transcripts aligning to this locus, but not the S-opsin OPN1SW, were initially tagged as “M/L cones.” However, these SNPs could be robustly distinguished from sequencing errors when aggregating across all the reads aligning to this locus (Figure S2A), and their positions were consistent with previous studies (Onishi et al., 2002). For each M/L cone cell, we then counted the OPN1MW- and OPN1LW-specific SNPs (Figure S2B). ~70% of M/L cones in the data possessed reads that had at least one transcript covering one of the 26 SNPs. Cells with exclusively OPN1MW- and OPN1LW-specific SNP counts were labeled M and L-cones, respectively, and the remainder were called “mixed” (Figures S2D,E).
Differential Expression Analysis
Differential expression (DE) tests for a gene between a pair of clusters or between a cluster and the remaining clusters were performed using the ‘MAST’ package (Finak et al., 2015), which implements a two-part ‘hurdle’ model to control for both technical quality and cross-animal variation. P-values were corrected for multiple hypotheses by controlling the false discovery rate (FDR; (Hochberg, 1995)) using the R function ‘p.adjust’.
We sought markers for a cluster by comparing it to other clusters only within the same class for DE genes, since class specific markers are well-defined (Table S1, Figs. 1C). To obtain highly specific markers for histological validation, we ranked significantly enriched (>1.5-fold difference, FDR adjusted p < 10−6) markers in increasing order of the proportion of background cells where they were detected (Table S3). Differential gene expression patterns were visualized either in dotplots (e.g. Figure 2B) or box-and-whisker plots (e.g. Figure 4B).
To identify fovea or periphery specific DE genes for cell types that had 1:1 matches across the two regions, we collected the corresponding cells from each region and applied MAST as before, while controlling for library size differences. In choosing candidates to pursue for validation, we also avoided weakly enriched genes that could be explained by large differences in abundance of a different cell class in the same region. For example, RHO and GNAT1 were weakly enriched in peripheral RGCs, BCs and ACs, which likely reflects contamination from the predominant rods in the periphery.
Transcriptional Mapping of Foveal Types to Peripheral Types within A Class
We compared the consistency between foveal and peripheral clusters for HCs, BCs, ACs and RGCs within each cell class using a multi-class classification approach, as described before (Shekhar et al., 2016). For PRs and non-neuronal cells, the fovea vs periphery mapping was straightforward as the types could be easily identified based on known markers, and their transcriptional identities could be matched between the two regions.
For BCs, RGCs, ACs and HCs, we trained a multi-class classifier using Random Forests (RF;(Breiman, 2001)) as well as the recently published Xgboost algorithm, which uses gradient boosted trees (Tianqi Chen, 2016) using the R packages ‘randomForest’ and ‘xgboost’, respectively. In each case, we trained the classifier on the peripheral cluster labels, using as features the common HVGs identified in the foveal and peripheral datasets as above (cluster-specific markers were not favored in any way). The training was performed on 50% of the cells in peripheral datasets and its performance was tested on the remaining, “held-out” dataset. (The foveal data was not used for training.) For each of the 4 classes (HCs, BCs, ACs and RGCs), the maximum error rate for types in the held-out data was < 1% (not shown), suggesting that the type-specific transcriptional signatures were robust, learnable, and not prone to overfitting.
The trained classifier was then used to assign each foveal cell a peripheral identity, in a manner that was agnostic to its foveal cluster identity. For each cell, we computed the “margin”, i.e the fraction of decision trees that vote for the “winner” type (majority vote). We excluded cells which fell into the following criterion,
This was done to buffer against potential misclassifications due to a marginal majority.
For all of the cell classes tested, < 4% of cells were excluded due to this criterion. The congruences of the foveal cluster identities and their peripheral matches were visualized as confusion matrices for each class (Figures 3A-D). A mapping between a foveal cluster F and a peripheral cluster P was considered 1:1 if > 85% of cells from F mapped to P, and if < 5% of cells from every other peripheral cluster mapped to F. Results from Xgboost and RF were extremely comparable, although the former was much more computationally efficient. The confusion matrices (Figures 3A-D) shown are equally-weighted aggregated results of Xgboost and RF.
To quantify the extent of 1:1 mapping between foveal and peripheral clusters we computed the Adjusted Rand Index (ARI), a measure of the similarity between two data clusterings, between the foveal cluster labels and the Xgboost/RF–assigned cluster labels using the R package “mclust”. The significance of the computed ARI values were evaluated by comparing against “null” ARI values computed by randomly permuting the cluster labels across cells. Mean and standard-deviation of null ARI values were computed by averaging 10,000 trials.
Comparison of Cluster Frequencies between the Fovea and the Periphery
We computed frequency differences of 1:1 matched types between the fovea and the periphery, and assessed their significance using a two sample t-test. For foveal types that were a mixture of multiple peripheral types (e.g. fRGC11), we assigned each foveal cell its matched peripheral cluster identity prior to comparison (see previous section). A fovea vs. periphery frequency comparison could not be meaningfully done for ACs, as CD90 non-uniformly labels ACs. We believe that this phenomenon likely explains the poor correlation between foveal and peripheral AC type frequencies (r=0.12); we could therefore not make a fair comparison of AC types between the regions. Interestingly, although the CD73-samples contributed only ~10% of all peripheral ACs, the AC type frequencies in these samples correlated much better with the foveal population (r=0.79; Figure S4B), and suggested that a few peripheral types (e.g. VIP+ ACs) were conspicuously underrepresented in the foveal samples (Figure 3C).
We quantified the compositional similarity between foveal and peripheral clusterings for each cell class by computing the Jensen-Shannon divergence (JSD), as follows. Let P = {p1, p2,…, pn} and Q = {q1, q,…,qn} be the peripheral and foveal frequencies of n clusters of a cell class C, each of which sums to unity. Here n is the total number of clusters which is a sum of the 1:1 matched clusters, and the clusters unique to either region. Then the JSD between the fovea and the periphery for cell class C is computed as,
Here, , and D(P | Q) is the Kullback-Leibler (KL) divergence between probability distributions P and Q as follows,
Note that while D(P | Q) ≠ D(Q | P), JSD(P | Q) is symmetric in that JSD(P | Q) = JSD(Q | P).
In comparing compositional similarity between foveal and peripheral cell classes, we used average frequencies across all replicates for foveal PRs, BCs, HCs, ACs and RGCs, but only BCs, HCs, ACs and RGCs in the periphery. This is because our collection procedure (Figure S1B) methods selected against peripheral PRs. We therefore used a published estimate of peripheral PR composition: 95% rods, 4.75% M/L cones and 0.25% S cones (Roorda et al., 2001; Wikler et al., 1990). In addition, we considered only the CD73-samples for ACs because the CD90+ samples differentially excluded some AC types. For peripheral ACs, we only used their composition in the CD73-samples (see above). We expect rods to be underrepresented in our peripheral samples due to our collection procedure.
Transcriptional Mapping of Cell Types across Species
Mouse: We transcriptionally related macaque cones, HC, BC, AC, and RGC types to their mouse counterparts from earlier publications using the same approach that related foveal to peripheral types, with minor modifications (see below). We conducted the following mappings:
11 macaque peripheral BC clusters (BB/GB* was considered as a single cluster and then subclustered post hoc) were mapped to 15 mouse BC types published in our earlier study that identified these types molecularly and matched them to morphology (Shekhar et al., 2016)
34 Macaque peripheral AC clusters were mapped to 21 mouse AC clusters identified in our earlier whole-retina study (Macosko et al., 2015), which represents an incomplete molecular characterization of ACs, estimated to contain >60 types based on morphological (Diamond, 2017) and molecular (JS, WY, KS, AR, and Mallory Laboulaye, unpublished) diversity.
Macaque peripheral H1 and H2 were transcriptionally related to cells from a single cluster of mouse HCs in our previous whole retina study (Macosko et al., 2015) to determine whether mouse HCs were molecularly more similar to one of the two types.
18 macaque peripheral RGC clusters were mapped to a recently published dataset of postnatal day 5 (P5) mouse RGCs, which identified 40 distinct molecular clusters (Rheaume et al., 2018).
As we did not sample peripheral S-cones, we related macaque foveal M/L and S cones to mouse cones from our previous full retina data (Macosko et al., 2015). We computationally separated mouse cones into those that exclusively expressed Opn1sw or Opn1mw to enrich for pure S-cones and pure M-cones, respectively.
In all cases except HCs, we trained the classifier on the mouse types, and mapped the macaque cells on to these types. In each case 1-5 above, we trained RF/Xgboost classifiers on the training data (mouse for Cones, BCs, ACs, RGCs and macaque for HC) using as features the common set of highly variable genes (HVGs) in both the mouse and the macaque datasets among the 1:1 orthologues. Restricting the features to the common set of HVG orthologues was important for the performance of the classifier; in our initial tests, we found that using HVG orthologues identified only in one of the two species resulted in mappings that were more diffuse, likely due to genes that expressed in one species at high levels but, were completely absent in the other due to transcriptional rewiring. However, we emphasize that no manual curation of the feature set was performed to favor or disfavor certain genes based on their cell-type specific expression. We trained the classifier using 80% of the cells in the relevant mouse dataset, and validated it on the “held-out” cells in that dataset, which showed excellent performance in all cases. Next, we applied the classifier on the test set (macaque Cones, BCs, ACs, RGCs and mouse HCs), and assigned each cell an identity in the other species. As above, cluster identity of cells in the test set was not used to guide the mapping. Mappings of cell types across species were visualized using confusion matrices for Cones, ACs, BCs and RGCs (Figures 5A,C-E) and a barplot for HCs (Figure 5B). A mapping between a test cluster Ctest and a training cluster Ctrain was considered 1:1 if > 75% of cells from Ctest mapped to P, and if < 5% of cells from every other test cluster mapped to Ctrain.
Human: We related human BC clusters to macaque BC clusters using an approach similar to that described above for mouse. We trained a classifier on the macaque peripheral BC data, applied it to map each human BC to a macaque cluster, and visualized the resulting confusion matrix to identify 1:1 matches (Figure 6F).
Conservation of Transcription Factor (TF) Codes in Mouse and Macaque RGCs
We downloaded a curated list of human and mouse transcription factors from the TFCat Transcription Factor Database Catalog (www.tfcat.ca), CIS-BP database (http://cisbp.ccbr.utoronto.ca/) and (Lambert et al., 2018), and used these to assemble a corresponding list of macaque transcription factors from our orthology list (see above). We filtered this list in two ways. First, we restricted the list to those TFs that had 1:1 mouse and macaque orthologs. Second, we only considered those TFs that were expressed (> 0 UMIs/cell) in > 25% of the cells in at least one cluster.
For each of the remaining TFs, we computed a TF specificity score (TFSS) among RGC types separately in macaque and mice. Specifically, for a transcription factor t, we computed a vector of the fraction of cells expressing t in every RGC cluster. For example, implies that 70% of cells in RGC cluster 1, 0% of cells in cluster 2, 30% in cluster 3, and so on, express transcription factor t. We normalized to sum to 1, and computed the Shannon index of the resulting normalized vector,
We computed TFSSmouse(t) and TFSSmacaque(t) for all filtered transcription factors t (n=93), and compared them (Figure 5G). Note that constitutively expressed TFs like Pou4f1/Six6 have low values on the plot, because of their low Shannon indices.
To compute TF co-expression scores and compare across species, we further filtered the 1:1 orthologous TF pairs to only include those TFs that are expressed in at least 15% of clusters in both species. For each species, we then binarized TF expression in RGC clusters as a matrix T, where T(t, c) = 1 if TF t is expressed in at least 25% of cells in cluster c. For every TF t, we computed its probability of expression in types, α(t), in any species as,
Here, Nc refers to the number of clusters. The observed probability of co-expression of TFs t1 and t2, A(t1,t2), can be computed as,
We compute the TF co-expression scores (TFCS) in either species as,
Figure S5G compares TFCSmouse(t1, t2) and TFSSmacaque(t1, t2).
Evaluating Cell-Type Specific Expression of Disease Associated Loci
We assembled lists of genome-wide associated study genes for prominent retinal diseases that cause blindness – primary open angle glaucoma (POAG), primary angle closure glaucoma (PACG), cone-rod dystrophy (CRD), retinitis pigmentosa (Leferink et al.), congenital stationary night blindness (CSNB), age-related macular degeneration (Springelkamp et al.) and diabetic macular edema (DME) from (Farrar et al., 2017; Fritsche et al., 2016; Graham et al., 2018; MacGregor et al., 2018; Wiggs and Pasquale, 2017; Zeitz et al., 2015).
To facilitate 1:1 comparison between the fovea and the periphery, we assigned each foveal cell a peripheral type identity using the RF/Xgboost mapping described above. We only considered those genes that were expressed in at least 20% of cells of at least one cell type, which resulted in 196 genes for further consideration. We note that this does not necessarily rule out the possibility of these genes turning on in specific cell types during disease pathogenesis, just as the expression of a gene in a cell type at baseline does not necessarily imply vulnerability in disease.
We visualized expression patterns of the disease genes in our atlas using one of three visualizations:
Dotplots, which show cell-type resolution (as in Figure 7D),
Heatmaps showing relative expression of individual genes in major cell classes (Figures 7B, S7), and
Heatmap showing aggregated expression of groups of genes corresponding to individual diseases in major cell classes (Figure 7A). While 2. and 3. lack cell-type resolution, they do show statistical enrichment of genes and gene groups in specific cell classes, and differences in fovea and periphery, if they exist.
For each of the fovea and periphery, we summarized the expression of all disease genes (see above) across all cell types, by computing two matrices E and F. For gene g and cell-type c, F(g,c) was the fraction of cells in cell type c that had non-zero expression ( > 0 UMIs) of g, and E(g,c) was the average number of transcripts in cells with non-zero expression. To remove outliers, we capped values in E to the 99.5th quantile of expression. We then computed a matrix of gene expression scores for all genes across all cell types as the product of these two matrices,
For (2), the we computed the expression strength S(g,C) of each gene g in a cell class C (one of Rods, Cones, HCs, BCs, ACs, RGCs, and non-neuronal cells)
Here, we computed the maximum, rather than the average or aggregate, to highlight specific instances of highly cell-type specific expression of genes in a class. We combined Sfovea(g, C) and Speriphery(g, C), and visualized them as a single heatmap, where we z-scored the rows (corresponding to genes) to highlight relative expression strengths .
Specific rows are highlighted in Figure 6I. Here, the initial filtering of removing genes that were not expressed in at least 20% of cells in any cell type avoided spurious patterns.
For (3), we subset by disease and computed the mean relative expression strengths within each cell type group for each disease. This aggregate expression strength was visualized in Figure 7A.
In Figure S7, we visualized the relative expression strengths of genes grouped by disease as in Figure 7B, but after subdividing non-neuronal cells into MGs, Pericytes, Endothelial cells and Microglia to highlight differential expression among these types.
ACKNOWLEDGMENTS
We thank R. Born, A. Liu and J. Wiggs for advice; and L. Gaffney, A. Hupalowska and E. Martersteck for assistance. This work was supported by grants from the NIH (MH105960, EY025840, EY028633, EY025555 and EY028625) and the BrightFocus Foundation (M2014055).