Abstract
Background Bulk RNA-Seq has been extensively utilized to investigate the molecular changes accompanying motor neuron degeneration in Amyotrophic Lateral Sclerosis (ALS). However, due to the heterogeneity and degenerating phenotype of the neurons, it has proved difficult to assign specific changes to neuronal subtypes and identify which factors drive these changes. Consequently, we have utilized single cell transcriptomics of degenerating motor neurons derived from ALS patients to uncover key transcriptional drivers of dysfunctional pathways.
Results Single cell analysis of spinal neuronal cultures derived from ALS and isogenic iPSC allowed us to classify cells into neural subtypes including motor neurons and interneurons. Differential expression analysis between disease and control motor neurons revealed downregulation of genes involved in synaptic structure, neuromuscular junction, neuronal cytoskeleton and mitochondrial function. Interestingly, interneurons did not show similar suppression of these homeostatic functions. Single cell expression data enabled us to derive a context-specific transcriptional network relevant to ALS neurons. Master regulator analysis on this network identified core transcriptional factors driving the ALS disease signature. Specifically, we were able to correlate suppression of HOXA1 and HOXA5 to synaptic dysfunction in ALS motor neurons. Our results suggest that suppression of HOX genes may be a general phenomenon in SOD1 ALS.
Conclusions Our results demonstrate the utility of single cell transcriptomics in mapping disease-relevant gene regulatory networks driving neurodegeneration in ALS motor neurons. We propose that ALS-associated mutant SOD1 leads to inhibition of transcriptional networks driving homeostatic programs specific to motor neurons, thereby providing a possible explanation for the relative resistance of spinal interneurons to degeneration in ALS.
Introduction
Amyotrophic Lateral Sclerosis (ALS), also known as Lou Gehrig’s disease, is an age-onset fatal neurodegenerative disorder that affects motor neurons in the brain and spinal cord(1). Patients display progressive paralysis and eventually die due to failure of the respiratory muscles, commonly within 3-5 years of diagnosis(2). Despite extensive research, the causes underlying the observed degeneration are incompletely understood. Hence, so far, there is no cure for ALS. Understanding the molecular drivers of neurodegeneration in ALS can potentially lead to the development of life-saving therapies. Approximately 20% of ALS cases are familial with mutations identified in genes spanning diverse cellular functions including SOD1(3). Animal models have been extensively used to study ALS and have revealed important insights into disease mechanisms(4). However, species-specific differences and the need to overexpress the mutant proteins to generate phenotypes have necessitated developing human models of the disease that can complement existing animal models(5).
Induced pluripotent stem cells (iPSC) derived from patients suffering with ALS provide a powerful model to study ALS. ALS patient-derived iPSC bear the disease-causing mutations in a physiologically relevant background and can be readily differentiated into clinically relevant neurons(6). Such diseased neurons can now be compared to healthy neurons to model key aspects of the disease such as neuron survival, morphometric defects, electrophysiological dysfunctions and protein/RNA aggregation foci in vitro(7–12).
Molecular characterisation of these neurons using “omics” tools has uncovered important insights into disease pathophysiology(8,9,13,14). However, application of genomic tools such as RNA-sequencing to ALS neurons in bulk has serious drawbacks. Current differentiation protocols generate motor neurons at efficiencies ranging from 50-80% with the efficiency varying depending upon the iPSC line used. The differentiated neurons are usually a mix of motor neurons (MN) and “non-motor neurons”, typically spinal interneurons (IN). Additionally, long term cultures of these neurons that are required for phenotypic characterization commonly display some degree of glial cell proliferation. Importantly, ALS motor neurons display progressive degeneration. This suggests that at any given time point, neurons can be expected to be in different stages of degeneration and hence may display a differential expression of key drivers of disease pathology. Bulk RNA analysis of these cultures would not only average cell type expression but also the motor neuron specific degeneration expression signatures. To address these issues, we performed single cell transcriptomic analysis of degenerating ALS SOD1 mutant neurons. Our analysis reveals MN specific transcriptional networks regulating synaptic function and neuronal cytoskeleton to be downregulated in ALS MN. Importantly, single cell data enabled us to build a disease relevant transcriptional network that was used to identify key transcription factors driving the disease signature.
Materials and Methods
Human iPSC culture of iPSC
ALS patient-derived iPSC bearing SOD1 E100G/+ (ND35662) mutation were obtained from the Coriell Institute for Medical Research. ALS iPSC and the genome edited isogenic control iPSC were maintained as colonies on human ES qualified matrigel (Corning) in mTeSR (StemCell Technologies). Colonies were routinely passaged in a 1:6 split using Dispase. Mycoplasma testing was conducted regularly to rule out mycoplasma contamination of cultures.
Differentiation of iPSC into spinal motor neurons
iPSC were plated as colonies onto matrigel and differentiated by treatment with neuronal differentiation media (N2B27: DMEM/F12, Neurobasal, N2 supplement 1%, B27 supplement 1%, L-glutamine 1%, ascorbic acid 5uM, insulin 20ug/ml) supplemented with SB431542 (80uM), CHIR9921 (3uM) and LDN8312 (0.2uM) from day 0 till day 4. Cells were caudalized by treatment with 0.1uM retinoic acid starting at day 2 and ventralized with 1uM purmorphamine starting at day 4 and continued till day 10. At day 10, OLIG2 positive motor neuron progenitors were replated onto poly-D-lysine/laminin coated wells and differentiated by treating the cells with N2B27 media supplemented with BDNF 20ug/ml, GDNF 10ug/ml and DAPT 10uM. DAPT treatment was stopped at day 14 and neuronal cultures were pulsed with mitomycin at a dose of 10ug/ml for 1 hour to prevent further proliferation of any undifferentiated progenitors. Neuronal cultures were maintained till day 44 by changing media every other day. Survival of MN was assessed using 3 independent differentiations.
Immunofluorescence
Cells were fixed with 4% paraformaldehyde, permeabilized with ice-cold methanol for 5 minutes and washed with PBS containing 10% serum for 1 hour at room temperature. Cells were incubated with primary antibodies (Table S1) diluted into PBS containing 10% serum and incubated overnight at 4°C. Next day, cells were washed and incubated with Alexa-fluor conjugated secondary antibodies (Molecular probes) for 45 minutes at room temperature and nuclei were stained with Hoechst 33542 (Molecular probes).
Quantitative RT-PCR
Total RNA was extracted with the miRNeasy kit (Qiagen) and reverse transcribed using random hexamers and the High Capacity reverse transcription system from Applied Biosystems. Quantitative PCR was performed using the SYBR GREEN PCR Master Mix from Applied Biosystems. The target gene mRNA expression was normalized to the expression of RPL13 and relative mRNA fold changes were calculated by the ΔΔCt method. Primer sequenced are included in Table S2.
Single-cell capture and library preparation
Single cells were captured using standard protocol of C1 single-cell auto prep system (Fluidigm). Briefly, differentiated neuronal cultures at day 44 were dissociated into single cells by Accutase and loaded onto the C1 chip. Post cell capture, each well of the chip was manually inspected to identify wells bearing single cells. Next, lysis, reverse transcription and PCR amplification of the cDNA was perform in an automated fashion within the C1 instrument. To prepare single-cell libraries, cDNA products from each single cell were harvested from C1 chip followed by concentration and quantification using PicoGreen dsDNA Assay kit. Sequencing libraries were generated using Illumina Nextera XT library preparation kit.
Read processing, mapping and quality control
Sequence data were processed and mapped to the human reference genome (hg19) using TopHat (v2.0.11)(15) with Bowtie2 (v2.2.1)(16). Gene expression levels were quantified with HTSeq-count (v0.6.1p1)(17) and converted to fragments per kilobase million (FPKM) for human gene annotation (GENCODE release 19)(18). This yielded 57,241 transcripts from 365 libraries. Only libraries deemed to be single cells were retained for further analysis (332 cells). Additionally, libraries were qualified as low quality if: 1) the mapping ratio (mapped reads/total sequenced reads) was < 0.5; 2) total number of genes expressed were < 4000; 3) total number of mapped reads < 0.5 million. A gene was deemed to be poorly expressed if it was found to be present in less than 20 cells at an FPKM expression threshold set at 0.1. The filtering process yielded 303 single cells and 14774 genes for further analysis.
Weighted gene coexpression network analysis
To identify functional modules of genes associated with ALS we used weighted gene coexpression network analysis (WGCNA) implemented in the R statistical language. FPKM values for filtered libraries and features (as described above) were log transformed and used to build a coexpression network. The coexpression network was constructed with WGCNA using a soft thresholding power of 6 using the signed-hybrid approach. Modules in the network were identified using the cutTreeDynamic function with a minimum module size of 20. Modules with similar expression profiles were merged if their eigengene correlation coefficients were >=0.9. To define modules related with ALS we used Pearson’s correlation coefficient to assess associations between module eigengenes and disease state. P-values were corrected using the method of Benjamini and Hochberg and correlations with an adjusted p-value < 0.01 were deemed significant. Gene ontology enrichment analysis of disease associated modules was carried out using the anRichment R package.
Master regulator analysis
The master regulator analysis was performed using the RTN R package. Log transformed FPKM values for 211 neurons were used as input to build a transcriptional network for 845 TFs present in the dataset. TF annotations were obtained from AnimalTFDB. P-values for network edges were estimated from a pooled null distribution using 1000 permutations. Edges with an adjusted p-value < 0.05 were retained for DPI filtering. Post-DPI filtering, the MRA generates a list of TF with predicted targets that is termed as the regulon. The regulon for each TF can be classed as positive or negative based on the Pearson correlations. To identify master regulators, the differential gene expression between ALS and control MN was used as a phenotype and sorted from most upregulated to most downregulated. The RTN package was used to conduct a GSEA like analysis to identify whether a TF regulon (positive or negative) was enriched towards one end of the sorted list of differentially expressed genes. P-values were estimated based on 1000 permutations of the dataset and adjusted using the Benjamini Hochberg method. At a p-value cutoff of 0.05, we identified 52 TFs as significant. Next, we assessed whether the identified TFs were differentially expressed and whether the direction of fold change was concordant with the regulon of that TF. Applying this criterion generated a list of 13 TF that were deemed as master regulators of the ALS signature.
Overexpression of HOXA1 and HOXA5 in SH-S5Y5 cells
HOXA1 and HOXA5 cDNA expressing plasmids driven by a CMV promoter were purchased from origene. A CMV driven GFP plasmid was used as control. SH-S5Y5 cells were plated on poly-D-lysine coated 24-well plates at 1 million cells per well. Plasmid transfections were performed using lipofectamine 3000 according to manufacturer’s instruictions. Cells were induced to differentiate the next day by changing media to N2B27 + 10uM retinoic acid. This resulted in the cells exiting the cell cycle preventing plasmid loss. Cells were harvested at day 6 of differentiation.
Analysis of HOXA1 and HOXA5 binding sites
ChIP-Seq derived peaks for HOXA1, HOXA5, H3K4Me1 and H3K4Me3 were downloaded from the gene expression omnibus for datasets GSM1208634, GSM1239461, GSM1208810, GSM1208811 repectively. Fold enrichments were used as is from the peak files. Peaks were classified into 3 classes: strongly enriched (fold enrichment > 2.0), weakly enriched (fold enrichment >= 1.5 and <= 2.0) and not enriched (fold enrichment < 1.5). A gene was deemed as expressed if a H3K4Me3 or H3K4Me1 peak was found within 5kb of the transcriptional start site or if a H3K4Me1 peak was found within the gene body. A gene was deemed a HOXA1 or HOXA5 target if a HOXA1 or HOXA5 binding site was found within 10kb of the transcriptional start site, within the gene body or 2 kb from the transcriptional termination site.
Analysis of mouse SOD1 G93A ALS MN
Microarray expression values for mouse SOD1 G93A MN were downloaded from the gene expression omnibus for dataset GSE46298. Expression values were background subtracted, normlaized and log transformed using the RMA R package. Threshold for low expression was set to 30 percentile of the entire dataset. P-values for individual genes were estimated based on a 2-tailed Student’s t-test and corrected for multiple hypothesis using the Benjamini Hochberg procedure. Only genes with median expression values above background were included in the analysis.
Results
Single cell RNA-Seq analysis of ALS and control neurons
We previously developed an in vitro model of ALS MN degeneration where MN differentiated from mutant SOD1 iPSC display disease-specific phenotypes such as reduced cell survival and morphometric defects compared to their isogenic control counterparts(19). We differentiated iPSC derived from patients bearing the SOD1 E100G mutation, as well as the corresponding CRISPR edited isogenic control into MN (Fig.1a) (19). Mature MN appeared by day 30 and expressed MN markers such as ISL1, CHAT and NF-H in addition to the pan-neuronal markers TUJ1 and MAP2 (Fig. 1b). ALS and control iPSC could be differentiated into MN at similar efficiencies (~72% ISL1+ TUJ1+ neurons). When cultured for a further 2 weeks (day 44), ALS iPSC derived neurons displayed a 40% loss in survival compared to the isogenic control MN (Fig. 1c). To gain deeper insights into the mechanisms driving neurodegeneration, we performed single cell RNA-sequencing on the ALS and isogenic control MN at day 44 of our differentiation protocol (Fig. 1a). Neuronal cultures were dissociated into single cells, captured and lysed in a fully automated fashion using a microfluidic platform followed by library preparation and deep sequencing of transcripts in the individual cells (Methods). We captured a total of 332 individual cells that included 165 cells from the ALS culture and 167 cells from the isogenic control.
Single cell transcriptomes were sequenced to an average depth of 1.5e6 reads per cell and the overall sequencing depth was similar across the ALS and control datasets (Figure S1a). The proportion of total reads mapped to the genome is an indicator of library quality(20). Both ALS and control cells displayed similar mapping statistics indicating there was no bias towards poor quality cells in any one dataset (Figure S1b). At a FPKM threshold of 0.1, both datasets expressed 8000 to 9000 genes (Figure S1c). To remove poorly amplified RNA libraries (indicated by low mapping ratios and low numbers of expressed genes) and lowly expressed genes, single cell libraries were subjected to a set of quality control criteria (Methods). After quality filtering, we retained a total of 303 high quality cells (153 cells for ALS and 150 cells for the control). These 303 cells expressed 14774 genes in total with ALS cells expressing on average 7170 genes while the control dataset expressed 7745 genes (Figure S1d) with the overall distribution of the number of genes being similar between the two datasets. Each gene was classified based on whether it was protein coding, long non-coding, pseudogene or small nuclear/nucleolar RNA. We did not find any systematic difference in distribution of the gene classes expressed between the ALS and controls datasets (Figure S1e). In summary, our single cell transcriptomes for the ALS and controls sets were similar in quality and character on a genome-wide level.
Our differentiation protocol was designed to generate spinal MN(21). In vivo, spinal MN at different rostro-caudal levels of the spinal cord are demarcated by specific combinations of HOX gene expression (known as the HOX code) (22). To ascertain the rostrocaudal address of our in vitro differentiated neurons, we estimated the percentage of cells expressing each of the 39 HOX genes and plotted the data as a heatmap (Fig 1d). The heatmap showed most of the cells expressed HOXA5 and HOXB5 with very few cells expressing HOXB8 and HOXD8 and none expressing HOX genes from paralog groups 9 and higher (Fig. 1d). This clearly indicated that all of our cells were largely restricted to the hindbrain or brachial spinal cord identity as would be expected from our differentiation protocol that employed 0.1uM retinoic acid without any GDF11(23). Next, we assessed expression of the classical markers of neuronal subtypes for motor neurons (ISL1, CHAT, VAChT or SLC18A3), interneurons (GAD1, GAD2, ARX) and non-neuronal cells (S100B, SOX9, MKI67) across all cells. As we have shown previously, our data indicates that iPSC-derived neuronal cultures display wide variation in expression across individual single cells that typically gets averaged in bulk analysis(24) (Fig.1e). To resolve this heterogeneity and enable differential expression between relevant classes of neurons, we sought to classify cells into specific neural lineages.
Classification of single cells into neural subtypes
Gene expression of selective markers in single cells is routinely used to classify neurons into distinct lineages(25,26). However, using one or two markers to classify single cells can lead to erroneous classification as single cell data typically has a high rate of dropouts, especially for lowly expressed transcripts. On the other hand, clustering of single cell data based on the expression of all detected transcripts might lead to sub-optimal clustering due to the inclusion of genes that are irrelevant to the classification. We sought to circumvent these problems by first identifying genes that can be used to classify cells into relevant cell types (neurons vs glia and MN vs IN). These lists of genes were termed as classifier gene sets. Cells were classified into distinct neural subtypes by sequentially clustering single cells based on the relevant classifier gene set as described below (Fig. 2a).
Classification of single cells into neurons and glia
We first identified genes differentially present between neurons and glia using a recently published gene expression dataset on purified human neurons, astrocytes and oligodendrocytes from frozen brain tissue(27). Genes that displayed a fold change of at least 20 between neurons and astrocytes or oligodendrocytes were included for future analysis. Differential gene expression analysis identified 522 genes as upregulated and 433 genes as downregulated in neurons versus glial cells. Gene ontology using DAVID(28) on the differentially expressed gene set showed enrichment of specific functional categories related to neuronal physiology. Categories related to neuron development such as GABAergic synapse and postsynaptic cell membrane were found to be enriched in the neuron-activated genes while cell cycle and glial differentiation terms were deemed to be enriched in the downregulated genes confirming that our identified gene set was able to distinguish neurons and glia (Figure S2). Out of the 955 genes, 755 genes were expressed in our filtered single cell dataset and were further used to cluster cells into neurons and glia. We identified 63 cells out of the starting pool of 332 cells as mitotic as these expressed high levels of the proliferating cell marker MKI67 and removed them from further analysis. The remaining 269 single cells were clustered based on the neurons vs glia classifier gene set. Consensus k-means clustering using the R package SC3(29) identified three distinct clusters (Fig.2b). We calculated the mean expression of classical neuronal markers (SYP, RBFOX1, SYN1, CAMKV), neuronal lineage specific markers (CHAT, GAD1 and GAD2) and glial markers (SOX9, S100B, REST, PAX6, MKI67) in each of these 3 clusters. Expression profiles of classical markers indicated that clusters 2 and 3 were neuronal i.e. these cells expressed pan-neuronal and neuronal subtype specific markers but did not express glial markers (total 211 cells) while cluster 1 included neuronal progenitors and glial cells i.e. marker expression profiles were opposite to that of cluster 1 (58 cells) (Fig. 2c).
Classification of neurons into MN and IN
We sought to further classify the neuronal cells in clusters 2 and 3 into MN and IN by clustering the cells based on established MN and IN markers that were expressed in our dataset (MN markers: CHAT, SLC18A3 or VACHT, ISL1, ONECUT1, PRPH, IN markers: GAD1, GAD2, ARX, DLX1, DLX6). To avoid confounding the classification due to the SOD1 mutation, the ALS and isogenic control datasets were clustered separately. Consensus clustering of the control neuronal cells using the MN/IN marker gene set identified three clusters (Figure S3a). Analysis of mean expression of hallmark MN and IN markers indicated that cluster c1 comprised of IN i.e. these neurons expressed high levels of IN markers but not MN markers (49 cells) while clusters c2 and c3 were MN i.e. these cells expressed high levels of MN markers but not IN markers (total 74 cells) (Figure S3b). However, we observed a subset of IN that expressed ISL1 (Figure S3b). ISL1 expression in interneurons has been observed in the mouse spinal cord within lamine V, VI and VII(30). However, to avoid the possibility that these were cell doublets that had escaped detection, this subset (28 cells) was removed. Re-clustering the remaining neurons identified three clusters with expected marker expression: cluster c1 (21 cells) displayed IN markers and clusters c2 and c3 (total 74 cells) displayed MN markers (Fig. 2d and 2e). We used a similar clustering workflow on the SOD1 neuronal cells to identify MN and IN clusters (Fig. 2a, 2f, 2g). For the SOD1 ALS dataset, marker expression levels clearly indicated that cluster a1 comprised of IN while cluster a2 comprised of MN. Cluster a3 displayed expression of both MN and IN markers (possible doublets) and was removed from further analysis. In summary, our clustering approach identified 74 MN (c2 plus c3 clusters in Fig. 2d,e) and 21 IN (c1 cluster in Fig. 2d,e) in the control dataset, while 24 MN (a2 cluster in Fig. 2f,g) and 33 IN (a1 cluster in Fig. 2f,g) were identified in the SOD1 ALS dataset.
Differential expression analysis of ALS and control neurons
To gain deeper insights into the transcriptional changes within ALS neurons, we compared gene expression in SOD1 ALS MN with the isogenic control MN using Monocle(31). Differential expression analysis identified 84 upregulated genes and 332 downregulated genes in ALS MN at a FDR < 0.05. Gene ontology (GO) enrichment analysis of the differentially expressed genes identified several pathways dysregulated in ALS MN. The downregulated gene ontology terms were distributed across 2 main categories, 1) synaptic function and 2) mitochondrial structure and function. Terms enriched related to synaptic function included “synapse organization”, “synapse assembly”, “axon”, “trans-synaptic signalling”, and “neuromuscular junction” while mitochondria related terms included “mitochondrial outer membrane “, “NADH: ubiquinone oxidoreductase, mitochondria” and “oxidative phosphorylation” (Fig. 2h). On the other hand, GO terms enriched in the upregulated genes included terms related to the cell cycle and protein targeting to the endoplasmic reticulum (ER) (Fig. 2i). Deficiency in the mitochondrial oxidative phosphorylation pathway as well as suppression of translation secondary to ER stress with concomitant increase in cell cycle related genes has been identified previously in bulk analysis of ALS SOD1 MN(19), which was recapitulated in our single cell expression analysis. Importantly, dysregulation of synaptic signalling and structure in our ALS model supports the dying back hypothesis of ALS that posits neuronal degeneration is secondary to pathology initiated at the distal end of the axon and neuromuscular junction(32). Next, we asked whether mutant SOD1 induces similar changes in the IN gene expression program. Strikingly, comparative analysis between ALS and control IN revealed far fewer genes dysregulated in ALS IN compared to ALS MN (24 genes upregulated while 48 genes downregulated). Additionally, there was minimal, though significant, overlap in the gene expression programs perturbed between the two classes of neurons with 4 genes shared in the upregulated set and 9 genes shared in the downregulated set (Figure S4). Given the small number of genes identified as dysregulated in ALS IN, GO enrichment analysis was unable to find significant enrichment of any functional categories. Overall, our data indicates that mutant SOD1 causes significant changes in the transcriptional program of MN compared to IN.
Network analysis of single neurons using WGCNA
To gain a systems level understanding of the observed transcriptional changes, we performed weighted gene co-expression network analysis (WGCNA)(33,34) to identify modules of co-regulated genes dysregulated in ALS MN and IN. Since co-expression analysis works on identifying co-regulated sets of genes, we postulated that a large sample set with high variability per gene across the samples would improve performance of the network analysis. Hence, we included the entire set of neurons for both SOD1 and isogenic control datasets (a total of 211 cells). WGCNA identified a total of thirteen co-regulated modules of which three (modules blue, purple and black) were significantly downregulated in ALS MN compared to isogenic control MN (adjusted p-value < 0.01) (Fig. 3a). GO enrichment analysis of these modules revealed association of each module with specific functional categories (Fig. 3b). Synaptic function and structure terms were enriched in the blue module, microtubule and cytoskeleton terms were enriched in the purple module, while oxidative phosphorylation and mitochondrial membrane terms were enriched in the black module. These observations were in accordance with the downregulation of these functional categories observed in the differential gene expression analysis of ALS MN. Interestingly, the blue module (enriched for synaptic terms) associated significantly with MN compared to IN in both the ALS and control samples but was observed to be strongly downregulated in ALS MN (Fig. 3a). This indicates that genes mediating synaptic signalling are highly expressed in MN compared to IN. Downregulation of these genes specifically in ALS MN suggests a possible mechanism that can explain the relative susceptibility of ALS MN to degeneration.
We hypothesized that downregulation of the observed gene modules may be driven by concomitant dysregulation of specific transcription factors (TFs). To test this hypothesis, the expression of ~1000 TFs in our dataset was correlated to the module eigengenes estimated by WGCNA. Correlation analysis identified sequence-specific TFs that correlated both positively and negatively with the modules. Given a module that is downregulated in ALS MN, a TF that correlates positively with this module is expected to be inhibited in ALS MN while a TF that correlates negatively is expected to be activated in ALS MN. TFs that matched these expectations were deemed as concordant. Application of the criterion of concordance to our correlation analysis identified TFs associated with specific modules (Fig. 3c). For example, HOXA5 displayed greater than 4-fold suppression in ALS MN compared to isogenic controls while its expression correlated positively with the blue module that was identified as downregulated in ALS MN (Fig. 3c). This strongly indicates that HOXA5 is involved in regulating synaptic functions in MN. In addition, we observed SOX4 and HMGB2 activated in ALS MN and negatively correlated with the blue module (Fig. 3c), indicating that these TFs act in an opposite fashion to HOXA5. Additionally, we observed positive correlations between HOXA5 and ATF2 with the purple module (cytoskeleton organization) while ATF2 also correlated positively with the black module (oxidative phosphorylation) (Fig. 3c). The WGCNA analysis strongly suggested that downregulation of specific pathways by mutant SOD1 is accompanied by corresponding de-regulation of TFs and such factors are readily identified from our single cell dataset.
Master regulator analysis to identify TFs driving ALS disease signature
Genes differentially expressed in ALS MN can be considered as a molecular phenotype that drives the cellular phenotype of neuronal degeneration. We wanted to identify TFs that regulated this disease signature. Identification of such master regulators requires a context-specific transcriptional network(35). Our single cell RNA-Seq data offered a unique opportunity to build such a network relevant to ALS and MN.
The gene interactions detected by WGCNA include both direct and indirect interactions. We postulated that removal of indirect interactions would significantly enhance our power to identify dysregulated TFs driving ALS MN degeneration. To this end, we deployed the ARACNE algorithm that uses an information theoretic approach to prune indirect regulations in transcriptional networks(36). Though the pruning procedure does not necessarily remove all indirect interactions, it significantly enriches the network for direct interactions.
ARACNE was implemented on the gene expression matrix derived from all 211 neurons using the RTN package(37), and downstream targets were identified for 845 TFs expressed in the neuronal cells. Key TF drivers were identified using master regulator analysis (Figure S5). The predicted targets of a TF (termed as the regulon of that TF) were classified as activated or repressed based on the direction of the correlation between the TF and the target gene. Next, an ALS signature was defined as genes that were differentially expressed between ALS MN and isogenic control MN. Master regulators were identified based on whether there was a statistically significant overlap between the activated or repressed regulon of a TF and the ALS disease signature (Figure S5, Methods). The master regulator analysis (MRA) identified 51 TFs at a FDR < 0.05. We filtered out TFs that were not found to be differentially expressed in the ALS MN compared to control. Next, we checked for concordance between the expression change of a TF and its regulon and filtered out non-concordant TFs (Figure S5). This identified a core set of 13 TFs that satisfied the following criteria: 1) the TFs were differentially expressed in ALS MN compared to control, 2) the regulons of these TFs showed a significant overlap with the ALS gene expression signature, 3) the TF and its regulon expression was concordant (Fig. 4a). Interestingly, HOXA5 was amongst the TFs identified by this analysis, where the positive regulon of HOXA5 was downregulated in ALS MN while the negative regulon was upregulated in ALS MN (Figure S6). Additionally, HOXA5 itself was downregulated in ALS MN (Fig. 4a). To gain further insights into the functions of the identified master regulators, we performed gene ontology enrichment analysis on the regulons of these TFs (Figure. S7). Strikingly, HOXA1, HOXA5 and HOXD1 regulons were enriched in genes related to synaptic structure and signalling as well as maintaining the axon cytoskeleton organization (Figure S7). This indicated a direct link between inhibition of synaptic functions in ALS MN and HOXA1/HOXA5/HOXD1 downregulation. Additionally, the GO enrichment analysis linked downregulation of ATF2 to autophagy and PRDM2 to mitochondrial structural defects. Thus, our master regulator analysis linked functional deficits in ALS MN to downregulation of specific TFs. We asked whether the identified TFs regulated each other. Extracting a sub-network comprising only of the master regulators revealed that HOX TFs closely interacted within themselves as compared to the remaining TFs with HOXA1, HOXA5 and HOXD1 displaying the maximum number of connections to other TFs (Fig. 4b). This suggested that synaptic dysregulation driven by altered HOX gene expression is a major driver of the disease signature in ALS.
Next, we asked whether these TFs were dysregulated in ALS IN as well. Comparison of the gene expression profiles of the identified TFs across MN and IN in both the ALS and isogenic cells revealed distinct patterns of expression across the 4 subsets (Fig. 4c). Group 1 TFs were expressed at higher levels in control MN compared to IN and were significantly downregulated in ALS MN. HOXA1, HOXA5, HOXD1 and MAFG belonged to this group where the average expression levels of these TFs were 2-10 fold higher in MN but were strongly downregulated (> 5 fold) in ALS MN. Group 2 and 3 TF expression levels were similar in control MN and IN, however Group 2 TFs were dysregulated specifically in ALS MN (ATF2, PRDM2, TSC22D2, ZMIZ1, ZNF134) while Group 3 TFs were dysregulated in both ALS MN and IN (HMGB2, HOXD8, ZBTB6) (Fig. 4c). Finally, Group 4 included just SOX4 where the expression was lower in control MN compared to IN but significantly elevated in both ALS MN and IN (Fig. 4c). The expression pattern of Group 1 TFs suggested that these genes could be involved in MN homeostatic networks and downregulation of these TFs in ALS MN could, at least in part, explain the relative susceptibility of MN compared to IN in ALS. Group 2 TFs regulated core cellular functions such as autophagy and mitochondrial respiration, and downregulation of these TFs could result in cellular stress due to dysregulation of core cellular pathways. Group 3 and 4 TFs expression dysregulation was observed in both MN and IN. This suggested that these TFs may play an ancillary role in accelerating the degeneration process but may not be major drivers of degeneration on their own.
Dysregulation of TFs in the SOD1 mouse model of ALS
Since our observations regarding dysregulated TFs in SOD1 ALS were based on analyzing iPSC derived from a single patient, we decided to confirm our findings in a mouse model of SOD1 ALS. The SOD1 G93A mouse is a well-established animal model of ALS that displays degeneration of MN over time accompanied by disruption of synaptic functions. A recent study had analyzed gene expression changes in MN isolated from the lumbar spinal cord of G93A mice via laser capture microdissection and analysed gene expression changes via microarrays (38). The study had used two different mouse strains, 129Sv and C57, each expressing the mutant G93A hSOD1 gene and analysed gene expression changes at various stages over the course of MN degeneration. The 129Sv strain displays fast disease progression while the C57 strain displays slow disease progression. We first analysed gene expression of all candidate TFs in this dataset except HOXA1 and HOXA5 as these were not expressed in lumbar MN. We did not observe significant differences in TF expression at the pre-symptomatic stages in either strain (Fig. 5a,5c). However, both Sox4 and Hmgb2 displayed strong activation in ALS MN at later stages in both strains, in concordance to our data (Fig. 5a, 5c). Further, Atf2 dislayed downregulation in the 129Sv strain at the endstage of the disease (Fig. 5a). Surprisingly, Prdm2, Zbtb6 and Zmiz1 displayed activation in ALS MN in contrast to our observation in iPSC derived MN. Since HOXA1 and HOXA5 were not expressed in lumbar MN, we analysed expression of other Hox genes relevant to the mouse lumbar spinal region (Fig. 5b, 5d). We observed significant downregulation of multiple Hox genes in ALS MN derived from both the 129Sv and the C57 strains (Fig. 5b, 5d). The expression changes in the slow progressing C57 strain, though milder, were largely in concordance with changes observed in 129Sv (Fig. 5b). Overall, our analysis indicated the involvement of ATF2, SOX4, HMGB2 as well as HOX genes in SOD1 ALS.
Early drivers of MN degeneration
We next asked whether any of these TFs displayed dysregulation in the early stages of the degeneration process using our iPSC-based model. The expression levels of the 13 master regulators were assessed in MN differentiated from ALS and control iPSC at D30 of differentiation. At this stage, ALS MN did not display any significant differences in survival compared to the control MN. Strikingly, HOXA1 and HOXA5 but not HOXD1 showed significant downregulation in ALS neuronal cultures in spite of the assay being performed in bulk (Fig. 6a). HOXD8, PRDM2 and ZBTB6 were also downregulated in ALS albeit to a lesser extent (Fig. 6a). We did not observe significant upregulation of SOX4 or HMGB2 at this stage. This indicated that HOXA1 and HOXA5 could be early drivers of neurodegeneration in ALS. Since HOX genes were also observed to be downregulated in the mouse model dataset and our network analysis had linked HOX genes to synaptic function, we decided to investigate the role of HOX genes in ALS. We next assessed whether the targets of HOXA1 and HOXA5, as predicted by the MRA, were downregulated in ALS. In concordance with our expectation, several HOXA1/HOXA5 synaptic targets were significantly downregulated, although modestly, in ALS neurons at D30 compared to the control (Fig. 6b). To confirm a direct role of HOXA1/HOXA5 in regulating synaptic gene expression, we exogenously expressed HOXA1 and HOXA5 in the neuronal cell line SH-SY5Y and induced these cells to differentiate. Overexpression of HOXA1 and HOXA5 significantly upregulated several synaptic genes confirming that HOXA1 and HOXA5 activate synaptic gene expression in neurons (Fig. 6c). Next, we asked whether synaptic genes are direct targets of these TFs. To answer this question, we analysed ChIP-seq data mapping HOXA1 and HOXA5 binding sites in colon carcinoma cells(39). For most synaptic genes expressed in the carcinoma cells (assessed by the proximal localization of H3K4Me3 or H3K4Me1 signal), the ChIP-Seq analysis identified either a HOXA1 or HOXA5 site proximal to the promoter or within the gene body, thereby indicating that at least a subset of synaptic genes are direct targets of these TFs (Fig. 6d). Interestingly, the ChIP-Seq data revealed HOXA1 binding sites in close proximity to the HOXA5 promoter and vice versa (Figure S8a) suggesting that HOXA1 and HOXA5 display reciprocal regulation. Further, we found HOXA1 binding sites proximal to the promoters in 9 out of 13 TFs identified by the MRA (Figure S8b). This observation indicated that HOXA1 might regulate other master regulators including HOXA5 and suggests the possibility that downregulation of HOXA1 could be an early event triggering synaptic breakdown in SOD1 ALS MN.
Discussion
ALS patient-derived iPSC have provided unprecedented access to human diseased motor neurons enabling researchers to follow the course of degeneration in dish(6). Investigation of these models using genomics has uncovered key pathways dysregulated in ALS neurons. However, so far, iPSC derived neuronal cultures have been typically analysed in bulk. Hence, it has been difficult to assign observed pathway aberrations specifically to MN or other neuronal subsets present in the in vitro culture. The advent of single cell genomics has now allowed the analysis of individual neurons in mixed cultures(24,40). This game-changing technology now allows identification of genome-wide gene expression in individual neurons thereby enabling neuronal classification into subtypes. We have applied this technology to analyse RNA expression in individual neurons derived from ALS iPSC. This not only allowed us to distinguish gene expression changes in MN and IN, but also enabled us to construct context-specific gene regulatory networks. Network analysis is a powerful way to understand how genes interact with each other to bring about cellular phenotypes(41,42), in the current context case: neuronal degeneration. By building a transcriptional network specific to the ALS landscape, we were able to uncover network modules specifically downregulated in ALS MN. Further, by using master regulator analysis, we were able to identify TFs that drive these disease-associated network modules. Significantly, our analysis uncovered a hitherto unknown link between HOXA1/HOXA5 downregulation and synaptic dysfunction in ALS MN.
The dying back hypothesis posits that neuropathology is initiated in the distal axons and synapses of motor neurons subsequently leading to axon retraction and degeneration of the soma proximally(32). In support of this hypothesis, defects in the neuromuscular junction and distal axons were identified in mouse models of ALS before onset of symptoms(43). Similarly, defects in synaptic activity have been observed in MN derived from ALS associated C9ORF72 repeat expansion and mutant TDP43 patient derived iPSC(9,10). Interestingly, these defects were observed on prolonged in vitro culture of the MN correlating with onset of degenerative phenotypes(9). These observations strongly indicate that synaptic dysfunction is a major driver of neurodegeneration in ALS. Synaptic collapse can be a downstream effect of either impaired delivery or production of synaptic proteins and mRNAs. Impaired delivery can occur secondary to defects in the cytoskeleton leading to inefficient axonal transport. Accordingly, neurofilament protein inclusions have been observed in mutant SOD1 MN that correlated with onset of neurite collapse(7). On the other hand, our data reveals inhibition of the synaptic genes at the transcriptional level thereby impairing production. Further, we find direct links between transcriptional suppression of the synaptic program and downregulation of HOXA1 and HOXA5 transcription factors.
HOX genes encode homeobox containing transcription factors that play a role in spinal cord development(44). Specifically, HOX genes are involved in spinal cord patterning along the rostro-caudal axis(22,45). Mutational analyses on mouse Hox genes have revealed roles of these factors in defining regional boundaries, MN development and axon guidance(22). However, little is known about the expression and role of these genes subsequent to MN-specification. Our data indicates that mature MN display high levels of HOX gene expression under normal physiological conditions. We considered the possibility that the observed expression of HOX genes may be due to the fact that iPSC derived MN display a foetal transcriptome. However, this is not the case as HOX genes including HOXA5 were found to be expressed in adult MN micro-dissected from human spinal cords(46). Additionally, both HOXA1 and HOXA5 were found to be expressed in micro-dissected postnatal mouse and rat cervical spinal MN(46,47). This strongly suggests that HOX genes have a role beyond MN specification. Our data indicates that HOXA1 and HOXA5 activate genes involved in synaptic structure and assembly. Additionally, ChIP-Seq mapping of binding sites indicates that most of the identified synaptic genes are direct downstream targets of these transcription factors. Accordingly, deletion of Hoxa5 in post-natal mice was shown to downregulate genes involved in synaptic functions(48). Strikingly, we find that expression of these two transcription factors is specifically high in MN compared to IN but is strongly downregulated in ALS. This suggests one possible mechanism via which mutant SOD1 targets MN homeostatic TFs thereby explaining the relative susceptibility of MN to degeneration. The fact that these factors are also downregulated in D30 ALS MN cultures before the onset of any degenerative phenotypes suggests that these may be the early drivers of degeneration. We also find downregulation of ATF2 and PRDM2, TFs that were linked to core cellular processes such as oxidative phosphorylation, mitochondrial structure and autophagy. It must be noted that inhibition of mitochondrial function and autophagy have been directly linked to defects in axonal transport and neuromuscular synaptic function(49,50). On the other hand, SOX4 and HMGB2 were found to be upregulated in our dataset as well as in ALS mouse models. However, whether these factors drive degeneration in parallel to or downstream of HOXA1/A5, is unclear. The observation that these TFs were not as strongly dysregulated in D30 MN indicates their subsequent downregulation or upregulation may cause exacerbation of the degenerative cycle set in motion earlier by HOXA1 and HOXA5 suppression. Accordingly, knockdown of Hmgb2 has been shown to increase viability while replating mouse MN suggesting that Hmgb2 may be a generic stress-related factor(35). Is HOX gene suppression observed in other regions of the spinal cord? We analysed Hox gene expression in microarray expression data from adult lumbar MN that were microdissected from the SOD1 G93A mouse model of ALS and observed downregulation of Hox genes relevant to the lumbar rostro-caudal address of the spinal cord. This strongly suggests that mutant SOD1 suppresses HOX gene expression across other spinal segments.
The underlying cause of MN degeneration in ALS is very likely to be multi-factorial with multiple drivers collaborating to cause MN demise. Our data indicates that one of these drivers could be the inhibition of HOXA1 and HOXA5 that leads to suppression of genes required for axonal cytoskeleton and synaptic function, thereby resulting in lower availability of synaptic proteins at the neuromuscular junction.
Conclusions
Using in-depth analysis of degenerating MN, we have demostrated that TFs regulating MN homeostasis are affected in ALS, which may explain why MN are highly susceptible to degeneration. Our results display the power of applying single cell analysis to iPSC based neurodegenerative models to uncover core transcriptional drivers of specific pathways involved in motor neuron degeneration. We expect that wider use of single cell genomics especially multi-omics technologies to measure different molecular entities from the same cell (51) combined with network biology will help uncover novel regulators that can be targeted using small molecules or gene therapy.
Conflict of Interests
The authors declare that they have no conflict of interest.
Funding
This work was generously supported by the Wellcome Trust Institutional Strategic Support Award (WT204909MA), the Joint Council Office ASTAR, Singapore and start-up funds provided to the lead author by the University of Exeter, U.K. R.A. is supported by a UKRI EPSRC Innovation Fellowship. C.R.G.W is supported by the Biotechnology and Biological Sciences Research Council-funded South West Biosciences Doctoral Training Partnership [BB/J014400/1; BB/M009122/1].
Authors’ contributions
A.B and L.W.S provided funding for the project. A.B. S.C.N and L.O.G designed and conducted the experiments. A.B., P.T, R.A, C.R.G.W analysed the data. All authors contributed towards interpreting the data and writing the manuscript.
Acknowledgements
We thank the sequencing centre at the Genome Institute of Singapore for help with Illumina sequencing and mapping.