Abstract
Brain development requires a complex choreography of cell proliferation, specialisation, migration and network formation, guided by the activation and repression of gene expression programs. It remains unclear how this process is disrupted in neuropsychiatric disorders. Here we integrate human genetics with transcriptomic data from the differentiation of human embryonic stem cells into cortical excitatory neurons. This reveals a cascade of transcriptional programs, activated during early corticoneurogenesis in vitro and in vivo, in which genetic variation is robustly associated with neuropsychiatric disorders and cognitive function. Within these early neurogenic programs, genetic risk is concentrated in loss-of-function intolerant (LoFi) genes, capturing virtually all LoFi disease association. Down-regulation of these programs in DLG2 knockout lines delays expression of cell-type identity alongside marked deficits in neuronal migration, morphology and action potential generation, validating computational predictions. These data implicate specific cellular pathways and neurodevelopmental processes in the aetiology of multiple neuropsychiatric disorders and cognition.
Introduction
Although subsequently expanding to encompass other disorders, this study initially sought to explore the role of developmental processes in schizophrenia (SZ). SZ is highly heritable1, 2, with genetic variation across the frequency spectrum contributing to disease risk3–7. While rare variant studies consistently implicate the disruption of specific mature postsynaptic complexes in SZ aetiology4, 5, 8–11, the precise cellular pathways mediating common variant risk (an estimated 30-50% of the total genetic contribution to liability5) remain largely obscure. Synaptic complexes enriched for SZ rare variants display little evidence for GWAS association, and while there is convincing evidence for enrichment in broader synapse-related gene sets12, 13, these only capture a modest proportion of the overall GWAS signal12. In contrast, nearly 50% of genic SNP-based heritability is captured by loss-of-function intolerant (LoFi) genes12. Being under extreme selective constraint, LoFi genes are likely to play important developmental roles and are known to be enriched for rare variants contributing to autism spectrum disorders (ASD) and intellectual disability/severe neurodevelopmental delay (ID/NDD)14 as well as SZ11, 15, 16. This suggests that a significant proportion of SZ common variants may contribute to disease via the disruption of neurodevelopmental pathways harbouring a concentration of LoFi genes, with more severe genetic perturbation of these pathways increasing risk of SZ and disorders associated with more severe developmental phenotypes (e.g. ASD, ID/NDD).
Supporting a neurodevelopmental role for SZ common variants, there is growing evidence that many such risk factors impact gene expression in the foetal brain17–20 and are enriched in cell-types at multiple stages of cortical excitatory neuron development21. This raises the question: do SZ common variants converge on specific gene expression (transcriptional) programs that are normally activated or repressed during foetal cortical excitatory neuron development? Mutations disrupting key regulators of such programs would be expected to possess a higher contribution to disease risk, reflected in a larger effect size and lower allele frequency. We therefore sought rare, single-gene mutations linked to SZ where the affected gene is expressed in human foetal brain and has the potential to regulate developmental processes. This led us to DLG2: multiple independent deletions have been identified at the DLG2 locus in both SZ and ASD patients8, 22; DLG2 mRNA is present from 8 weeks post-conception in humans23 and throughout all stages of in vitro differentiation from human embryonic stem cells (hESCs) to cortical projection neurons24. Furthermore, the invertebrate orthologue of DLG1-4 (Dlg) is a core component of the Scrib signalling module, which regulates cell polarity, differentiation and migration during development25. Primarily studied as a post-synaptic scaffold protein, DLG2 is required for the formation of NMDA receptor complexes26: these complexes regulate the induction of several forms of synaptic plasticity27 and are enriched for rare mutations in SZ cases4, 5, 8–11. This raises the intriguing possibility that DLG2 may be required for the normal operation of both adult and developmental signalling pathways relevant to SZ pathophysiology.
To explore the role of DLG2 in neurodevelopment we engineered homozygous loss-of-function DLG2 mutations into hESCs using the CRISPR-CAS9 system. Mutant (DLG2-/-) and isogenic sister wild-type (WT) hESC lines were differentiated into cortical excitatory neurons and cells characterised at multiple developmental timepoints to identify phenotypes and gene expression changes in DLG2-/- lines (Fig. 1a). Neurodevelopmental expression programs dysregulated in DLG2-/- lines were identified and analysed for risk variant enrichment; we then explored the biological function of disease-relevant programs, both computationally and experimentally, and evaluated the contribution of LoFi genes to common and rare variant associations (Fig. 1a).
Results
Knockout generation and validation
Two DLG2-/- lines were created from H7 hESCs using the CRISPR/Cas9-D10A nickase system targeting the first PDZ domain (Supplementary Fig. 1). Sequencing of predicted off-target sites revealed no mutations (Methods, Supplementary Fig. 2 & Supplementary Table 1). All subsequent analyses compared these lines to an isogenic WT sister line that went through the same procedure but remained genetically unaltered.
DLG2-/- and WT lines were differentiated into cortical excitatory neurons using a modified dual SMAD inhibition protocol28, 29; RNA was extracted in triplicate from each line at 4 timepoints spanning cortical excitatory neuron development and gene expression quantified (Fig. 1b, Supplementary Fig. 3). A significant decrease in DLG2 mRNA was observed for exons spanning the first PDZ domain, with a similar decrease inferred for PDZ-containing transcripts, indicating degradation of DLG2-/- transcripts via nonsense-mediated decay (Supplementary Fig. 4). Quantitative mass spectrometry-based proteomic analysis of peptide-affinity pulldowns using the NMDA receptor NR2 subunit PDZ peptide ligand30 confirmed the presence of DLG2 in pulldowns from WT but not DLG2-/- lines (Fig. 1c-f, Supplementary Table 2). Genotyping revealed no CNVs in either DLG2-/- line relative to WT (Supplementary Fig. 5a). Both DLG2-/- lines expressed pluripotency markers OCT4, SOX2 and NANOG at 100% of WT levels (Supplementary Fig. 5b). Cells were extensively characterised for their cortical identity using western blotting and immunocytochemistry from days 20-60. Over 90% of day 20 cells were positive for FOXG1, PAX6 and SOX2 and <1% cells expressed ventral genes such as DLX1, GBX2, NKX2.1 and OLIG3 (Supplementary Fig. 6), confirming dorsal forebrain fate. In addition, staining of markers expressed in ventral forebrain-derived neurons from striatal, thalamic and hypothalamic nuclei confirmed no or trace expression (Supplementary Fig. 6).
DLG2 knockout impacts gene expression during cortical excitatory neuron development
To robustly identify genes dysregulated by DLG2 knockout, expression data from the two DLG2-/- lines was pooled and compared to WT at each timepoint (Methods). Disruption of DLG2 had a profound effect: of the >13,000 protein-coding genes expressed at each timepoint, ∼7% were differentially expressed between DLG2-/- and WT at day 15, rising to 40-60% between days 20 and 30 then decreasing to ∼25% by day 60 (Fig. 1g, Supplementary Table 3).
Common risk variants implicate disruption of neurogenesis in SZ
We next tested whether genes differentially expressed in DLG2-/- lines at each timepoint were enriched for SZ common risk variants. Taking summary statistics from the largest available SZ GWAS12, we utilised the competitive gene-set enrichment test implemented in MAGMA (version 1.07)31. As expected, the set of all genes expressed at one or more timepoint in DLG2-/- or WT lines (allWT+KO) was highly enriched for common variant association (P = 2.1 x 10-17) reflecting the neural lineage of these cells. We therefore tested genes up- and down-regulated at each timepoint for genetic association conditioning on allWT+KO using the strict condition-residualize procedure (all subsequent GWAS enrichment tests were conditioned on allWT+KO in the same way). This revealed strong association enrichment solely in genes down-regulated at day 30 (30down-/-: Pcorrected = 1.9 x 10-7, Fig. 2a), coinciding with active neurogenesis (Fig. 1b). When compared to allWT+KO, 30down-/- genes were over-represented in GO terms related to neuronal development, function and migration (Methods, Supplementary Table 4). Iterative refinement via conditional analyses identified 23 terms with independent evidence for over-representation (Fig. 2b, Methods). This suggests that loss of DLG2 dysregulates transcriptional programs underlying neurogenesis (neuronal growth, electrophysiological properties and migration) and implicates these processes in SZ aetiology.
Dysregulation of neurogenesis in DLG2-/- lines delays cortical cell-fate expression
To validate disruption of neurogenesis in DLG2-/- lines and investigate whether this leads to differences in the number or type of neurons produced, we compared the expression of cell-type specific markers in DLG2-/- and WT lines from days 30-60 via immunocytochemistry (ICC) and Western blotting (Fig. 2c-i). From ICC it was clear that DLG2-/- cells are able to differentiate and produce postmitotic neurons expressing characteristic neuronal markers such as NEUN and TUJ1 plus cortical deep layer markers TBR1 and CTIP2 (Fig. 2c-i, Supplementary Fig. 7). Western blot of NEUN (Fig. 2c) and MAP2 (Supplementary Fig. 7) and quantification of NEUN+ cells following ICC (Fig. 2f) revealed no difference in the percentage of neurons produced by DLG2-/- cultures. This is in line with the comparable percentage of cells in the cell cycle/neural progenitors at days 30-60 in DLG2-/- and WT cultures indicated by a similar proportion of KI67+ and SOX2+ cells (Supplementary Fig. 7). At these early timepoints we would not expect to see the generation of upper layer neurons. Although we could identify a small percentage of SATB2+ cells in both WT and KO lines, all co-expressed CTIP2 (Supplementary Fig. 7) indicating their deep layer identity32. An analysis of deep layer markers TBR1 and CTIP2 revealed a significant decrease in CTIP2+ cells but a comparable proportion of TBR1+ neurons for all timepoints investigated (Fig. 2d, e, g-i). On average the proportion of CTIP2+ cells recovered from 15% of the WT level on day 30 to 50% by day 60, although there was notable variation between DLG2-/- lines (Supplementary Fig. 8); total CTIP2 protein level also recovered to some extent, but at a slower rate (Supplementary Fig. 8). Thus, DLG2-/- does not affect the rate at which neurons are produced but delays the expression of subtype identity in new-born deep layer neurons.
DLG2-/- lines display deficits in neuron morphology & migration
Given the over-representation of 30down-/- genes in terms related to neuron morphogenesis and migration (Fig. 2b), we sought to experimentally validate these phenotypes. Immature (day 30) and mature (day 70) neurons were traced and their morphology quantified (Fig. 3). At both timepoints DLG2-/- neurons displayed a simpler structure than WT, characterised by a similar number of primary neurites projecting from the soma (Fig. 3a) but with greatly reduced branching (Fig. 3b). Total neurite length did not differ (Fig. 3c), leading to a clear DLG2-/- phenotype of longer and relatively unbranched primary neurites (Fig. 3e). There was no significant difference in soma area (Fig. 3d). Day 40 DLG2-/- neurons had a slower speed of migration (Fig. 3f) and reduced displacement from their origin after 70 hrs (Fig. 3g, h). In summary, DLG2-/- neurons show clear abnormalities in both morphology and migration, validating the GO term analysis.
Distinct transcriptional programs regulated by DLG2 are enriched for common SZ risk alleles
We postulated that loss of DLG2 inhibits the activation of transcriptional programs driving neurogenesis, which starts between days 20 and 30 and steadily increases thereafter. If this is the case, then SZ genetic enrichment in 30down-/- should be captured by genes normally upregulated between days 20 and 30 in WT cultures (20-30upWT). Analysing differential expression between WT samples at successive timepoints, we found strong risk variant enrichment in 20-30upWT (Fig. 4a). The overlap between 20-30upWT and 30down-/- captured the signal in both sets (Poverlap = 3.23 x 10-10; 30down-/- only P = 0.44; 20-30upWT only P = 0.62). This was not simply due to the size of the overlap (3075 genes, 85% of 20-30upWT) as the regression coefficient for the set of overlapping genes (β = 0.14), which reflects magnitude of enrichment, was significantly greater than for genes unique to 30down-/- (β = 0.006, Pdifferent = 0.0015) or 20-30upWT (β = −0.015, Pdifferent = 0.0045). Thus, it is neurogenic transcriptional programs that are typically upregulated in WT but down-regulated in DLG2-/- lines that are enriched for SZ common variants.
To more precisely identify SZ-relevant transcriptional programs active during neurogenesis, we classified 20-30upWT genes based on their subsequent WT expression profiles (Fig. 4b, Methods): early-increasing genes, whose expression continues to rise between days 30 and 60; early-stable genes, whose expression stays at a relatively constant level; and early-transient genes, whose expression is later down-regulated. We also defined a set of late genes, whose expression only increases significantly after day 30. These were further partitioned into genes that were down-regulated at day 30 in DLG2-/- lines (e.g. early-stable-/-) and those that were not (e.g. early-stableWT only). The sole exception to this was the late set, which had minimal overlap with 30down-/- (62 out of 1399 genes) and was therefore left intact. Early-stable-/- and early-increasing-/- sets were robustly enriched for SZ association (conditioning on allWT+KO, Fig. 4c), revealing that SZ GWAS association is restricted to 2 distinct transcriptional programs normally activated during the onset of neurogenesis but down-regulated in DLG2-/- lines.
Cascade of transcriptional programs predicted to drive neurogenesis & differentiation
We next investigated the biological function of early neurogenic programs dysregulated in DLG2-/- lines. Each was over-represented for a coherent set of GO terms indicating a distinct biological role (Supplementary Table 5): early-transient-/- for histone/chromatin binding and transcriptional regulation; early-stable-/- for signal transduction, transcriptional regulation, neurogenesis, cell projection development, migration and differentiation; and early-increasing-/- for axon guidance, dendrite morphology, components of pre- and post-synaptic compartments and electrophysiological properties. These functions suggest a linked, time-ordered cascade of transcriptional programs spanning early neurogenesis. This begins with an initial phase of chromatin remodelling (early-transient-/-) that establishes neuron sub-type identity and leads to activation of a longer-term program guiding the growth and migration of new-born neurons (early-stable-/-). This in turn promotes the fine-tuning of sub-type specific neuronal structure, function and connectivity as cells enter the terminal phase of differentiation (early-increasing-/-).
To test support for the existence of this transcriptional cascade and its disruption in disease, we identified disease-relevant regulatory genes from each program whose downstream targets have been experimentally identified or computationally predicted (Methods). Reflecting our hypothesis that dysregulation of these pathways is likely to play a role in multiple neurodevelopmental disorders, we sought regulators linked to SZ, ASD and ID/NDD: chromatin modifier CHD833–37 from early-transient; transcription factor TCF4 12, 38–40 and translational regulator FMRP12, 41, 42 from early-stable-/-; and transcription factors (and deep layer markers) TBR142, 43 and BCL11B (CTIP2) 12, 13, 42, 44 from early-increasing-/-. We predicted that a substantial proportion of early-stable-/- genes would be directly regulated by CHD8, while early-increasing-/- would be enriched for genes directly regulated by TCF4, TBR1 and BCL11B. Since early-transient-/- is responsible for activating subsequent programs, we predicted that early-increasing-/- would be enriched for indirect targets of CHD8 (genes not directly regulated but whose expression is altered when CHD8 is perturbed) that are down-regulated in CHD8 knockdown cells36. We also predicted that early-transient-/- genes would not be enriched for targets of terminal phase regulators BCL11B and TBR1. FMRP represses the translation of its mRNA targets, facilitating their translocation to distal sites of protein synthesis41, 45, and its function is known to be important for axon and dendrite growth46. We therefore predicted that early-stable-/- and -increasing-/- (but not -transient-/-) genes would be enriched for FMRP targets. Over-representation tests emphatically confirmed these predictions, supporting the existence of a regulatory cascade driving early neurogenic transcriptional programs disrupted in neuropsychiatric disorders (Fig. 4d). In addition, the targets of TCF4, FMRP, BCL11B and TBR1 were more highly enriched for SZ association than other genes in early-increasing-/- (Fig. 4d), highlighting specific pathways through which these known risk genes are likely to contribute to disease.
Convergence of genetic risk on perturbed action potential generation
We next tested whether biological processes over-represented in early-stable-/- or early-increasing-/- (Supplementary Table 5) captured more or less of the SZ association in these programs than expected (Methods). None of the 13 semi-independent GO term subsets identified in early-stable-/- differed substantially from early-stable-/- as a whole (Supplementary Table 6), indicating that risk factors are distributed relatively evenly between them. Of the 16 subsets for early-increasing-/-, somatodendritic compartment and membrane depolarization during action potential displayed evidence for excess enrichment relative to the program as a whole (Fig. 4e). No single term showed evidence for depletion, suggesting that diverse biological processes regulating neuronal growth, morphology and function are perturbed in SZ. The enhanced enrichment in action potential (AP) related genes is particularly striking: while postsynaptic complexes regulating synaptic plasticity are robustly implicated in SZ4, 5, 8–11, this represents the first evidence that the molecular machinery underlying AP generation is also disrupted. We therefore sought to confirm the disruption of APs in DLG2-/- lines (Fig. 5a-j), also investigating the impact of DLG2 loss on synaptic transmission (Fig. 5l-n).
Day 50 DLG2-/- neurons were less excitable, with a significantly more depolarised resting membrane potential (Fig. 5a). Stepped current injection evoked AP firing in 80% WT but only 43% DLG2-/- neurons (Fig. 5c). APs produced by DLG2-/- cells were characteristic of less mature neurons (Fig. 5d), having smaller amplitude, longer half-width and a slower maximum rate of depolarisation and repolarisation (ẟV/ẟt) (Fig. 5e-h). We found no change in AP voltage threshold, rheobase current (Fig. 5i, j) or input resistance (Fig. 5b). The percentage of neurons displaying spontaneous excitatory postsynaptic currents (EPSCs) was comparable at days 50 and 60 (Fig. 5n) as was EPSC frequency and amplitude (Fig. 5l, m). Lack of effect on synaptic transmission may reflect compensation by DLG4, whose expression shows a trend towards an increase in synaptosomes from day 65 DLG2-/- neurons (Fig. 5o). In summary, developing DLG2-/- neurons have a reduced ability to fire APs and produce less mature APs.
Neurogenic programs capture risk variant enrichment in loss-of-function intolerant genes
Having identified neurodevelopmentally expressed pathways enriched for common SZ risk variants and investigated the phenotypic consequences of their dysregulation in DLG2-/- lines, we sought to test our hypothesis that these pathways capture a significant proportion of the SZ GWAS enrichment seen in LoFi genes12. We predicted that LoFi genes would primarily be concentrated in earlier transcriptional programs where the impact of disruption is potentially more severe. LoFi genes were over-represented in all early neurogenic programs but notably depleted in the late set (Fig. 6a). LoFi SZ GWAS association (conditioned on allWT+KO) was captured by the overlap with early-stable-/- and early-increasing-/-, localising the GWAS signal to a fraction of LoFi genes (less than a third) located in specific neurogenic pathways (Fig. 6b).
Under our proposed model, early-transient-/- initiates activation of other early neurogenic programs, thus its dysregulation has the potential to cause more profound developmental deficits. We therefore speculated that – while displaying no evidence for SZ common variant association – LoFi genes in early-transient-/- would be enriched for rare mutations linked to SZ and/or more severe neurodevelopmental disorders. All early neurogenic programs displayed a markedly elevated rate of de novo LoF mutations relative to allWT+KO that was captured by LoFi genes: early-transient-/- for mutations identified in NDD and ASD cases47; early-stable-/- for NDD, ASD and SZ16; and early-increasing-/- for NDD (Fig. 6c). De novo LoF mutations from unaffected siblings of ASD cases47 showed no elevation. In all three programs, a clear gradient of effect was evident from NDD (largest elevation in rate) to ASD to SZ, visible only in LoFi genes (Fig. 6d). A modest gradient was also evident for LoFi genes lying outside early neurogenic programs (‘Other LoFi genes’, Fig. 6d), despite de novo rates not being robustly elevated here. This suggests the existence of additional biological pathways less central to disease pathophysiology that harbour disease-relevant LoFi genes, although larger samples are likely to be required for their identification.
Given the robust rare variant enrichment across multiple disorders, we investigated whether neurogenic programs are also enriched for common risk variants contributing to disorders other than SZ, analysing a range of conditions with which SZ is known to share heritability: ASD48; attention-deficit/hyperactivity disorder (ADHD)49; bipolar disorder (BP)50; and major depressive disorder (MDD)51. Since altered cognitive function is a feature of all these disorders, we also tested programs for enrichment in common variants linked to IQ52. Remarkably, all disorders showed evidence for common variant enrichment in one or more early neurogenic program that was again captured by LoFi genes (Fig. 6e, f). In contrast, common variants conferring risk for neurodegenerative disorder Alzheimer’s disease (AD)53 were not enriched. Whereas rare variant enrichment was concentrated towards the initial stages of the transcriptional cascade (early-transient-/-, early-stable-/-), GWAS association was confined to later stages (early-stable-/-, early-increasing-/-). Conditions with the least prior evidence for an early neurodevelopmental component, BP and MDD, displayed enrichment only in the latest stage (early-increasing-/-). Dysregulation of transcriptional programs underlying cortical excitatory neurogenesis thus contributes to a wide spectrum of neuropsychiatric disorders. Furthermore, robust enrichment of early-stable-/- and early-increasing-/- for IQ association (Fig. 6e, f) strongly suggests that perturbation of these neurogenic programs contributes to the emergence of cognitive symptoms in these disorders.
Divergence between disorders at the level of cellular pathways
For each phenotype-program association identified above, we tested whether GO terms over-represented in that program captured significantly more/less of the association signal than the program as a whole (Methods). As seen for SZ (Fig. 4e, Supplementary Table 6), none of the independent GO term subsets within early-stable-/- or early-increasing-/- showed evidence for depletion of GWAS association (Supplementary Table 7), suggesting that common variants contributing to neuropsychiatric disorders and cognition are distributed across the diverse molecular processes encapsulated by each neurogenic program. However, clear differences in emphasis were evident between disorders: of the two conditions whose association was restricted to early-increasing-/- (Fig. 6e, f), BP showed strong evidence for excess enrichment in membrane depolarization during action potential genes – also enriched for SZ GWAS association (Fig. 4e) – but MDD did not (Supplementary Table 7).
Turning to de novo rare variant associations, all independent GO term subsets in early-transient-/- (primarily related to transcriptional regulation, Supplementary Table 5) displayed increased enrichment for ASD mutations, whereas excess NDD association was restricted to chromatin binding genes (Supplementary Table 8). Amongst early-stable-/- genes, both ASD and NDD de novos were concentrated in transcription factors; in contrast, components of early endosomes were strongly depleted for NDD mutations, as were G-protein-coupled signalling molecules (Guanine nucleotide exchange factors, Rab interactors) – key regulators of endosome function54, 55 (Supplementary Table 8). In early-increasing-/-, NDD mutations were concentrated in terms related to synaptic transmission and also in membrane depolarization during action potential genes (Supplementary Table 8), once again highlighting the importance of this relatively small gene-set.
Neurogenic programs are active during excitatory corticoneurogenesis in vivo
Genetic analyses (Figs. 4, 6) leave little doubt that early neurogenic programs are highly relevant to cognitive function and the pathogenesis of neuropsychiatric disorders. However, these programs were identified from bulk RNAseq data in vitro and it remains to be shown that their constituent genes are actively co-expressed in the appropriate cell-types during cortical excitatory neurogenesis in vivo. To address this, we extracted gene expression data for cell-types spanning cortical excitatory neurodevelopment from an existing single-cell RNAseq study of human foetal brain tissue32. After normalising the expression for each gene across all cells, we calculated the average expression for each gene in each cell-type/stage of development available: early RG, RG, IPCs, transitioning cells (intermediate between progenitors and neurons), new-born and maturing neurons (Methods). We then plotted the expression of each program (mean and standard error of gene-level averages) in each cell-type/stage and tested for differences in expression between successive types/stages (Fig. 7, Supplementary Table 9). The expression profile seen for each program in vitro was recapitulated in vivo (Fig. 7). Notably, while other programs in the cascade were significantly upregulated during the transition from progenitors to neurons, early-transient-/- expression was found to be low in early RG, rising in more mature NPCs then declining in neurons. This is consistent with its predicted role in shaping neuronal sub-type identity, which recent evidence indicates is determined by the internal state of NPCs immediately prior to their exit from the cell-cycle56.
Discussion
A complex choreography of cell proliferation, specification, growth, migration and network formation underlies brain development. To date, limited progress has been made pinpointing aspects of this process disrupted in neuropsychiatric disorders. Here we uncover 3 distinct gene expression programs active during early excitatory corticoneurogenesis in vitro and in vivo (Fig. 7). These programs are highly enriched for variants contributing to a wide spectrum of disorders and cognitive function (Fig. 6). The extent of association across 9 independent genetic datasets is remarkable, each program displaying robust association in multiple studies: 2 for early-transient-/-; 7 (including common and rare variation in both ASD and SZ) for early-stable-/-; and 6 for early-increasing-/-. These programs harbour well-supported risk genes for complex and Mendelian disorders, a number of which are highlighted in Fig. 8a. This convergence of genetic evidence leaves little doubt that these programs play an important aetiological role in a wide range of psychiatric disorders.
Each program has a unique gene expression profile and molecular composition, indicating a distinct functional role during early neurogenesis: based on our findings we propose that they form a transcriptional cascade regulating neuronal growth, migration and differentiation (Fig 8a). Computational analyses of gene/mRNA regulatory interactions implicate known neurodevelopmental disorder risk genes (CHD8, TCF4, FMRP, BCL11B and TBR1) as regulators of this cascade and reveal pathways through which they are highly likely to contribute to disease (Fig. 4e). Supporting this model, down-regulation of neurogenic programs in DLG2-/- lines is accompanied by deficits that reflect their predicted functions: impaired migration; simplified neuron morphology; immature action potential generation; and delayed expression of cell-type identity (see also Supplementary Discussion). Further experimental work is required to more precisely delineate phenotypes associated with the disruption of individual programs and the risk genes they harbour and to map out regulatory interactions shaping their expression and activity, testing computational predictions. Here we focus on phenotypes expressed by individual new-born excitatory neurons; in future studies it will be important to investigate the persistence of these phenotypes and explore longer-term effects on neuronal circuit formation and function, particularly in light of the predicted role of early-increasing-/- (Fig. 8a).
Cell developmental processes such as growth and migration57 arise from the coordinated operation of diverse molecular pathways distributed across multiple organelles. In general, risk variant enrichment in early neurogenic programs did not appear to be restricted to specific functional subsets of genes, suggesting that genetic perturbation of multiple sub-cellular processes is likely to contribute to each disease-relevant cellular phenotype. However, there was evidence that within a given neurogenic program the relative importance of sub-cellular processes varies between disorders. Of the 5 disorders with association in early-increasing-/-, SZ, BP and NDD displayed evidence for an excess concentration of genetic risk in a small subset of genes linked to action potential generation, while ADHD and MDD did not. Besides their role in mature neuronal function, mutations in these genes disrupt cortical neuron migration and neurite outgrowth and are implicated in severe neurodevelopmental syndromes (Supplementary Discussion). Within early-stable-/- (predicted to regulate neuronal growth and migration, Fig. 8a) NDD and ASD de novo rare variants were strongly concentrated in transcription factors, NDD variants also being highly depleted from genes linked to early endosome function. Amongst other roles, endosomal trafficking regulates the cell-surface expression of receptors and adhesion molecules during migration and axon guidance55. It is possible that, even where they impact the same higher level cell developmental process, risk variants for different disorders may display a unique pattern of disruption across underlying molecular pathways.
A clear pattern of enrichment was evident across early neurogenic programs, with rare damaging mutations contributing to more severe disorders concentrated in initial stages (early-transient-/-, early-stable-/-) of the cascade (Fig. 6d) and common variant association increasing towards later (early-stable-/-, early-increasing-/-) stages (Fig. 6f). It has been proposed that adult and childhood disorders lie on an aetiological and neurodevelopmental continuum: the more severe the disorder the greater the contribution from rare, damaging mutations and the earlier their developmental impact58–60 (Fig. 8b). Our data support this model and ground it in developmental neurobiology, embedding genetic risk for multiple disorders in a common pathophysiological framework.
Genetic risk for all disorders was concentrated in LoFi genes, indicating far wider relevance for these genes than previously appreciated; here we provide the first real insight into their pathophysiological roles. Being under high selective constraint, LoFi genes profoundly impact development through to sexual maturity. It has not been clear whether LoFi genes harbouring pathogenic mutations are distributed across diverse pathways shaping pre-/post-natal growth or are concentrated in specific pathways and/or stages of development. Our analyses reveal that not all neurodevelopmental pathways are enriched for LoFi genes (Fig. 6a), and that the subset of LoFi genes (∼40%) concentrated in early neurogenic programs capture virtually all common and rare variant LoFi association across a wide spectrum of disorders (Fig. 6). While early-transient-/- expression is limited to initial stages of neurogenesis (peaking as radial glia mature, Fig. 7), early-stable-/- and early-increasing-/- continue to be up-regulated during the NPC-neuron transition and persist as neurons mature (Fig. 7), shaping form, function and (potentially) network level organisation at later stages (Fig. 8a).
While it was knockout of DLG2 that led us to the identification of disease-relevant programs and allowed us to investigate cellular phenotypes associated with their dysregulation, DLG2 itself has yet to reach the status of a canonical SZ/ASD risk gene. DLG2 is primarily known for its role as a postsynaptic scaffold protein in mature neurons, where it is required for normal formation of NMDA receptor signalling complexes26. These complexes are themselves enriched for rare mutations in SZ cases4, 5, 8–11, revealing an unexpected link between mature and developmental disease mechanisms. The extent of this link may be far greater than generally appreciated: as noted above, channels involved in action potential generation are also present in early neurogenic programs and are known to impact neuron growth and migration.
Our data reveal that DLG2 expression is important for cortical excitatory neurodevelopment, but the mechanism by which it operates remains to be uncovered. Based on its known function, and involvement of invertebrate Dlg in the developmental Scrib signalling module25, we propose that DLG2 links cell-surface receptors to signal transduction pathways regulating the activation of neurogenic programs (Supplementary Fig. 9). We hypothesise that stochastic signalling in DLG2-/- lines delays and impairs transcriptional activation, disrupting the orchestration of events required for normal development and the specification of neuronal properties. Precise timing is crucial during brain development, where the correct dendritic morphology, axonal length and electrical properties are required for normal circuit formation. Consequently, even transient perturbation of neurogenesis may have a profound impact on fine-grained neuronal wiring, network function and ultimately perception, cognition and behaviour. Clearly much work remains to be done, but we believe that the current findings sketch out a useful neurobiological model upon which future studies into the developmental origins of psychiatric genetic disorders can build.
Methods
hESC culture
All hESC lines were maintained at 37°C and 5% CO2 in 6 well cell culture plates (Greiner) coated with 1% Matrigel hESC-Qualified Matrix (Corning) prepared in Dulbecco’s Modified Eagle Medium: Nutrient Mixture F-12 (DMEM/F12, Thermo Fisher Scientific). Cells were fed daily with Essential 8 medium (E8, Thermo Fisher Scientific) and passaged at 80% confluency using Versene solution (Thermo Fisher Scientific) for 1.5 minutes at 37°C followed by manual dissociation with a serological pipette. All cells were kept below passage 25 and confirmed as negative for mycoplasma infection.
DLG2 Knockout hESC line generation
Two guide RNAs targeting exon 22 of the human DLG2 gene, covering the first PDZ domain, were designed using a web-based tool (crispr.mit.edu) and cloned into two plasmids containing D10A nickase mutant Cas9 with GFP (PX461) or Puromycin resistant gene (PX462)61. pSpCas9n(BB)-2A-GFP (PX461) and pSpCas9n(BB)-2A-Puro (PX462) was a gift from Feng Zhang (For PX461, Addgene plasmid#48140; http://n2t.net/addgene:48140; RRID:Addgene_48140; For PX462, Addgene plasmid #48141; http://n2t.net/addgene:48141; RRID:Addgene_48141). H7 hESCs (WiCell) were nucleofected using P4 solution and CB150 programme (Lonza) with 5µg of plasmids, FACS sorted on the following day and plated at a low density (∼70 cells/cm2) for clonal isolation. 19 clonal populations were established with 6 WT and 13 mutant lines after targeted sequencing of the exon 22. One WT and two homozygous knockout lines were chosen for study: our WT and KO lines therefore originate from the same H7 parental line and have gone through the same process of nucleofection and FACS sorting together.
Genetic validation
The gRNA pair had zero predicted off-target nickase sites (Supplementary Fig. 2). Even though we did not use a wild-type Cas9 nuclease (where only a single gRNA is required to create a double-stranded break), we further checked genic predicted off-target sites for each individual gRNA by PCR and Sanger sequencing (GATC & LGC). Out of 30 sites identified, we randomly selected 14 (7 for each gRNA) for validation. No mutations relative to WT were present at any site (Supplementary Table 1). In addition, genotyping on the Illumina PsychArray v1.1 revealed no CNV insertions/deletions in either DLG2-/- line relative to WT (Supplementary Fig. 5).
Cortical differentiation
Differentiation to cortical projection neurons (Fig. 1b) was achieved using the dual SMAD inhibition protocol28 with modifications (embryoid body to monolayer and replacement of KSR medium with N2B27 medium) suggested by Cambray et al., 201229. Prior to differentiation Versene treatment and mechanical dissociation was used to passage hESCs at approximately 100,000 cells per well into 12 well cell culture plates (Greiner) coated with 1% Matrigel Growth Factor Reduced (GFR) Basement Membrane matrix (Corning) in DMEM/F12, cells were maintained in E8 medium at 37°C and 5% CO2 until 90% confluent. At day 0 of the differentiation E8 media was replaced with N2B27-RA neuronal differentiation media consisting of: 2/3 DMEM/F12, 1/3 Neurobasal (Thermo Fisher Scientific), 1x N-2 Supplement (Thermo Fisher Scientific), 1x B27 Supplement minus vitamin A (Thermo Fisher Scientific), 1x Pen Step Glutamine (Thermo Fisher Scientific) and 50 µM 2-Mercaptoethanol (Thermo Fisher Scientific), which was supplemented with 100 nM LDN193189 (Cambridge Biosciences) and 10 µM SB431542 (Stratech Scientific) for the first 10 days only (the neural induction period). At day 10 cells were passaged at a 2:3 ratio into 12 well cell culture plates coated with 15 µg/ml human plasma fibronectin (Merck) in Dulbecco’s phosphate-buffered saline (DPBS, Thermo Fisher Scientific), passage was as previously described with the addition of a 1 hour incubation with 10 µM Y27632 Dihydrochloride (ROCK inhibitor, Stratech Scientific) prior to Versene dissociation. During days 10 to 20 of differentiation cells were maintained in N2B27-RA (without LDN193189 or SB431542 supplementation) and passaged at day 20 in a 1:4 ratio into 24 well cell culture plates (Greiner) sequentially coated with 10 µg/ml poly-d-lysine hydrobromide (PDL, Sigma) and 15 µg/ml laminin (Sigma) in DPBS. Vitamin A was added to the differentiation media at day 26, standard 1x B27 Supplement (Thermo Fisher Scientific) replacing 1x B27 Supplement minus vitamin A, and cells were maintained in the resulting N2B27+RA media for the remainder of the differentiation. Cells maintained to day 40 received no additional passage beyond passage 2 at day 20 while cells kept beyond day 40 received a third passage at day 30, 1:2 onto PDL-laminin as previously described. In all cases cells maintained past day 30 were fed with N2B27+RA supplemented with 2µg/ml laminin once weekly to prevent cell detachment from the culture plates.
Immunocytochemistry
Cells were fixed in 4% paraformaldehyde (PFA, Sigma) in PBS for 20 minutes at 4°C followed by a 1 hour room temperature incubation in blocking solution of 5% donkey serum (Biosera) in 0.3% Triton-X-100 (Sigma) in PBS (0.3% PBST). Primary antibodies, used at an assay dependent concentration (see ‘Antibody concentration’), were diluted in blocking solution and incubated with cells overnight at 4°C. Following removal of primary antibody solution and 3 PBS washes, cells were incubated in the dark for 2 hours at room temperature with appropriate Alexa Fluor secondary antibodies (Thermo Fisher Scientific) diluted 1:500 with blocking solution. After an additional 2 PBS washes cells were counterstained with DAPI nucleic acid stain (Thermo Fisher Scientific), diluted 1:1000 with PBS, for 5 mins at room temperature and following a final PBS wash, mounted using Dako Fluorescence Mounting Medium (Agilent) and glass coverslips. Imaging was with either the LSM710 confocal microscope (Zeiss) or Cellinsight Cx7 High-Content Screening Platform (Thermo Fisher Scientific) with HCS Studio Cell Analysis software (Thermo Fisher Scientific) used for quantification.
Western blotting
Total protein was extracted from dissociated cultured cells by incubating in 1x RIPA buffer (New England Biolabs) with added MS-SAFE Protease and Phosphatase Inhibitor (Sigma) for 30 minutes on ice with regular vortexing, concentration was determined using a DC Protein Assay (BioRad) quantified with the CLARIOstar microplate reader (BMG Labtech). Proteins for western blotting were incubated with Bolt LDS sample buffer (Thermo Fisher Scientific) and Bolt Sample Reducing Agent (Thermo Fisher Scientific) for 10 minutes at 70°C before loading into Bolt 4-12% Bis-Tris Plus gels (Thermo Fisher Scientific). Gels were run at 120V for 2-3 hours in Bolt MES SDS Running Buffer (Thermo Fisher Scientific) prior to protein transfer to Amersham Protran nitrocellulose blotting membrane (GE Healthcare) using a Mini Trans-Blot Cell (BioRad) and Bolt Transfer Buffer (Thermo Fisher Scientific) run at 120V for 1 hour 45 minutes. Transfer was confirmed by visualising protein bands with 0.1% Ponceau S (Sigma) in 5% acetic acid (Sigma) followed by repeated H2O washes to remove the stain.
Following transfer, membranes were incubated in a blocking solution of 5% milk in TBST, 0.1% TWEEN 20 (Sigma) in TBS (Formedium), for 1 hour at room temperature. Primary antibodies, used at an assay dependent concentration, were diluted with blocking solution prior to incubation with membranes overnight at 4°C. Following 3 TBST washes, membranes were incubated in the dark for 1 hour at room temperature with IRDye secondary antibodies (LI-COR) diluted 1:15000 with blocking solution. After 3 TBS washes staining was visualised using the Odyssey CLx Imaging System (LI-COR).
Antibody concentration
Synaptosomal preparation
Synaptic protein was extracted by manually dissociating cultured cells in 1x Syn-PER Reagent (Thermo Fisher Scientific) with added MS-SAFE Protease and Phosphatase Inhibitor (Sigma). Following low speed centrifugation to pellet cell debris (1,200g, 10 min, 4°C) the supernatant was centrifuged at high speed to pellet synaptosomes (15,000g, 20 min, 4°C) which were resuspended in fresh Syn-PER Reagent. Protein concentration was determined using a DC Protein Assay (BioRad) quantified with the CLARIOstar microplate reader (BMG Labtech).
Peptide affinity purification
PDZ domain containing proteins were enriched from total protein extracts by peptide affinity purification. NMDA receptor subunit 2 C-terminal peptide “SIESDV” was synthesised (Pepceuticals) and fully dissolved in 90% v/v methanol + 1M HEPES pH7 (both Sigma). Dissolved peptide was coupled to Affi-Gel 10 resin (Bio-Rad) that had been washed 3 times in methanol, followed by overnight room temperature incubation on a roller mixer. Unreacted NHS groups were subsequently blocked using 1M Tris pH9 (Sigma) with 2 hours room temp incubation on a roller mixer. The peptide bound resin was then washed 3 times with DOC buffer (1% w/v sodium deoxycholate; 50mM Tris pH9; 1X MS-SAFE Protease and Phosphatase Inhibitor, all Sigma) and stored on ice until required. Total protein was extracted from dissociated cultured cells by incubating in DOC buffer for 1 hour on ice with regular vortexing, cell debris was pelleted by high speed centrifugation (21,300g, 2 hours, 4°C) and the supernatant added to the previously prepared “SIESDV” peptide bound resin. After overnight 4°C incubation on a roller mixer, the resin was washed 5 times with ice cold DOC buffer and the bound protein eluted by 15 minute 70°C incubation in 5% w/v sodium dodecyl sulphate (SDS, Sigma). The eluted protein was reduced with 10 mM TCEP and alkylated using 20 mM Iodoacetamide, trapped and washed on an S-trap micro spin column (ProtiFi, LLC) according to the manufacturer’s instructions and protein digested using trypsin sequence grade (Pierce) at 47°C for 1 hour. Eluted peptides were dried in a vacuum concentrator and resuspended in 0.5% formic acid for MS analysis.
Mass spectrometry analysis
LC-MS/MS analysis was performed and data was processed and quantified as described previously62. Briefly, peptides were analysed by nanoflow LC-MS/MS using an Orbitrap Elite (Thermo Fisher) hybrid mass spectrometer equipped with a nanospray source, coupled to an Ultimate RSLCnano LC System (Dionex). Peptides were desalted on-line using a nano trap column, 75 μm I.D.X 20mm (Thermo Fisher) and then separated using a 130-min gradient from 3 to 40% buffer B (0.5% formic acid in 80% acetonitrile) on an EASY-Spray column, 50 cm × 50 μm ID, PepMap C18, 2 μm particles, 100 Å pore size (Thermo Fisher). The Orbitrap Elite was operated with a cycle of one MS (in the Orbitrap) acquired at a resolution of 60,000 at m/z 400, with the top 20 most abundant multiply charged (2+ and higher) ions in a given chromatographic window subjected to MS/MS fragmentation in the linear ion trap. An FTMS target value of 1e6 and an ion trap MSn target value of 1e4 were used with the lock mass (445.120025) enabled. Maximum FTMS scan accumulation time of 500 ms and maximum ion trap MSn scan accumulation time of 100 ms were used. Dynamic exclusion was enabled with a repeat duration of 45 s with an exclusion list of 500 and an exclusion duration of 30 s. Raw mass spectrometry data were analysed with MaxQuant version 1.6.10.4363. Data were searched against a human UniProt sequence database (downloaded December 2019) using the following search parameters: digestion set to Trypsin/P, methionine oxidation and N-terminal protein acetylation as variable modifications, cysteine carbamidomethylation as a fixed modification, match between runs enabled with a match time window of 0.7 min and a 20-min alignment time window, label-free quantification enabled with a minimum ratio count of 2, minimum number of neighbours of 3 and an average number of neighbours of 6. PSM and protein match thresholds were set at 0.1 ppm. A protein FDR of 0.01 and a peptide FDR of 0.01 were used for identification level cut-offs.
CNV analysis
Following manual dissociation of WT and DLG2 KO hESC into DPBS, genomic DNA was extracted using the ISOLATE II Genomic DNA kit (Bioline). Following DNA amplification and fragmentation according to the associated Illumina HTS assay protocol samples were hybridized to an Infinium PsychArray v1.1 BeadChip (Illumina). The stained bead chip was imaged using the iScan System (Illuminia) and Genome Studio v2.0 software (Illumina) subsequently used to normalise the raw signal intensity data and perform genotype clustering. Final analysis for Copy Number Variation (CNV) was carried out with PennCNV software64.
RNA sequencing
WT and DLG2 KO cells were cultured to days 15, 20, 30 and 60 of cortical differentiation as described above (See ‘Cortical differentiation’). Total transcriptome RNA was isolated from triplicate wells for all cell lines at each time point by lysing cells in TRIzol Reagent (Thermo Fisher Scientific) followed by purification with the PureLink RNA Mini Kit (Thermo Fisher Scientific). RNA quality control (QC) was performed with the RNA 6000 Nano kit analysed using the 2100 Bioanalyzer Eukaryote Total RNA Nano assay (Agilent). cDNA libraries for sequencing were produced using the KAPA mRNA HyperPrep Kit for Illumina Platforms (Kapa Biosystems) and indexed with KAPA Single-Indexed Adapter Set A + B (Kapa Biosystems). Library quantification was by Qubit 1x dsDNA HS Assay kit (Thermo Fisher Scientific) and QC by High Sensitivity DNA kit analysed using the 2100 Bioanalyzer High Sensitivity DNA assay (Agilent). Sequencing was performed using the HiSeq4000 Sequencing System (Illumina) with libraries split into 2 equimolar pools, each of which was run over 2 flow cell lanes with 75 base pair paired end reads and 8 base pair index reads.
All samples were modelled after the long-rna-seq-pipeline used by the PsychENCODE Consortium and available at https://www.synapse.org/#!Synapse:syn12026837. Briefly, the fastq files from Illumina HiSeq4000 were assessed for quality by using FastQC tool (v0.11.8)65 and trimmed for adapter sequence and low base call quality (Phred score < 30 at ends) with cutadapt (v2.3)66. The mapping of the trimmed reads was done using STAR (v2.7.0e)67 and the BAM files were produced in both genomic and transcriptomic coordinates and sorted using samtools (v1.9)68. The aligned and sorted BAM files were further assessed for quality using Picard tools (v2.20.2)69. This revealed a high level of duplicate reads in day 30 KO2 samples (∼72% compared to an average of 23% for other samples). These samples were removed prior to further analyses, which were thus performed on KO1 and WT samples for this timepoint. GRCh38.p13 was used as the reference genome and the comprehensive gene annotations on the primary assembly from Gencode (release 32) used as gene annotation. Gene and transcript-level quantifications were calculated using RSEM (v1.3.1)70. Both STAR and RSEM executions were performed using the psychENCODE parameters.
RSEM gene and isoform level estimated counts were imported using the tximport package (v1.12.3)71. Protein coding genes expressed (cpm>=1) in at least 1/3 of the samples were taken forward for differential analyses of genes, transcripts and exons. Differential gene expression analysis was performed using the DESeq2 package (v1.24.0)72 and differentially expressed genes were considered significant if their p value after Bonferroni correction was < 0.05. Differential exon usage was analysed using the DEXSeq pipeline73. Briefly, the GENCODE annotation .gtf file was translated into a .gff file with collapsed exon counting bins by using the dexseq_prepare_annotation.py script. Mapped reads overlapping each of the exon counting bins were then counted using the python_count.py script and the HTSEQ software (0.11.2)74. Finally, differential exon usage was evaluated using DEXSeq (v1.30)73 and significant differences identified using an FDR threshold of 0.05. All the differential analyses were performed by using R (v3.6.1).
When analysing differential gene expression in DLG2-/- relative to WT, samples from KO1 and KO2 lines were combined i.e. for each timepoint a single differential gene expression analysis was performed, comparing expression in KO1 & KO2 samples against wild-type. To assess the impact of sample dropout at day 30, we investigated the similarity in gene expression between lines by clustering all KO1, KO2 and WT samples (Supplementary Fig. 3a). At all 4 timepoints, all replicates from KO1 and KO2 cluster together: while KO2 samples from day 30 are not of sufficient quality to reliably inform further analyses, they are clearly similar to KO1 day 30 samples. We also performed differential expression analyses separately for each line (i.e. KO1 v WT and KO2 v WT) at all other timepoints. The overlap in expressed genes accounted for >98% of the genes expressed in each line and gene expression fold change was highly correlated between KO1 v WT and KO2 v WT (Spearman’s ⍴day 15 = 0.67, ⍴ day 20 = 0.95, ⍴ day 60 = 0.75). Over-representation odds ratios for GO terms also remain well correlated for significantly up-regulated (⍴day 15 = 0.70, ⍴ day 20 = 0.92, ⍴ day 60 = 0.67) and down-regulated regulated (⍴day 15 = 0.55, ⍴day 20 = 0.95, ⍴day 60 = 0.56) genes. We noted that agreement between lines was greatest for day 20, which also lies close to the onset of neurogenesis and displays a high level of differential expression between KO and WT lines, comparable to that for day 30 (Figure 1g). Further indicating a limited impact for sample dropout, phenotypes predicted by GO term analysis of differential expression at day 30 (deficits in neuron migration, morphology and action potential generation) were experimentally validated (Fig. 3 & 5); and all early neurogenic transcriptional programs identified in these data were shown to possess an identical profile of expression across human neurodevelopmental cell-types in vivo (Fig. 7).
Transcriptional programs
Genes were partitioned based upon their WT expression profiles as follows. Differentially expressed genes (Bonferroni P < 0.05) were first identified between pairs of timepoints (analysing WT data only): genes up-regulated in day 30 relative to day 20 (20-30upWT); genes up-regulated in day 60 relative to day 30 (30-60upWT); and genes up-regulated in day 60 relative to day 20 (20-60upWT). Early-transient, early-stable, early-increasing and late programs were then defined based upon the intersections of these gene-sets as shown in Fig. 4b.
Human foetal cortex single-cell RNA sequencing data
Single-cell RNA-Seq gene expression data from Nowakowski et al., 201732 were downloaded from bit.ly/cortexSingleCell. Cells corresponding to distinct neurodevelopmental cell-types (including cell-types at different stages of maturity) were identified and extracted, collating all cells from the corresponding in vivo cell clusters32 as follows:
Progenitors
RG (early): “RG-early”
RG: “RG-div1”, “RG-div2”, “oRG”, “tRG”, “vRG” IPC: “IPC-div1”, “IPC-div2”
Transitioning: “IPC-nEN1”, “IPC-nEN2”, “IPC-nEN3”
Cortical excitatory neurons
Newborn: “nEN-early1”, “nEN-early2”, “nEN-late”
Maturing: “EN-PFC1”,“EN-PFC2”,“EN-PFC3”,“EN-V1-1”,“EN-V1-2”, “EN-V1-3”
Cells with less than 5% of all protein-coding genes expressed (TPM>0) and genes expressed in less than 5% of cells were filtered out. The remaining dataset consisted of 2318 cells and 9239 protein-coding genes. Gene expression counts (TPM) were z-score normalised for each gene across all cells, then the average normalised expression score for each gene was calculated for each of the above cell-types. Over 80% of genes for each in vitro program (early-transient-/-, early-stable-/- and early-increasing-/-) were present in the in vivo data; all of these genes passed our filtering criteria. Taking the set of genes corresponding to each in vitro program, we calculated the mean and standard error of their gene-level averages in each in vivo cell-type (Fig. 7). For each program, the difference between successive neurodevelopmental cell-types/stages was calculated using a two-tailed Student’s t-test and p-values Bonferroni-corrected for the 6 comparisons made: RG (early) v RG; RG v IPC; RG v Transitioning; IPC v Transitioning; Transitioning v Newborn neurons; and Newborn v Maturing neurons.
Gene set construction
GO
The Gene Ontology (GO) ontology tree was downloaded from OBO: http://purl.obolibrary.org/obo/go/go-basic.obo
Ontology trees were constructed separately for Molecular Function, Biological Process and Cellular Component using ‘is_a’ and ‘part_of’ relationships. GO annotations were downloaded from NCBI:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
Annotations containing the negative qualifier NOT were removed, as were all annotations with evidence codes IEA, NAS and RCA. Annotations were further restricted to protein-coding genes. Genes corresponding to each annotation term were then annotated with all parents of that term, identified using the appropriate ontology tree. Finally, terms containing between 20 and 2000 genes were extracted for analysis.
Regulator targets
Predicted TBR1 and BCL11B targets42
Transcription factor-target gene interactions identified by elastic net regression were downloaded from the PsychEncode resource website (http://resource.psychencode.org/#Derived) and predicted targets for TBR1 and BCL11B extracted (interaction file: INT-11_ElasticNet_Filtered_Cutoff_0.1_GRN_1.csv). Gene symbols were mapped to NCBI/Entrez ids using data from the NCBI gene_info file.
TCF4 targets40
identifiers were updated using the gene_history file from NCBI.
FMRP targets41
NCBI/Entrez mouse gene identifiers were updated using the gene_history file from NCBI. Genes were then mapped from mouse to human using Homologene, restricting to protein-coding genes with a 1-1 mapping.
Direct CHD8 mid-foetal promoter targets37
symbols (NCBI gene_info file) and locations (NCBI Build37.3) were used to map genes to NCBI/Entrez gene ids. Note: using Sugathan et al., 201436 rather than Cotney et al., 201537 to define direct CHD8 targets did not alter the observed pattern of overlap with transcriptional programs (Fig. 4)(data not shown).
Indirect CHD8 targets36
Ensembl ids were updated (Ensembl stable_id_event file) then mapped to current NCBI/Entrez ids using Ensembl and NCBI id cross-reference files. Taking genes with altered expression on CHD8 shRNA knockdown, we removed those identified as direct CHD8 targets in NPCs36 or as CHD8 mid-foetal promoter targets37. Using a stricter definition of genes with altered expression on CHD8 shRNA knockdown than that taken by Sugathan et al., 201436 – genes with a Bonferroni corrected differential expression P < 0.05, rather than genes with a nominal P < 0.05 – did not alter the pattern of overlap with transcriptional programs (data not shown).
Functional over-representation test (gene set overlap)
The degree of overlap between pairs of gene sets was evaluated using Fisher’s Exact test, where the background set consisted of all genes expressed in either WT or DLG2-/- lines (allWT+KO). This was used for GO terms; the analysis of regulator targets (Fig. 4e); and the overlap between LoFi genes and transcriptional programs (Fig. 6a). In order to identify a semi-independent subset of over-represented annotations from the output of GO term tests (Fig. 4f, Supplementary Table 4, 5), we used an iterative refinement procedure. Briefly, we selected the gene set with the largest enrichment odds ratio; removed all genes in this set from all other over-represented annotations; re-tested these reduced gene-sets for over-representation in 30down-/- genes; then discarded gene-sets with P ≥ 0.05 (after Bonferroni-correction for the number of sets tested in that iteration). This process was repeated (with gene-sets being cumulatively depleted of genes at each iteration) until there were no remaining sets with a corrected P < 0.05.
Common variant association
All common variant gene-set enrichment analyses were performed using the competitive gene-set enrichment test implemented in MAGMA version 1.07, conditioning on allWT+KO using the condition-residualize function. To test whether GO terms (Fig 4f, Supplementary Table 6) or regulator targets (Fig. 4e) enriched in a specific program captured more or less of the SZ association in these programs than expected, a two-sided enrichment test was performed on term/target genes within the program, conditioning on allWT+KO and on all genes in the program. All other GWAS enrichment tests were one-sided. To test whether common variant enrichment differed between two gene-sets, we took the regression coefficient β and its standard error SE(β) for each gene-set from the MAGMA output file and compared z = d/SE(d) to a standard normal distribution, where d = β1 – β2 and SE(d) = √[SE(β1)2 + SE(β2)2]. Gene-level association statistics for schizophrenia were taken from Pardiñas et al., 201812; those for ADHD49, bipolar disorder50 and Alzheimer’s disease53 were calculated using the MAGMA multi model, with a fixed 20,000 permutations for each gene. Prior to analysis, SNPs with MAF < 0.01 or INFO score < 0.6 were removed from the bipolar GWAS, bringing it into line with the other datasets.
Rare variant association
The de novo LoF mutations for SZ analysed here are described in Rees et al., 202016. De novo LoF mutations for NDD, ASD and unaffected siblings of individuals with ASD were taken from Satterstrom et al., 202047: these were re-annotated using VEP75 and mutations mapping to > 2 genes (once readthrough annotations had been discarded) were removed from the analysis. A two-sided Poisson rate ratio test was used to evaluate whether the enrichment of de novo LoF mutations in specific gene-sets was significantly greater than that observed for all other expressed genes (using allWT+KO). The expected rate of de novo LoF mutations in a set of genes was estimated using individual gene mutation rates76.
Migration assay
Cells were cultured and differentiated to cortical projection neurons as previously described. Neuronal migration was measured during a 70-hour period from day 40 by transferring cell culture plates to the IncuCyte Live Cell Analysis System (Sartorius). Cells were maintained at 37°C and 5% CO2 with 20X magnification phase contrast images taken of whole wells in every 2 hours for the analysis period. The StackReg plugin77 for ImageJ was used to fully align the resulting stacks of time lapse-images after which the cartesian coordinates of individual neuronal soma were recorded over the course of the experiment, enabling the distance and speed of neuronal migration to be calculated. Data sets (Fig. 3f, g) were analysed by unpaired two-tailed Student’s t-test.
Morphology analysis
Cells were differentiated to cortical projection neurons essentially as described and neuronal morphology assessed at days 30 and 70. To generate low density cultures for analysis, cells were passaged at either day 25 or 50 using 15-minute Accutase solution (Sigma) dissociation followed by plating at 100,000 cells per well on 24 well culture plates. 72 hours prior to morphology assessment cells were transfected with 500ng pmaxGFP (Lonza) per well using Lipofectaime 3000 Reagent (Thermo Fisher Scientific) and Opti-MEM Reduced Serum Media (Thermo Fisher Scientific) for the preparation of DNA-lipid complexes. At days 30 or 70, cells were fixed in 4% paraformaldehyde (PFA, Sigma) in PBS for 20 minutes at 4°C before mounting with Dako Fluorescence Mounting Medium (Agilent) and glass coverslips. Random fields were imaged using a DMI6000B Inverted microscope (Leica) and the morphology of GFP expressing cells with a clear neuronal phenotype quantified using the Neurolucida 360 (MBF Bioscience) neuron tracing and analysis software package. Data sets (Fig. 3a-d) were analysed by two-way ANOVA with post hoc comparisons using Bonferroni correction, comparing to WT controls.
Electrophysiology
Whole cell patch clamp electrophysiology was performed on cells cultured on 13mm round coverslips and the most morphologically mature neurons were patched in each culture; hence the most comparable subpopulation of cells from each genotype was compared. On day 20 of hESC differentiation, 250,000 human neural precursor cells from WT and KO hESCs were dissociated and plated on each PDL-coated coverslip in 30µl diluted (20x) matrigel (Corning) together with 20,000 rat primary glial cells. Postnatal day 7-10 old Sprague-Dawley rats (Charles River) bred in-house were sacrificed via cervical dislocation and cortex was quickly dissected. Tissues were dissociated using 2mg/ml papain and plated in DMEM supplemented with 10% Foetal bovine serum and 1% penicillin/streptomycin/Amphotericin B and 1x Glutamax (all Thermo Fisher Scientific). Microglia and oligodendrocyte precursor cells were removed by shaking at 500 rpm for 24 hours at 37°C. All animal procedures were performed in accordance with Cardiff University’s animal care committee’s regulations and the European Directive 2010/63/EU on the protection of animals used for scientific purposes. Plated cells were fed with BrainPhys medium (Stem cell Technologies) supplemented with 1x B27 (Thermo Fisher Scientific), 10ng/ml BDNF (Cambridge Bioscience) and 200µM ascorbic acid (Sigma). To stop the proliferation of cells, 1x CultureOne (Thermo Fisher Scientific) was supplemented from day 21. For postsynaptic current experiment, coverslips were transferred to a recording chamber (RC-26G, Warner Instruments) and perfused with HEPES Buffered Saline (HBS) (119 mM NaCl; 5 mM KCl; 25 mM HEPES; 33 mM glucose; 2mM CaCl2; 2mM MgCl2; 1µM glycine; 100µM picrotoxin; pH 7.4), at a flow rate of 2-3 ml per minute. Recordings were made using pipettes pulled from borosilicate glass capillaries (1.5 mm OD, 0.86 mm ID, Harvard Apparatus), and experiments were performed at room temperature (∼20 °C). mEPSC recordings were made using recording electrodes filled with a Cs-based intracellular filling solution (130 mM CsMeSO4; 8 mM NaCl; 4 mM Mg-ATP; 0.3 mM Na-GTP, 0.5 mM EGTA; 10 mM HEPES; 6 mM QX-314; with pH 7.3 and osmolarity ∼295 mOsm). Cells were voltage clamped at −60 mV using a Multiclamp 700B amplifier (Axon Instruments). Continuous current acquisition, series resistance and input resistance were monitored and analysed online and offline using the WinLTP software78 (http://www.winltp.com). Only cells with series resistance <25 MΩ with a change in series resistance <10% from the start were included in this study. Data were analysed by importing Axon Binary Files into Clampfit (version 10.6; Molecular Devices). A threshold function of >12 pA was used to identify mEPSC events, which were then subject to manual confirmation. Results were outputted to SigmaPlot (version 12.5, Systat Software), where analysis of peak amplitude and frequency of events was performed. The current clamp was used to record resting membrane potential (RMP) and action potentials (AP). Data were sampled at 20kHz with a 3 kHz Bessel filter with MultiClamp 700B amplifier. Coverslips were transferred into the recording chamber maintained at RT (20-21°C) on the stage of an Olympus BX61W (Olympus) differential interference contrast (DIC) microscope and perfused at 2.5ml/min with the external solution composed of 135mM NaCl, 3mM KCl, 1.2mM MgCl2, 1.25mM CaCl2, 15mM D-glucose, 5mM HEPES (all from Sigma), and pH was titrated to 7.4 by NaOH. The internal solution used to fill the patch pipettes was composed of 117mM KCl, 10mM NaCl, 11mM HEPES, 2mM Na2-ATP, 2mM Na-GTP, 1.2mM Na2-phosphocreatine, 2mM MgCl2, 1mM CaCl2 and 11mM EGTA (all from Sigma), and pH was titrated to 7.3 by NaOH. The resistance of a patch pipette was 3–9 MΩ and the series resistance component was fully compensated using the bridge balance function of the instrument. The RMP of cells was recorded immediately after breaking into the cells in gap free mode. A systematic current injection protocol (duration, 1 s; increment, 20 pA; from - 60pA to 120pA) was applied to the neurons held at −60mV to evoke APs. Input resistance (Rin) was calculated by Rin=(Vi-Vm)/I, where Vi is the potential recorded from -10pA current step. The AP properties are measured by the first over shooting AP. Further analysis for action potential characterization was carried out by Clampfit 10.7 software (Molecular Devices).
Statistical analysis and data presentation
Unless specifically stated in each methodology section, GraphPad Prism (version 8.3.0) was used to test the statistical significance of the data and to produce the graphs. Stars above bars in each graph represents Bonferroni-corrected post hoc tests, *P<0.05; **P<0.01; ***P<0.001; ****P<0.0001 vs. WT control. All phenotypic validation results were from a minimum of two independent differentiations unless otherwise stated, within a given differentiation triplicate samples were used per cell line at each time point investigated. All data presented as mean ± SEM.
Bonferroni test correction
GO term analyses were corrected for the ∼4,200 terms tested.
Fig. 2a – corrected for 8 tests (up & down regulated x 4 timepoints)
Fig. 4a – corrected for 6 tests (up & down regulated x 3 pairs of timepoints) Fig. 4c – corrected for 7 tests (7 gene expression programs)
Fig. 4d (Poverlap) – corrected for 21 tests (7 regulator target sets x 3 programs)
Fig. 4d (PGWAS) – corrected for 10 tests (10 over-represented sets taken forward for genetic analysis)
Fig. 4e – corrected for 16 tests (16 semi-independent over-represented GO terms) Fig. 6a – corrected for 4 tests (4 programs)
Fig. 6b – uncorrected (secondary, exploratory analysis: results in bold survive correction for 7 tests)
Fig. 6c – each disorder (& SIB controls) corrected for 7 tests (4 LoFi + 3 non-LoFi gene-sets) Fig. 6e – each disorder corrected for 7 tests (4 LoFi + 3 non-LoFi gene-sets)
Fig. 7 – each program (early-transient-/-, early-stable-/-, early-increasing-/-) corrected for 6 tests Supplementary Table 6 – corrected for 13 tests (13 semi-independent over-represented GO terms)
Supplementary Table 7 – each disorder corrected for number of GO terms analysed (13 early-stable-/-, 16 early-increasing-/-)
Supplementary Table 8 – each disorder corrected for number of GO terms analysed (4 early-transient-/-, 13 early-stable-/-, 16 early-increasing-/-)
Supplementary Table 9 – each program (early-transient-/-, early-stable-/-, early-increasing-/-) corrected for 6 tests
Data usage acknowledgements
We thank the International Genomics of Alzheimer’s Project (IGAP) for providing summary results data for AD common variant analysis. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. IGAP was made possible by the generous participation of the control subjects, the patients, and their families. The i–Select chips was funded by the French National Foundation on Alzheimer’s disease and related disorders. EADI was supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2 and the Lille University Hospital. GERAD was supported by the Medical Research Council (Grant n° 503480), Alzheimer’s Research UK (Grant n° 503176), the Wellcome Trust (Grant n° 082604/2/07/Z) and German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant n° 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by the NIH/NIA grant R01 AG033193 and the NIA AG081220 and AGES contract N01–AG–12100, the NHLBI grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by the NIH/NIA grants: U01 AG032984, U24 AG021886, U01 AG016976, and the Alzheimer’s Association grant ADGC–10–19672
We thank the research participants and employees of 23andMe for the sharing of summary statistics for MDD common variant analysis.
Author contributions
Conceptualization, AP, ES
Methodology, AE, DB, AP, ES
Software/Data curation, DDA, AP
Formal analysis/Investigation, BS, DDA, MC, TS, ER, YZ, GC, SL, AFP, DW, AP, ES
Writing – Original Draft, BS, DDA, MC, YZ, DW, AP, ES Writing –
Review & Editing, BS, DDA, MC, ER, MOD, MO, AE, DB, DW, AP, ES
Visualization, BS, DDA, MC, YZ, DW, AP, ES
Supervision, AP, ES
Funding acquisition, AH, MOD, MO, AP, ES
Competing Interests statement
DDA, YZ, AH, WG, MOD, MO, AP are supported by a collaborative research grant from Takeda (Takeda played no part in the conception, design, implementation, or interpretation of this study). The other authors report no financial relationships with commercial interests.
Correspondence and requests for materials
should be addressed to E.S or A.J.P.
Data availability
RNAseq data generated by this study have been deposited in the European Nucleotide Archive with the accession number PRJEB35773. Additional data that support the findings of this study but are not included in the manuscript, figures and supplementary information are available from the corresponding author upon request.
Code availability
All publicly available software utilised are noted in Methods. The custom R script used to perform GO term over-representation and refinement analyses is available from GitHub (https://github.com/ajp-cdf/Gene-set-over-representation-refinement).
Supplemental Information
Supplementary Table 1. CRISPR/Cas9 off-target validation
Supplementary Table 2. DLG2 unique peptides (LC-MS/MS of day 30 and 60 samples)
Supplementary Table 3. Differential gene expression (KO v WT and successive WT timepoints)
Supplementary Table 4. GO over-representation analysis (KO vs WT day 30 down-regulated genes)
Supplementary Table 5. GO over-representation analysis (neurogenic transcriptional programs)
Supplementary Table 6. Schizophrenia GWAS enrichment (GO terms over-represented in early-stable-/-)
Supplementary Table 7. GWAS enrichment in GO terms over-represented amongst earlystable-/- and early-increasing-/-
Supplementary Table 8. De novo LoF enrichment in GO terms over-represented amongst early-stable-/- and early-increasing-/-
Supplementary Table 9. Expression of neurogenic programs acrosss human in vivo neurodevelopmental cell-types
Supplementary Discussion
Acknowledgments
This work was supported by Wellcome Trust Strategic Award (100202/Z/12/Z), MRC programme grant (G08005009), MRC Centre grant (MR/L010305/1), Waterloo Foundation ‘Changing Minds’ programme and start-up funding from the Neuroscience and Mental Health Research Institute, Cardiff University. We acknowledge excellent technical support for RNA sequencing from Joanne Morgan (MRC Centre) and MS analysis from Lydia Kiesel (University of Sheffield) and assistance in morphology tracing from Sophie Pocklington. We appreciate excellent general lab support from Emma Dalton, Trudy Workman and Olena Petter. We thank Prof. Meng Li for her advice and Dr. Claudia Tamburini for technical support in the initial stages of the project and Profs. Yves Barde and Lesley Jones for helpful comments on the manuscript and Emily Adair for providing rat primary glial cells. For data usage acknowledgements, see Methods.