Introduction

The aggregation of psychiatric conditions and symptoms in families has long been recognized,1, 2, 3, 4, 5 with more recent genetic analyses suggesting an overlap between a number of disorders.1, 6, 7, 8, 9 Recent studies considering single-nucleotide polymorphism-based genetic correlation demonstrated a marked correlation between schizophrenia (SCZ) and bipolar disorder (BPD) and to a lesser extent between SCZ and autism spectrum disorder,1 suggesting shared genetic etiologies. However, because of limited brain-tissue availability, there have been fewer studies at the level of gene expression. We and others hypothesize that gene expression studies may begin to unravel how genetic correlations may functionally overlap in neuropsychiatric disorders.

In a recent publication, Zhao et al.10 suggested that SCZ and BPD show concordant differential gene expression (R=0.28) and that the genes contributing to this overlap are enriched for genetic association signal in both SCZ and BPD while highlighting several biological pathways.10 Two separate recent studies of gene expression in autism (AUT) have resolved gene expression changes related to altered synaptic and neuronal signaling as well as immunological differences in AUT-affected brains.11, 12 In particular, a marked increase was observed in gene expression related to alternative activation of the innate immune system, or the M2 response in AUT-affected brains, relative to controls.12

Here we set out to analyze RNA-sequencing (RNA-Seq) data in combination from AUT, SCZ and BPD to identify cross-disorder transcriptomic relationships. We highlight the highly correlated nature of the SCZ and AUT transcriptomes, which together demonstrate a downregulation of genes involved in neurotransmission and synapse regulation across the two disorders.

Materials and methods

AUT sample information

RNA-Seq for 104 cortical brain-tissue samples across three brain regions (BA10, BA19 and BA44/45), comprising 57 samples from 40 control subjects and 47 samples from 32 AUT subjects, was previously carried out.12 We note that, as in the initial publication of these data,12 AUT samples harboring copy number variants recurrent in AUT spectrum disorder have not been included in these analyses. Details related to samples, sequencing, quality control and informatics can be found in Gupta et al.12 and are summarized in Supplementary Table 1.

SCZ and BPD sample information

RNA-Seq data were obtained from the Stanley Medical Research Institute (SMRI, http://www.stanleyresearch.org/) consisting of 82 (31 SCZ, 25 BPD and 26 controls) anterior cingulate cortex (BA24) samples. Detailed sequencing information can be found in Zhao et al.10 Sample information for those included in this analysis can be found in Supplementary Table 2.

RNA-Seq, alignment and quality control

Sequencing, alignment, quality control and gene expression estimation for the AUT samples were carried out as previously described.12 The reads from both the AUT and SMRI sequencing were subjected to a common analysis pipeline12 in which quality control of raw sequences included removing both polyA stretches and adaptor sequence contamination using a Python script, ‘cutadapt’ (v1.2.1).13 Sequences were then aligned to the Genome Reference Consortium Human build 37 (GRCh37/hg19) assembly using TopHat2 (refs 14, 15), allowing for only uniquely aligned sequences with fewer than three mismatches to align.

Gene expression estimation and normalization

Gene count estimates were obtained for 62 069 Ensembl gene annotations (GRCh37/hg19) using HTSeq (http://www-huber.embl.de/users/anders/HTSeq/) under an intersection-strict model. Of these, 8856 genes with at least 10 reads across 75% of the SMRI samples were then normalized for gene length and GC content using conditional quantile normalization.16 In the AUT samples, the 13 262 genes previously included for analysis12 were normalized for gene length and GC content using conditional quantile normalization. Outliers were then removed from the conditional quantile normalization-normalized gene expression estimates on a per-gene basis as described previously.17 In either data set, any sample whose gene expression value was more than 2.7 s.d. from the mean of the gene expression was excluded from analysis at that particular gene before linear modeling.

Differential gene expression analysis

Due to the unique experimental design in which multiple brain regions were sequenced from the same individual, AUT gene expression estimates were fit using a linear mixed effects model, with subject ID included as a random intercept term, and case–control status as the primary variable of interest. Age, sex, site of sample collection, brain region and 12 surrogate variables (SVs)18 were included as fixed effects in the model to account for known and unknown covariates. SVs function to remove batch effects and sources of noise in gene expression data by adjusting for unknown or unmodeled sources of variation and are therefore included for analysis.18

SCZ and BPD RNA-Seq data were analyzed using standard linear regression, with case–control status as the primary variable of interest. The known covariates to which we had access and that were included in the analysis by Zhao et al.10 (age, sex, cumulative antipsychotic use, brain pH and post-mortem interval) were incorporated into the model here along with SVs to account for unknown sources of variation.

Because the SCZ and BPD cases share controls, two separate differential gene expression analyses were performed. For the comparison with AUT, all cases (SCZ or BPD) and all controls from the SMRI data set were included in the analysis. Alternatively, when SCZ and BPD were to be compared directly, we employed a strategy similar to how these data were handled previously, in which controls were divided randomly into half.10 One set of controls was then compared with the SCZ cases, whereas the other set of controls was compared with the BPD cases. This procedure was carried out 100 times for each cross-disorder comparison, and the Z-scores (effect size/standard error) were recorded for each gene for each run. The median Z-score for each gene across these 100 runs was then used for analyses comparing SCZ with BPD.

Null differential gene expression analysis

To obtain a null set of differential gene expression values, each of the analyses in the previous section was carried out modeling the data exactly as described above, save for the permutation of case–control status. In AUT data sets, the case–control status was randomized between samples from the same collection sites, as described previously.12 To minimize the possibility of reporting false-positive findings, 1000 null permutations were utilized to determine significance.

Calculating genes differentially expressed across disorders

To determine which genes were differentially expressed across disorders, Z-scores were multiplied across each of the three disorder comparisons (ZSCZ* ZBPD, Z SCZ* ZAUT, ZBPD* ZAUT). Genes with large cross-disorder Z-scores were considered to be differentially expressed across disorders, with significance determined by permutation. For each cross-disorder comparison, the most extreme cross-disorder Z-score for each of these 1000 null permutations was recorded. Of these values, the cross-disorder cutoff for significance (defined at P<0.05) to determine which genes were differentially expressed across disorders was determined by taking the value for which only 5% of the null values were more extreme.

To determine differentially and concordantly expressed genes (DCEGs) common to all three disorders, Z-scores were multiplied for the 2895 genes with Z-scores in the same direction across all three disorders (ZAUT* ZSCZ* ZBPD). As SCZ and BPD are directly compared in the analysis, split-control-generated Z-scores for SCZ and BPD were utilized to account for the shared control samples. To assess significance, the same analysis was carried out with 1000 null permutations as described above.

Calculating the correlation of DCEGs across phenotypes

Pearson’s correlation coefficient (R) was calculated for the Z-scores from each disorder comparison (SCZ–AUT, SCZ–BPD and BPD–AUT) to assess the similarity of genes differentially expressed across disorders. To determine the significance of this correlation, Pearson’s correlation coefficient was calculated after testing each of the 1000 null permutations.

Pathway analysis of DCEGs

Pathway enrichment analysis was carried out on genes differentially expressed across disorders. Gene Ontology (GO) gene sets were downloaded from MsigDB (1466 gene sets, http://www.broadinstitute.org/gsea/msigdb/collections.jsp#C5). For each gene and across all three disease comparisons, Z-scores were summed across disorders using the Stouffer’s method19 and pathways were tested for enrichment (details can be found in Supplementary Methods). Significance was determined empirically by permutation for each cross-disorder comparison (1.51 × 10−4 for AUT–SCZ, 1.72 × 10−4 for AUT–BPD and 4.25 × 10−6 for SCZ–BPD).

As a complementary approach, we utilized the following two open source programs for pathway analysis: WebGestalt (v2, http://bioinfo.vanderbilt.edu/webgestalt/)20, 21 to run a GO analysis22, 23 and Database for Annotation, Visualization and Integrated Discovery (DAVID)24 (v6.7, https://david.ncifcrf.gov/) for functional pathway analysis. As the input for these approaches requires gene lists, we input genes that were differentially expressed (absolute value (Z-score) >2.2) in both disorders of the comparison: (1) SCZ–AUT (191 genes), (2) BPD–AUT (38 genes) and (3) SCZ–BPD (16 genes).

GO analysis used a hypergeometric test for enrichment utilizing the Benjamini–Hochberg method25 for multiple test corrections. GO categories whose adjusted P-values<0.001 were considered to be statistically significantly enriched. For DAVID, gene lists were uploaded and a ‘Functional Annotation Chart’ was generated using default settings. Functional categories whose Bonferroni-adjusted P-value<0.05 were reported as significant.

To ensure that results from these analyses were not biased by the different number of gene input into the pathway analysis, we also carried out the GO and DAVID analyses described above with a fixed number of 191 genes from each cross-disorder comparison.

Enrichment for genetic signal analysis

Genome-wide association study (GWAS) results were downloaded from the Psychiatric Genetic Consortium (http://www.med.unc.edu/pgc/) for AUT, BPD and SCZ.

Gene-based P-values were computed on the summary data for each disorder using FAST (v1.8)26 for the 8856 genes included in the cross-disorder differential gene expression analysis (DGEA). Details for settings used can be found in the Supplementary Methods. To test for enrichment of genetic signal, we first took suggestive genes (gene-based P<0.05) for each individual GWAS (SCZ, BPD and AUT) and compared these to P-values from the DGEA. Data were plotted in a QQ-plot among 100 null permutations to look for enrichment relative to the null data. To ensure that this analysis was not a reflection of the gene-based P-value restriction imposed on the data, a more permissive (P<0.1) and more restrictive (P<0.01) GWAS cutoff were used and the same enrichment analysis carried out.

Code availability

Codes used throughout for data processing, quality control and analysis are available from corresponding author.

Results

Sample summary

Of the 105 samples in the SMRI array collection, 82 cortical brain samples (BA24) were sequenced and included for analysis (31 SCZ, 25 BPD and 26 controls). To accompany these data, 104 AUT samples from three cortical brain regions (BA10, BA19 and BA44/45) were included for analysis, composed of 57 control and 47 AUT samples. A summary of sample statistics is provided in Table 1 with detailed sample information in Supplementary Tables 1 and 2 for AUT and SMRI data, respectively. Further sample information can be found in the original publications.10, 12

Table 1 Sample summary

Genes differentially expressed across SCZ, BPD and AUT

Nine genes were differentially expressed (P<0.05) in both SCZ and AUT. None were significant when comparing BPD to SCZ, and one gene reached significance in the AUT–BPD comparison (Table 2 and Supplementary Tables 3 and 4). We note that the single gene differentially expressed between AUT–BPD, IQSEC3, is significant in both AUT–SCZ and AUT–BPD comparisons. The relatively large Z-scores in SCZ (Z=−3.59) and BPD (Z=−3.46) suggest that this result is not simply driven by the altered gene expression in AUT alone.

Table 2 Genes significantly differentially expressed across disorders

Differentially expressed genes (DEGs) across all three disorders were identified in a joint analysis of genes whose direction of effect was consistent across all three disorders (ZAUT*ZSCZ*ZBPD). Two genes, IQSEC3 (Z=−35.45, P=0.001) and COPS7A (Z=−22.52, P=0.017), are transcriptome-wide significant (P<0.05, absolute value (ZAUT*ZSCZ*ZBPD)>19.56), indicating a common role for altered gene expression of these genes across all three neuropsychiatric disorders (Table 2 and Supplementary Figure 1). We note that these two genes, IQSEC3 and COPS7A, are syntenic (12p13.33 and 12p13.31, respectively) with their expression being markedly correlated in both the SMRI and AUT data sets (R=0.41 and R=0.70, respectively; Supplementary Figure 2).

Correlation in gene expression across SCZ, BPD and AUT

The transcriptomic relationship across disorders and correlation of test statistics (Z-scores) was investigated. SCZ–AUT demonstrated the most significant correlation (R=0.298, P<0.001). SCZ–BPD also demonstrated a positive correlation (R=0.11). This level of correlation was neither significant (P=0.41) nor as high as previously reported (R=0.28).10 Similarly, the correlation between AUT and BPD was minimal and did not differ significantly from the null (R=0.06, P=0.25; Figure 1, Supplementary Figure 3).

Figure 1
figure 1

Correlation of cross-disorder differential gene expression. Z-scores for each cross-disorder comparison ((a) AUT–SCZ (autism–schizophrenia), (b) AUT–BPD (AUT–bipolar disorder) and (c) SCZ–BPD) are plotted. The best fit line is in red. Pearson’s Correlation Coefficient (R) is included on the graph, quantifying the level of correlation between the transcriptomes of each cross-disorder comparison.

To explore the discrepancy between the correlation reported here for SCZ and BPD and that previously reported, we carried out the same analysis without the inclusion of SVs in the model. The failure to include unknown covariates in the model led to a marked increase in the correlation between SCZ and BPD (R=0.50), suggesting that the previously reported correlation between these disorders may have been influenced by hidden structure in the data (Supplementary Figure 4).

Pathway enrichment analyses of genes differentially expressed across disorders

Combined pathway analysis utilizing lists of genes differentially expressed across disorders (absolute value (Z-score)>2.2 in both disorders) was carried out using both GO enrichment and DAVID pathway analysis. For this analysis, 191 DEGs for AUT–SCZ, 38 for AUT–BPD and 16 for SCZ–BPD met these criteria. DAVID pathway analysis highlighted the role of neuron projection development (PBonferroni=0.012) in those genes differentially expressed in both AUT and SCZ (Table 3). Similarly, when these genes were characterized by GO, there was a clear abundance of altered gene expression in neuronal and synapse-related GOs (Figure 2). Further, when these DEGAUT–SCZ genes were split up into those either concordantly up- or downregulated in both disorders, 106 genes differentially downregulated in both disorders were driving the GO enrichments, with no contribution from the 69 genes upregulated in both disorders (Supplementary Figure 5). As for AUT–BPD comparisons, there were no enrichments detected for any gene ontologies, and the only emergent DAVID pathway was genes related to phosphoproteins (PBonferroni=1.2 × 10−4; Table 3). Similarly, no GO or DAVD pathways were found to be significant for DEGSCZ–BPD. Substantially, similar results were observed when the number of genes from each cross-disorder comparison input into the pathway analysis was fixed rather than imposing a Z-score cutoff (Supplementary Table 5 and Supplementary Figures 6). Finally, we found that the number of cross-disorder discordant DEGs (upregulated in one disorder but downregulated in the other) differs across the three comparisons, such that there are fewer discordant cross-disorder DEGs (16/191, 8.4%) in the comparison between SCZ and AUT than in the comparison between AUT and BPD (76/191, 39.8%) or between SCZ and BPD (38/191, 19.9%), further supporting the transcriptomic similarities between AUT and SCZ.

Table 3 DAVID pathway analysis for cross-disorder DEGs
Figure 2
figure 2

Gene Ontology (GO) analysis of cross-disorder DEGAUT–SCZ. Genes differentially expressed in both autism (AUT) and schizophrenia (SCZ; absolute (Z-score)>2.2) were analyzed for ontological enrichment of biological processes, developmental processes and cellular component. Onotological categories with at least five genes and an adjusted P-value<0.001 are highlighted in red. This tree highlights the role of nerve impulse transmission, synaptic transmission and neurotransmitter transport in those genes differentially expressed in both AUT and SCZ.

Traditional pathway analysis requires a significance cutoff for the gene input for analysis. To avoid a potential bias by choosing an arbitrary cutoff, we used a Z-score-based approach (see Methods) and identified gene enrichment of DCEGs common to all three disorder comparisons using the GO data from MSigDB. Three GO pathways—each of which indicated some enrichment for altered gene expression in transporter genes—were enriched for DEGs in both AUT and SCZ. No pathways were study-wide significant in the other two disorder comparisons (Supplementary Table 6).

Cross-disorder DEG enrichment in association signals

To test whether genes differentially expressed across disorders were enriched for genetic associations, we compared cross-disorder DGEA results to gene-level GWAS results. We first directly compared gene-based GWAS P-values (P<0.05) from each individual GWAS (AUT, SCZ and BPD) to P-values from the cross-disorder DGEA (AUT–SCZ, AUT–BPD and SCZ–BPD). No comparison was identified that would suggest any enrichment in signal overlap with respect to the null (Supplementary Figure 8). Three additional P-value cutoffs (P<0.1, P<0.01 and P<1) demonstrated that neither these null findings nor the inflation seen are a function of the gene-based P-value cutoff imposed on the data (Supplementary Figures 9–11, Supplementary Table 7). Likewise, there were no enrichments for cross-disorder DEGs seen in these analyses relative to the null. Finally, loss of function variants have recently been reported in a number of AUT studies;27, 28, 29, 30, 31, 32 however, Gupta et al. demonstrated that these gene expression data are neither enriched for the findings from the exome studies nor for SVs.12 Accordingly, these lists of variants have not been included in these analyses.

Discussion

To our knowledge, this is the first study to combine next-generation sequencing gene expression analyses across AUT, SCZ and BDP to assess the transcriptomic relationship and how gene expression relates to GWAS findings. We report that, at the transcriptome level, AUT and SCZ demonstrate a highly overlapping gene expression profile. The cross-disorder DEGs between AUT and SCZ highlight a shared relationship in synapse and projection formation, suggesting a role for neuronal development underlying the correlation. Further, despite the lack of global significant differential transcriptomic correlation between either BPD and SCZ or AUT and BPD, we highlight two genes, IQSEC3 and COPS7A, for their consistent downregulation across all three disorders, and support further investigation into these specific genes’ expression and function to better understand their role in neuropsychiatric disorders. Finally, we report that the genes differentially expressed across disorders were not enriched in genetic association signals for AUT, SCZ or BPD.

Correlations in differential gene expression across disorders highlight similarities between AUT and SCZ

After modeling the data for each individual-disorder comparison relative to their controls, the cross-disorder comparison demonstrated that SCZ and AUT share a similarly altered transcriptome (P<0.001), whereas AUT–BPD and SCZ–BPD (P=0.25 and P=0.41, respectively) do not show a significant correlation (Figure 1; Supplementary Figure 3). We note that the lack of significant correlation between BDP–SCZ in our analysis is in conflict with a previous report,10 and is likely because of our inclusion of SVs to account for unknown sources of variation, suggesting that the previously reported analysis of these data is overstated (see Supplementary Discussion). Further, although these data do not directly support transcriptomic overlap between SCZ and BPD, this is likely reflective of the shared control design of the experiment. This experimental design results in a smaller effective sample size and a study underpowered to assess overlap between these two disorders. Given the genetic relationship between these disorders (where SCZ–BPD>AUT–SCZ>AUT–BPD),1 future work utilizing a larger sample for analysis may likely demonstrate a shared transcriptomic profile between SCZ and BPD; however, these data do not.

Analyzing the pathways in which DEGs in both SCZ and AUT were involved, we found that the genes differentially expressed in AUT and SCZ were enriched for neuron projection development (P=0.012, Table 3). In addition, there was a clear enrichment for genes involved in synaptic and neuronal processes. The other two nonsignificant cross-disorder comparisons (AUT–BPD and SCZ–BPD) failed to demonstrate any enrichment for biological process ontology, even when controlling for the number of cross-disorder DEGs, further supporting the conclusion that differential transcriptomic correlation is biologically relevant between SCZ and AUT but is not observed in the other two cross-disorder comparisons (Supplementary Figure 6). When the DEGs across AUT and SCZ were broken down into those concordantly upregulated versus those concordantly downregulated, the enrichment in GO was only present in those genes concordantly downregulated (Supplementary Figure 5), suggesting that these synaptic and neuronal alterations were a result of decreased brain expression in both disorders.

Finally, in assessing which specific genes were differentially expressed across disorders, we identified IQSEC3 and COPS7A as differentially expressed in all three disorders (Table 2). IQSEC3 (KIAA1110) is a protein-coding gene that has been shown to be specifically expressed in the human adult brain with particularly high levels in the human cortex.33 IQSEC3 has been suggested to act as a guanine exchange factor for ARF1 in endocytosis,33 and ARF1 critically regulates actin dynamics in neurons and synaptic strength and plasticity, potentially aligning with pathways previously implicated in AUT, SCZ and BPD. COPS7A is expressed broadly across tissues,34 and encodes part of the COP9 signalosome, a multisubunit protease with a role in regulating the ubiquitin–proteasome pathway.35

Differences in genetic variation not explained by overlapping gene expression profiles

We report no enrichment for significant cross-disorder DEGs among GWAS signal in any of the comparisons (Supplementary Figure 8) relative to the null. These findings suggest either that (1) alterations at the genetic level do not largely manifest themselves in altered gene expression concordantly across these disorders or (2) that primary genetic defects do not result in altered gene expression across disorders at the time points measured but could, perhaps, alter gene expression at other time points, such as during development or (3) the effects of these genetic perturbations are small and that increased sample sizes will be required to detect these slight differences in cross-disorder altered gene expression. Regardless, large differences in gene expression across these disorders appear to be independent of known genetic variation in each of these disorders.

There were a number of limitations associated with our observations. As the analyses combine data across two studies with notable design differences in each (shared controls in the SMRI data, multiple brain regions from the same individual in the AUT data, limited ability to detect lowly expressed genes and comparison of different cortical brain regions), there was certainly variation unrelated to the disease state introduced into the differential gene expression analyses. However, we have controlled for this to the best of our ability by accounting for unknown covariates in all analyses and by determining all levels of significance relative to null permutations. Although we have controlled for the differences in experimental design in our analysis, we note that the reported overlap in AUT and SCZ was significant (P<0.001), despite the fact that different cortical brain regions were studied in the two data sets. Owing to this limitation, we hypothesize that our observed correlation between AUT and SCZ may underestimate the true transcriptomic correlation and that the similarities may be even more pronounced between AUT and SCZ, had the same brain regions been studied. Similarly, sequencing depths in these data sets are lower than many RNA-Seq data sets currently being published. Thus, whereas lowly expressed genes are not well-estimated here, their omission from analysis would only lead to false negatives—or genes missing from overlap. This does not detract for the findings, herein, but simply acknowledges that some genes may not be included in the analysis, herein. Conversely, we acknowledge that our power to detect correlation between SCZ and BPD is limited because of the smaller effective sample size, a consequence of the shared control design of the experiment and that, given a larger sample size, transcriptomic correlation between these two disorders may likely become evident and reflective of the known genetic relationships.1

With future studies employing larger sample sizes and more powerful characterizations, we will gain a better understanding of the transcriptomic relationships that are common and disparate among neuropsychiatric disorders. Besides providing context for how the altered genetic landscape of each disorder affects the brain, we hope that identification of common aspects underlying susceptibility might be novel targets to therapeutically address the underlying pathogenic mechanisms.