Abstract
Autism spectrum disorders (ASD) are a group of related neurodevelopmental diseases displaying significant genetic and phenotypic heterogeneity1–4. Despite recent progress in understanding ASD genetics, the nature of phenotypic heterogeneity across probands remains unclear5,6. Notably, likely gene-disrupting (LGD) de novo mutations affecting the same gene often result in substantially different ASD phenotypes. Nevertheless, we find that truncating mutations that affect the same exon frequently lead to strikingly similar intellectual phenotypes in unrelated ASD probands. Analogous patterns are observed for two independent proband cohorts and several other important ASD-associated phenotypes. These results suggest that exons, rather than genes, often represent a unit of effective phenotypic impact for truncating mutations in autism. The observed phenotypic effects are likely mediated by nonsense-mediated decay (NMD) of splicing isoforms, with autism phenotypes usually triggered by relatively mild (15-30%) decreases in overall gene dosage. We find that exons with biases towards prenatal and postnatal expression preferentially contribute to ASD cases with lower and higher IQ phenotypes, respectively. We further demonstrate that LGD mutations in the same exon usually lead to similar expression changes across human tissues. Therefore, analogous phenotypic patterns may be also observed in other genetic disorders.
In this study, we focused on severely damaging, so-called likely gene-disrupting (LGD) mutations, which include nonsense, splice site, and frameshift variants. We used genetic and phenotypic data, including exome de novo mutations and corresponding phenotypes of ASD probands7, for more than 2,500 families from the Simons Simplex Collection (SSC). De novo LGD mutations are observed at significantly higher rates in SSC probands compared to unaffected siblings8,9. This demonstrates a substantial contribution of these mutations to disease etiology in simplex ASD families8, i.e. families with only a single affected child among siblings. We primarily considered in the paper the impact of de novo LGD mutations on several well-studied intellectual phenotypes: full-scale (FSIQ), nonverbal (NVIQ), and verbal (VIQ) intelligence quotients8,10,11. Notably, these scores are standardized by age and normalized across a broad range of phenotypes7.
We first investigated the variability of intellectual phenotypes associated with de novo LGD mutations in the same gene. The IQ differences between probands with mutations in the same gene were slightly smaller than the differences between all pairs of probands. Specifically, the mean pairwise differences for probands with mutations in the same gene was 25.7 NVIQ points (~12% smaller compared to all pairs of ASD probands, Mann-Whitney U one-tail test P = 0.14; Supplementary Table 1). We next explored whether probands with LGD mutations at similar locations within the same gene resulted, on average, in more similar phenotypes (Supplementary Fig. 1). Indeed, IQ differences between probands with LGD mutations ≤ 1000 base pairs apart were significantly smaller than differences between probands with more distant mutations; ≤ 1 kbp NVIQ average difference 10.4 points; > 1 kbp average difference 28.6 points (MWU one-tail test P = 0.005). However, across the entire range of nucleotide distances between LGD mutations, we did not observe either a significant correlation or a monotonic relationship between IQ differences and mutation proximity (NVIQ Spearman’s ρ = 0.1 P = 0.4; Mann-Kendall one-tail trend test P = 0.5).
To explain the observed patterns of phenotypic similarity, we next considered the exon-intron structure of target genes. Specifically, we investigated truncating mutations affecting the same exon in unrelated ASD probands; we took into account LGD mutations in the exon’s coding sequence as well as disruptions of the exon’s flanking canonical splice sites, since such splice site mutations should affect the same transcript isoforms (Supplementary Fig. 2). Interestingly, the analysis of 16 unrelated ASD probands (8 pairs) with such mutations showed that they have strikingly more similar phenotypes (Fig. 1, red bars) compared to probands with LGD mutations in the same gene (Fig. 1, dark green bars); same exon FSIQ/NVIQ/VIQ average IQ difference 8.9, 8.3, 17.3 points, same gene average difference 28.3, 25.7, 34.9 points (Mann-Whitney U one-tail test P = 0.003, 0.005, 0.016). Because of well-known gender differences in autism susceptibility11–13, we also compared IQ differences between probands of the same gender harboring truncating mutations in the same exon (Fig. 1, orange bars) to IQ differences between probands of different genders; same gender FSIQ/NVIQ/VIQ average difference 5.4, 7.2, 12.2; different gender average difference 14.7, 10, 25.7 (MWU one-tail test P = 0.04, 0.29, 0.07). Thus, stratification by gender further decreases the phenotypic differences between probands with LGD mutations in the same exon. Notably, the phenotypic similarity only extended to mutations in the same exon. The average IQ differences between probands with LGD mutations in neighboring exons were not significantly different compared to mutations in non-neighboring exons (MWU one-tail test P = 0.6, 0.18, 0.8; Supplementary Fig. 3). The observed effects are also specific to LGD mutations; probands with either synonymous (P = 0.93, 0.97, 0.95; Supplementary Fig. 4) or missense (P = 0.8, 0.5, 0.8; Supplementary Fig. 5) mutations in the same exon were as phenotypically diverse as random pairs of ASD probands.
We next explored the relationship between phenotypic similarity and the proximity of truncating mutations in the corresponding protein primary sequences. This analysis revealed that probands with LGD mutations in the same exon often had similar IQs, despite being affected by truncating mutations separated by scores to hundreds of amino acids in protein sequence (Fig. 2a; Supplementary Fig. 6). Notably, probands with LGD mutations in the same exon were more phenotypically similar than probands with LGD mutations separated by comparable amino acid distances in the same protein (NVIQ distance-matched permutation test P = 0.002; Supplementary Fig. 7). We also investigated whether de novo mutations truncating a larger fraction of protein sequences resulted, on average, in more severe intellectual phenotypes. The analysis showed no significant correlations between the fraction of truncated protein and the severity of intellectual phenotypes (Fig. 2b); NVIQ Pearson’s R =0.05 (P = 0.35; Supplementary Fig. 8). We also did not find any significant biases in the distribution of truncating de novo mutations across protein sequences compared with the distribution of synonymous de novo mutations (Kolmogorov-Smirnov two-tail test P = 0.9; Supplementary Fig. 9). It is possible that the lack of the correlation between phenotypic impact and the fraction of truncated gene is due to the signal averaging across different proteins. Therefore, for genes with recurrent mutations, we used a paired test to investigate whether truncating a larger fraction of the same protein leads to more severe phenotypes. This analysis also showed no significant differences (average NVIQ difference 0.24 points; Wilcoxon signed-ranked one-tail test P =0.44). Using the Pfam database14 we also investigated whether mutations that truncate the same protein domain lead to more similar phenotypic differences. We found that mutations in different exons, even when truncating the same protein domain, resulted in phenotypes as different as due to random LGD mutations in the same gene (average NVIQ differences = 28.1; Supplementary Fig. 10).
The results presented above suggest that it is the occurrence of de novo LGD mutations in the same exon, rather than simply the proximity of mutation sites in nucleotide or amino acid sequence, that leads to similar phenotypic consequences. To explain this observation, we hypothesized that truncating mutations in the same exon usually affect, due to nonsense-mediated decay (NMD)15, the expression of the same splicing isoforms. Therefore, such mutations should lead to similar functional impacts through similar effects on overall gene dosage and the expression levels of affected transcriptional isoforms. To explore this mechanistic model, we used data from the Genotype and Tissue Expression (GTEx) Consortium16,17, which collected exome sequencing and human tissue-specific gene expression data from hundreds of individuals and across multiple tissues. Using ~4,400 LGD variants in coding regions and corresponding RNA-seq data, we compared the expression changes resulting from LGD variants in the same and different exons of the same gene (Fig. 3). For each truncating variant, we analyzed allele-specific read counts18 and then used an empirical Bayes approach to infer the effects of NMD on gene expression (see Methods). This analysis demonstrated that the average gene dosage changes were more than 7 times more similar for individuals with LGD variants in the same exon compared to individuals with LGD variants in different exons of the same gene (Fig 3a); 2.2% versus 17.3% average difference in overall gene dosage decrease (Mann-Whitney U one-tail test P < 2×10-16). Moreover, by analyzing GTEx data for each tissue separately, we consistently found drastically more similar dosage changes resulting from LGD variants in the same exons (Fig. 3a).
Distinct splicing isoforms often have different functional properties19,20. Consequently, LGD variants may affect phenotypes not only through NMD-induced changes in overall gene dosage, but also by altering the expression levels of different splicing isoforms. To analyze changes in the relative expression of specific isoforms, we used GTEx variants and calculated the angular distance metric between vectors describing isoform-specific expression changes (see Methods). This analysis confirmed that changes in relative isoform expression are significantly (~5 fold) more similar for LGD variants in the same exon compared to variants in different exons (Fig. 3b); 0.1 versus 0.46 average angular distance (Mann-Whitney U one-tail test P < 2×10-16). The results were also consistent across tissues (Fig. 3b). Overall, the analyses of GTEx data demonstrate that the changes in expression due to truncating variants in the same exon are indeed substantially more similar than the changes due to variants in different exons of the same gene.
Truncating variants in highly expressed exons should lead, through NMD, to relatively larger decreases in overall gene dosage. To confirm this hypothesis, we used RNA-seq data from GTEx to quantify the relative exon expression for each exon harboring a truncating variant. To calculate relative exon expression, we normalized GTEx expression values of each exon by GTEx expression values of the corresponding gene. Indeed, we observed a strong correlation between the relative expression levels of exons harboring LGD variants and the corresponding changes in overall gene dosage (Fig. 4; Pearson’s R = 0.69, P < 2×10-16; Spearman’s ρ = 0.81, P < 2×10-16; see Methods).
Notably, NMD-induced dosage changes may mediate the relationship between the expression levels of target exons and the corresponding phenotypic effects of truncating mutations. To investigate this relationship we used the BrainSpan dataset21, which contains exon-specific expression from human brain tissues. The BrainSpan data allowed us to estimate expression dosage changes resulting from LGD mutations in different exons of ASD-associated genes (see Methods). Notably, it is likely that there is substantial variability in the sensitivity of intellectual phenotypes to dosage changes across human genes. Therefore, to quantify the IQ sensitivities for genes with recurrent truncating mutations in SSC, we considered a simple linear dosage model. Specifically, we assumed that changes in probands’ IQs are linearly proportional to decreases in gene dosage; we further assumed the average neurotypical IQ (100) for wild type gene dosage. We restricted our analysis to LGD mutations predicted to cause NMD-induced gene dosage changes, i.e. we excluded mutations within 50 bp of the last exon junction complex22. Using this model, we estimated the sensitivity of IQs to dosage changes for each gene with recurrent truncating ASD mutations (Supplementary Fig. 11; see Methods). Calculated in this way, the IQ sensitivity for a gene is equal to the estimated phenotypic effect of a truncating mutation in an exon with average expression.
The aforementioned model revealed that mutation-induced dosage changes are indeed strongly correlated with the normalized phenotypic effects; NVIQ Pearson’s R = 0.63, permutation test P = 0.02; (Fig. 5a; Supplementary Fig. 12); very weak correlations were obtained using randomly permuted data, i.e. when truncating mutations were randomly re-assigned to different exons in the same gene (average NVIQ Pearson’s R = 0.18; see Methods). Since the heritability of intelligence is known to significantly increase with age23, we also investigated how the results depend on the age of probands. When we restricted our analysis to the older half of probands in SSC (median age 8.35 years), the strength of the correlations between the predicted dosage changes and normalized phenotypic consequences increased further; NVIQ Pearson’s R = 0.75; permutation test P = 0.019 (Fig. 5b; Supplementary Fig. 13). The strong correlations between target exon expression and intellectual ASD phenotypes suggest that, when genespecific effects are taken into account, a significant fraction (30%-40%) of the relative phenotypic effects of de novo LGD mutations can be explained by the resulting dosage changes in target genes.
Next, we evaluated the ability of our linear dosage model to explain the effects of LGD mutations on non-normalized IQs. For each gene with multiple truncating mutations, we used our regression model to perform leave-one-out predictions of each mutation’s effect on proband IQ scores (Fig. 5c, inset; see Methods). Notably, for LGD mutations that trigger NMD, the inference errors of the dosage model were significantly smaller than the differences in IQ scores between probands with LGD mutations in the same gene; NVIQ median prediction error 11.0 points; same gene median IQ difference 22.0 points; MWU one-tail test P = 0.014 (Fig. 5c; Supplementary Fig. 14). The inference based on probands of the same gender had significantly smaller errors compared to inferences based on probands of the opposite gender, confirming functional differences in ASD genetics between genders; same gender NVIQ median error 9.1 points; different gender median error 19.9 points (MWU one-tail test P = 0.018). Moreover, the inference errors decreased for older probands; for example, for probands older than 12 years, median NVIQ error 7.6 points (Fig. 5c, Supplementary Fig. 14 and 15).
Given that relative exon usage substantially changes across neural development21,24, we next investigated the relationship between developmental profiles of exon expression and ASD phenotypes. To that end, we sorted exons from genes harboring LGD mutations8 into four groups (quartiles) based on their developmental expression bias; the developmental bias was calculated as the fold change between prenatal and postnatal exon expression levels (Fig. 6a). We then analyzed the enrichment of LGD mutations in each exon group (see Methods). Notably, compared to exons with no substantial developmental bias, we found significant enrichment of LGD mutations not only in exons with a strong prenatal bias (binomial one-tail test P = 8×10-3, Relative Rate = 1.33), but also in exons with postnatal biases (P = 0.018, RR = 1.31) (Fig. 6b). To understand the origin of the observed biases, we stratified probands into lower (≤ 70) and higher IQ (> 70) cohorts (Fig. 6c). This analysis demonstrated that while LGD mutations associated with lower IQs were strongly enriched only in prenatally biased exons (binomial one-tail test P = 6×10-3, RR = 1.62), mutations associated with higher IQs displayed enrichment exclusively in postnatally biased exons (P = 0.05, RR = 1.27). These results reveal that mutations in exons with biases towards prenatal and postnatal expression preferentially contribute to ASD cases with lower and higher IQ phenotypes, respectively. Notably, the observed exon developmental biases for LGD mutations are not simply driven by biases at the gene level, as mutations associated with both higher and lower IQ phenotypes showed enrichment exclusively towards genes with prenatally biased expression (Supplementary Fig. 16).
Although we primarily analyzed the impact of autism mutations on intellectual phenotypes, similar dosage and isoform expression changes in affected genes may also lead to analogous patterns for other quantitative ASD phenotypes25,26. Indeed, for LGD mutations predicted to lead to NMD, we observed similar results for several other key phenotypes. Specifically, probands with truncating mutations in the same exon exhibited more similar adaptive behavior abilities compared to probands with mutations in the same gene (Fig. 7a, Supplementary Fig. 17); Vineland Adaptive Behavior Scales (VABS)27 composite standard score difference 4.7 versus 12.1 points (Mann-Whitney U one-tail test P = 0.017). In contrast, VABS differences between probands with truncating mutations in the same gene were not significantly smaller than for randomly paired probands (Fig. 7a, Supplementary Fig. 17); 12.1 versus 13.7 points (11% smaller; MWU one-tail test P = 0.23; Fig. 7a). Probands with truncating mutations in the same exon displayed more similar motor skills; the Purdue Pegboard Test, 1.2 versus 3.0 for the average difference in normalized tasks completed with both hands (MWU one-tail test P = 0.02; Supplementary Fig. 18; see Methods). Coordination scores in the Social Responsiveness Scale questionnaire were also more similar in probands with mutation the in the same exon; 0.6 versus 1.1 for the average difference in normalized response (MWU one-tail test P = 0.05; Supplementary Fig. 19).
Finally, we sought to validate the observed phenotypic patterns using an independent cohort of ASD probands. To that end, we analyzed an independently collected dataset from the ongoing Simons Variation in Individuals Project (VIP)28. The analyzed VIP dataset contained genetic information and VABS phenotypic scores for 41 individuals with de novo LGD mutations in 12 genes. Reassuringly, and consistent with our findings in SSC, probands from the VIP cohort with truncating de novo mutations in the same exon also exhibited strikingly more similar VABS phenotypic scores compared to probands with mutations in the same gene (Fig. 7a, Supplementary Fig. 20); VABS composite standard score difference 6.0 versus 12.4 (Mann-Whitney U one-tail test P = 0.014). Similar to the SSC cohort, LGD mutations in neighboring exons did not result in more similar behavior phenotypes; VABS composite standard score average difference 13.6 points (MWU one-tail test P = 0.6). The fraction of truncated proteins also did not show significant correlation with the VABS scores of affected probands (Pearson’s R = −0.08, P = 0.7). Overall, these results confirm the phenotypic patterns observed in the SSC cohort, indicating the generality of the reported findings.
Using VABS scores from both SSC and VIP, we next investigated whether, analogous to the IQ phenotypes (Fig. 3a), the similarity of VABS scores are primarily due to the presence of mutations in the same exon, rather than proximity of truncating mutations within the corresponding protein sequence. Indeed, LGD mutations in the same exon often resulted in similar adaptive behavior abilities even when the corresponding mutations were separated by hundreds of amino acids (Fig. 7b; Supplementary Fig. 21). By comparing mutations in the same exon to mutations separated by similar amino acid distances in the same protein, we confirmed that probands with mutations in the same exon were significantly more phenotypically similar (permutation test P = 3×10-4; Supplementary Fig. 22; see Methods).
Discussion
Previous studies explored phenotypic similarity in syndromic forms of ASD due to mutations in specific genes29–33. Nevertheless, across a large collection of contributing genes, the nature of the substantial phenotypic heterogeneity in ASD remains unclear. Our study reveals several main sources of the observed heterogeneity in simplex ASD cases triggered by highly penetrant truncating mutations.There is a substantial variability in the IQ sensitivity to dosage and isoform expression changes across human genes (Supplementary Fig. 11). We also estimate that, due to the imperfect efficiency of NMD, truncating mutations usually result in relatively mild changes in gene dosage, on average decreasing overall gene expression by ~15-30% (Supplementary Fig. 23; see Methods). Nevertheless, when genespecific sensitivities are taken into account, the relative phenotypic effects are significantly correlated with expression dosage changes, which depend on the target exon expression (Fig. 5). Furthermore, even perturbations leading to similar dosage changes in the same gene may result in diverse phenotypes, if different functional isoforms are affected. We found that the similarity of truncated isoforms between LGD mutations significantly correlated with phenotypic similarity (NVIQ Spearman ρ = −0.21, P = 0.02; VABS Spearman ρ = −0.19, P = 0.006; see Methods). When exactly the same set of isoforms are perturbed, as is the case for LGD mutations in the same exon, the phenotypic diversity in unrelated probands decreases even further (Fig. 1). For intellectual phenotypes, same exon membership between LGD mutations accounts for a substantially larger fraction of phenotypic variance than multiple other genomic features, including expression, evolutionary conservation, pathway membership, and domain truncation (see Methods). Overall, these results demonstrate that for de novo LGD mutations, exons, rather than genes, represent a unit of effective phenotypic impact. It is also likely that differences in genetic background and environment represent other important sources of phenotypic variability34–36. As the heritability of IQ phenotypes usually increases with age, it is reassuring that we observe a substantially higher correlation between phenotypes and gene dosage changes for older probands (Fig. 5b).
In the present study, we focused specifically on simplex cases of ASD, in which de novo LGD mutations are highly penetrant. In more diverse cohorts, individuals with LGD mutations in the same exon will likely display substantially greater phenotypic heterogeneity. For example, the Simons Variation in Individuals Project identified broad spectra of phenotypes associated with specific variants in the general population28,37–39. We also observed significantly larger phenotypic variability for probands from sequenced family trios, i.e. families without unaffected siblings (Supplementary Fig. 24). For these probands, the enrichment of de novo LGD mutations is likely to be substantially lower and the contribution from genetic background larger40, thus resulting in more pronounced phenotypic variability.
Our study may have important implications for precision medicine34,41,42. From a therapeutic perspective, compensatory expression of intact alleles, which has already been tested in mouse models of ASD43–45 and other diseases46, may provide an approach for alleviating phenotypic effects for at least a fraction of highly penetrant LGDs. From a prognostic perspective, our results suggest that by sequencing and phenotyping sufficiently large patient cohorts harboring truncating mutations in different exons of contributing ASD genes, it may be possible to understand likely phenotypic consequences, at least for a subset of cases resulting from highly penetrant de novo LGD mutations in simplex families. Furthermore, because we observe similar patterns of expression changes across multiple human tissues, medically relevant phenotypic analyses may be also extended to other disorders caused by highly penetrant truncating mutations.
Footnotes
Results supplemented with new findings, including an analysis of developmental expression patterns. Abstract revised.