Abstract
Tourette syndrome (TS) is highly heritable, although identification of its underlying genetic cause(s) has remained elusive. We examined a European ancestry sample composed of 2,435 TS cases and 4,100 controls for copy-number variants (CNVs) using SNP microarrays and identified two genome-wide significant loci that confer a substantial increase in risk for TS (NRXN1, OR=20.3, 95%CI [2.6-156.2], p=6.0 × 10−6; CNTN6, OR=10.1, 95% CI [2.3-45.4], p=3.7 × 10−5). Approximately 1% of TS cases carried one of these CNVs, indicating that rare structural variation contributes significantly to the genetic architecture of TS.
Tourette syndrome (TS) is a complex developmental neuropsychiatric disorder of childhood onset characterized by multiple motor and vocal tics, with an estimated population prevalence of 0.3-0.9%1. Twin and family-based studies of TS have repeatedly demonstrated that it is highly heritable (e.g., h2 of 0.77 in a recent analysis of the Swedish National Patient Register2), while analysis of genome-wide SNP data suggests that TS risk is highly polygenic and distributed across both common and rare variation3. To date, the TS samples with available genome-wide genotyping have been inadequately sized for common variant association studies of a complex trait. To further characterize the genetic influences on TS, we assessed the impact of rare CNVs on disease risk, as it has been shown repeatedly that such variants contribute to susceptibility for other heritable neurodevelopmental disorders, including intellectual disability (ID), autism spectrum disorder (ASD) and schizophrenia (SCZ)4.
We genotyped TS cases and ancestry-matched controls on the same genome-wide SNP array platform (Illumina OmniExpress, Supplementary Table 1a). Following standard quality control (QC) steps (Supplementary Table 1b, Supplementary Figure 1, and Supplementary Text), including genotype-based determination of ancestry (Supplementary Figure 2), we analyzed CNVs in a SNP dataset of 6,535 unrelated European ancestry samples, including 2,435 individuals diagnosed with TS (by DSM-IV-TR criteria) and 4,100 healthy controls. To improve specificity, we generated CNV calls with two widely-used Hidden Markov Model (HMM)-based segmentation algorithms, PennCNV5 and QuantiSNP6 and retained the intersection of CNVs detected by both methods. We also conducted an additional QC step to test for any differential sensitivity in CNV detection between cases and controls, both within and across batches and sites, by analyzing CNV calls from 11 common HapMap3 CNVs using a sensitive locus-specific intensity clustering method, generating a total of 4,758 non-reference CNV calls across all samples (Supplementary Figure 3). Comparison of these genotypes with our consensus HMM-based calls confirmed the absence of any differential bias in the sensitivity of CNV detection between phenotypic groups whether assessed across all loci (p=0.53, Fisher’s exact test) or between individuals (p=0.15, Welch’s t-test, Supplementary Table 4 and Supplementary Text).
In total, we resolved 8,365 rare (as defined by a minor allele frequency [MAF] < 1% across all samples [50% reciprocal overlap]) CNV calls of at least 50 kbps in length and spanning a minimum of 10 probes. We assessed global CNV burden in terms of the number of CNVs, total CNV length, and the number of genes affected by CNVs, stratified by CNV type, size, and frequency. When considering all CNVs, we observed a modest but significant enrichment in TS across all metrics of burden for single-occurrence events (or singletons, corresponding to a MAF of approximately 0.00015) only (OR per CNV of 1.09 [1.01-1.18], p=0.03; OR per 100kb of 1.022 [1.006-1.035], p=6 × 10−3; OR per gene of 1.016 [1.004-1.030], p=0.01; Supplementary Table 5). In general, CNV burden for TS was greater with increasing event size and rarity, with the most substantial effect seen among large singleton deletions (>1 Mb), with an OR per CNV of 2.82 [1.36-6.18], p=7 × 10−3 (Figure 1).
We next evaluated the dataset for possible enrichment of rare CNVs at specific loci, conducting a point-wise (segmental) test of association, treating deletions and duplications independently. As non-overlapping CNVs that affect the same gene would be unaccounted for by segmental assessments of enrichment, we also performed a complementary test collapsed on individual genes conditioned on exonic CNVs affecting protein-coding genes. In contrast to genome-wide association studies of SNPs, there is no generally accepted threshold to indicate genome-wide significance for CNVs. Therefore, for both tests, we established locus-specific (Plocus) and genome-wide corrected (Pcorr) p-values empirically through 1,000,000 permutations, using the max(T) method to control for familywise error rate (FWER)7. Both tests converged on the same two distinct loci, one for deletions and another for duplications, which were enriched among TS cases and survived correction for multiple testing.
For deletions, the peak segmental association signal was located at rs13418185 (Plocus=6 × 10−6, Pcorr=8.6 × 10−4, Figure 2a), corresponding to heterozygous losses across the first three exons of NRXN1, found exclusively among TS cases (N=10, Figure 2a and b). In the gene-based test of genome-wide exonic CNVs, NRXN1 deletions were the most significant association (Supplementary Table 6 and Supplementary Figure 6), representing 12 cases (0.49%) and a single control (0.02%); OR=20.3 [2.6-156.2]; Plocus=6.2 × 10−5; Pcorr=6.7 × 10−4. Consistent with deletions previously identified for this gene in ASD, SCZ, and epilepsy, these exon-spanning CNVs clustered at the 5’ end of the gene and predominantly affected the a isoform of NRXN18.
The most significant segmental association with a duplication was located within the CNTN6 gene at rs4085434 (Plocus=3.7 × 10−5, Pcorr=6.5 × 10−3) with a secondary peak located directly upstream (Plocus=5.4 × 10−5, Pcorr=6.5 × 10−3, Figure 2a and c). Closer inspection of the locus revealed an enrichment of large duplications spanning this gene. A gene-based test determined that duplications overlapping CNTN6 correspond to an OR=10.1 [2.3-45.4] for TS (Plocus=2.5 × 10−4, Pcorr=8.3 × 10−3), with gains found in 12 cases (0.49%) and 2 controls (0.05%) (Supplementary Table 6 and Supplementary Figure 7). All duplications detected across CNTN6 were heterozygous and spanned exons.
No other loci were significant after controlling for FWER, under either segmental or gene-based tests of association. We obtained similar results after pair-matching each individual case with its closest ancestrally matched control, demonstrating that these results are robust and not the result of inter-European population stratification or case-control sample biases (Supplementary Text and Supplementary Figure 8). Furthermore, we observed no significant enrichment of any CNVs among controls.
Excluding these two genome-wide significant loci, we conducted a secondary analysis testing for an increased burden among 27 rare, recurrent CNVs previously associated with various neurodevelopmental/neuropsychiatric disorders. We observed no nominally significant enrichment, either considering these CNVs individually (Supplementary Table 7) or in concert (P=1.0, 2-sided Fisher’s exact test).
Although previous studies have reported heterozygous exonic NRXN1 deletions in 4 TS patients9,10, the small sample sizes in these prior studies precluded any definitive association of this deletion with TS. Here, we demonstrate that exonic deletions affecting NRXN1, particularly those spanning exons 1-3, confer a substantial increase in risk for the disorder. Of note, among the 12 TS cases with exonic NRXN1 deletions, four had another previously diagnosed neurodevelopmental disorder (NDD), including two with ASD (Supplementary Table 8). The association of NRXN1 deletions with different neurodevelopmental disorders represents one of the most consistent findings regarding CNVs in neuropsychiatry8,11,12. Our data suggests an approximately two-fold higher prevalence of exonic NRXN1 deletions in TS compared to other neuropsychiatric disorders12, although much larger replication cohorts will be necessary to affirm this apparent comparative enrichment. Despite the diverse clinical presentation exhibited by NRXN1 deletion carriers, in vitro models using human neurons differentiated from induced pluripotent stem cells have shown independent lines carrying different exonic deletions in the NRXN1-a isoform exhibit markedly similar defects in synaptic transmission13. NRXN1-α interactions are also critical for thalamocortical synaptogenesis and plasticity14, underscoring a potential mechanism for its repeated association to developmental neuropsychiatric disorders.
Like NRXN1, CNTN6 encodes a cell-adhesion molecule that has been shown to promote neurite outgrowth. On the basis of structural variation, CNTN6 has been proposed as a candidate gene for intellectual disability and/or developmental delay15,16, and deletions affecting CNTN6 are significantly enriched in ASD17. However, none of the subjects with CNTN6 duplications identified here had a known NDD (Supplementary Table 8). Notably, the CNTN6 duplications identified in our sample are considerably larger in TS cases compared to controls (641.0 vs. 142.9 kbp). 9 out of 12 TS carriers harbor a duplication exceeding 500 kbp in length, while both of the CNTN6 duplications found in controls were less than 200 kbp. Furthermore, in a previous CNV study of 1,086 TS cases and 1,789 controls unrelated to the samples used in the current analysis, duplications directly upstream of CNTN6 demonstrated the greatest enrichment18, reinforcing a possible pathogenic significance of CNTN6 duplications to this disorder. Consistent with northern blot analysis in the adult human nervous system19, examination of human brain RNAseq data from the BrainSpan project indicates that CNTN6 is widely expressed postnatally with highest expression seen both prenatally and postnatally in the cerebellum and mediodorsal thalamus, with additional focal expression in mid-gestational frontal and sensorimotor cortex (Supplementary Figure 9). The thalamus has long been a region of proposed involvement in TS based on multiple levels of evidence including thalamic lesions, human neurophysiology studies, and by recent treatments successes using deep-brain stimulation20,21. The cerebellum has also recently been implicated in TS by functional magnetic resonance imaging22.
In summary, we have conducted the largest survey of structural variation in TS to date. We identified two genome-wide significant loci that are enriched for rare CNVs in TS: deletions in NRXN1 and duplications in CNTN6. Approximately 1% of TS cases carry a CNV in either gene. Furthermore, we demonstrate a significant increase in global CNV burden, primarily for large, extremely rare deletions. This result suggests that additional CNVs that confer susceptibility to TS remain, but their discovery will likely require substantial increases in sample size.
Summarized Methods
Sample ascertainment and data generation
TS cases were ascertained through 21 sites across North America, Europe, and Israel through either specialty clinics or a web-based recruitment effort using a validated diagnostic instrument (TICS Inventory, Supplementary Text). A definite DSM-IV-TR diagnosis of TS, determined by an expert clinician, was a requisite for study inclusion. Unselected control samples were collected in conjunction with TS cases. Additional unscreened controls were obtained from four external studies, and SNP data was generated for all samples on the Illumina OmniExpress platform (Supplementary Text and Supplementary Table 1) according to the manufacturer’s specifications. Raw intensities were obtained using GenomeStudio (Illumina). Quality Control (QC, Supplementary Text and Supplementary Table 1b) was conducted using PLINK v. 1.90, Perl, and R scripts. Samples were further excluded if they were of discrepant or indeterminate genetic sex or were outliers based on heterozygosity (Supplementary Figure 2a). When samples exhibited an excessive amount of cryptic relatedness (PI-HAT > 0.185), only the sample with the higher call rate was retained. In addition, control samples that exhibited an excessive amount of cryptic relatedness to individuals clinically diagnosed with a neuropsychiatric phenotype were also removed.
Ancestry inference and matching
Genotype data was combined with data from publicly available HapMap samples of European, African, and Asian continental ancestry (Illumina). All available European (EU) population samples from the 1000 Genomes Project were also included to establish an appropriate calibration threshold for EU ancestry designation. A total of 19,024 LD-independent markers were used for ancestry inference, and samples were excluded if they contained > 0.0985 non-EU ancestry as determined using fastStructure (Supplementary Figure 2).
CNV calling
Only SNP assays common to all versions of the OmniExpress arrays were used for CNV detection (n=689,077) to mitigate any disparity in CNV detection due to probe coverage. Raw CNV calls were generated on all autosomal chromosomes using PennCNV and QuantiSNP. In addition to hard cutoffs used to flag problematic assays, samples were excluded if they represented outliers in a number of CNV quality metrics (determined as mean ±3 SD or by manual inspection, Supplementary Figure 1). Rare CNVs, defined by a prevalence of <1% across all samples, were validated with an alternative locus specific CNV genotyping algorithm that considers normalized, median-summarized intensity values across each putative CNV region. An overview of the CNV processing pipeline is presented in Supplementary Figure 1 and described in detail in the Supplementary Text.
Burden analysis of global CNV burden
Under a logistic regression model, we assessed for global CNV burden as measured by the total number of CNVs, cumulative CNV length, or number of genes spanned by CNVs, including covariates found to be significantly associated with these burden metrics (Supplementary Text and Supplementary Table 9). Odds ratios indicate an increase in risk for TS per unit of CNV burden. P-values were calculated using the likelihood ratio test.
Locus-specific tests of association
The segmental test of association was performed at all unique CNV breakpoints. For gene-based association tests, we considered only CNVs spanning exons of coding genes as defined by Refseq annotation. Significance for both tests of association was determined by 1,000,000 permutations of phenotype labels. In each case, both locus-specific and genome-wide corrected p-values were obtained using the max(T) permutation method as implemented in PLINK v1.07, which controls for family-wise error rate by comparing the locus-specific test statistic to all test statistics genome-wide within each permutation.
Analysis of known neuropsychiatric susceptibility loci
A list of known CNVs with strong evidence of association to various neuropsychiatric disorders, including ASD, ID/DD, SCZ, and BD was assembled from the literature4. For recombination hotspots, a CNV was counted if it overlapped with the reported region by at least 50%. Singlegene associated CNVs were considered if they shared overlap to annotated gene boundaries as annotated in RefSeq. Locus-specific P-values were determined by 100,000 permutations of phenotype labels.
URLs
PennCNV, http://penncnv.openbioinformatics.org; QuantiSNP, https://sites.google.com/site/quantisnp/home; HapMap3, ftp://ftp.ncbi.nlm.nih.gov/hapmap; 1000 Genomes Project; http://www.1000genomes.org; BrainSpan, http://www.brainspan.org
ACKNOWLEDGMENTS
The authors wish to thank all the patients with Tourette Syndrome and their families, as well as unaffected volunteers, who generously agreed to participate in this study. This study was supported by the US National Institutes of Health U01 NS040024 to Drs. Pauls, Mathews, and Scharf and the Tourette Syndrome Association International Consortium for Genetics, ARRA Grant NS040024-09S1, K23 MH085057, and K02 NS085048 to Dr. Scharf, ARRA Grants NS040024-07S1 and NS016648 to Dr. Pauls, MH096767 to Dr. Mathews and NINDS Informatics Center for Neurogenetics and Neurogenomics grant P30 NS062691 to Drs. Coppola and Freimer, by grants from the Tourette Association of America to Drs. Paschou, Pauls, Mathews, and Scharf and from the German Research Society to Dr. Hebebrand.
AUTHOR CONTRIBUTIONS
All authors were involved in the conception and design of the genetic study. A.H., P.P., C.A.M., J.M.S., and G.C. designed and oversaw the analyses. A.H., D.Y., L.K.D., J.H.S., F.T., V.R., I.Z., E.M.R., L.O., J.A.C., L.M.M., B.M.N., N.B.F., P.P., C.A.M., J.M.S., and G.C. participated in the conduct of the analyses. Major contributions to writing and editing were made by A.H., C.A.M., J.M.S, and G.C. All authors assisted with critically revising the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interest: details are available in the online version of the paper.
P.S., M.G., H.S.S., R.A.K., Y.D., G.R., C.L.B., G.L., W.M.M., D.L.P, N.J.C., N.B.F., P.P., C.A.M. and J.M.S have received research funding from the Tourette Association of America (TAA). J.M.S., C.A.M. have received travel support from the TAA and serve on the TAA Scientific Advisory Board. J.M.S. has also received consulting fees from Nuvelation Pharma, Inc. P.S. received unrestricted Educational Grants in support of conferences he organized from Purdue and Shire, a CME speaker fee from Purdue University, industry sponsored clinical trial support from Otsuka and is a member of the Data Safety Monitoring Committee at Psyadon. C.L.B. has received funding for clinical trials from Psyadon Pharmaceuticals, Neurocrine Pharmaceuticals, Synchroneuron Pharmaceuticals, AUSPEX Pharmaceuticals, and TEVA Pharmaceuticals. She was a paid speaker for the TAA CDC program and a paid consultant for Bracket eCOA. I.A.M. has participated in research funded by the National Parkinson Foundation, TAA, Abbvie, Auspex, Biotie, Michael J. Fox Foundation, Neurocrine, Pfizer, and Teva, but has no owner interest in any pharmaceutical company. I.A.M. has been reimbursed for speaking for the National Parkinson Foundation and TAA. M.S.O. serves as a consultant for the National Parkinson Foundation, and has received research grants from NIH, NPF, the Michael J. Fox Foundation, the Parkinson Alliance, Smallwood Foundation, the Bachmann-Strauss Foundation, the TAA, and the UF Foundation. M.S.O’s DBS research is supported by: R01 NR014852 and R01NS096008. He has previously received honoraria, but in the past >60 months has received no support from industry. M.S.O. has received royalties for publications with Demos, Manson, Amazon, Smashwords, Books4Patients, and Cambridge, is an associate editor for New England Journal of Medicine Journal Watch Neurology, has participated in CME and educational activities on movement disorders in the last 36 months sponsored by PeerView, Prime, QuantiaMD, WebMD, Medicus, MedNet, Henry Stewart, and by Vanderbilt University. The institution and not M.S.O. receives grants from Medtronic, Abbvie, Allergan, and ANS/St. Jude, and the PI has no financial interest in these grants. D.W.W. has received royalties from Guilford Press and Oxford University Press, speaking honoraria from the TAA and serves on the TAA Medical Advisory Board. All other authors have no competing financial interests to declare.
Footnotes
On behalf of the Tourette Syndrome Association International Consortium for Genomics (TSAICG) and the Gilles de la Tourette Syndrome GWAS Replication Initiative (GGRI)