Complex genetic network underlying the convergent of Rett Syndrome like (RTT-L) phenotype in neurodevelopmental disorders

Mutations of the X-linked gene encoding methyl-CpG-binding protein 2 (MECP2) cause classical forms of Rett syndrome (RTT) in girls. Patients with features of classical Rett syndrome, but do not fulfill all the diagnostic criteria (e.g. absence of a MECP2 mutation), are defined as atypical Rett syndrome. Genes encoding for cyclin-dependent kinase-like 5 (CDKL5) and forkhead box G1 (FOXG1) are more commonly found in patients with atypical Rett syndrome. Nevertheless, a subset of patients who are recognized to have an overlapping phenotype with RTT but are lacking a mutation in a gene that causes typical or atypical RTT are described as having Rett syndrome like phenotype (RTT-L). Whole Exome Sequencing (WES) of 8 RTT-L patients from our cohort revealed mutations in the genes GABRG2, GRIN1, ATP1A2, KCNQ2, KCNB1, TCF4, SEMA6B, and GRIN2A, which are seemingly unrelated to Rett syndrome genes. We hypothesized that the phenotypic overlap in RTT and RTT-L is caused by mutations in genes that affect common cellular pathways critical for normal brain development and function. We annotated the list of genes identified causing RTT-L from peer-reviewed articles and performed a protein-protein interaction (PPI) network analysis. We also investigated their interaction with RTT (typical or atypical) genes such as MECP2, CDKL5, NTNG1, and FOXG1. We found that the RTT-L-causing genes were enriched in the biological pathways such as circadian entrainment, the CREB pathway, and RET signaling, and neuronal processes like ion transport, synaptic transmission, and transcription. We conclude that genes that significantly interact with the PPI network established by RTT genes cause RTT-L, explaining the considerable feature overlap between genes that are indicated for RTT-L and RTT.


Introduction
Rett syndrome (RTT), first described by Austrian neurologist Andreas Rett, is a deleterious X-linked neurodevelopmental disease mainly affecting females, which is characterized by the loss of spoken language and loss of purposeful hand use to hand stereotypies that mimic handwashing 1 . Mutations of the X-linked gene encoding methyl-CpG-binding protein 2 (MECP2) causes RTT in girls, severe encephalopathy in male infants, and X-linked mental retardation 2 .
Classical Rett Syndrome is mostly attributed to de novo mutations in MECP2 3 , with affected probands displaying a characteristic phenotype, including loss of acquired purposeful hand skills, loss of acquired spoken language, gait abnormalities, and stereotypic hand movements 4 . Patients who do not fulfill all the diagnostic criteria of RTT but presence of overlapping phenotype of RTT are classified as having atypical Rett syndrome 5 . Several pathogenic variants for atypical RTT have been described in recent years, such as the early-seizure onset variant attributed gene CDKL5 (cyclin-dependent kinase-like 5) 6 , the preserved speech variant attributed to MECP2 7, 8 , the rare variant attributed to gene NTNG1 (netrin-G1) 9 , and the congenital variant attributed to gene FOXG1 (Forkhead box G1) 10 .
Mutations also occur in loci other than MECP2, CDKL5, NTNG1, and FOXG1, with patients displaying some but not all the clinical features associated with classic and atypical RTT 12 .
A proband with this phenotype is classified as Rett Syndrome like phenotype (RTT-L) 12 . The use of next generation sequencing (NGS) has expanded the locus heterogeneity in individuals diagnosed with RTT-L, however, the molecular pathways and processes causing the RTT-L phenotype are still unknown. 12 The identification of pathogenic mutations in genes of RTT-L probands previously not associated to RTT or neurodevelopmental disorders indicates the need to understand the molecular mechanism so that effective therapies can be developed.
We hypothesized that several of the mutated genes causing the RTT-L phenotype encode closely interacting proteins with RTT and atypical RTT proteins; then, the analysis of their interactions might reveal a common biological pathway implicated in RTT-L. We report the use of a family-based exome sequencing approach in a cohort of 8 families with clinical features of RTT-L and the identification of de novo mutations in genes outside of the usually studied genes MECP2, CDKL5, FOXG1, and NTNG1. We then demonstrate the interactions that occur within the protein network between MECP2, CDKL5, FOXG1, and NTNG1. Finally, after assembling a list of de novo mutations in genes that cause RTT-L from published reports (including our own de novo mutations), we establish the interconnectedness of many genes from our list in relation to the protein network established by MECP2, CDKL5, FOXG1, and NTNG1.

Patient samples
In this study, we identified a cohort of 8 Caucasian trios with RTT-L clinical features according to the revised RettSearch International Consortium criteria and nomenclature, 4 and all lacked mutations in MECP2, CDKL5, FOXG1, and NTNG1. The Rett Search International Consortium criteria is an instrument for assessing and diagnosing RTT. It classifies characteristics of RTT into typical and atypical RTT through a series of behavioral criteria. The parents of the affected children did not exhibit clinical features of RTT-L or intellectual disability. Genomic DNA from these trios was obtained from peripheral blood leukocytes. The study protocol and consent procedure were approved by the Western Institutional Review Board (WIRB; study number 20120789). Informed consent was obtained from patients.

Whole exome sequencing
A trio-based WES method was performed. Genomic DNA was extracted from blood leukocytes for each member of the family trio and genomic libraries were prepared using 1.2 g of DNA with the TruSeq DNA sample preparation and Exome Enrichment kit (62Mb; Illumina, San Diego, CA). Sequencing was performed by paired-end sequencing on a HiSeq2000 instrument (Illumina Inc, San Diego, USA) and were then aligned to the Human genome (Hg19/GRC37) using the Burrows-Wheeler alignment tool (BWA-MEMv0.7.8). Polymerase chain reaction (PCR) duplicates were removed using Picard (v1.128), and base quality recalibration and indel realignment were performed using the Genome Analysis Toolkit (GATKv3.3). Variants were called jointly with HaplotypeCaller 13 , recalibrated with GATK, and annotated with dbNSFP (v3.1) and snpEff (3.2a) for protein-coding events. Our WES was focused on de novo single nucleotide variants since they are often implicated in causing intellectual disability-related diseases, although other modes of inheritance such as autosomal recessive and X-linked were considered as well. To select variants that were not present in the healthy human populations, variants with an allele frequency >1% observed in the Single Nucleotide Polymorphism database (dbSNP) and the 1000 Genomes Pilot Project 14 were filtered out and were filtered against variants found in their parents.
Finally, de novo candidates were selected based on the alignment quality, damage predictors, and conservation level of each of the genes. The presence of the variants in large control databases such as the Genome Aggregation Database (gnomAD) was also evaluated 15 . De novo variants in subsequent analyses were defined as variants demonstrated by both exome sequencing and Sanger sequence validation to be present in a proband and absent from both parents. The predicted functional impact of each candidate's de novo missense variant was assessed through analysis of in silico tools. Variants were considered potentially deleterious through varied criteria, including the MutationAssessor, MutationTaster, Provean, Combined Annotation Dependent Depletion (CADD), and Polyphen score, and splice-altering predictions for splice sites 16 .

RTT-L Gene Literature Search
We manually curated a list of genes implicated in RTT-L. An extensive literature search was conducted on PubMed for peer-reviewed articles describing patients with RTT-L Syndrome or RTT Syndrome-like disorder till July 2017. We focused on genes where the literature clearly described the patient's phenotypic features as overlapping with RTT Syndrome. The list was meant to be exhaustive, and as a result was routinely updated; the genes identified from our own study were included in the list. We also compiled a list of phenotypic characteristics observed in the subjects of the peer-reviewed articles we found. These characteristics were based off the criteria outlined by the RettSearch International Consortium as well as frequently described proband phenotypes 4 .

Functional Enrichment Analysis and Ontology Lists
Functional enrichment analysis was performed on the RTT-L gene list along with RTTand atypical RTT-causing genes (RTT genes) to infer significant biological processes and pathways. We used the Overrepresentation Test 17 followed by the Fisher's exact test (http://pantherdb.org, Version 13) based on the GeneOntology (GO) database 18 (released 2017-12-27). Biological processes with the most significant overrepresentation were recorded, and the genes composing them were compiled into process-level gene lists. The Combined RTT-L and RTT gene lists' involvement in biological pathways were analyzed using GeneAnalytics (http://geneanalytics.genecards.org) program to characterize molecular pathways, powered by GeneCards, LifeMap Discovery, MalaCards, and Path Cards 19 . Related pathways from the gene list were grouped into Superpathways to improve inferences and pathway enrichment, reduce redundancy, and rank genes within a biological mechanism via the GeneAnalytics algorithm.
Pathways compiled were statistically overrepresented with significance defined at <0.0001 and a corresponding -log(p-value) score > 16.

RTT-and RTT-L-Implicated Genes Network Analysis
Protein-protein interaction (PPI) networks were generated from the Homo sapiens interaction database created by GeneMania 2017-03-14 release 10 . The RTT-L-causing genes identified through our institution and our exhaustive literature search as well as the RTTimplicated genes (MECP2, CDKL5, FOXG1, NTNG1) were used as the seed list of a PPI network of 250 total genes. Network figures were created using Cytoscape 3.5.1 21 . PPI network statistics were generated using Cytoscape's NetworkAnalyzer 22 and Centiscape 23 . To estimate the statistical significance of observations drawn from gene list PPI networks, we compiled a gene list of 1,460 highly expressed central nervous system (CNS) genes identified through "The Human Protein Atlas" 24 (www.proteinatlas.org). For each biological process and pathway gene list identified, PPI networks containing the 250 genes, the statistically significant threshold were created with a seed of random brain genes (n=65) from The Human Protein Atlas gene list. The mean numbers and standard deviation of network centralization, network density, node degree, and edge length of the random gene lists were used to calculate z-scores of RTT-L and RTT statistics ( Figure 1).

Clinical and Molecular Characterization of Pathogenic Variants
Here we present the summary of the clinical reports behind each RTT-L case at our institution ( Table 1). All these patients were initially diagnosed as RTT or RTT-L by the clinician before the genetic diagnosis. Through WES, we identified disease causing variants in GABRG2 GRIN2A (patient 6), TCF4 (patient 7), and SEMA6B (patient 8) genes, all of which occurring in evolutionarily conserved locations ( Figure 2). In addition, all variants were not observed in controls in the Genome Aggregation Database (gnomAD). These genes presented phenotypes that overlap with those seen in patients with Rett syndrome (Table 1). Patient 1 is heterozygous for a de novo substitution mutation (c.316G>A) in the GABRG2 gene, which is associated with childhood febrile seizures, resulting in a missense (p.Ala106Thr) in the ligand binding region. Patient 2 was identified to carry a heterozygous de novo mutation in GRIN1 gene, which is associated with neurodevelopmental disorder with seizures. The heterozygous c.2443G>C substitution resulted in a missense mutation (p.Gly815Arg). Patient 3 is heterozygous for a de novo substitution mutation (c.977T>G) (p.Ile326Arg) in the ATP1A2 gene, which is associated with childhood hemiplegia. Trio WES performed on Patient 4 revealed a de novo substitution mutation (c.740C>A) in the gene KCNQ2, which is associated with early infantile epileptic encephalopathy, resulting in a nonsense variant (p.Ser247Ter), which is predicted to be targeted by nonsense-mediated decay. Patient 5 was identified to have a de novo variant (c.916C>T) in KCNB1 gene, which is associated with early infantile epileptic encephalopathy, resulting in a missense (p.Arg306Cys). Patient 6 is heterozygous for a de novo mutation (c.1845T>C) in gene GRIN2A, which encodes a member of the glutamate-gated ion channel protein family and is associated with focal epilepsy and mental retardation. This mutation results in missense (p.Asn614Ser) in exon 8 and affects the ligand-gated ion channel domain in the cytoplasmic region. Patient 7 is heterozygous for a de novo splice site mutation in TCF4 gene. WES of Patient 8 revealed the presence of a de novo frameshift (c.1991delG) in the gene SEMA6B, which is predicted to led to premature truncated protein (p.Gly664fs).

Phenotype Clustering of RTT-L Patients
We analyzed the phenotypes of RTT-L patients to evaluate the frequency of main and supportive RTT criteria that appeared in patients. Of the main criteria, RTT-L patients often displayed at least two of the following clinical RTT phenotype: regression, partial or complete loss of acquired purposeful hand skills, partial or complete loss of acquired spoken language, gait abnormalities, and hand stereotypies. However, RTT-L patients often frequently displayed Rett supportive criteria, including bruxism (56%), growth retardation (67%), respiratory disturbances (43%), and screaming fits (33%).

Functional Enrichment
We then conducted functional enrichment using GeneAnalytics on the complete RTT-L and RTT gene list to identify the overrepresented or under-represented biological processes and pathways. The biological functions identified (p < 0.0001) were ranked according to the p-value (Table 3). RTT-L patients showed significant over-representation of GO Biological Processes involving ion transport, synaptic transmission, regulation of ion transport, visual learning, regulation of postsynaptic membrane potential (Table 3). Similarly, RTT-L patients also had overrepresentation of the GeneAnalytics Pathways including Synaptic transmission, dopamine signaling, neurophysiological Process, NMDA receptor activation and circadian entrainment ( Table 3). The identification of these biological pathways and processes suggests the diversity of biological pathways and processes RTT-L genes are implicated in.

Comparison of RTT-L-Causing Genes with SFARI Autism Gene Set
In 2009, Wall et al. created a phylogeny 41 which grouped autism with RTT Syndrome under the term "autism sibling disorders". The authors used the SFARI gene database to collect 991 genes implicated in autism. In order to evaluate our data, we compared the genes implicated in causing RTT-L with the SFARI autism gene set. The SFARI gene set showed a significant number of genes in common with the curated gene list. 37% of the genes in the RTT-L and RTT were implicated in the autism-causing SFARI gene set (Supplementary Table 1).

Network Analysis of Protein-Protein Interactions Between RTT-L and RTT Genes
Phenotypic overlap between clinical features of RTT and the RTT-L patients in our cohorts was observed, prompting an investigation of protein-protein interactions of genes implicated in RTT-L Syndrome (Table 1). We used the GeneMania interaction data sets to examine whether de novo mutations causing RTT syndrome and RTT-L syndrome clustered in protein-protein interaction networks. Remarkably, 63 of the 67 genes mapped to an interconnected PPI network with 351 edges, with only the genes SHANK3 and ZNF620 being disconnected from the network.
These 63 genes interacted with one another through coexpression (45.47%), predicted interactions (17.20%), pathway interactions (12.34%), colocalization (11.72%), physical interactions (9.61%), shared protein domains (2.52%), and genetic interactions (1.15%) ( Figure 3A). This PPI network was then supplemented with the addition of 183 genes, determined as having the strongest interaction with the 67 RTT and RTT-L genes curated through the GeneMania association algorithm; the modified PPI network (n=250) was analyzed for network centralization, average node degree, average edge length, and network density ( Table 4). The PPI network had significantly high average node degree (p=0.0003), network density (p=0.0007), and network centrality (p=0.0015) compared to control networks of brain genes identified through Human Protein Atlas, contributing to the greater connectivity of the network in shared neighbors between nodes (Table 4 and Figure 3B). The network was also characterized by the presence of genes (FOXG1, CAMK2D, GFRA1, DAB1, MEF2C, PIK3R1, PTPRT, SNAP25, MAP3K10, SH3GL2, SYT1) with high betweenness centrality, indicating that these particular genes play an outsized role in connecting different biological pathways and processes together ( Figures 3C and 3D).

RTT-L Gene Interaction with MECP2 Epigenetic Network
Most frequently associated with causing classical Rett syndrome, MECP2 forms a complex regulatory network, regulating neuronal function and development by regulating RNA splicing and chromatin structure as well as interfering in methylation of DNA. We analyzed the MECP2 regulatory network from Ehrhart et. al., for genes implicated in RTT-L genes and pathways. We found that RTT-L genes were often directly involved in the MECP2 regulatory network. MECP2 activates the formation of the MECP2-HDAC complex, which in turn inhibits MEF2C, another RTT-L gene. MECP2 also forms complexes with RTT-L gene CREB1 to activate transcription as well as with YB1 and PRPF3 to regulate the alternative splicing of RTT-L gene GRIN1.
Additionally, MECP2 affects the glutamate and GABA pathways, both of which contain many RTT-L genes, through the activation of AMPA and the inhibition of noncoding RNA Ak081227, respectively. Network analysis of the PPI interaction network generated with the RTT and RTT-L syndrome-causing genes shows the presence of a significantly interconnected protein network facilitated by genes with high betweenness centrality, or the number of shortest paths in the network that pass through the gene. These genes serve as key interactors with genes that cause RTT-L syndrome. For instance, one of these high betweenness centrality genes, synaptotagmin 1 (SYT1), has protein-protein interactions with 20 of the 56 genes that cause RTT-L syndrome as well as with CDKL5. Moreover, the PPI network generated by these high-centrality genes contains many of the GO biological processes also implicated in the RTT-L PPI network.     List of genes contributing to classical RTT, atypical RTT, RTT-L features of our cohort, RTT-L, and both RTT-L and RTT. Bolded genes indicate those identified in our patient cohort as harboring pathogenic mutations. Table 3: Lists of genes associated with each GeneOntology biological process and GeneAnalytics Pathways. Bolded genes indicate those identified in our patient cohort as harboring pathogenic mutations. Table 4: Network analysis of PPI networks generated from the seed gene list.

Supplementary Materials
Supplementary Figure 1: Number of RTT-L-causing genes associated with biological processes described by GeneOntology Supplementary Figure 2: Pathway clustering of the varied biological processes of RTT-Lcausing genes.