Abstract
Background Mutations in the human L1CAM gene cause a group of neurodevelopmental disorders known as L1 syndrome (CRASH syndrome). The L1CAM gene provides instructions for producing the L1 protein, which is found all over the nervous system on the surface of neurons. L1 syndrome involves a variety of characteristics but the most common characteristic is muscle stiffness. Patients with L1 syndrome can also suffer from difficulty speaking, seizures, and underdeveloped or absent tissue connecting the left and right halves of the brain.
Method The human L1CAM gene was studied from dbSNP/NCBI, 1499 SNPs were Homo sapiens; of which 450 were missense mutations. This selected for Comprehensive bioinformatics analysis by several in silico tools to investigate the effect of SNPs on L1CAM protein’s structure and function.
Results 34 missense mutations (26 novel mutations) out of 450 nsSNPs that are found to be the most deleterious that effect on the L1CAM structural and functional level.
Conclusion Better understanding of L1 syndrome caused by mutations in L1CAM gene was achieved using Comprehensive bioinformatics analysis. These findings describe 35 novel L1 mutations which improve our understanding on genotype-phenotype correlation. And can be used as diagnostic markers for L1 syndrome and besides in cancer diagnosis specifically in breast cancer.
1. Introduction
L1 syndrome (also known as CRASH syndrome) is a group of a very rare inherited disorders that primarily affect the nervous system and is characterized by Hydrocephalus with Stenosis of the Aqueduct of Sylvius (HSAS), intellectual disability, corpus callosum hypoplasia (or agenesis), adducted thumbs and spastic paraplegia (1-4). It is a recessive X-linked disorder that is exclusively affects men with incidence of 1/30.000 male births (5), and it is caused by mutations in the L1CAM gene (6-19) located near the telomere of the long arm of X chromosome in Xq28 in humans (20-22). Over 200 mutations in L1CAM gene have been reported (3, 17), this gene encodes for the L1 Cell Adhesion Molecule protein, which is a member of the immunoglobulin superfamily. As the name implies; the protein enables the adhesion of neural cells to one another and it is a key regulator of synapse formation, synaptic plasticity and axons and dendrites growth and formation (23-26). It was found that L1CAM gene is a major driver for tumor cell invasion and motility (27) therefore fully understanding its function will aid in the diagnosis and treatment of different types of cancers (27-50).
The underline pathogenetic mechanisms by which L1 syndrome happens remains unsolved (6, 25, 51), and the treatment requires shunting of the cerebrospinal fluid as needed (52). We hope for a better understanding of the condition and we believe thorough investigation of the L1CAM gene might help in that.
The aim of this study is to identify pathogenic mutations in the coding region of L1CAM gene using variant bioinformatics tools, which might then be used as diagnostic markers and may help in the development of new therapeutic strategies using gene therapy and pharmacogenomics. This is the first in silico analysis in the coding region of L1CAM gene that prioritized the functional analysis of nsSNPs. The use of variant bioinformatics tools was extremely beneficial due to the elimination of the cost and conformation of the results by the different-parameters-based softwares and It will facilitate the future genetic studies (53).
2. Materials and Methods
2.1 Data mining
The data on human L1CAM gene was collected from National Center for Biological Information (NCBI) web site (54). (https://www.ncbi.nlm.nih.gov/) and the protein sequence was collected from Uniprot (55) (https://www.uniprot.org/).
2.2 SIFT
We used SIFT to observe the effect of A.A. substitution on protein function. SIFT predicts damaging SNPs on the basis of the degree of conserved amino acid residues in aligned sequences to the closely related sequences, gathered through PSI-BLAST (56, 57). It’s available at (http://sift.jcvi.org/).
2.3 PolyPhen
We used PolyPhen (version 2) to study the probable impacts of A.A. substitution on structural and functional properties of the protein by considering physical and comparative approaches (58, 59). It is available at (http://genetics.bwh.harvard.edu/pph2/).
2.4 PROVEAN
PROVEAN is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. It is useful for filtering sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important (60, 61). It is available at (https://rostlab.org/services/snap2web).
2.5 SNAP2
SNAP2 is a trained classifier that is based on a machine learning device called “neural network”. It distinguishes between disease-associated and neutral variants/non-synonymous SNPs by taking a variety of sequence and variant features into account. It is available at (https://rostlab.org/services/snap2web/).
2.6 SNPs&GO
SNPs&GO is an accurate method that, starting from a protein sequence, can predict whether a variation is disease related or not by exploiting the corresponding protein functional annotation. (62, 63). It is available at (http://snps.biofold.org/snps-and-go/snps-and-go.html).
2.7 PHD-SNP
An online Support Vector Machine (SVM) based classifier, is optimized to predict if a given single point protein mutation can be classified as disease-related or as a neutral polymorphism. It is available at: (http://snps.biofold.org/phd-snp/phdsnp.html).
2.8 I-Mutant 3.0
I-Mutant 3.0 Is a neural network based tool for the routine analysis of protein stability and alterations by taking into account the single-site mutations. The FASTA sequence of protein retrieved from UniProt is used as an input to predict the mutational effect on protein stability (64). It is available at (http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi).
2.9 MUpro
MUpro is a support vector machine-based tool for the prediction of protein stability changes upon nsSNPs. The value of the energy change is predicted, and a confidence score between −1 and 1 for measuring the confidence of the prediction is calculated. A score <0 means the variant decreases the protein stability; conversely, a score >0 means the variant increases the protein stability. It’s available at (http://mupro.proteomics.ics.uci.edu/).
2.10 GeneMANIA
We submitted genes and selected from a list of data sets that they wish to query. GeneMANI approach to know protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins (65-67). It is available at (http://www.genemania.org/).
2.11 Identification of Functional SNPs in Conserved Regions by using ConSurf server
ConSurf web server provides evolutionary conservation profiles for proteins of known structure in the PDB. Amino acid sequences similar to each sequence in the PDB were collected and aligned using CSI-BLAST and MAFFT, respectively. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate 4Site algorithm, implemented in the ConSurf web server. It is available at (http://consurf.tau.ac.il/).
2.12. Structural Analysis
2.12.1 Detection of nsSNPs Location in Protein Structure
Mutation3D is a functional prediction and visualization tool for studying the spatial arrangement of amino acid substitutions on protein models and structures. Further, it presents a systematic analysis of whole genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes (68). It is available at (http://mutation3d.org).
2.13 Analysis of 3 UTR and 5 UTR of L1CAM gene
Sequence related to the 3 UTR and 5 UTR of L1CAM gene were retrieved from ensemble website, (https://www.ensembl.org/index.html). which were inserted in to RegRNA 2 website to generate the related microRNA sequences. (http://regrna2.mbc.nctu.edu.tw/) These results are also given to miRmap software to get free energy and conservation values. (https://mirmap.ezlab.org/).
3. Results
4. Discussion
26 novel mutations were found to have a damaging effect on the stability and function of the L1CAM gene using Comprehensive bioinformatics analysis tools. L1 Cell Adhesion Molecule (L1CAM) gene is a Protein Coding gene that plays a role In the axon outgrowth and path finding during the development of the nervous system. It is specialized extensions of neurons that transmit nerve impulses. L1CAM has been identified in a group of overlapping X-linked neurological disorders known as L1 syndrome (52).
Many Homo sapiens SNPs that are now recognized (https://www.ncbi.nlm.nih.gov/snp), open the doors to improve our understanding on genotype-phenotype correlation. Therefore, a deep comprehensive bioinformatics analysis was made to prioritize SNPs that have a structural and functional impact on L1CAMprotein. The most frequent type of genetic mutation is the single nucleotide polymorphism (SNP). Non-synonymous SNPs (nsSNPs) or missense mutations arise in coding region. nsSNPs result in a single amino acid substitution which may have effects on the structure and/or function of protein (69). Therefore, in this study we focus on SNPs in coding and non-coding regions. We investigated the effect of each SNP on the function and stability of the protein and gene expression of related genes using different bioinformatics tools with different parameters and aspects, in order to confirm the results we found and to minimize the error to the least percentage possible. The software used were SIFT, Polyphen-2, PROVEAN, SNAP2, SNP&GO, PhD-SNP, I-Mutant 3.0, MUPro and Mutation3D (Figure 1)
1498 SNPs were retrieved from the dbSNP/NCBI Database, which was the total number of nsSNPs in the coding region of the L1CAM gene. There were 450 nsSNPs (missense mutations) then submitted them to functional analysis by SIFT, PolyPhen-2, PROVEAN and SNAP2. SIFT server predicted 140 deleterious SNPs, polyphen-2 predicted 224 damaging SNPs (75 possibly damaging and 149 probably damaging), PROVEAN predicted 146 deleterious SNPs while in SNAP2 we filtered the triple-positive deleterious SNPs from the previous three analysis tools, out of 49, There were 43 nsSNPs predicted deleterious SNPs by SNAP2. (Table 1) After filtering the Quadra-positive deleterious SNPs we ended up with 43 SNPs and we submitted them to SNPs&GO and PhD-SNP to further investigate their effect on the functional level. PhD-SNP predicted 37 disease-associated SNPs while SNP&GO predicted 38, so we filtered the double positive 34 SNPs (Table 2) and submitted them to I-Mutant 3.0, P-MUT and MUPro respectively (Table 3) to investigate their effect on structural level. All the SNPs were found to cause a decrease in the stability of the protein except for six SNPs predicted by I-Mutant 3.0 to increase the stability, one SNP (P240L) prediction by MUPro and the SNPs in P-MUT predicted 34 deleterious.
Interestingly, GeneMANIA could not predict L1CAM gene function after the mutations. The genes co-expressed with, share similar protein domain, or participate to achieve similar function were illustrated by GeneMANIA and shown in (figure 2), Tables (4 & 5).
We also used ConSurf server; the nsSNPs that are located at highly conserved amino acid positions tend to be more deleterious than nsSNPs that are located at non-conserved sites. (supplemental figure 4 for ConSurf result, which is available at https://www.biorxiv.org/)
There were some studies that have been reported which show pathogenic nsSNPs that cause L1 syndrome (11, 70-78), which is corresponding with our results. It has been observed that L1CAM polymorphisms are associated in several types of human cancers (27-50, 79). Furthermore, this study confirm that (E1175K, Y784C, Y750S, D598N, C539G, G452R, G370R, P333R, W276R, C264Y, P240L and R184Q) SNPs are pathogenic, these results show similarities with the results found earlier in dbSNPs/NCBI database. Also all these SNPs were retrieved as untested (C57Y, A116G, A123T, L297V, N316S, N408K, G411R, G493R, Y682C, R755C, P816S, N825Y, N825K, N945I, R990C, L1132P, R1145C, R1145H, G1149R, and K1150R) were found to be all pathogenic.
We also used Mutation3D server; (Figure 3) All SNPs in red (R184Q, P240L, C264Y, W276R, L297V and N316S) are clustered mutation. Significantly, such mutation clusters are abundant in human cancers (82) we think it is associated with breast cancer. Afflictions which can be considered cases of highly accelerated evolution within somatic tissues. Recent studies have revealed several molecular mechanisms of clustered mutagenesis. (83) While SNPs in blue (C57Y, A116G, A123T, P333R, G370R, N408K and G411R) and gray are covered and uncovered mutation respectively.
Different cellular functions are controlled by transcriptional regulation done by non-coding RNA molecules. Recently discovered class of non-coding RNA molecules is MicroRNA (miRNA) that are small non-coding RNA having function of activation and/or suppression of protein translation inside the cells at post-transcriptional level (84). In present study, we have predicted some targets miRNAs for L1CAM gene. we found that it has sites for (Has miR-4707-3p) and (hsa-miR-4763-3p) microRNA in the 3 UTR regions. And sites for (hsa-miR-3189-5p) and (hsa-miR-4743-5p) microRNA in the 5 regions. Which were noted to be conserved among different species indicating its significant role in the function of the final protein. This insight provides clue to wet-lab researches to understand the expression pattern of L1CAM gene and binding phenomenon of mRNA and miRNA upon mutation.
This study revealed 26 novel Pathological mutations that have a potential functional impact and may thus be used as diagnostic markers for L1 syndrome and can make an ideal target for tumor therapy (32, 80) properties of L1CAM, in addition to its cell surface localization, make it a potentially useful diagnostic marker for cancer progression and a candidate for anti-cancer therapy (39, 81). Finally some appreciations of wet lab techniques are suggested to support these findings.
5. Conclusion
A large number of different pathological L1CAM mutations have been identified. Therefore the confirmation of these nsSNPs in L1 syndrome was crucial by using Comprehensive bioinformatics analysis. These findings describe 26 novel L1 mutations which improve our understanding on genotype-phenotype correlation. And can be used as diagnostic markers for L1 syndrome and besides in cancer diagnosis.
Conflict of interest
The authors declare no conflict of interest.
Acknowledgment
The authors wish to acknowledgment the enthusiastic cooperation of Africa City of Technology -Sudan.