Abstract
Background Woodhouse-Sakati Syndrome refers to a group of inherited disorders characterized by alopecia, hypogonadism, diabetes mellitus, hypothyroidism and progressive extrapyramidal signs. The aim of this study is to identify the pathogenic SNPs in the DCAF17 gene with their related mciroRNAs and their effect on the structure and function of the protein.
Material and Methods We used different bioinformatics tools to predict the effect of each SNP on the structure and function of the protein. After that we defined the miRNAs founded in the 3′-UTR region on the DCAF17 gene and studied the annotations relative to it.
Results Ten deleterious SNPs out of 339 were found to have a damaging effect on the protein structure and function, with one significant micoRNA in the 3′-UTR region.
Conclusion This was the first in silico analysis of DCAF17 gene, in which 10 novel mutations were found using different bioinformatics tools that could be used as a diagnostic markers for Woodhouse-Sakati syndrome, with one relevant microRNA that can regulate the function of the protein.
1. Introduction
Woodhouse-Sakati Syndrome (WSS, MIM: 241080) is a rare autosomal-recessive multi systemic disorder which is characterized by a combination of hypogonadism, alopecia, Diabetes Mellitus (DM), mental retardation and extrapyramidal signs.(1–10)
It was originally described in a number of consanguineous Saudi families in the Middle East, but has recently been reported in other ethnicities as well.(11, 12) Since its original description in 1983, approximately 50 cases have been reported until now.(13) Exogenous hormone therapy promoting secondary sex characteristic development and insulin are among the recommended treatments for the disease.(14, 15)
The availability of vast amounts of sequence data, coupled to advances in computational biology in the recent years provides an ideal framework for in silico gene expression analysis. Single Nucleotide Polymorphisms (SNPs) make up about 90% of DNA sequence variations, making it the most common type of genetic variation. The SNPs that are most likely to have a direct impact on the protein product of a gene are those that change the amino acid sequence and variants in gene regulatory regions, which control protein expression levels.(16-18)
The disease was first described in 2008 by al Alazami et al,(4, 11) Mutations in the DCAF17 gene are the cause of Woodhouse-Sakati syndrome.(3, 19, 20) It is located on chromosome 2q31.1.(4, 21) it encodes a nucleolar protein with poorly understood function, adding to that the pathogenic mechanism underlying WSS is also un known.(13)
The aim of this study was to identify the pathogenic SNPs in DCAF17 gene located in the coding region and analyzing the microRNA(miRNA) in the 3′-untranslated regions (3′-UTR) using in silico prediction software’s that determine the structure, function and regulation of their respective proteins. This is the first in silico analysis of it is kind in DCAF17 gene to address this matter.
2. Material and Methods
2.1. Retrieving nsSNPs
SNPs associated with DCAF17 gene were obtained from the Single Nucleotide Polymorphism database (dbSNP) in the National Center for Biotechnology Information (NCBI). (http://www.ncbi.nlm.nih.gov/snp/).
The sequence and natural variants of DCAF17 protein were obtained from the UniProt database as it considered as the most reliable and unambiguous database for protein sequences.(22) (https://www.uniprot.org/).
A total number of 339 nonsynonymous Single Nucleotide Polymorphisms (ns SNPs) were found from NCBI database, all of them were subjected to in silico analysis using ten different algorithms and softwares; SIFT, PROVEAN, PolyPhen-2, SNPs&GO, PhD-SNP, PMUT, I-mutant, Project Hope, GeneMANIA and Chimera.
2.2. Identifying the most damaging nsSNPs and disease related mutations
Five different bioinformatics tools were utilized to predict functional effects of nsSNPs obtained from dbSNP database on the protein. These algorithmic include: SIFT, PROVEAN, PolyPhen-2, SNAP2 and PhD-SNP.
The SNPs predicted deleterious by at least four softwares were considered high risk nsSNPs and investigated further.
2.2.1. SIFT Server
Phenotypic effects of amino acid substitution on protein function were predicted by using Sorting Intolerant From Tolerant (SIFT) server, which is a powerful tool used to fulfill this purpose. A list of nsSNPs (rsIDs) from NCBI’s dbSNP database was submitted as a query (original) sequence to SIFT to predict tolerated and deleterious substitutions for every position of a protein sequence. The server divides the results into “Deleterious” and “Tolerated”, nsSNPs with SIFT score ≤ 0.05 were classified as deleterious and are further analyzed to identify the damaging ones, and those > 0.05 were classified as tolerated and are not further analyzed. (Available at: http://sift.bii.a-star.edu.sg/).(23–26)
2.2.2. Provean Server
(Protein Variation Effect Analyzer) is the second software tool used. It also predicts the effect of an amino acid substitution on the biological function of a protein. It predicts the damaging effects of any type of protein sequence variations to not only single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. The results are obtained as either “Deleterious” if the prediction score was <-2.5, while score > −2.5 indicates that the variant is predicted to have a “Neutral” effect. (Available at: http://provean.jcvi.org/index.php).(27, 28)
2.2.3. Polyphen-2 Server
Polymorphism Phenotyping v2.0 (PolyPhen-2) is another online tool that predicts the possible effects of an amino acid substitution on the structure and function of the protein. The results are classified into “PROBABLY DAMAGING” that is the most disease causing with a score near to 1 (0.7-1), “POSSIBLY DAMAGING” with a less disease causing ability with a score of 0.5-0.8 and “BENIGN” which does not alter protein functions with a score closer to zero; (0-0.4). (Available at: http://genetics.bwh.harvard.edu/pph2/).(22, 29, 30)
2.2.4. SNPs & Go server
An online web server that used to ensure the disease relationship with the studied single nucleotide polymorphisms SNPs. It gives three different results based on three different analytical algorithms; Panther result, PHD-SNP result, SNPs&GO result. Each one of these results is composed of three parts, the prediction which decides whether the mutation is neutral or disease related, reliability index (RI), and disease probability (if >0.5 mutation is considered as disease causing nsSNP). (Available at: http://snps-and-go.biocomp.unibo.it/snps-and-go/).(31)
2.2.5. PMUT Server
PMUT is a powerful web-based tool used for the prediction of pathological variants on proteins. The prediction results are classified as “Neutral” or “Disease”. It is available at (http://mmb.irbbarcelona.org/PMut).(32)
2.3. Protein Structural analysis
I-Mutant:
A online web-tool that is used for the prediction of protein stability changes upon single point mutations, determining whether the mutation increases or decreases the protein stability. (Available at: http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi).(33)
2.4. Project HOPE
It is an online web-server used to analyze the structural and functional variations on protein sequence that have been resulted from single amino acid substitution. It searches protein 3D structures by collecting structural information from a series of sources, including calculations on the 3D coordinates of the protein and sequence annotations from the UniProt database. Protein sequences are submitted to project HOPE server then HOPE builds a complete report with text, figures, and animations. It is available at (http://www.cmbi.ru.nl/hope).(34)
2.5. GeneMANIA
A user-friendly web interface tool approaches to know protein function, analyzing submitted gene lists and prioritizing genes for functional assays. It provides lists with functionally similar genes that it identifies and recognizes using available genomics and proteomics data. (Available at: (http://www.genemania.org/).(35)
2.6. Modeling nsSNP locations on protein structure
Chimera:
Chimera 1.8 software is used for visualization and editing of the three dimensional structure of protein. It visualizes the original amino acid with the mutated one to see the impact that can be produced. It is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies and sequence alignments. High-quality images and animations can be generated. (Availabe at: http://www.cgl.ucsf.edu/chimera).(36)
2.7. 3′-UTR analysis
2.7.1. Ensemble 95
RNA sequence were retrieved from ensemble, which is website that makes key genomic data sets available to the entire scientific community without restrictions.(37, 38) (Available at: https://www.ensembl.org/index.html).
2.7.2. RegRNA2.0
Then the RNA sequences were inserted to RegRNA 2.0 (which is an integrated web server for identifying functional RNA motifs and sites.(39, 40) (Available at: http://regrna2.mbc.nctu.edu.tw/) to find the related microRNA (miRNA).
2.7.3. miRmap22.1
Then the miRNA sequence were further inserted in to miRmap website to predict the miRNA targets and study the repression strength using thermodynamic, probabilistic, evolutionary and sequence-based approaches.(41) (Available at: https://mirmap.ezlab.org/).
2.7.4. miRBase
After that we use mirbase website to study its annotations (miRBase website provides a wide-range of information on published microRNAs, including their sequences, their biogenesis precursors, genome coordinates, and community driven annotation).(42–44) (Available at: http://www.mirbase.org/).
Result
339 missense SNPs were retrived from National Center for Biotechnology Information (NCBI) website and submitted to SIFT, Proven, Polyphen-2, SNAP2 respectively. After analysis, 60 mutations out of 339 missense mutations were found to be deleterious in all four server. as found in table (1)
Shows the most deleterious SNPs by four softwares.
Furthermore, the above result were submitted to SNP&GO, PHD, and Pmut to study the relation between the SNPs and the disease. Out of 60 SNPs 10 were found to be related to disease. As indicated in table (2). Then I mutant was used to detect the stability of the protein and the result indicated that all the SNPs except one (rs1279189246,H478L) lead to decrease in the stability which lead to unstable amino acid interaction. As found in table (3).
Disease effect nsSNPs associated variations predicted by SNPs&GO and PHD-SNP softwares.
Stability analysis predicted by I-Mutant version 3.0.
One significant microRNA was found and investigated as shown in table (4) and (5).
miRNAs target identified in 3′-UTR region of DCAF17 gene by miRmap website.
Alignment of Query to mature miRNAs.
The gene co-expressed, share domain and interaction with DCAF17 gene network.
Diagrammatic representation of in silico work flow for DCAF17 gene in coding and 3′-UTR regions.
Interaction between DCAF17 gene and related genes using GeneMANIA.
D69V: amino acid change from Aspartic acid to Valine at position 69.
R72W: amino acid change from Arginine to Tryptophan at position 72.
V186D: amino acid Valine change to Aspartic acid at position 186.
P197L: amino acid Proline change to Leucine at position 197.
L204P: aminoacid Leucine change to Proline at position 204.
L224P: amino acid change from Leucine to Proline at position 224.
L433S: amino acid Leucine change to Serine at position 4333.
H478R: amino acid Histidine change to Arginine at position 478.
D483G: amino acid Aspartic acid change to Glycine at position 483.
F498V: amino acid Phenylalanine change to Valine at position 498.
Discussion
Ten novel mutations have been found that affect the stability, structure and the function of DCAF17 gene with one associated microRNA in the 3′-UTR region utilizing various in silico bioinformatics tools. These methods for prediction used standard tools and multiple disciplines to detect and characterize the missense variation. In order to ensure our results and increase the accuracy; eight web server in silico algorithm have been used: SIFT, Proven, Polyphen, SNAP 2, SNP&GO, PHD, Pmut and I mutant.
This gene considered to be with no function but it has relation to the disease Woodhouse-Sakati Syndrome with MIM no 241080 (from NCBI)(13, 45). Because there is no similar mutation pathogenicity prediction study to this gene; this founding considered to be novel mutation(46), (13), (47).
GeneMANIA did not indicated specific function for DCAF17 gene, but there is interaction with another genes(47), (15), (48), (14). The function of the gene is not clear until now(13).
Project HOPE server was used to predict the effect of the amino acid substitutions on the molecular level of the protein. The ten SNPs were submitted to hope server individually and the obtained result indicated a change in the size, hydrophobicity and charge of the final protein, all the SNPs were found in Ddb1- And Cul4-Associated Factor 17 IPR031620 domain region in the gene, which further confirm its pathogenicity.
All the mutant amino acids except three (R72W, V186D, P197L) were found to be smaller in size in comparison to the wild type which may lead to loss of interactions with other molecules or residues. The hydrophobicity of the wild-type and mutant residue differs, there were 6 SNPs mentioned in project hope report (D69V, R72W, V186D, L433P, H478L, D483G) that has mutant amino acid with more hydrophobicity that could lead to loss of hydrogen bonds and/or disturb correct folding. There is a difference in charge between the wild-type and mutant amino acid, with three SNPS(D69V, R72W, D483G) mentioned in the report to have different nerutral charge and one SNPs (V186D) with negative charge in the mutant amino acids which could cause a loss of interactions with other molecules or residues.
Different biological processes are controlled by transcriptional regulation done by non-coding RNA molecules (miRNA) consisting of 18–24 nucleotides in length which could activate and/or suppress protein translation inside the cells at post-transcriptional level.(49–52) it was found in gene card site (https://www.genecards.org/) that DCAF17 gene may be dysregulated by miRNA, so we analyzed the microRNA in the 3′-UTR region of DCAF17 gene using multiple softwares (ensemble, regRNA2, miRmap and mirbase). We found one relevant miRNA; (hsa-miR-4436b-3p) with PhyloP conservation score of 0.71 and after analyzing its annotation through miRmap using sequence from 5 species human,mouse,worm,fly and Arabidopsis it appeared to have a role in inhibiting adipose triglyceride lipase, carcinogenesis and other possible roles.
This study was limited as it depended in computerized software and it need further wet lab studies for confirmation.
We now appreciate the regulatory role of the 3′-UTR in the function of DCAF17 protein and with this 10 novel mutation we could develop a precise biomarker for Woodhouse-Sakati Syndrome in the future.
Conclusion
Ten novel SNPs were found to be deleterious in this paper with possible role as future biomarker for Woodhouse-Sakati syndrome, and one microRNA was found in the 3′-UTR to regulate the function of the protein.
Conflict of Interest
No conflict of interest to declare.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Acknowledgment
We appreciate the kind help and assistance we had in Africa City of Technology.