Bioinformatic analysis of human ZPR1 gene pathogenic exome mutations

Advanced sequencing technologies enable rapid detection of sequence variants, aiming to uncover the molecular foundations of human genetic disorders. The challenge lies in interpreting the influence of new exome variants that lead to diverse phenotypes. Our study introduces a detailed, multi-tiered method for assessing the impact of novel variants, particularly focusing on the zinc finger protein 1 (ZPR1) gene. Herein, we employed a combination of variant effect predictors, protein stability analyses, and the American College of Medical Genetics and Association of Molecular Pathology (ACMG/AMP) guidelines. Our structural analysis pinpoints specific amino acid residues in the ZPR1 zinc finger domains that are sensitive to changes, distinguishing between benign and disease-causing coding variants using rigorous in silico tools. We examined 223 germline ZPR1 exome variants, uncovering significant ethnic disparities in the frequency of heterozygous harmful ZPR1 variants, ranging from 0.04% in the Ashkenazi Jewish population to 0.34% in African/African Americans. Additionally, the discovery of three homozygous carriers in European and South Asian groups suggests a higher occurrence of ZPR1 variants in these demographics, meriting further exploration. This research provides insights into the prevalence and implications of amino acid substitutions in the ZPR1 protein.


Introduction
The zinc finger protein 1 (ZPR1) gene in humans produces a protein comprising 459 amino acids.This protein is recognized as isoform I and represents the primary version of ZPR1 [1].ZPR1 is a C4-type zinc finger protein and part of a family of zinc finger proteins linked to various conditions, including cancer, non-alcoholic fatty liver disease (NAFLD), type 2 diabetes mellitus (T2DM), and other genetic disorders [2].The gene's cytogenetic location is on chromosome 11 at the specific site 11q23.3.Its coordinates are Chr11: 11:116,773,799-116,788,023 on the reverse strand, according to the GRCh38.p13[3].
In 2018, Ito et al. reported a novel rare autosomal recessive disorder (RARD) observed among four children in three New Mexican Hispanic ancestral families.A proband from one of these families had an exome sequence that showed a homozygous c.587T>C (p.Ile196Thr) mutation in the ZPR1 gene [4].Further analysis revealed that this mutation was present in a heterozygous state among the parents and unaffected siblings [4].From a clinical perspective, patients with homozygous mutations exhibited a systemic syndrome marked by growth limitations both before and after birth, inborn hair loss, kidney dysfunction, developmental delays, hearing impairment, and increased early mortality.In another family, it was found that the parents of an affected individual were heterozygous carriers of the same ZPR1 mutation, and there were no other siblings with a homozygous manifestation of the condition [4].The fact that both families resided in the Rio Grande Valley area of New Mexico suggests the possibility of a founder effect mutation [4] combined with endogamous practices [5,6].
It is thought that the function of the ZPR1 gene is similar to other genes linked to primordial dwarfism, which negatively impacts cell cycle progression and cell proliferation [4,7].For example, mutations present in the ORC1, ORC4, ORC6, CDT1, and CDC6 genes alter G1/S transition and S phase progression by perturbing the pre-replication complex and predisposition to Meier-Gorlin syndrome [8,9], and mutations present in the ATR gene alter S phase progression and predisposition to Seckel syndrome type 1 disease [10,11].The missense mutation in the ZPR1 gene c.587T>C (p.Ile196Thr) has been shown to stop the cell cycle from moving past the G1 phase [4], which is similar to genes linked to primordial dwarfism.At least two studies have indicated that ZPR1 protein primarily resides in the cytoplasm of quiescent mammalian cells but is translocated into the nucleus after binding eukaryotic translation elongation factor 1A (eEF1A), a process regulated by epidermal growth factor (EGF) and disrupts the interaction with epidermal growth factor receptor (EGFR) [12,13].Another study indicated that the ZPR1 protein binds to the cytoplasmic tyrosine kinase domain of EGFR via two zinc finger structural motifs and accumulates in the nucleolus of proliferating cells [14].Collectively, these findings imply that the ZPR1 protein plays a crucial role in the nucleolar function of proliferating cells.[13].
It has been reported that decreased expression of ZPR1 in humans contributes to spinal muscular atrophy (SMA), a neurodegenerative condition.In cells deficient in ZPR1, the nuclear positioning of survival motor neurons (SMN) and the NPAT transcription factor is disturbed, leading to a blockage in S phase progression and arrests in both the G1 and G2 phases.[15].These changes in subnuclear architecture and cell cycle progression could be attributed to transcriptional defects [15].In normal proliferating cells, the ZPR1 protein spreads throughout the cell during the G1 and G2/M phases and redistributes to the nucleus during the S phase [16].The eEF1A has a conserved binding epitope required for normal cell growth, proliferation, and cell cycle progression.Based on a previous study [17], the ZPR1 protein can form a complex with the GDP-bound eEF1A to regulate the activation of eEF1A.Also, it can form a complex with the SMN protein in response to stimuli induced by EGF.The ZPR1 domains are structurally different from each other, which leads to functional divergence that changes how the ZPR1 protein interacts with those two complexes [17].Nonetheless, the truncated mutant ZPR1-ΔA (193-246) can still bind eEF1A in the absence of Zn2+ [18].To date, the interactions between ZPR1 and eEF1A2 continue to represent an incredibly critical research area concerning motor neuron biology [19].A study has indicated that the ZPR1 protein is capable of binding to RNA polymerase II to increase the expression of the SMN2 gene [20], and overexpressing the ZPR1 protein can raise the levels of SMN throughout the body and prevent the disease from starting in SMA mice [20].
Besides the previously mentioned RARD linked to ZPR1 deficiency, a prevalent mutation, rs964184, has related to abnormal glucose metabolism and Type 2 Diabetes Mellitus (T2DM) in Japanese populations [21].Another frequent mutation, rs2075294, located in the ZPR1 gene within the APOA1/C3/A4/A5-ZPR1-BUD13 gene cluster, is associated with dyslipidemia [22].Supporting these observations, a separate study [2] showed that the ZPR1 protein interacts with high-fat diets, influencing energy metabolism and neurological degeneration.Furthermore, ZPR1 expression levels in breast cancer tissues were significantly elevated compared to adjacent nontumorous tissues (P < 0.001), suggesting a potential role of ZPR1 gene mutations in promoting the spread of breast cancer through cell invasion and migration, possibly via the ERK/GSK3/Snail signaling pathway [23].Finally, a reduction in ZPR1 gene expression is linked with the development of non-small cell lung cancer (NSCLC), indicating that the ZPR1 protein might suppress proliferation and invasion of NSCLC cells by inhibiting the FAK-AKT signaling pathway [24].
Since the ZPR1 protein has a significant role in multiple human diseases, it is important to know the functional effects of the potential pathogenic mutations in the gene.We looked at the ZPR1 pathogenic exome mutations in a large-scale quantitative population genetics and ethnogeographic way in this study.Using the data from the Genome Aggregation Database (gnomAD), a large publicly accessible database of population variation derived from harmonized sequencing data, we predicted the effects of 223 rare ZPR1 exome mutations using four computational methods: SIFT, Polyphen-HVAR, FoldX, and Consurf.The obtained results have improved our understanding of the molecular basis of how the ZPR1 gene or protein, and the functional-related binding partner proteins interact and predispose to disease susceptibility.

Subjects of analysis and data availability
The exome data of the dataset GRCH37/hg19 used in this study is publicly and openly available in the Genome Aggregation Database V 2.1.1 at https://gnomad.broadinstitute.org/.The data was collected on March 12, 2021.

Bioinformatic analyses of ZPR1 gene mutations
ANNOVAR [25], a software package consisting of a series of bioinformatic tools, was used in the bioinformatic analyses.Specifically, to annotate the functions of the ZPR1 gene mutations, the script annotate_variation.plwas used with GRCH37/hg19.The gnomAD allele frequency for exome variants was used for gnomad_exome.For the impact scores for the mutations, the script table_annovar.pl was used to run two variant effect predictor (VEP) algorithms: SIFT [26] and Polyphen2_HVAR [27].To achieve a stringent classification of the pathogenic and neutral mutations, the default thresholds of the calculated scores were adjusted based on a previous study [26,27].For SIFT, a score of 0.50-1.00and a score of 0.0-0.49were set for neutral and pathogenic mutations, respectively.For polyphen2_HVAR, a score of 0.00-0.89and a score of 0.91-1.00were set for neural and pathogenic mutations, respectively.We used ten more VEP algorithms: SIFT4G [40], Polyphen2_HDIV [27], LRT [41], MutationTaster [42], MutationAssessor [43], PROVEAN [44], M-CAP [33], BayesDel [34], ClinPred [35], and FATHMM-MKL [36] to look into the effects of the pathogenic mutations found by the SIFT and Polyphen2_HVAR scores.
Scores from each of these ten VEP algorithms were calculated using the script table_annovar.pl with the dbnsfp41a keyword specification.To be aware of, this ANNOVAR analysis did not include exome mutations that gnomAD had flagged.All the encoded protein products involved in the present study correspond to isoform I. Mutations leading to a different isoform were not included.

Protein evolutionary conservation analysis
The ConSurf server (https://consurf.tau.ac.il/) [28] scores the evolutionary conservation based on an empirical Bayesian algorithm and the phylogenic relations between close homologous amino acid sequences that are calculated with the Maximum Likelihood (ML) method [29].It was used to analyze the evolutionary conservation of the ZPR1 protein.In this analysis, a conservation score of 1-9 was obtained for each residue position.The positions were classified based on the scores as fluid (1-4), intermediary (5-6), or highly conserved (7)(8)(9).Based on recommendations from the previous studies, a residue at a position that is highly conserved and exposed was proposed to be functional; a residue at a position that is highly conserved but buried was proposed to be structural.

Structural analysis of the ZPR1 protein
Iterative Threading Assembly Refinement (I-TASSER) [30] was used to model the structure of the ZPR1 protein with no specific assumptions.The modeling conducted with I-TASSER is based on a fold recognition that meta-threads the amino acid sequence of ZPR1 against the PDB (Protein Data Bank) library to identify the homolog template(s).The major threading template I-TASSER identified in this modeling is Mus musculus ZPR1 (PDBID: 2qkdA) [31].The structural stability of the ZPR1 protein resulting from the gene mutations was predicted by calculating the free energy change (ΔΔG) of folding using the software FoldX 5.0 [49].

Statistical analysis
A Chi-square test using one-sided P-values was performed to determine the significant differences between variant frequencies and gender distributions.A resultant value less than 0.05 was considered to indicate a significant difference.

Compliance with ethical standards
Written informed consent was obtained from each participant before participation, in accordance with the Declaration of Helsinki.
The dataset, excluding the 'other' category, totals 122,678 individuals from six specific ethnic groups, providing a rich resource for population-genetic and disease-oriented research.Within this dataset, 223 rare exome mutations in the ZPR1 gene (ZEMs) were identified across these six groups, showing a minor allele frequency (MAF) ranging from 3.98e-08 to 0.0658.Notably, only two of these mutations, A419V (rs144966144) and I196T (rs368697578), are listed in the ClinVar database [32].These rare ZEMs were found in 344 individuals as heterozygous carriers and 5 as homozygous.
We analyzed the 223 rare ZEMs by calculating their SIFT scores and PolyPhen-2_HVAR scores.
According to our findings (see Supplementary Information 1), 60 of these mutations were identified as pathogenic.Notably, the I196T mutation (rs368697578) that was first found in ClinVar was correctly identified as harmful by both SIFT and PolyPhen-2_HVAR.The A419V mutation (rs144966144), which was also found in ClinVar annotated as benign and in conflict with these results.The results also indicated that the PolyPhen-2_HVAR score of 0.729, which means it might be harmful, but a SIFT score of 0.46 meant it was probably tolerated.As a result, PolyPhen-2_HVAR correctly identified this mutation while SIFT did not.Beyond these, both SIFT and PolyPhen-2 categorized 119 mutations as tolerated.The remaining 44 rare ZEMs could not be conclusively classified and therefore excluded from further analysis.
To discern between pathogenic and tolerated mutations in the ZPR1 gene, we employed ten additional algorithms, with the findings detailed in Supplementary Information (SI) sections 3 and 4. Figure 1 in our report illustrates the location of each amino acid in the ZPR1 protein, including the N-disordered region (amino acids 1-32), ZnF1 (51-83), and ZnF2 (259-291) regions.The protein is further divided into two domains: the A-domain (amino acids 100-234) and the Bdomain (amino acids 308-438), as previously outlined [4].Notably, a number of pathogenic ZPR1 exome mutations (ZEMs) (25 out of 60) are located within the A-domain, which is known for its interaction with eEF1A.The B-domain contains the second highest number of pathogenic ZEMs (15 out of 60) (SI 8).The mutation I196T (rs368697578), associated with a rare ZPR1 disorder [4], is situated in the A-domain.In terms of pathogenic ZEMs distribution, the highest count is in the region encoding the A-domain (236 out of 349 individuals), followed by mutations in the Bdomain (26 individuals), the ZnF1 and ZnF2 domains (16 individuals), and other areas (16 individuals), as elaborated in SI 8.
In summary, in this analysis, the gnomAD data containing the information from 122,678 individuals was sampled.223 ZEMs were found in 349 individuals.60 of these rare ZEMs were classified as pathogenic, and 119 were classified as tolerated.In another prediction, the effects of the classified mutations were made clearer.Looking at all those who were sampled, 90% of the pathogenic ZEMs (54/60) are found in 44/122,678 of the population, while 10% (6/60) are found in 227/122,678.

Ethnic differences in the prevalence of ZPR1 pathogenic mutations
The prevalence of heterozygous and homozygous individuals carrying the identified ZPR1 pathogenic mutations are shown in Table 1 by ethnic groups.Also, the data for the neutral mutations is shown in SI (4).
In summary, out of the 60 pathogenic mutations identified, 28 were exclusive to Europeans, three to Latino/Americans, eight to South Asians, and seven to African/African Americans, while none were specific to Ashkenazi Jewish or East Asian populations (as shown in Table 1).Additionally, 46 of these pathogenic mutations displayed an ethnic-specific pattern and found in 67 heterozygous carriers and 2 homozygous carriers.Notably, both of the homozygous carriers were of South Asian descent.

Evolutionary conservation analysis of the ZPR1 protein
Mutations that destabilize proteins often result in degradation by the proteasome, leading to reduced protein levels in cells [45,46], and can also cause misfolding and/or aggregation [48].Furthermore, disturbances in protein stability may act as a contributing factor in diseases linked to genes with haploinsufficiency [47].The relationship between the severity of diseases and structurally destabilizing mutations in genetic or monogenic disorders has been substantiated by Scheller et al. [46] and Yue et al. [45].To investigate to what extent the 179, including 60 pathogenic and 119 tolerated, rare ZEMs, affect stability of the protein structure, we calculated the difference in free energy of folding (ΔΔG in kcal/mol) between a wild-type ZPR1 residue and the corresponding mutant, i.e., ΔΔG = ΔGmut -ΔGwt, using FoldX [49], an algorithm that evaluates the effects of mutations using an empirical force field where the side chains are allowed to move but not the backbone.A positive ΔΔG indicates the destabilization effect, while a negative indicates stabilization.The experimental error twilight zone is ΔΔG = ± 0.5 kcal/mol, so mutations with the calculated ΔΔG = ± 0.5 kcal/mol are defined as twilight mutations [50].The same metric was adopted in our test.Also, mutations with ΔΔG up to 1 kcal/mol have been described as "mild destabilizers" [50].This metric was adopted too.Mutants with ΔΔG > 2 kcal/mol were found to make the folded state less stable, which could not be told apart from mutants that cause disease [51].In our test, a mutation is considered significant if ΔΔG > 1.5 kcal/mol.Residues with ΔΔG > 1.5 kcal/mol were identified and shown in Figure 3.
Based on recommendations from the previous studies [52], a residue at a position that is highly conserved and exposed was proposed to be functional; a residue at a position that is highly conserved but buried was proposed to be structural.In the ZPR1 protein, residue at position 196 is buried in the sixth beta-strand of the A-domain that occupies the hydrophobic core [4].The hydrophobic core network consists of residues I196, L149, L194, A123, V117, Y103, and V205 [4].A mutation changing this residue will disrupt this network [4].In our test, a ΔΔG of 2.3 kcal/mol, indicating a disease-causing mutant, was obtained for mutation p.IIe196Thr.Using ConSurf [28], we confirmed that the residue at the position of 196 is buried and structurally important (SI (7)).A previous study demonstrated that the mutation p.Ile196Thr impedes cell cycle progression beyond the G1 phase [4].In comparison to control fibroblast cells, hardly any cells from affected patients managed to progress through the cell cycle, and only a minor fraction of the patient's cells was observed in the late S phase and G2/M phases [4].This residue is found in the ZnF1-A domain of the ZPR1 protein.This domain is particularly important for binding eEF1A, which is needed for normal cell growth, proliferation, and cycle progression.Therefore, the mutation p.Ile196Thr might play a significant role in the interaction with eEF1A [14].
In the case of a monomeric protein, FoldX predicts its overall stability by considering a range of energy contributions.These include backbone and side chain hydrogen bonds, van der Waals and electrostatic interactions, penalties for burying polar and hydrophobic groups, inter-residue van der Waals clashes, entropic costs for fixing side chains and the main chain, the energy cost of cis peptide bonds, intra-residue van der Waals torsional clashes, electrostatics from helix dipoles, water bridges, disulfide bonds, partial covalent bonds (such as metal interactions), ionization energy, and the count of residues [49].The side chain atoms of the mutated residue play a notable role in ΔΔG calculation [49].Utilizing Python 3.10.2[53], we analyzed the Pearson correlation between ΔΔG and various FoldX energy terms.We discovered a moderate correlation with torsional clashes (from intra-residue van der Waals torsional clashes; Pearson correlation 0.49); and a strong correlation with van der Waals clashes (Pearson correlation 0.86).In terms of solvation polarity, the correlation was good (0.41).Our data indicated that torsional and van der Waals clashes had more influence on ΔΔG than other factors.This observation, where one or two energy terms provide a clearer trend than the total energy, aligns with findings from a previous study on laccase mutants [54].Overall, our FoldX results are consistent with prior experimental findings for ZPR1, except for those related to the disordered region [49].This accounts for the high ΔΔG value (20.2 kcal/mol) for the A21V mutation, which occurs in the disordered region.

Discussion
This study analyzed 223 rare ZEMs, all with low minor allele frequencies (MAF) ranging from 3.98e-8 to 0.0658.A major portion (94.1%) had MAFs between 3.98e-06 and 9.55e-05.Despite previous suggestions [55] that pathogenicity is not always tied to extremely low frequencies, the mutation p.Ala264Val (rs35120633) associated with plasma triglyceride levels [36] and identified as pathogenic in our study had the highest MAF at 0.0658.Ethnic disparities were evident among the 60 pathogenic ZEMs identified, with over 80% found in the African/African American population (SI 3).Three homozygous carriers of rare ZEMs were detected, two in South Asians and one in a European, possibly indicating late-onset or non-fully penetrant diseases [37].
Interestingly, the ZPR1 gene was not mentioned in ORPHANET [56], the portal for rare diseases and orphan drugs, as of January 10, 2022.
Our analysis examined the prevalence of ZPR1 mutations across various ethnic groups using data from gnomAD, the largest exome and genome sequencing database available.This approach minimized bias [38] and provided significant statistical results [39], though it lacked representation from Arab, Pacific Island, and Native Australian groups [37].The study underlines the need for further clinical and in vitro functional characterizations to confirm the pathogenic nature of these mutations.
We assessed the impact of amino acid substitutions from the 223 rare ZEMs on protein stability, showing their positions in the ZPR1 protein domains (Fig. 1) and providing a basis for future functional analysis.Using FoldX software, we calculated the ΔΔG for these mutations.Although FoldX is highly regarded for identifying pathogenic mutations and efficient in performance [51], caution is advised in interpreting its results, especially in disordered regions.Our results showed that more than half of the ACMG/ACP variants changed the structure of the protein.This shows that predicted structural changes and VEP algorithms are similar.
This study meets the ACMG/AMP evidence standards by putting all exome variants in ZPR1 from the gnomAD database into two groups: those that cause disease and those that don't.We used SIFT and PolyPhen-2-HVAR consensus to find 60 pathogenic, 119 neutral, and 44 variants of unknown significance (VUS).The increased use of multiple VEPs, however, showed limitations in reducing VUS, particularly for variants not previously associated with disease [53].The frequency of ZPR1 exome variants challenges the traditional understanding of genetic causality based on MAF.We saw a strong link between ΔΔG and steric clashes.It is important to note that variants with a ΔΔG of at least ±1.5 kcal/mol require more research (SI 6).The study highlights the importance of genetic analysis before marriage as a preventive measure against monogenic disorders and ARDrelated burdens.Our data can be a valuable resource for population genetic screening programs, especially in studying autosomal recessive disorders that are typically under-researched.

Fig 1 .
Fig 1. ZPR1 protein model generated with the software PROTTER [57].This model indicates the locations of ancestral amino acids, including the location of the pathogenic ZEMs.

Fig 2 .Fig 3 .
Fig 2. The distribution of pathogenic variants among different ethnic groups in at least five individuals.

Table 2 .
Ethnic differences and global prevalence of neutral ZPR1 mutations.