LIM domain-wide comprehensive mutagenesis reveals the role of leucine in CSRP3 protein stability

Cardiomyopathies are a severe and chronic cardiovascular burden worldwide, affecting a large cohort in the general population. Cysteine and glycine-rich protein 3 (CSRP3) is one of key proteins implicated in dominant dilated cardiomyopathy (DCM) and hypertrophic cardiomyopathy (HCM). In this study, we device a rapid in-silico screening protocol that creates a mutational landscape map for all possible allowed and disallowed substitutions in the protein of interest. This map provides the structural and functional insights on the stability of LIM domains of CSRP3. Further, the sequence analysis delineates the eukaryotic CSRP3 protein orthologs which complements the mutational map. Next, we also evaluated the effect of HCM/DCM mutations on these domains. One of highly destabilising mutations - L44P (also disease causing) and a neutral mutation - L44M were further subjected to molecular dynamics (MD) simulations. The results establish that L44P substitution affects the LIM domain structure. The present study provides a useful perspective to our understanding of the role of mutations in the CSRP3 LIM domains and their evolution. Experimentally verifying every reported mutation can become challenging both in time and resources used. This study provides a novel screening method for quick identification of key mutation sites for specific protein structures that can reduce the burden on experimental research.


Introduction
Cysteine and glycine-rich proteins (CSRPs) belong to the LIM-only domain family proteins.
Three proteins (CSRP1, CSRP2 and CSRP3/MLP) are members of this group characterised by the presence of two LIM domains. The LIM domains within the CRPs are separated by a 50-60 amino acid linker region. The cysteine and glycine-rich protein 3 (CSRP3), also called Muscle lim protein (MLP), is a dual role mechanosensor that shuttles between the nucleus and cytoplasm in cardiac myocytes 1,2 . It is singularly expressed in the heart and skeletal muscle 3 . It is also a scaffold protein involved in multiple protein-protein interactions within the Z-disc, including actin-binding protein α -actinin, titin-binding protein telethonin, and myogenic transcription factors like myoblast determination protein1 (MyoD) 4-6 . Mutations in the human CSRP3 gene exhibit dominant dilated cardiomyopathy (DCM) and hypertrophic cardiomyopathy (HCM) phenotypes 5,7-9 . HCM and DCM are severe and chronic diseases affecting an estimated 1:500 and 1:250 individuals in the general population 10,11 . Therefore, focused studies on the genes involved in these diseases are essential for therapeutic reasons.
The LIM domain bears a unique sequence motif containing two independent zinc fingers that function as the building block for protein interactions. This motif was first named after Lin-11, Isl-1, and Mec-3 proteins from C. elegans, rat, and C. elegans, respectively for mutations to change protein charge and cause protein de-stability. Even amino acids with comparable properties can affect stability (e.g., L44P, Y57S and V127I).   T  a  b  l  e  1  H  C  M  /  D  C  M  m  u  t  a  t  i  o  n  s  a  n  d  t  h  e  i  r  s  e  v  e  r  i  t  y  b  a  s  e  d  o  n  P  R  O  V  E  A  N  a  n  d  P  o  l  y  p  h  e  n  2  p  r  e  d  i  c  t  i  o  n  .   a   C  h  a  n  g  e  i  n  c  h  a  r  g  e  i  n  t  h  e  m  u  t  a  t  i  o  n  r  e  s  i  d  u  e  s  .  A  s  t  e  r  i  s  k  s  (  *  )  m  a  r  k  e  d  s  u  b  s  t  i  t  u  t  i  o  n  s  a  r  e  p  r  e  d  i  c  t  e  d  t  o  b  e  d  e  -s  t  a  b  i  l  i  s  i  n  g  i  n  t  h  e  s  t  a  b  i  l  i  t  y  a  n  a  l  y  s  i Correlation map analysis of LIM1 highlights differential dynamics of WT, L44M

and L44P
We carried out cross-correlation dynamics in the WT, L44M and L44P mutant trajectories.
The inter-residue cross-correlation map revealed that WT shows visible differences compared to L44P, whereas the cross-correlation map of WT and L44M bear higher resemblance (Fig.   3). Focused analysis for residues 43, 44 and 45 shows substantial variations in the crosscorrelation. It should be noted that mutation of leucine to proline at position 44 imposes a greater loss of contacts in distant regions of LIM1 domain as compared to the wildtype and L44M replacement (Fig. 3). The inter residue contact of WT, L44M and L44P were projected on two-dimensional map ( Supplementary Information Fig. S2). We observed no stark difference in the contacts of WT, L44M and L44P trajectories. These findings suggest that L44P mutational effect on residue interactions is not extreme in the vicinity of L44 but rather has long range effects in the LIM1 domain.

Loss of secondary structure in L44P mutant trajectory timeline
Since previous result point to difference in inter-residue cross correlation in L44M and L44P mutations, secondary structure analysis of WT, L44P and L44M trajectories was carried out. mutation, while is still intact in the WT throughout the 100ns timeline. In addition, there is a significant loss of isolated bridge conformation at position 43 in the L44P trajectory. Figure   4a also highlights the significance of the more acceptable substitution of L44M. In the case of the L44M, whilst the extended β -strand conformation is disrupted, the isolated bridge conformation is preserved throughout the trajectory timeline.
In LIM1 domain structure, hydrophobic and hydrogen bond interactions are affected in the L44P substitution (Fig. 4b)

CSRP3 sequence orthologs complements the mutational map
Earlier stability result indicates that LIM1 and LIM2 domain have differential mutational tolerance. We carried out sequence conservation analysis in eukaryotes to specify LIM1 and LIM2 domains in CSRP3 protein. Human CSRP3 protein orthologs sequences from 30 representative eukaryotes were downloaded from NCBI and subjected to multiple sequence alignment (MSA) for sequence conservation analysis. These eukaryotes varied from fishes, amphibians, birds to primates (Supplementary Information Table S1). Alignment of these sequences highlighted the strongly conserved nature of the CSRP3 protein (Fig. 5a). The conserved positions have also been mapped on the three-dimensional structure of LIM1 and LIM2 (Fig. 5b). Mutational map and sequence map decipher structurally critical residues in the LIM domains. The majority of the residue substituted in eukaryotes CSRP3 LIMs were neutral on the mutational landscape (see Fig. 2a analysis identified that many of the substitutions were localised to a particular taxon, therefore to assess evolution of CSRP3, we performed phylogeny analysis on the named sequences.

CSRP3 evolution follows ancestral timescale
Maximum Likelihood (ML) phylogeny construction of representative eukaryotes revealed that CSRP3 evolved according to the evolutionary timescale in most of them (Fig. 5c).
Fishes, birds, marsupials, rodents, placentals, carnivores and primates clustered separately and this signifies that CSRP3 evolved in the evolutionary timescale. However, it should be noted that the Burmese python clustered outside the reptiles' clade (Fig. 5c). This anomaly we believe may have risen due to uneven distribution of sequences across the reptile family.
In addition, we separately explored the phylogeny of LIM1 and LIM2 across these representative eukaryotes to see the closeness in these domains ( Supplementary Information   Fig. S4). In the phylogeny, LIM1 and LIM2 form two distinct cluster and the separation indicates that LIM1 and LIM2 have evolved distinctly. .

Discussion and Conclusions
Cardiomyopathies are predominant cardiovascular diseases primarily caused by mutations in sarcomeric proteins that affect muscle contraction-relaxation activity 10,11 . Hypertrophic cardiomyopathy and dilated cardiomyopathy are highly prevalent and estimated to affect 1:500 and 1:250 individuals among the general population 10,11 . Majority of the therapeutic efforts are directed towards providing only a symptomatic relief to the patients through administration of beta-and calcium blockers, blood thinners and heart rhythm drugs Mutations in CSRP3 protein are reported to cause HCM and DCM diseases 5,7-9 . CSRP3 is a scaffolding, dual role mechanosensor protein that shuttles between the nucleus and cytoplasm in cardiac myocytes 1,2 . It has been shown to interact with multiple proteins such as actinbinding proteins, titin-binding proteins and myogenic transcription factors 4-6 . Previous studies have shown that mutations in key positions affect CSRP3 protein-protein interactions and causes protein depletion mediated by proteosome action (for example: L44P, C58G and S54R/E55G) which resulted in hypertrophic cardiomyopathy in mice models 9 . Majority of the reported mutations lie in the LIM1 domain of CSRP3 protein 19,23 . Therefore, it is imperative to conduct a global study of mutational effects across the LIM domains.
Seventeen mutations from HGMD database and a published study 19 were selected and 11 of these were predicted to be deleterious. We narrowed down to 11 mutations that were found to be specific to the LIM domains. Mutational landscape mapping of these 11 mutations through FoldX analysis revealed that 2 of these were highly destabilising, 4 were slightly destabilising and 5 were neutral. Out of the two highly destabilising mutations, L44P and C58G, we focussed on structural importance of L44 position as this has been overlooked by several In conclusion, the methodology presented in this study provides a blueprint for targeting essential mutations in proteins that are key for structural stability pertaining to disease phenotypes. For example, L44P was identified as a key position for structural stability in CSRP3 involved in HCM/DCM. Such a method can be employed as a prerequisite for future research on genetic analysis of important diseases such as Huntington's disease, cystic fibrosis, and other neurodegenerative disorders.

Selection of HCM/DCM point mutations and prediction of deleterious mutations
The Human Gene Mutation Database (HGMD) 23 database and the data from an earlier study 19 were used to retrieve cardiomyopathy point mutations reported in the CSRP3 gene. The above cardiomyopathy disease-causing mutations list was checked whether a point mutation is expected to be benign or damaging. For this, we utilised the Protein Variation Effect Analyzer v1.1 (PROVEAN v1.1) and Polymorphism phenotyping 2 (Polyphen2) algorithm tools 27,28 . These methods rely on either sequence information or structural information, or both, to predict the functional impact of SNPs.

Sorting tolerant from intolerant
The PROVEAN is a generalised trained computational tool to predict the functional effects of single or multiple amino acid substitutions in protein sequences 27 . The CSRP3 protein sequence, alongside the previously obtained cardiomyopathy mutations, were uploaded to the server, using the default settings. Mutations were assigned neutral or deleterious based on their alignment score value of more or less than -2.5. PolyPhen 2 (Polymorphism Phenotyping v2) uses eight sequence-based and three structure-based predictive features to predict the impact of protein sequence variants 28 . It utilises naive Bayes posterior probability to provide accurate predictions with three possible outputs, namely -probably damaging, possibly damaging or benign. The CSRP3 protein sequence and cardiomyopathy SNPs were submitted as queries to the polyphen2 server. The outcome levels were qualitatively assigned benign, possibly damaging and probably damaging from the default output.

Structural stability of disease mutations
We

Mutational landscape of CSRP3 LIM domain
Each of the CSRP3 LIM residues were mutated to all other 19 amino acid residues (1083= 57 residues × 19 possible mutations per residue) using FoldX 4.0. FoldX was preferred to accomplish this high throughput task due to its relatively high accuracy among fast algorithms. The same FoldX procedure as mentioned above was followed in this case too including the mutations stability categorisation protocol.

Protein preparation
The first conformer of CSRP3 LIM domain from the NMR structure (PDB ID 2O10) was used for structural analysis. Mutant structures (L44P & L44M) were generated from the original structure in the Maestro package (Schrödinger Release 2020: Maestro, Schrödinger, LLC, New York, NY, 2020). All structures (WT and mutants) were minimised at pH 6.8 using PROPKA from Protein Preparation Wizard. Each structure was restrain minimised using the OPLS3e force field.

Protein solvation
Each restrain minimised structure was solvated with the TIP3P solvent system in the System builder from Desmond module of Schrodinger 30 . Orthorhombic box shape was used for boundary conditions having a buffer distance of 10 and the box volume was minimised.
The system was neutralised with either K + or CLions, and additional 150 mM KCL salt was added. OPLS3e force field 31 was assigned for the run. The output generated by the System Builder was used for Molecular Dynamics production run using the built in Molecular Dynamics package.

MD simulations
The default relaxation protocol was followed for the solvated system from the previous step.
After relaxation, production MD was executed in NPT constraint parameters using the OPLS3e force field. The default settings of RESPA integrator 32 (2 femtoseconds time step for bonded or near non bonded interactions and six femtoseconds for far non bonded interactions) were incorporated for the simulation. The nose-Hoover thermostat algorithm was used to keep the temperature at 300 K 33 . Similarly, the pressure was kept at 1 bar using method Martyna-Tobias-Klein method 34 . The production MD was simulated for 100 nanoseconds in triple replicates.

Correlation Analysis
Correlation analysis is a crucial method in MD analysis. This method can provide information about the impact of the amino acid substitution on the protein dynamics and specify which residues are involved in the structural changes and their role. Cross-correlation maps of the residues' motion were used to identify the regions moving in or out of phase in the MD trajectory (also see Supplementary information material and methods). An interresidue cross-correlation based on the number of contacts rather than molecular fluctuations was also analysed. The value of residue pair can vary from -1 (completely anticorrelated motion) to +1 (completely correlated motion).

Simulation event analysis
MD trajectories were analysed using simulation interaction diagram ( I  M  p  r  o  t  e  i  n  p  r  o  m  o  t  e  s  m  y  o  g  e  n  e  s  i  s   b  y  e  n  h  a  n  c  i  n  g  t  h  e  a  c  t  i  v  i  t  y  o  f  M  y  o  D  .   M  o  l  e  c  u  l  a  r  a  n  d  C  e  l  l  u  l  a  r  B  i  o  l  o  g  y  1  7   ,  4  7  5  0  -4  7  6  0  (  1  9  9  7  ) .  A  n  c  i  e  n  ,  F  .  ,  P  u  c  c  i  ,  F  .  ,  G  o  d  f  r  o  i  d  ,  M  .  &  R  o  o  m  a  n  ,  M  .  P  r  e  d  i  c  t  i  o  n  a  n  d  i  n  t  e  r  p  r  e  t  a  t  i  o  n  o  f  d  e  l  e  t  e  r  i  o  u  s   c  o  d  i  n  g  v  a  r  i  a  n  t  s  i  n  t  e  r  m  s  o  f  p  r  o  t  e  i  n  s  t  r  u  c  t  u  r  a  l  s  t  a  b  i  l  i  t  y  .   S  c  i  e  n  t  i  f  i  c  R  e  p  o  r  t  s  8 ,