In silico screening of TMPRSS2 SNPs that affect its binding with SARS-CoV2 spike protein and directly involved in the interaction affinity changes

In this paper, we used in silico analysis to shed light on the possible interaction between TMPRSS2 and SARS-CoV2 spike (S) protein by examining the role of TMPRSS2 single nucleotide polymorphisms (SNPs) in relation with susceptibility and inter-individual variability of SARS-CoV2 infection. First, we used molecular docking of human TMPRSS2 protein to predict the binding site of TMPRSS2, especially the TMPRSS2 link loops, in order to assess the effect TMPRSS2 SNPs. The latter lead to missense variants on the interaction between TMPRSS2 and SARS-CoV2 S protein. In a second step, we further refine our analysis by performing a structure-function analysis of the complexes using PyMol software, and finally by MD simulations to validate the as-obtained results. Our findings show that 17 SNPs among the 692 natural TMPRSS2 coding variants are in positions to influence the binding of TMPRSS2 with the viral S protein. All of them give more important interaction energy as assessed by docking. Among the 17 SNPs, four missense variants E389A, K392Q, T393S and Q438E lead to “directly increasing” the interaction affinity and 2 missense variants R470I and Y416C cause it “directly decreasing”. The R470I and Y416C present in African and American population, respectively. While the other 4 SNP variants (E389A; K392Q; T393S and Q438E) are present only in the European population, which could link the viral infection susceptibility to demographic, geographic and genetic factors.


INTRODUCTION 50
The pathogenesis of the coronavirus disease (COVID-19) is triggered by the entry of 51 SARS-CoV2 via the spike protein into angiotensin-converting enzyme 2 (ACE2)-bearing 52 host cells, especially pneumocytes, resulting in overactivation of the immune system 53 (cytokine storm), which attacks the infected cells and damages the lung tissue (Hakmi 54 et al.,2020). Cell entry of the betacoronaviruses, depends on the binding of the surface 55 unit, S1, of the viral spike protein to ACE2 receptor, which facilitates viral attachment to 56 the surface of the target cells. Moreover, to fuse membranes, the S protein needs to be 57 proteolytically activated at the S1/S2 boundary, such that S1 dissociates and S2 undergoes 58 a radical structural modification, therefore, viral entry requires not only the binding to 59 the ACE2 receptor but also the priming of the viral S protein by the transmembrane 60 protease serine 2 (TMPRSS2), which cleaves the S proteins at the S1/S2 and S2 sites 61 Most recently, the ongoing COVID-19 pandemic has created the hypothesis that inter-80 individual genetic differences may affect the spatial transmission dynamics of SARS-81 is to detect the TMPRSS2 polymorphisms affecting binding interfaces, and which are 109 directly associated with the increase or decrease of the interaction affinity with the S1/S2 domain of the spike protein, which can be considered as protective or susceptibility 111 variants to SARS CoV-2 infection. 112

Protein molecular modelling 133
When this study was started, the crystal structure of human TMPRSS2 has not been filed 134 in the Protein Data Bank (PDB), therefore, we modelled the structure of human 135 TMPRSS2 employing I-TASSER (Iterative Threading Assembly Refinement), which is a 136 strong predictor of protein 3D structure, aiming to determine by computational 137 calculations the spatial location of every atom in a protein molecule from the amino acid 138 sequence (Zhang, 2008). In April 2021, the crystal structure of human TMPRSS2 in 139 complex with Nefamostat has been deposited in the Protein Data Bank (code PDB: 7MEQ), we compared our structure to the one recently deposited in PDB in order to 141 verify the quality and reliability of our model using PyMOL software. 142

Identification of TMPRSS2 binding interfaces, selection and characterization of 143
SNPs 144 Although there is not enough information about the active site and the catalytic site of 145 TMPRSS2, by running a protease conserved domain (CD), TMPRSS2 was analyzed, and 146 all its link loops residues were predicted with PyMOL. Following the identification of 147 the binding interfaces, we selected only the variants located at the level of these 148 connecting loops from the list of missense and damaging TMPRSS2 SNPs already 149 predicted and extracted from databases. dbSNP is a database of genetic variants 150 implemented at the National Center for Biotechnology Information "NCBI" and 151 GnomaAD database were exploited to characterize the selected SNPs (population and 152 allelic frequency). 153

Homology modelling of selected TMPRSS2 SNPs affecting binding interfaces 154
To explore the structural changes in the protein encoded by different alleles of 155 TMPRSS2, molecular models of all the selected protein variants were developed and 156 superimposed over the structurally resolved template of wild-type TMPRSS2 using 157 SWISS-MODEL, which allows a fully automated protein structure homology modelling. 158 The FASTA sequence of TMPRSS2 was obtained from the UniProt protein knowledge 159 database (UniProt Id O15393, corresponding to 492 amino acid transcript). The sequence 160 of each TMPRSS2 variant is generated at the base of the wild-type sequence by a simple 161 substitution of the amino acid coding for the missense mutation. 162

Molecular docking 163
AutoDock Vina was used to carry out the molecular docking between S1/S2 domain of 164 SARS-CoV2 spike protein and TMPRSS2 wild type or missense variants. In our analysis 165 we used, as a receptor, the TMPRSS2 wild type or missense variants, and, as a ligand, the 166 S1/S2 domain of SARS-CoV2 spike protein model (Code PDB :6ZB4) downloaded from 167 RCSB-PDB database. To obtain the optimal docking, the interactions of the wild-type 168 receptor and variants with the partner were simulated using different parameters therefore receptor and ligand we used a grid size was set to 80× 80 × 80 points with a 170 spacing of 1 Å. 171

Structure analysis of TMPRSS2 variants and SARS-CoV2 spike protein complexes 172
To further understand the effect of polymorphisms on receptor recognition by the S1/S2 173 domain of SARS-CoV2 a structural analysis was performed by PyMOL. This is an 174 approach combined with the molecular docking output files to analyze the interactions 175 between the ligand and its receptor. We evaluate the complexes obtained by docking to 176 monitor intermolecular hydrogen bonds, electrostatic, and hydrophobic interactions 177 between SARS-CoV2 S protein and TMPRSS2 missense variants compared to the wild 178 type. Representation of TMPRSS2 protein binding loops. 245

Polymorphisms of the TMPRSS2 gene related to the protein binding region to 246 the viral particle and geographical distribution of this TMPRSS2 SNPs 247
The simplest way that missense variation could impact SARS-CoV-2 infection would be 248 by altering the TMPRSS2-S interface. TMPRSS2 missense variants located at residues 249 that bind the S protein are most likely to have such effects.

3D structures of the selected variants modelled by SWISS-MODEL 260
Structurally, all TMPRSS2 variants bear the characteristic domains of TMPRSS2 wild 261 type. The overall protein architecture of TMPRSS2 allelic variants is largely conserved. 262 The 3D structure of the 17 TMPRSS2 SNPs presents a significant similarity with the 3D 263 structure of the wild type and the Ramachandran analysis of the different analogues 264 proves that all the amino acids of the models are found in the favorable regions. 265

Missense variants impact on binding of TMPRSS2 receptor to the viral S protein 281
Firstly, the interaction of wild type TMPRSS2 and the selected missense variants with 282 the 3D structure of spike monomer protein were simulated using Autodock Vina. 283 Additionally, both hydrogen, electrostatic bonds and hydrophobic bonds within 284 different TMPRSS2-spike protein complexes were assessed using PyMOL. We classified 285 the polymorphisms into two categories. The first one includes mutations that directly 286 increase the interaction affinity within the complex, these variants increase the number 287 of electrostatic interactions or decrease the distance of interaction between the receptor 288 and ligand residues. We term these mutations as "directly increasing". The second one 289 includes mutations that directly decrease the interaction affinity of the complex by 290 decreasing the number of electrostatic interactions or increasing the distance of 291 interaction between the residues of receptor and its ligand, we term these mutations as 292 "directly decreasing". All these mutants were found to affect the residues in the 293 TMPRSS2/S protein binding interface and in direct contact with the virus domain S1/S2 294 residues. 295 Therefore, this structure analysis allowed us to identify four missense variants E389A, 296 K392Q, T393S and Q438E "directly increasing" the interaction affinity and 2 missense 297 variants R470I and Y416C "directly decreasing" it. 298 299 Figure 3. Secondary structure of the catalytic domain of TMPRSS2 protein (shown in 300 blue) and the domain S1/S2 of the SARS-CoV2 (shown in orange). The mutated amino 301 acids that "directly increasing" are colored in magenta and those "directly decreasing" 302 are presented as green spheres. 303

MD study 304
The complexes of the binding site (255-492) of TMPRSS2 and missense variants with the 305 spike protein were subject of MD simulation studies over a period of 120 ns to 306 understand their stability and study the structural consequences of these substitutions. 307 To determine the stability and mechanistic aspects of the wild type and mutant 308 complexes, hydrogen bond interactions, RMSD, RMSF, Rg and their binding profiles 309 were analyzed and discussed below. 310

Analysis of RMSD 311
The RMSD is usually used to measure the protein drift from a reference structure, to 312 study the residue behavior of the protein during the simulations and to describe the 313 dynamic stability of systems as it measures the global fluctuations of proteins or 314 complexes. It reflected the mobility of an atom during the MD simulation trajectory. As 315 a result, a higher residue RMSD value suggests higher mobility; inversely, a lower residue 316 RMSD value suggests lower mobility. Therefore, the RMSD analysis was carried out for 317 the MD simulations of each system to determine the change in the overall stability of 318 the protein after mutation, more specifically to understand the effect of TMPRSS2 319 missense variants on the stability of complexes. In addition, we compared the RMSD of 320 the wild type and mutants TMPRSS2/S protein complexes to the free forms of receptor 321 ( Figure S1) and the wild type complex to mutants TMPRSS2/S protein complexes (Figure 322 complexes. Data are displayed in Figure S4. 343

Dynamics of hydrogen bonds 344
Analysis of the hydrogen bonds (HB) during ligand binding is another important factor 345 that influences protein stability, it has a significant role to strengthen protein-protein 346 interactions. To elucidate how the mutations affect the TMPRSS2 and viral protein 347 interaction at molecular level, the dynamics of hydrogen bonds of each system is 348 displayed in Figure S5   suggested that TMPRSS2 DNA polymorphisms were likely to be associated with 382 susceptibility to COVID-19 and would contribute to differences in SARS-CoV2 infection. 383 In the other hand, recent studies showed that ACE2 genetic variation is very rare in the obtained. After modelling the TMPRSS2 3D structure using I-TASSER, we have 397 compared the catalytic domain of our structure to the one recently deposited in PDB 398 (code PDB: 7MEQ), which shows a strong similarity with an RMSD value equal to 0.705 399 Å, then we predicted the binding site with the S1/S2 domain of the S protein. In a further 400 step, we focused only on the missense variants whose spatial position is at the level of 401 the binding site in order to identify those that are able to modify the interaction affinity 402 in a direct way with the S protein. 403 In the present study, we performed an in-silico analysis of the SNP variants localized at 404 the binding loops. Those SNPs can be directly involved in the alteration of interaction 405 affinity based on molecular docking to obtain the complexes of TMPRSS2. We also 406 selected missense variants with the viral protein to predict the interaction affinity 407 between the two partners, followed by a structure function analysis to identify the key 408 bonds of interaction. In a last step, we carried out a MD study for the wild-type complex 409 and variants that have the potential to alter the interaction affinity between TMPRSS2 and the S protein with aiming to validate the previous results and identify missense