Implicit model to capture electrostatic features of membrane environment

Membrane protein structure prediction and design are challenging due to the complexity of capturing the interactions in the lipid layer, such as those arising from electrostatics. Accurately capturing electrostatic energies in the low-dielectric membrane often requires expensive Poisson-Boltzmann calculations that are not scalable for membrane protein structure prediction and design. In this work, we have developed a fast-to-compute implicit energy function that considers the realistic characteristics of different lipid bilayers, making design calculations tractable. This method captures the impact of the lipid head group using a mean-field-based approach and uses a depth-dependent dielectric constant to characterize the membrane environment. This energy function Franklin2023 (F23) is built upon Franklin2019 (F19), which is based on experimentally derived hydrophobicity scales in the membrane bilayer. We evaluated the performance of F23 on five different tests probing (1) protein orientation in the bilayer, (2) stability, and (3) sequence recovery. Relative to F19, F23 has improved the calculation of the tilt angle of membrane proteins for 90% of WALP peptides, 15% of TM-peptides, and 25% of the adsorbed peptides. The performances for stability and design tests were equivalent for F19 and F23. The speed and calibration of the implicit model will help F23 access biophysical phenomena at long time and length scales and accelerate the membrane protein design pipeline.


Introduction
Electrostatic interactions are essential in membrane protein (MP) stability, structure, 15 and function. Such interactions determine the way macromolecules interact with 16 membranes: for example, adsorption and internalization of small peptides such as 17 melittin and indolocidin are dependent on the lipid surface charge; 1 signaling and 18 ion-transport through ion channels and active transporters are dependent on the 19 electrostatic potential across the cellular membrane; 2 and association of many 20 peripheral MPs involves electrostatic interactions with the membrane. 3,4 Particularly, 21 the lipid head groups can be charged, and the hydrophobic core of the bilayer can 22 enhance interactions between charged particles giving rise to biophysical features that 23 are less frequently observed in the aqueous phase. 5 Hydrophobicity scales the 24 propensity of a given residue to be present in the hydrocarbon region. 6,7 However, some 25 show unusual behavior: for example, residues Arg and Lys snorkel out of the membrane 26 interior to minimize the cost of inserting charged groups into the membrane; 8-10 27 aromatic residues Trp and Tyr mostly interact with the head-group region, but Phe is 28 present in both membrane core as well as the head-group region. 11 Additionally, the 29 non-polar bilayer increases the strength of traditional hydrogen bonds. 5,12 Despite the 30 importance of membrane electrostatic interactions, they are challenging to capture 31 computationally, primarily due to the high computational cost incurred in sampling 32 solvent and solute degrees of freedom. 33 One way for electrostatic calculations to overcome solute and solvent degrees of 34 freedom is to treat the solvent as a continuous medium and ignore solvent-solvent 35 interactions. A common approach in biological systems is solving the 36 Poisson-Boltzmann (PB) equation with finite-difference methods representing lipids 37 both explicitly and as continuum slabs; however, computational costs have been a 38 bottleneck. [13][14][15][16] To overcome the cost, the generalized Born (GB) approximation 39 represents the membrane environment through a switching function or heterogeneous 40 dielectric slab. [17][18][19] The calculated GB energy includes a screening radius which in turn 41 depends on its local environment and surrounding neighbors, however, it is still of residues in the phospholipid bilayer based on the Moon and Fleming (MF) scale. 33, 34 61 However, F19 omitted the electrostatic interactions of the membrane environment, 62 leading to errors in benchmarking tests. For example, in a test to calculate the tilt angle 63 of adsorbed peptide cecropin A(1 − 8)-magainin 2(1 − 12) (PDB:1f0d), F19 oriented it 64 at 56 • with the membrane normal with its N-terminus Lys residues facing the aqueous 65 phase. However, solution NMR found the tilt angle of the peptide at 90 • , which allowed 66 favorable attractive electrostatic interactions between the cationic Lys residues and the 67 anionic phosphate lipid head groups of DLPC. A second issue in F19 energy function 68 was an underestimated water-to-bilayer transfer energy ∆G w,l for Asp and Glu at 69 neutral pH. The transfer energies for Asp and Glu derived from the MF hydrophobicity 70 scale were measured at a pH of 3.8. 34 Since this pH is close to the nominal pKa values 71 of Asp and Glu side chains, the experimental transfer energy was influenced by the 72 presence of protonated Asp and Glu (D 0 , E 0 ) residues, which are less polar than the 73 deprotonated residues (D −1 , E −1 ) at neutral pH. In this paper, our objective is to add 74 the missing electrostatic features of the membrane to F19 energy function and test 75 whether the new energy function, named Franklin2023 (F23), better captures features of 76 MPs. 77 We explore three modifications to F19. First, we develop a fast-to-compute 78 Coulomb-based electrostatics method that includes a low dielectric constant of the lipid 79 bilayer. Second, we use a mean field-based calculation for the electrostatic potential due 80 to the lipid membrane and water by solving Poisson's equation using a fast Fourier 81 transform (FFT) based solver while avoiding any assumption or discrepancies due to 82 ionic strength incurred in PB or its linearized version. 35,36 Third, we used a modified 83 value of ∆G w,l for Asp and Glu residues at neutral pH.  Relative to F19, F23 improves the calculation of the tilt angle of membrane proteins for 94 90% of WALP peptides of different lengths, 15% of transmembrane (TM)-peptides, and 95 25% of the adsorbed peptides. The performances for stability and design tests are 96 equivalent to F19 and F23. Finally, we conclude the paper with perspectives on further 97 improvements for implicit membrane energy functions.

99
Implicit model development 100 Electrostatic energy term based on lipid type 101 We calculate the electrostatic effect of lipid bilayers based on all-atom molecular 102 dynamics (MD) simulation of lipid molecules and solvents alone. We derive the 103 electrostatic potential Ψ(x, y, z) by solving Poisson's equation using a 3D-grid based 104 FFT solver: where ρ w , ρ l and ρ s is the position-dependent charge densities of water, lipid, and salt averaged over MD trajectories, 38 and ε = 8.854 × 10 −12 CV −1 m −1 is the dielectric constant of vacuum. To extract the depth-dependent Ψ, we use non-linear regression to fit the piece-wise functionΨ w as shown in Figure 1a. We derived the fitting parameters for all simulated lipid bilayers as: where z c is the maximum of the critical points for the functionΨ w for |z| < z c . The 106 reference state of Ψ = 0 in water is set at depth z ≈ 40Å and the potential difference 107 ∆Ψ is obtained (as shown by the dashed lines in Figure 1a). We compare the 108 electrostatic potential due to salt and no salt in Figure 1b. As expected, the potential 109 difference is found to be lower due to the presence of salt. The electrostatic energy due 110 to the protein is then: where the sums are over all residues and all atoms in each residue, q r,a is the charge of a 112 protein atom, and Ψ(z) is the electric potential of the empty water-bilayer system as a 113 function of membrane depth. Ψ(z) is an average of Ψ(x, y, z) in the other directions. Inspired by IMM1 in F23, for structure prediction and design, we use a membrane 120 hydration f hyd dependent dielectric constant. Membrane hydration (f hyd ) is a proxy of 121 the membrane depth and is capable of capturing water cavity effects. 38 In the center of 122 the membrane, when an atom is exposed to lipids, f hyd = 0; whereas when an atom is 123 fully exposed to water, f hyd = 1.0. When an atom is in the membrane yet faces a water 124 cavity, 0.0 < f hyd < 1.0. For an atom pair, we define the hydration fraction f hyd,ij as 125 the geometric mean of f hyd,i and f hyd,j . Rosetta uses a distance-dependent dielectric 126 constant for soluble proteins with sigmoidal function ε sol (r ij ) to transition between the 127 protein core (ε core = 6) and surface (ε surface = 80). 40 To define the distance-dependent 128 dielectric constant within the membrane, we use the same sigmoidal function ε memb (r ij ) 129 to transition between protein core (ε core = 3) and surface (ε surface = 10) as summarized 130 in Figure 2a. 41,42 To combine the effect of distance dependence and membrane 131 hydration, we use a linear mixture equation: Figure 2b shows ε(r ij , f hyd,ij ) as a function of f hyd,ij and atom pair distance r ij . Based 133 on the modified dielectric constant, the Coulomb energy is: where q i , q j are partial charges of protein atoms, C 0 is Coulomb's constant 135 (322Å kcal/mol e -2 ), where e is the elementary charge and the electrostatic energy is 136 truncated at r max = 5.5Å. 33 To isolate the excess electrostatic energy due to the 137 membrane, in this energy term, we subtract the electrostatic energy from the solution. 138 In addition, nullifying the electrostatic energy ensures the reference energy of the 139 unfolded protein in the soluble region remains unaffected. 43 The modified energy is 140 given as: higher slope, which is based on side-chain water-to-octanol transfer energies (as seen in 149 Figure 3a). 34 where Ref2015 is Rosetta's standard energy function for soluble proteins, 40 ∆G w,l add 166 the transfer energy from water to bilayer, 38 and ∆G lipid and ∆E dielectric elec,ij are the 167 energies due to the lipid head group and the membrane dielectric constant. We can 168 recover the energy function F19 by setting w w,l = 0.5, w lipid = 0 and w dielec = 0. We 169 solved the simultaneous equations for the three unknown positive weights to maximize 170 the correlation coefficient for the experimentally measured ∆∆G mut 34, 45 and obtained 171 w w,l = 1, w lipid = 0.128 and w dielec = 0.01. Matrix rows for Gly, Ala, Pro, Asp, and Glu 172 residues were excluded to avoid overfitting.  Benchmark performance of F23 195 We evaluate the performance of F23 using five benchmark tests. The tests are designed 196 to evaluate an energy function's ability to replicate measured membrane protein 197 stability and perform structure prediction and design. We compare the performance of 198 F23 to two other existing implicit models: (1) F19 and (2) M12, which is based on 199 IMM1. 33,51 We chose these models for their low computational cost which allows 200 evaluation with structure prediction and design tests. Our second test also concerns the tilt angle for peptides. However, unlike Test 1, which 255 explored related families in a membrane, these peptides are of diverse sequences and in 256 different lipid environments, with some that adsorb on the surface of the membrane and 257 some that pass through it. The objective here is to see if the lowest energy orientation 258 of the membrane peptides identified by the energy function corresponds to the native 259 orientation. This test is the cornerstone of our benchmark because it was used for the 260 validation of early implicit membrane models. 37,56 The tilt angles of TM-peptides were 261 measured previously using solid-state NMR experiments using different lipid   (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 27, 2023. ; https://doi.org/10.1101/2023.06.26.546486 doi: bioRxiv preprint magainin-cecropin hybrid (pdb:1f0d) and novispirin G10 (pdb:1hu6) by 21 • and 16 • 288 respectively. 289 We compare the F23 per-residue energy of the magainin-cecropin hybrid (pdb:1f0d) 290 at configurations predicted by F19, F23, and the native state (Figure 6b). F23 penalizes 291 Lys residues at the tilt angle calculated by F19 and allows favorable attractive  Our third test evaluates how a change in the sequence of a membrane protein affects its 307 overall thermostability. ∆∆G mutation is the change in Gibbs free energy of folding from 308 water to the lipid bilayer of each mutant relative to the native sequence.
To design MPs, ∆∆G mutation is crucial in evaluating the effect of genetic mutations on 310 protein functions. To evaluate ∆∆G mutation , we use a Rosetta fixed-backbone and 311 fixed-orientation protocol. 38 We use two sets of ∆∆G mutation measurements from the  45 The experimental 316 uncertainty of the measured ∆∆G mutation values is ±0.6 kcal mol −1 .
317 Figure 7a-b compares the experimental and calculated ∆∆G mutation from Ala to all 318 other residues for OmpLA and PagP proteins at a fixed height in the membrane. We 319 also included D −1 and E −1 to emulate these residues at neutral pH. While F19 linearly 320 correlates well for both OmpLA and PagP, F23 predicts the ∆∆G mutation closer to the 321 experimental values (e.g. in OmpLA H improves from 7.2 to 5.2 REU, K from 6.4 to 5.2 322 REU, N from 6.2 to 4.7 REU; in PagP R improves from 5.9 to 3.8 REU and K from 8.5 323 to 7 REU), evident from the slope of the best-fit line (the fit excludes Asp and Glu, see 324 the caption for further details). (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  To evaluate the performance of energy functions, we compute the ∆∆G ins (Eq. 8) as 334 the energy difference between the lowest energy orientation of the peptide in the lipid 335 bilayer phase and in the aqueous phase: where G ref is the Gibbs free energy of an unfolded protein in solution.
337 Figure 8 compares ∆∆G ins calculated by F19 and F23 to that by MD simulation.

338
The correlation coefficient for F23 is 0.01 higher than F19, which is insignificant.

339
However, the slopes for the best-fit lines are about double for F23 relative to F19.

340
Relative to F19, the additional electrostatic terms in F23 increased favorable energy due 341 to insertion. F19 and F23 recognize the relative ∆∆G ins among the design peptides; 342 however, the energy of insertion is overestimated compared to the calculations from MD. 343 As in many other Rosetta-based calculations, these energy functions reproduce trends 344 but not the exact values. fixed-backbone protocol that samples possible sequences using a full protein 349 rotamer-and-sequence optimization and a multicool-simulated annealing. 59 Each protein 350 is initialized and fixed in the orientation from the OPM database. 60 We compute three 351 metrics: (1) sequence recovery, which measures the fraction of native amino acids including both α-helical bundles and β-barrels, with all entries having a resolution of 3.0 359 A or better and less than 25% sequence identity. The native and host lipid compositions 360 for these proteins vary widely and include compositions not yet covered by the F23 lipid 361 parameters; thus, for simplicity, we perform all design calculations in the DLPC bilayer. 362 High sequence recovery has long been correlated with strong energy function 363 performance for soluble proteins. 43 F19 designed sequences are 31.8% identical to their 364 native sequences (Figure 9a). With the new electrostatics terms, F23 recovered 30.8% of 365 native residues. The soluble protein energy function Ref15 and other previous implicit 366 membrane energy functions lagged behind in sequence recovery. To compare the 367 influence of different solvent environments, we calculated sequence recovery over subsets 368 of residues. First, we compare buried vs. solvent-exposed side chains (Figure 9b).

369
Among buried side chains, F19 recovered 3.4% higher native residues than F23. On the 370 surface, sequence recovery for M12 and F23 was 3% higher than F19. F23 recovered a 371 higher fraction and more variety of native residues (above random selection) relative to 372 F19 for both water-facing and lipid-facing residues (Figure 9c-d).

373
Amino acid distribution in design proteins is similar to that in native 374 proteins. To evaluate the distribution of amino acid types in the designed proteins 375 relative to that in native proteins, we calculated D KL as: where p des s , q nat s are the probabilities of amino acid type s in the design and native 377 sequences respectively and AA 20 is the set of all 20 canonical amino acids. p s is 378 calculated as: , buried vs. surface-exposed positions in (b), lipid-exposed positions in (c), and water exposed in (d).
where, pp r is the perplexity for a native amino acid type r, p (s|r) is the probability of 407 amino acid s being designed to replace a native residue r. The probability p (s|r) is 408 given as: where the superscripts des and nat denote designed and native sequences. The indicator 410 function in the numerator counts native residues r that convert to s:

413
When a model assigns a high probability to a given residue, imply its confidence in the 414 decision (meaning less perplexed). To elaborate, a perplexity of 20 means the model is 415 confused with 20 residue choices for design, and a perplexity of 2 means the model is 416 confused with 2 residue choices. However, low perplexity alone can be misleading, and it 417 is important for the designed residue choices to be of the right kind. Thus, an ideal 418 energy function will have a confusion matrix of residue types on design outcomes with a 419 high probability for diagonal elements (near diagonal elements represent the native 420 residue or of similar chemical properties) and low perplexity.

421
F19 has low perplexity for designing non-polar and special residues and higher 422 perplexity for charged and polar residues (Figure 11g-i). F23 follows the trends of F19; 423 however, it improved (reduced) the perplexity of the charged residues significantly for the aqueous region. For other polar residues, F23 has low perplexity; however, it often 425 selects charged residues in place of other polar residues.

426
To sum, F19 can capture native protein-like features slightly better than F23. The 427 perplexity and confusion matrix of F23 shows higher confidence and accuracy for 428 designing charged residues.   Table 1 summarizes the performance of F19 and F23 on the diverse benchmark test 442 set. Relative to F19, F23 has improved the calculation of the tilt angle of membrane 443 proteins for 90% of WALP peptides of different lengths, 15% of TM-peptides, and 25% 444 of the adsorbed peptides. To calculate the tilt angles, we used a fixed backbone for the 445 peptides. Experimental results have shown that the peptide backbone is quite flexible; 446 we hypothesize allowing backbone flexibility will further improve the tilt angle 447 calculation. F19 and F23 predicted stability equally well in ∆∆G mutation and ∆∆G ins . 448 F19 identifies native protein-like residues better than F23. However, F23 significantly 449 improved the confidence and accuracy of designing charged residues. For simplicity and 450 lack of exact lipid types, we performed all design calculations in the DLPC bilayer; this 451 is a broadly simplifying assumption that can be addressed by using the protein-matched 452 membrane types.

453
The results of the benchmark tests show that, relative to F19, F23 has improved the 454 ability of implicit membrane models to capture more relevant features of MPs. There 455 are now several future steps that can be taken to improve the implicit models. Some (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 27, 2023. ; https://doi.org/10.1101/2023.06.26.546486 doi: bioRxiv preprint potential steps are adding the effect of pH to account for membrane-induced pKa 457 shifts, 61, 62 using robust optimization techniques to ameliorate double counting of 458 physical effects due to use of empirical functions, 63 adding mixed and asymmetric lipid 459 layers, and adding deformation of the lipid interface as it is perturbed by charged amino 460 acids. 64,65 To capture the interface deformation or interaction of lipids with the residues, 461 we have to move beyond the slab-like rigid representation of the membrane. We 462 hypothesize that adding the above-mentioned features will capture the characteristics of 463 realistic membranes. Configurations generated by implicit models can be used as an 464 input or a filter for further investigations, including all-atom models to study 465 phenomena mediated by lipid and water interaction. With continued improvement in 466 MP feature representation, we anticipate that the implicit membrane model will enable 467 reliable, high-throughput, and high-resolution MP structure prediction and design.

468
Data Availability 469 Upon publication, the F23 energy function will be available in the Rosetta software 470 suite (https://www.rosettacommons.org). Rosetta is available to noncommercial users 471 for free and to commercial users for a fee. The scripts used to run the benchmark tests 472 will be available on github.