Abstract
Proteoglycans contain glycosaminoglycans (GAGs), negatively charged linear polymers made of repeating disaccharide units of uronic acid and hexosamine units. They play vital roles in numerous physiological and pathological processes, particularly governing cellular communication and attachment. Depending on their sulphonation state, acetylation, and glycosidic linkages, GAGs belong to different families. The high molecular weight, heterogeneity, and flexibility of GAGs hampers their characterization at atomic resolution, but this may be circumvented via coarse-grained (CG) approaches. In this work, we report a CG model for a library of common GAG types in their isolated or proteoglycan-linked states compatible with the widely popular CG Martini forcefields (versions 2.2 and 3.0). The model reproduces conformational and thermodynamic properties for a wide variety of GAGs, as well as matching structural and binding data for selected proteoglycan test systems. The parameters developed here may thus be employed to study a range of GAG-containing biomolecular systems, benefitting from the efficiency and broad applicability of the Martini framework.
1 Introduction
Proteoglycans (PGs) are heavily glycosylated proteins. These high molecular weight biomolecules found in the extracellular matrix (ECM) play vital roles in governing cellular interactions, the mechanical and viscous properties of tissues, and numerous physiological and pathological processes1,2. PGs bind to water, ions, chemokines, growth factors and many other biomolecules2. These properties of PGs arise from the covalently attached linear anionic glycosaminoglycans (GAGs), composed of repeating disaccharide units of uronic acid (UA) and hexosamine (HEX) units. The UA units can either be β-D-glucuronic acid (Glc) or α-L-iduronic acid (IdoA), while the hexosamine units can be α-D- or β-D-glucosamine (GlcN) or N-acetyl-β-D-galactosamine (GalNAc). Depending on their sulphonation state, acetylation, and glycosidic linkages, these building blocks make up various GAG families. These encompass non-sulphated GAGs such as hyaluronic acid (HA) and heparin (HP), and sulphated forms including heparin sulphate (HS), chondroitin sulphate (CS), dermatan sulphate (DS) and keratan sulphate (KS) (Figure 1). These GAGs are typically linked to a tetrasaccharide GlcA(β1→3)Gal(β1→3)Gal(β1→4)Xyl(β-Ser) sequence made of xylose (Xyl), two Galactose (Gal) units and Glucose (Glc), which then connects to the protein via an O-linked serine or threonine3.
The figure shows the major GAG structures for Heparin [IdoA2S(α1→4)GlcNS6S(α1→4)IdoA2S(α1→4)GlcNS6S], Heparan sulfate [GlcA(β1→4)GlcNAc(α1→4)GlcA(β1→4)GlcNAc], Chondroitin sulfate [GlcA(β1→3)GalNAc4S(β1→4)GlcA(β1→3)GalNAc4S], Dermatan sulfate [IdoA(α1→3)GalNAc4S(β1→4)IdoA(α1→3)GalNAc4S], Keratan sulfate [Gal6S(β1→4)GlcNAc6S(β1→3)Gal6S(β1→4)GlcNAc6S], Hyaluronic acid [GlcA(β1→3)GlcNAc(β1→4)GlcA(β1→3)GlcNAc]
GAGs are implicated in various disease states including inflammatory conditions4, cancers5–7, corneal opacification8,9, arthritis10 as well as age-related diseases11. Furthermore, GAGs in the glycocalyx surrounding cells can serve as attachment factors or receptors for infectious pathogens12. For example, host HS may stabilize the so-called ‘up’ conformation of the SARS-CoV-2 spike protein in concert with its own viral N-glycans, thereby facilitating angiotensin-converting enzyme 2 (ACE2) receptor binding and subsequent COVID-19 infection13. Unsurprisingly, there is significant therapeutic interest in GAGs, but experimentally studying their conformation, dynamics, and interactions at the molecular level is hampered by challenges associated with GAG synthesis14 and further complicated by their polymeric nature and the flexibility of their glycosidic linkages15. As an alternative, computational approaches may be utilized to better understand their structure-function relationships. The large size of polymeric GAGs makes quantum mechanics (QM) based methods computationally challenging, but their simulation is more tractable using molecular dynamics (MD) simulations. In MD, the nature and the accuracy of the atomic interactions are defined by bonded and non-bonded interaction potentials collectively termed the force field (FF). Such FFs are typically optimized based on QM and experimental data14,16. Many all-atom (AA) FFs such as GLYCAM17, CHARMM16 and GROMOS18 can correctly reproduce the conformational properties of GAG sequences and their protein binding properties19,20. AA studies have enabled e.g., improved understanding of anomeric effects or torsional preferences of sulphate groups, and characterization of critical inter- and intramolecular interactions that affect the flexibility and stability of GAGs20,21. Despite the high resolution accessible to AA FFs, their application to GAG sequences has typically been limited to, at best, dodecasaccharides, and in many cases, to tetra- or hexa-saccharides, due to the computational cost associated with obtaining sufficient sampling22. A solution to this is the development of simplified coarse-grained (CG) models, potentially enabling the study of conformational dynamics of higher-order GAG sequences as observed in nature (i.e. ∼20-200 disaccharides long).
In the CG approach, sets of atoms are grouped together and replaced by ‘pseudo-atomic’ particles, resulting in a reduced number of degrees of freedom in the system, thereby allowing access to longer simulation and time- and length-scales at the expense of molecular detail. One of the first CG models for CS and HA was developed by Bath et al., which could predict the influence of pH and ionic strength on persistence length and end-to-end distances, respectively23. The model was characterized by three beads representing a monosaccharide with an additional virtual particle representing its center of geometry and charge. In another CG model, Sattelle et al. implemented ring puckering in HP, CS and DS models to study their 3D shape, bioactivities and hydrodynamic properties24. In addition, the chain volume and its rigidity were shown to depend on glucuronic acid, the conformational properties of which could be modulated by the sulphonation state. An AMBER-compatible CG FF developed by Samsonov et al. used unique particle types based on several protein-GAG complexes25. In their model, the bonded interactions were optimized based on AA MD simulations, while the non-bonded parameters were optimized based on potential of mean force calculations. The resultant CG model reproduced many local and global properties from AA simulations, such as mean distributions of the root mean square deviation (RMSD), end-to-end distances, and radii of gyration (Rg). In a CG model by Kolesnikov et al., a polymeric field-theoretical approach could reproduce osmotic pressure data and many solution properties in the presence of ions26.
Although CG models for GAGs alone exist, they are not necessarily compatible with other essential biomolecules such as proteins, lipids, RNA, or DNA, hampering the capacity to study GAGs in realistic, complex biological environments. Martini is currently the most widely used CG FF for biomolecular systems27. This FF has been designed to reproduce thermodynamic properties and partitioning data by using a few building blocks that mimic the chemical specificity of the underlying atoms27. In the Martini model, four heavy atoms are typically represented as a single CG bead (4:1 mapping). In addition, finer mappings, including 3:1 and 2:1, are utilized for representing rings or high-resolution features. A typical uncharged bead of CG Martini water represents four AA water molecules. A three-bead polarizable water model is also available, two of which are dipoles connected to the central bead28. Implicit solvation is also supported, called ‘Dry’ Martini, and currently is only compatible with lipid systems29. The modular building blocks, along with a small number of parameters, result in broad applicability and transferability to other classes of biomolecules such as proteins, lipids, DNA, and N-glycans30–34. Since its introduction, the model has seen wide adoption among the scientific community for studying problems like lipid self-organization, membrane fusion, protein oligomerization, drug delivery, and many more, as summarized elsewhere35. The combination of the Martini FF with advances in computational resources and multiscale approaches have made possible simulations of large and complex biomolecular systems such as mitochondria36, liposomes37 and virus particles such as dengue38 and influenza A39. Certain shortcomings in version 2.2 of the Martini FF have surfaced in recent years such as overly strong or ‘sticky’ interactions for proteins40 and carbohydrates34. In an attempt to alleviate these problems, Martini v3.0 was introduced, where rebalancing of the non-bonded interactions was implemented41, while new bead types and an expanded interaction matrix resulted in a wider chemical space.
In the present work, we have extended the Martini FFs (both v2.2 and v3.0) towards the simulation of various GAG molecules, including HP, HS, CS, DS, and HA, and their O-linked proteoglycan variants. The bonded parameters for GAGs, including linkers, were obtained from extensive AA simulation sampling. Multiple mapping schemes were tested for heparin and the most successful model was employed for other GAGs. Bead types for monosaccharides were validated by calculating partition coefficients and comparing them to predictions from various empirical methods. The resultant CG models were used to predict local and global GAG properties and were applied to some key case studies based on availability of relevant experimental data in the literature. Firstly, HP is essential for homo- and heterodimerization of the fibroblast growth factor receptor (FGFR), and GAG binding modes from simulations were compared with high-resolution structural studies and used to confirm its previously proposed capacity for binding longer HP ligands. Secondly, bikunin is a serine protease inhibitor used to treat acute pancreatitis, and its PG form contains a single O-linked CS chain that is essential for function; simulations of the PG complex were carried out to characterize the dynamics of the GAG moiety, validated via hydrodynamic size estimates. The CG GAG library developed here represents a computationally efficient strategy for delineating the biological roles of GAG molecules, GAG-protein complexes, and PGs.
2 Methods
2.1 Simulation parameters
In AA simulations, the GROMOS54a7GLYC18 united-atom FF was used for the UA and HEX units involved in various GAG monosaccharides and Xyl-β-Ser O-glycosidic linkages representing the tetrasaccharide GAG linker. The GAG molecules were solvated using the SPC water model42. The systems were neutralised using Na+ and Cl- counterions. The simulations were performed at a temperature of 310 K using the velocity rescale thermostat43 with a relaxation time of 0.1 ps. The pressure was maintained at 1 bar using the weak coupling Berendsen barostat with a relaxation time of 1 ps. A cutoff of 1.4 nm was used for electrostatic and van der Waals interactions. Long-range electrostatics were treated using the particle mesh Ewald44 method with a cutoff of 1.2 nm.
In the case of CG simulations, both Martini v2.227,30,45 and v3.041 FFs were used. Systems were solvated using Martini water beads, and for systems using v2.2 FF an additional 10% antifreeze water beads were added. A temperature of 310 K and pressure of 1 bar were maintained using the velocity rescale thermostat46 and Parrinello-Rahman barostat47 with relaxation constants of 1 ps and 12 ps, respectively. The reaction-field method48 was used for electrostatics with a cutoff of 1.1 nm. Both AA and CG systems were minimized for 5,000 steps, equilibrated for 50 ns and then subjected to productions runs (summarised in Table S1Error! Reference source not found.).
GROMACS 201849 was used for all the simulations and run on: i) an in-house Linux cluster using 20 CPUs (E5-2690v3) and 1 GPU (NVIDIA K40) and ii) the ASPIRE 1 supercomputer of the National Supercomputing Centre Singapore (https://www.nscc.sg) using 24 CPUs (E5-2690v3) and 1 GPU (NVIDIA K40).
2.2 Mapping schemes
Two mapping schemes were initially tested for the assessment of the stability of GAG chains (Figure 2). Linear (‘map1’) and ring-like (‘map2’) mapping schemes were tested for HP. The linear mapping scheme is similar to that previously developed for carbohydrates30 and N-glycans34. The ring-like scheme has previously been used for glycolipids33. The underlying chemical nature of the grouped atoms and previously modelled glycans were considered when selecting the bead types. In both schemes, charged functional groups (i.e. sulphates and carboxylates) were treated with a single negatively charged bead. Initial bonded parameters were obtained from tetrasaccharide simulations.
The first scheme makes a linear linkage for the monosaccharides, while the second scheme uses a ring-like topology represented by three beads. Sulphate and carboxylate groups are represented as separate beads in both schemes.
The bonds between beads were fitted using:
where rbond and kbond are the equilibrium distance and the force constant, respectively. The angles within the CG beads were treated using:
where θ0 is the equilibrium angle and kangle is the force constant. The dihedrals were described using:
where θdihedral is the equilibrium angle between planes defined by the coordinates of the four consecutively bound atoms i, j, k and j, k, l respectively, kdihedral is the force constant, and n is the multiplicity. The dihedral distributions had a single minimum and were fitted using a multiplicity of 1.
2.2.1 Validation of non-bonded parameters
The initial bead types were selected by chemical intuition and by analogy with previously parametrized carbohydrate and N-glycan molecules30,34. The UA and HEX units had not been parametrized before. The additional sulphate groups make it necessary to validate the choice of the Martini bead types. As Martini parameterization is based on the reproduction of experimental partitioning coefficients (log Pow), these were calculated for various disaccharide combinations commonly found in GAGs. These coefficients were obtained using thermodynamic integration (TI) calculations with 55 windows along with additional windows around the high curvature region. Softcore potentials by Beutler (implemented in Gromacs) were used for sampling the LJ potential at high scaling values50. The calculations were performed in 1-octanol and water phases. The difference in solvation free energies of GAG disaccharides in 1-octanol (ΔGo) and water (ΔGw) was used to calculate the partition coefficients using ΔΔGow = −2.3RT log Pow. The water phase was simulated using one GAG disaccharide solvated in a box of 1,000 water molecules, while the organic phase consisted of 43 water and 519 1-octanol molecules, representing a 0.255 water-octanol molar fraction51. The coefficients were obtained for both v2.2 and v3.0 versions of the Martini FFs. The obtained simulation data was compared to the results of empirical (log Pow) prediction methods within the ALOGPs 2.152 program. The predicted coefficients from multiple methods such as ALOGPs, ALOGP, KOWWIN, ClogP, MLOGP, miLogP, AB/LogP. These methods differ from one another in terms of molecule fragmentation patterns and atomic contributions to the predicted log Pow. The choice of Martini beads for mapping the GAGs (based on the ‘map1’ scheme) is summarized in Table 1.
2.2.2 Protein-GAG studies
The AA crystal structures of FGFR1-FGF253 dimer (PDB:1FQ9) and bikunin54 (PDB:1BIK) were converted to CG representation using the martinize.py script available at cgmartini.nl. An elastic network was applied for preserving higher-order protein structure. The network was applied to atoms within the lower and upper cutoff distances of 0.5 and 0.9 nm with a force constant of 500 kJ mol-1 nm-2. These values ensured that the folds of each protein complex were retained, with the backbone RMSD of each system with respect to the experimental structure plateauing at ∼5 Å by the end of each trajectory (Figure S1).
In the case of a HP20 GAG pre-bound to the FGFR1-FGF2 complex, the GAG was manually aligned to the GAG HP fragment co-crystallized with the X-ray structure (PDB:1FQ9) and simulated for 500 ns using the Martini v2.2 FF with the protein-to-GAG non-bonded LJ interactions scaled by 0.9. This scaling of LJ interactions has previously been shown to be important for correctly representing the solution behaviour of glycans and their interactions with proteins34. Besides this, multiple unbiased self-assembly simulations were run, based on two sets of initial configurations of a simulation box containing FGFR1-FGF2 and two randomly placed HP20 molecules. Two replicates of each of the configurations were simulated for 1 μs which resulted in 4 independent simulations. In the case of bikunin, the GAG is attached via an O-linked modification at the Ser-10 position55. The protein structure of bikunin (PDB:1BIK) has been well studied54 while the heterogeneous GAG sequences56 attached to the protein have been characterized and found to range from a length of 27-39 saccharide units57. Therefore, either a 30-mer CS (CS30) or a 40-mer CS (CS40) chain was attached to the O-glycosylation site at Ser1058. Both systems were solvated in a cubic box and production runs were performed for 5 μs.
Results
2.3 Mapping scheme
Initially, AA tetrasaccharide simulations were used to seed bonded parameter estimates for simulations of longer GAG chains. In the case of HP10, the ‘map1’ scheme (Figure 2) resulted in a very stiff chain with higher end-to-end distances compared to AA data, and reducing the associated dihedral force constants did not lead to significant improvement. On the other hand, the ‘map2’ scheme (Figure 2) resulted in collapse of the polymer into an aggregated state, and consequently, very low end-to-end distances compared to AA data. In the latter case, only extremely strong (10-fold increase) dihedral force constants improved the end-to-end distances (Figure 3). Overall, the linear mapping scheme or ‘map1’ resulted in better correlations with the AA data in these preliminary tests.
(Top) Distribution of end to end distances. Distances were calculated from beads representing C1 and C4 atoms of the first and last monosaccharides. The AA trajectory was converted to a pseudo-CG trajectory to compare the end-to-end distances from the CG simulations. (Bottom) Final simulation snapshots of HP10 models. The hexosamine units are represented in red, while uronic acid units are colored in cyan.
To further improve the dynamics of longer GAG chains, backbone angles and dihedrals were next optimized against AA simulations of longer decasaccharide chains, initially focussing on HP10 (Figure 3). The backbone dihedral angles ϕ and ψ for beads connecting UA-HEX-UA-HEX and HEX-UA-HEX-UA sugars were monitored every GAG chain (Figure 4 and Figure 5). For example, in the case of HP, these angles were defined for G7-G3-G7-G3 and G3-G7-G3-G7 beads (Figure 4). This led to an improvement in end-to-end distances and dihedral conformations, and reproduction of the helical nature of the chains (Movie S1, Figure 5 and Figure 6), resulting in a much better overall correlation between AA and CG data (Figure 3). The ‘map1’ scheme (Figure 2) was eventually selected as the model of choice and as the basis for subsequent parameterization of all other GAG molecules. This was because the ‘map2’ scheme required four backbone dihedrals to represent each tetrasaccharide unit, compared to two for ‘map1’, and with higher force constants that often resulted in simulation instabilities arising from constraint errors. Furthermore, visual inspection of simulated chains using the ‘map2’ scheme revealed unwanted monosaccharide rotations. The final set of refined bonded parameters are detailed in Table S2.
In addition to the common GAG sequences, the tetrasaccharide linker or LIN [GlcA(β1→3)Gal(β1→3)Gal(β1→4)Xyl(β-SER)] for the linkage between GAG and SER was parametrised.
CG data was based on Martini v2.2. ϕ and Ψ angles were defined as dihedrals between neighboring disaccharide units in the CG model. For example, in the case of heparin, ϕ is defined as the dihedral angle between the G3-G7-G3-G7 beads while Ψ is defined as the angle between the G7-G3-G7-G3 beads. The angles were plotted as contour plots for HP, HS, CS, DS, KS and HA. The Martini bead definition is shown in Table 2.
The distances were calculated from beads representing C1 and C4 atoms of the first and last monosaccharides. AA trajectories were converted to the pseudo-CG trajectories to compare with Martini v2.2 CG simulations.
2.4 Conformational properties of GAGs
The local conformational properties of CG 10mer GAG sequences using the selected linear ‘map1’ scheme were generally in good agreement with the AA data (Figure 5), as were end-to-end distances (Figure 6). In the case of CS and HA, the dihedral distributions were almost identical to their AA counterparts. The CG models of HP, HS and DS resulted in slightly narrower dihedral distributions, while correctly reproducing the AA equilibrium values. As shown for HP (Figure S2), lowering the dihedral force constants of the CG model resulted in deviation from the global AA minimum. In the case of KS, the AA distributions exhibited multiple local minima, which is problematic to achieve at CG resolution. Nevertheless, our final CG model enabled two of the three minima to be well sampled during simulations, while accurately reproducing the chain flexibility and end-to-end distances between the central bead of the first and last monosaccharide in KS10 (Figure 6).
2.5 Hydrodynamic radii
The hydrodynamic radius (Rℎ) describes the sphere of diffusion volume, helping to characterize polymer dynamics, and is typically similar to the radius of gyration (Rg)59. We compared Rg values calculated from CG simulations of S100, DS80 and HP20 to available Rℎvalues obtained from dynamic light scattering (DLS) experiments60,61. The Rgwas measured for the last 500 ns of each 1 μs simulation and used to estimate Rℎ based on the relation Rg = 1.17 ∗ Rℎ derived for polymeric chains62. These computed Rℎ estimates for both the shorter and longer GAGs were in good agreement with experimental values (Table 2), consistent with the prior calibration of each GAG polymer in terms of both local structural/dynamic properties and end-to-end distances of up to 10mer sequences. The slight deviations observed for the longer CS100 and DS80 molecules may have been a result of simulating the fully sulphated forms, compared to the GAGs derived from natural sources used in experiments, which typically have variable sulphonation states that may affect chain flexibility and dynamics.
2.6 Validation of non-bonded parameters
The newly defined UA and HEX monosaccharides used across the various GAG sequences were validated by calculating partition coefficients (POct→Wat) for both Martini v2.2 and v3.0, based on a series of TI calculations as described in the Methods section. Solvation free energies for each GAG obtained in organic (ΔGoct) and water (ΔGwat) phases (Table 3) were used to obtain log Poct→wat coefficients (Table S3). As expected, differences in solvation free energies were observed between the two Martini FFs. All values obtained for both ΔGoctand ΔGwat were negative but were consistently greater in magnitude for v2.2 compared to v3.0 of the Martini FF, in agreement with the former being more ‘sticky’41,63. The negative ΔΔGoct→wat values obtained for both FFs reflect the preference for GAGs to partition into the water phase, due to the presence of polar hydroxyls along with anionic carboxylate and sulfate groups. Martini FF v3.0 exhibited slightly larger ΔΔGoct→wat values, indicating a greater tendency to partition into the bulk water phase.
As experimental values are unavailable, likely due to the highly polar nature of GAGs, the partition coefficients calculated from solvation free energies were compared to estimates obtained from empirical prediction methods including atomic (AlogP), hybrid (XlogP3) and fragment-based (ClogP, AC_logP) approaches (Figure 7 and Table S3Error! Reference source not found.). The predicted values correlated slightly better in Martini v3.0 than v2.2, wherein the average unsigned errors (AUE) compared to the mean of the empirical predictions were 8.14 and 8.5 kJ mol-1, respectively. The average of the predicted values from all these methods was generally slightly closer to the ones calculated from Martini v3.0 than to v2.2.
The log POW values obtained from simulations were compared to those obtained from empirical prediction methods as labelled. Avg represents the mean of values from all the prediction methods.
2.7 Aggregation studies
The highly polar and often charged nature of GAGs make them readily soluble in water. HP is water soluble with concentrations as high as 50 g L-1 64. Therefore, we performed aggregation studies of 35 HP tetrasaccharides corresponding to a significantly lower concentration of ∼23 g L-1 (Figure 8). In Martini v2.2 simulations, the GAGs aggregated within the first few hundred nanoseconds and remained bound to one another upon extending the simulations to 1 μs. This suggested that the non-bonded interaction potentials are too attractive. Similar behaviour has previously been shown for carbohydrates65 and N-glycans34 when using the Martini v2.2 FF. To correct for the imbalance in the glycan-glycan nonbonded interactions, the well depths (ε) of the LJ interaction potentials were scaled according to the following equation:
The average RDFs for CG and AA models are shown. A range of FFs and LJ scaling factors were tested, as labeled. The RDFs were calculated from the final 200 ns of each trajectory.

where λ is the scaling factor and εoriginal is the unscaled well depth of the LJ potential. The same correction factor has been used before for glycan34,65 and protein40 parameters. Multiple scaling factors, including 0.95, 0.9, 0.85 and 0.8 were tested. The radial distribution factions g(r) from CG simulations were compared to those obtained from AA simulations using the CHARMM36 FF (Figure 8). A drastic contrast in the g(r) values for unscaled Martini v2.2 simulations (λ=1.0) and the AA simulations was observed, suggesting excessive aggregation propensity. Scaling the interactions by λ=0.95 resulted in about 50% lower aggregation. Further scaling the interactions reduced the RDFs even more, and no aggregates were observed upon visual inspection of the trajectory in the case of λ=0.9. Although CG simulations could not perfectly reproduce the AA curve using λ=0.9 scaling or below, the differences between CG and AA models were minimal. Therefore, when using Martini v2.2, a scaling factor of 0.9 is recommended for GAG studies, similar to the value recommended for N-glycans34, where smaller glycans (up to 5 monosaccharides) needed the non-bonded interactions to be scaled by 0.9. In contrast, Martini v3.0 showed much lower aggregation propensity, as expected: the resultant RDF was similar to that obtained for Martini v2.2 with a scaling factor of 0.9.
2.8 GAG-protein studies
2.8.1 Heparin-binding to FGFR1-FGF2 complex
FGFR1 is a dimeric receptor that is tightly regulated by a unique set of FGFs ligands66,67, and comprises immunoglobulin-like domains in the extracellular region, a transmembrane helix, and cytoplasmic domains, including a kinase domain68,69. HP is essential for the homo- and heterodimerization of the receptor, and hence crucial in the proper functioning of the signalling process70,71. GAG binding experiments with FGFR showed dependence on HP chain length. Hexasaccharides and even disaccharides exhibit some binding activity against the receptor72–74, while progressively increasing affinities have been reported for up to dodecasaccharides 16,75. Crystallographic studies showed that FGFR GAG binding is facilitated by a positively charged ‘canyon’ which may be occupied by two HP10 chains53 (Figure S3Error! Reference source not found.) with docking studies indicating possible occupation of the entire canyon by longer chains71. Therefore, we studied interactions of HP20 binding to the FGFR1-FGF2 complex via both biased docking approaches and unbiased self-assembly simulations (see Methods section for further details). In the pre-docked simulations, HP20 remained stably bound throughout the simulations (Figure 9). The saccharide units within the central cavity of D-II showed minimum fluctuations, as assessed visually, with terminal sugars showing higher flexibility (Figure 9B). The homodimeric symmetric interaction pattern of HP20 within the positively charged canyon was evident from contact probabilities averaged over the entire trajectory (Figure 9C), which was similar to that observed in the crystal structure.
An ensemble of conformations of HP20 is shown as 20 overlapping and equally spaced frames derived from a 500 ns trajectory, viewed from (A) the side, and (B) the top of the complex. The GlcNS and IdoA sugars of HP20 are shown in licorice representation with cyan and green colors, respectively, and its backbone is colored salmon pink. Protein is colored as in Figure 8A-B. (C) Contact probability of HP20 to FGFR1-FGF2 complex (shown in surface representation) within a cutoff of 6 Å, colored on a scale from blue (P=0) through grey to red (P=1). Simulations were performed using Martini v2.2 parameters with a LJ scaling factor of 0.9.
As the CG representation allows access to higher timescales, unbiased self-assembly simulations of the receptor in the presence of two randomly placed HP20 ligands were next performed. In 3 out of 4 self-assembly simulations, HP20 ligands were able to find the binding site on FGFR1 within a few hundred nanoseconds (Figures S4A-C), with some HP20 saccharides binding to positively charged residues on FGF2 in the remaining replicate (Figure S4D). In one of the productive replicates, an HP20 ligand became bound to half of the FGFR1 cavity, similarly to the binding mode observed for HP10 in the crystal structure66, while the other HP20 ligand interacted partially with the remainder of the cavity as well as FGF2 (Error! Reference source not found.B). In the remaining productive replicates, HP20 sampled the whole of the binding cavity (Figure S4Error! Reference source not found.A, C), supporting previous hypotheses for the interaction of longer ligands with the FGFR1-FGF2 complex71.
2.8.2 Bikunin-CS proteoglycan
Bikunin PG is a 16 kDa serine protease inhibitor belonging to the kunin family of inhibitors 76 and contains a single GAG attached via an O-linked modification at the Ser-10 position55. The resultant PG is a therapeutically important molecule that has been used for the treatment of acute pancreatitis77. The 3D structure of the GAG sequences in either the free23,61,78–80 or bikunin-attached form61 have been studied, and found to range from a length of 27 to 39 saccharide units57 when attached to bikunin. Therefore, either a 30-mer CS (CS30) or a 40-mer CS (CS40) chain was used to generate an O-linked bikunin PG via a tetrasaccharide linker (LIN) (Figure 4). Subsequent simulations were used to explore the dynamics of these ‘extremes’ of CS chain length, validated against hydrodynamic radii estimated from previously performed DLS measurements.
The Rg values were monitored throughout the simulation. We found that starting from an extended conformation, the CS saccharides near the protein start interacting with the protein while the terminal ends point toward the solvent (Figure 10A, B). The same was observed while measuring the Rg values which were found to gradually reduce throughout the simulation, stabilizing at around ∼3 and ∼4 nm for CS30 and CS40, respectively. The experimentally estimated Rg values (Rg = 1.17 ∗ Rℎ) lie within the predicted Rg values observed for the two simulation extremes, confirming that the most prominent chain length indeed lies within the previously estimated chain length range of 27-39 CS units57 (Figure 10C).
Ensemble of conformations of (A) CS30 and (B) CS40 attached to Bikunin. GAG chains are colored in cyan while bikunin is colored in red. Snapshots are overlaid from every 100 ns of the entire trajectory. (C) Radius of gyration for bikunin complex formed with CS30 (orange) or CS40 (cyan). The experimental value of Rg is shown as a dashed line with errors in grey. Simulations were performed using Martini v2.2 parameters with a LJ scaling factor of 0.9.
2.9 Discussion
Structural characterization of GAGs in PGs is hampered by the fact that their observable lengths in crystal structures are typically short (typically only up to 5 to 8 saccharides)81. GAG-protein analyses performed in the past have typically focused on shorter GAG sequences82–86. Although these results may be extrapolated to longer polymers, the need for methodologies allowing for directly studying longer sequences has motivated the development of CG GAG models. The CG approach allows for the sampling of extended timescales for larger GAG polymers and PG complexes. Previous CG approaches23–26 have made advancements in understanding the behaviour of longer GAG chains, but lack wider compatibility with other classes of biomolecules.
Herein, we report a CG model for a library of common GAG types compatible with the widely popular CG Martini FFs (v2.2 and v3.0) for biomolecular simulations. CG models for HP, HS, CS, DS, KS and HA were carefully developed after assessing the stability of two alternative mapping schemes. A linear mapping scheme similar to previous ones for carbohydrates30 and N-glycans34 successfully reproduced the local and global properties of tetra- and decasaccharide structures according to AA data for all GAGs, with notable deviations only evident in the case of the flexible KS polymer whose multiple local minima in dihedral space were hard to fully replicate at CG resolution. The conformational properties of the GAGs were also further validated against available experimental hydrodynamic data.
The GAG parameters are still potentially limited by the extent to which anomeric linkages are distinguished in the di/tetrasaccharides. However, this effect is minimized with longer chains where the helical nature is dependent upon the backbone dihedrals and angles. In addition to the glycosidic linkages, the conformations of GAG chains are influenced by the monosaccharide ring conformations, which have been shown to affect local and global properties of glycans, including GAGs25,87–89. In our AA simulations (Figure S5), the UA and HEX units populated the 4C1 conformation as observed in other FFs as well as NMR studies87–89. 4C1 (major) and 2S0 (minor) ring conformers were observed for the GlcA sugar, similar to the lowest energy conformers observed in previously reported MD simulations88. The IduA sugar in HP and DS populated the 1C4 conformer, analogous to experimental observations87. However, a slight preference towards the 2So (37%) conformer was seen depending on the sulphonation state of IduA and its neighbouring hexosamine sugars. Hence, the AA FF used reproduces the experimentally observed puckering conformations, increasing the confidence in the obtained CG GAG models. Whilst puckering of individual units cannot be reproduced at the CG resolution, the global puckering dynamics should emerge in the angles and dihedrals along the GAG chains.
The non-bonded parameters governing the interactions of GAGs with water, ions and other biomolecules are equally important4,90–92. The Martini bead type choices were validated by comparing partition coefficients to empirical fragment-based prediction methods. Both v2.2 and v3.0 resulted in log Pow values in the range of the predicted data, meaning that either of the FF versions would be suitable for simulations. On the other hand, in the case of systems containing multiple solvated GAG molecules, we found that the GAGs aggregated within a few hundred nanoseconds for v2.2 of the FF. Similar behaviour has been previously reported wherein the v2.2 FF turned out to be ‘sticky’ especially for carbohydrate containing molecules33,34,65,93. Similar to N-glycan34 and carbohydrate65 studies, a scaling approach proved beneficial for optimizing the nonbonded interactions. Based on the solution aggregate simulations, a scaling factor of 0.9 could be recommended for Martini v2.2, while no scaling should be necessary with Martini v3.0. Following the corrections to the nonbonded interactions, the model performed very well in predictions of protein-GAG interactions and dynamics. Long HS chains were shown to bind spontaneously to FGFR1-FGF2 at the proposed binding site, whilst O-linked CS chains on the bikunin PG could map out the hydrodynamic space for the predicted GAG length. This confirms the model’s applicability to biologically relevant systems.
3 Conclusions
In summary, we focused on extending v2.2 and v3.0 of the Martini FF to GAGs. The resultant model can accurately predict local and global properties such as end-to-end distances and hydrodynamic radii. Solvation free energies were in good agreement with prediction data, thereby justifying the choice of bead types. In solution, a modest correction factor of 0.9 was best able to reproduce the underlying atomic resolution g(r) for Martini v2.2. Applying the parameters to test systems composed of protein-GAG and PG complexes successfully reproduced a range of structural and biophysical data. The model may thus be employed for studying GAG properties in solution or in complex with proteins, benefitting from the efficiency and broad applicability of the Martini CG framework.
Acknowledgements
The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg) and BII-ASTAR in house clusters. The research leading to these results was supported by funding from the Ministry of Education in Singapore (MOE AcRF Tier 3 Grant Number MOE2012-T3-1-008) and BII (A*STAR) core funds. PJB acknowledges funding from A*STAR (grant number FY21_CF_HTPO SEED_ID_BII_C211418001). JKM acknowledges AME Young Individual Research Grant (YIRG) number A2084c0160.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.
- 85.
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.
- 92.↵
- 93.↵