ABSTRACT
With approximately 400 encoding genes in humans, odorant receptors (ORs) are the largest subfamily of class A G protein-coupled receptors (GPCRs). Despite its high relevance and representation, the odorant-GPCRome is structurally poorly characterized: no experimental structures are available and the low sequence identity of ORs to experimentally solved GPCRs is a major challenge for their modeling. Moreover, the receptive range of most ORs is unknown. The odorant receptor OR5K1 was recently and comprehensively characterized in terms of cognate agonists. Here we investigate the binding modes of identified ligands into the OR5K1 orthosteric binding site using structural information both from AI-driven modeling, as recently released in the AlphaFold Protein Structure Database, and from template-based modeling. Induced-fit docking simulations were used to sample the binding site conformational space for ensemble docking. Side chain residue sampling and model selection were guided by mutagenesis data. We obtained models that could better rationalize the different activity of active (agonist) versus inactive molecules with respect to starting models, and also capture differences in activity related to small structural differences. We, therefore, provide a model refinement protocol that can be applied to model the orthosteric binding site of ORs as well as that of GPCRs with low sequence identity to available templates.
INTRODUCTION
G protein-coupled receptors (GPCRs) are the largest family of membrane proteins in the human genome. Through interaction with their modulators, GPCRs mediate the communication between the cell and the extracellular environment and are therefore involved in almost all physiological functions.1-4 Commonly, GPCRs are grouped into six classes based on the phylogenetic analysis: A (rhodopsin-like), B (secretin-like), C (metabotropic glutamate receptors), D (pheromone receptors), E (cAMP receptors), and F (frizzled/smoothened receptors).5-6 Class A GPCRs consist of over 80% of all GPCRs and are the targets of 34% of all drugs present in the market.7-8
Class A GPCRs share a basic architecture consisting of a bundle of seven transmembrane α-helices (TM1-TM7) connected by three intracellular loops (ICLs) and three extracellular loops (ECLs), a relatively short N-terminus in the extracellular region, and a short helix 8 connected to the C-terminus in the intracellular module. The ligand-binding domain of class A GPCRs, commonly referred to as the orthosteric binding site, is located in the EC part of the 7TM bundle (made up of residues belonging to TM3, TM5, TM6, and TM7), and has high structural diversity among different receptor subtypes. The 7TM bundle is the most structurally conserved component of the class A GPCR structures, presenting characteristic hydrophobic patterns and functionally important signature motifs.9-10
Odorant receptors (ORs), with approximately 400 encoding genes in humans, are the largest subfamily of class A GPCRs.11-15 Mammalian odorant receptors are split into two phylogenetically distinct groups, class I and class II ORs, which can be distinguished by some characteristic features that are highly conserved within their sequences.16-19 ORs present most of the class A GPCR signature motifs, despite an overall low sequence identity with the non-sensory class A GPCRs.20-21 The orthosteric binding site of ORs was also found to coincide with that of non-sensory class A GPCRs.20-25
The olfactory system uses a combinatorial code of ORs to represent thousands of odorants: a specific OR type may recognize more than one odorant, and each odorant may be recognized by more than one OR.26-31 Despite current efforts in assigning ORs to odorant molecules, or, vice versa, in defining the chemical ligand space of individual ORs, only the molecular recognition ranges of a few ORs have been investigated.27, 32-37
Structure-based virtual screening campaigns have been successfully applied for GPCR ligand discovery and are always more in use with the recent extraordinary advances in GPCR structural biology.38 Currently, no experimental structures of human ORs are available, and homology modeling techniques have been used to rationalize the binding modes of odorant compounds into ORs and discover new OR ligands.37, 39-43 AI-based methods are emerging as compelling tools to predict the 3D structure of proteins.44-46 During the CASP (Critical Assessment of Structure Prediction) 14 competition, AlphaFold 2 (AF2) was shown to be able to predict the structure of protein domains at an accuracy matching experimental methods.47 A database of over 360,000 protein models across 21 species was released and is scheduled to grow to cover over 100 million proteins (https://alphafold.ebi.ac.uk/).48-49 The database expands the coverage for GPCR structures, including 4,192 proteins annotated as odorant receptors, 97% of which are mammalian.45
In this paper, we used both AlphaFold 2 and template-based modeling methodologies for OR5K1 structural prediction. OR5K1 has been recently characterized as the specialized OR for the detection of pyrazine-based key food odorants and semiochemicals.50 We investigated the interaction of the set of identified agonists within the binding site of OR5K1 and used ligand information and mutagenesis data to guide the model refinement process.
RESULTS AND DISCUSSION
OR5K1 agonists
Pyrazines are known for contributing greatly to the aroma of roasted foods,51-53 but they are also renowned as semiochemicals,54-58 namely compounds that transfer chemical cues between individuals of the same and/or different species, most often eliciting a standardized behavior.59 Recently, OR5K1 was characterized as a specialized odorant receptor for the detection of pyrazine-based key food odorants and semiochemicals.50 The most potent compound is compound 1 (2,3-diethyl-5-methylpyrazine, EC50 = 10.29 μM). Compounds tested against OR5K1 include molecules with shorter or missing aliphatic chains to the pyrazine moiety (compounds 4, 6, 7, 12). We also know that the pyrazine itself does not activate this receptor.50 Therefore, the activity of OR5K1 molecules is supposed to rely on the presence and position of the aliphatic chains (Table 1). Interestingly, in the screening of pyrazines, the mixture of isomers 2-ethyl-3,5(6)-dimethylpyrazine was found to activate OR5K1 with an EC50 of 21.18 μM.50 In this work, we isolated the mixture and tested the individual isomers against OR5K1. We found that 2-ethyl-3,6-dimethylpyrazine (compound 2) has an EC50 of 14.85 μM, while 2-ethyl-3,5-dimethylpyrazine (compound 13) could not be measured to saturation with the concentration range available. This provides precise information on the contribution of the ethyl groups attached to the pyrazine ring.
OR5K1 agonists and EC50 values. Data for compounds 1, 3-12 are retrieved from literature,50 while data for compounds 2 and 13 were tested in this work.
OR5K1 structure prediction
ORs and chemosensory GPCRs share low sequence similarity (below 20%) with experimentally solved GPCRs.20, 60 The accuracy of 3D structures obtained by homology modeling is highly dependent on the templates. Good models of membrane proteins can be obtained for template sequence identities higher than 30%.61 A multi-template homology modeling approach has been used for successfully modeling different ORs, including OR51E1 and OR7D4.23, 62 In this approach, conserved motifs were used to guide the sequence alignment of odorant receptors; bovine Rhodopsin (bRho), human β2-adrenergic (hβ2AR), human Adenosine-2A (hA2A), and human Chemokine-4 (hCXCR4) receptors were used as templates.21
OR5K1 shares 15-19% sequence identity with these templates (Figure S1). Considering that we aimed to use the model to investigate the binding modes of agonists, we built the 3D structure of OR5K1 using bRho, hβ2AR, and hA2A in their active state, while hCXCR4 is only available in its inactive state.38 The extracellular loop 2 (ECL2) of the templates is much shorter than the ECL2 of OR5K1 (Figure S2). ECL2 is the largest and most structurally diverse extracellular loop of GPCRs,63 and those of ORs are among the longest ECL2 in class A GPCR.64 Loop modeling is highly challenging when sequence length reaches the size of the ECL2.65-67 We remodeled this region using templates with higher similarities in terms of length and sequence composition (Figures S2 and S3). Specifically, we used the ECL2 of NPY2 and CCK1 receptors as templates for the segment before the conserved Cys45.50 (S1574.57-Y179ECL2) and the Apelin receptor for the segment after the Cys45.50 (C18045.50-L1885.37).
We then downloaded the Alphafold 2 (AF2) structure of OR5K1 (https://alphafold.ebi.ac.uk/entry/Q8NHB7) to compare it with our homology model (HM). Except for the N-Terminus and the ECL3, the per-residue confidence score (average predicted local distance difference test, pLDDT) of all regions of the model is >90 (very high) or between 70 and 90 (confident) (Figure S3). The OR5K1 AF2 model is also among the high confidence AF2 GPCR models, as assessed by the per-model pLDDT80 score, which was suggested as a potential criterion to assess the quality of AF2 models for structure-based virtual screening.68 AF2 and HM models have a Root Mean Square Deviation (RMSD) of the alpha carbons of 3.26 Å. We observed a major difference in the TM5 conformation, which is closer to the orthosteric binding site in the HM than in the AF2 model. We calculated the GPCR activation index of the two models using the A100 tool,69 confirming that the HM is in its active state with an activation index of 68.46, but AF2 is an inactive model with an activation index of -21.30. This is because the OR5K1 HM was modeled using most of the templates in the active state conformation, instead, AF2 was generated with algorithms that do not necessarily take into consideration the activity state.
To assess the predictive ability of the HM and AF2 models, we performed molecular docking calculations of known ligands as actives (13 compounds, Table 1) and with all the compounds that did not elicit receptor response with a defined chirality (131 compounds, the complete list with SMILES is available at https://github.com/dipizio/OR5K1_binding_site) as inactives, and we then evaluated the performance of each model through Receiver Operating Characteristic (ROC) analysis.70-71 The Area Under the Curve (AUC) values are similar for HM (0.67) and AF2 (0.68), and the enrichment factor in the top 15% of the sorted screened molecules (EF15%) is very low in both cases, 0.11 and 0.24 for HM and AF2, respectively (EF15% max = 1.63) (Figure S4). The AF2 model is not able to dock the most potent agonists in our set. The only highly ranked agonist in both HM and AF2 models is compound 9 (EC50 = 527.76 μM), with docking scores of -5.68 and -4.91 kcal/mol, respectively. As expected, HM and AF2 models have different residue arrangements in the binding site, but, surprisingly, also the location of the predicted binding pocket is different (Figure 1). The orthosteric binding site of AF2 is not accessible, the pocket calculated with Sitemap (Schrödinger Release 2021-3: SiteMap, Schrödinger, LLC, New York, NY, 2021)72-73 is located between TM5 and TM6 and extends towards the membrane bilayer (Figure 1). Indeed, the location of the orthosteric binding site is partially occluded by the ECL2. The ECL2 folding is the most evident difference between the two models: we modeled the HM as an anti-parallel β-sheet, instead AF2 carries out an unstructured loop with a small α–helix that enters the orthosteric binding site.
SiteMap volume, H-bond acceptor area, H-bond donor area, and hydrophobic area are reported for both models.
Moreover, the secondary structure of the terminal region of TM6 is not well defined in the AF2, this portion is classified with local prediction confidence pLDDT between 70 and 90 for the helix part and lower than 70 for the ECL3 part (Figure S3). The initial part of TM7 is also different between the two models, there is a shift of one position in the helix and therefore different residue arrangements.
OR5K1 model refinement
AF2 and HM models propose two different ligand positions and binding poses. We performed induced-fit docking (IFD) simulations (Schrödinger Release 2021-3: Induced Fit Docking protocol; Glide, Schrödinger, LLC, New York, NY, 2021; Prime, Schrödinger, LLC, New York, NY, 2021)74 with the most active compounds (compound 1) for both AF2 and HM, allowing the flexibility of the binding site side chains to explore the conformational space of the orthosteric binding site of the two models. 44 models were generated starting from the AF2 model and 57 from HM. The ROC curves of these models show an improvement in their performance, the best models have AUC values of 0.81 and 0.85, and EF15% of 0.24 and 0.50 for AF2 and HM, respectively (Figure S5). The binding modes of compound 1 in the best models of AF2 and HM are different but the ligand is now located in the core of the orthosteric binding site in both models (Figure S5). Interestingly, we noticed that two leucine residues, L1043.32 and L2556.51, are predicted to be in the binding pocket by both models (Figure S5). Odorant molecules are typically small organic compounds of less than 300 Da with high-to-moderate hydrophobicity and their binding to ORs is driven by shape complementarity and mostly hydrophobic interactions.64, 75
L1043.32 is conserved in 98% of orthologs investigated across 51 species, except for the receptor of the new world monkey Aotus nancymaae (XP_012332612.1), where a rather conservative amino acid exchange replaced the leucine at position 104 by an isoleucine (Figure S7, Table S5). Similarly, L2556.51 of OR5K1 is conserved in 96% of all orthologs, except for the receptors of Aotus nancymaae, Loxodonta africana (African elephant, XP_003418985.1), and Urocitellus parryii (Arctic ground squirrel, XP_026258216.1). In all three orthologs and in the human paralog OR5K2, again, a rather conservative amino acid exchange replaced the leucine at position 255 by an isoleucine (Figure S7, Table S5). Single nucleotide missense variations have been reported for both amino acid positions, L1043.32I (rs777947557) and L2556.51F (rs1032366530) in human OR5K1, albeit with frequencies way below 0.01. Moreover, both positions L1043.32 and L2556.51 are part of a set of 22 amino acids that have been suggested previously to constitute a generalized odorant binding pocket in ORs.76 Both amino acid positions have been identified also experimentally as odorant interaction partners in different receptors by several independent studies.24, 36, 62, 77-82 Therefore, these leucine residues are likely to play a relevant role in the ligand recognition of OR5K1 agonists. We mutated these residues to alanine (L1043.32A, L2556.51A) and found that there is a shift in EC50 values for both mutants when stimulated with compound 1: EC50 of 525.28 ± 92.28 μM for L1043.32A and EC50 of 478.36 ± 185.10 μM for OR5K1 L2556.51A (Figure 2a). Monitoring the distance between the centroid of the ligand and the center between the Cα atoms of the two leucine residues on the poses obtained with IFD simulations, we observed that, while for the HM, this distance reaches the 0.2 nm, for the AF2 model it is above 0.4 nm (Figure 2b).
Concentration−response relations of compound 1 (2,3-diethyl-5-methylpyrazine) on OR5K1 ref (black), OR5K1 L1043.32A (turquois), and OR5K1 L2556.51A (pink). Data were mock control-subtracted, normalized to the response of OR5K1 ref to 2,3-diethyl-5-methylpyrazine (300 μM) and displayed as mean ± SD (n = 4). RLU = relative luminescence units. (b) Distance between the ligand centroid and the center between L1043.32 and L2556.51 alpha carbons in the first and second IFD simulation rounds.
To improve the conformational rearrangement around the ligand, we performed a second round of IFD simulations, allowing the flexibility of the binding site side chains around compound 1. With the second round of simulations, there is a better sampling for HM conformations and an enrichment of poses in close contact with L1043.32 and L2556.51 for the AF2 model (Figure 2b).
Then we analyzed all the poses where the ligand is close to L1043.32 and L2556.51 (with a distance below 0.4 nm): 106 structures for AF2 (1 from the first round of IFD and 105 from the second round) and 110 for HM (39 from the first round of IFD and 71 from the second round). We clustered the complexes into 31 and 34 possible binding poses for AF2 and HM, respectively. The distribution of the clusters is reported in Figure S6. Among all the potential binding modes, 6 models from the refinement of AF2 model and 12 structures from the refinement of HM have an AUC higher than 0.8 (Table S1). These may be considered the most predictive binding site conformations and were submitted to a third round of IFD simulations for the extensive sampling of the conformational space of L1043.32 and L2556.51. This generates 555 structures from the model refined from AF2 and 431 structures from the model refined from HM with AUC greater than 0.8 and distance between the ligand centroid and the center between L1043.32 and L2556.51 alpha carbons lower than 0.4 nm. Despite the high similarity of generated structures, we could appreciate different sampled binding modes (37 clusters from HM and 30 clusters from AF2, Figure S8). The best performing structures for each cluster are available at https://github.com/dipizio/OR5K1_binding_site. Considering the performance, the shape of the ROC curves and the contribution to the binding of L1043.32 and L2556.51, we selected the binding poses shown in Figure 3.
(a) ROC curves and (b) binding modes of compound 1 into the OR5K1 binding site of the best AF2 and HM models obtained after the extensive sampling of the conformational space of L1043.32 and L2556.51. We show as stick residues in the binding site positions that are in common between the two models. Residue F852.65 is only reported for the HM model, because TM2 in the AF2 model is not pointing to the binding site (the Cα atoms of F85 in the two models are 8.85 Å distant).
The starting models obtained from AF2 and HM have different conformations of the TM helices that prevent reaching convergence when sampling only the side chain conformations. As an example, in Figure 3, it is possible to appreciate the difference in the shift of TM7 residues in the two models: position 7.42 is F278 in the model from AF2 and T279 in the model from HM.
However, the ligand in both models is oriented in a similar position and interacts with L1043.32 and L2556.51. L1043.32 and L2556.51 interact with the aliphatic chains attached to the pyrazine moiety and might play a relevant role on ligand selectivity. Indeed, we have shown that even isomers, such as compounds 2 and 13, elicit different receptor activation (Figure 4).
Concentration-response relations of 2-ethyl-3,6-dimethylpyrazine (compound 2) and 2-ethyl-3,5-dimethylpyrazine (compound 13) on OR5K1. Data were mock control-subtracted, normalized to the OR5K1 signal of each ligand, and displayed as mean ± SD of independent transfection experiments (n = 4). RLU = relative luminescence units.
We computationally mutated L1043.32 and L2556.51 to alanine residues in these two models. Interestingly, the docking scores correlate with the drop in activation values observed experimentally and are highly influenced by the van der Waals (vdW) contribution of the leucine residues (Figure 5). Also, the docking scores of compound 2 in both models are lower than those of compound 13. Therefore, both models seem to be able to capture most differences in activity related to small structural differences either at the ligand or receptor side.
Schematic representation of binding mode of pyrazines 1, 2 and 13 in the OR5K1 binding site of selected models. For compound 1, we report also docking scores of the mutant models.
CONCLUSIONS
ORs are class A GPCRs for which we do not have experimental structures and that share a very low sequence identity with non-sensory GPCRs. The small size of OR modulators and the low resolution of the structure modeling represent a major challenge for the investigation of the molecular recognition mechanisms of this important class of receptors. Most ORs are still orphan and the receptive range of a few ORs has been characterized until now. In this paper, we used the recently published ligand information on OR5K150 to model and refine the OR5K1 orthosteric binding site. We used a multi-template homology modeling approach, as previously suggested to be a successful strategy for OR modeling.20-21, 23, 62 Moreover, we further refined the ECL2 loop, which we previously identified to be a necessary procedure for low resolution GPCR modeling.70, 83-84
We also used the AlphaFold 2 model of OR5K1 for our analyses. This allowed us to evaluate the use of AlphaFold2 OR structures for ligand-protein interaction studies. AF2 and HM models have differences in the backbone that unavoidably affect the binding site conformations. A difference between HM and AF2 models is the activation state. The prevalence of GPCR models in the inactive state has been addressed in a recent paper by Heo et al.,85 and the authors found that this may also affect the accuracy of binding site predictions and proposed multi-state models of GPCRs.
Altogether, we found that the refinement was a necessary step for both HM and AF2 models. The refinement process of AF2 model was needed not only to improve the performance, as for HM, but also to open the orthosteric binding site and allow docking of agonists. Through the modeling, we could identify relevant residues for the activity of OR5K1 agonists, namely, L1043.32 and L2556.51. These positions are highly conserved in OR5K1 orthologs across 51 species and have an extremely low frequency of SNP-based missense variations according to the 1000 Genomes Project. The support of mutagenesis experiments furnished precious experimental information for model refinement.
In summary, we propose here an iterative experimental-computational workflow that allowed us to explore the conformational space of OR5K1 binding site and can be used to model the orthosteric binding site of ORs as well as that of GPCRs with low sequence identity to available templates.
MATERIALS AND METHODS
Synthesis of 2-ethyl-3,5(6)-dimethylpyrazine
2-ethyl-3,5(6)-dimethylpyrazines were synthesized according to Czerny et al.86 by a Grignard-type reaction. Briefly, a solution of ethylmagnesium bromide in tetrahydrofuran (20 mL; 1.0 M; 20 mmol) was placed in a three-necked flask (100 mL) equipped with a reflux condenser, a dropping funnel and an argon inlet. While stirring at 40 °C a small portion of the respective reactant (2.2 g; 20 mmol) solved in 20 mL THF was added dropwise via the dropping funnel. 2,5-dimethylpyrazine was used for the synthesis of 2-ethyl-3,6-dimethylpyrazine and 2,6-isomere was taken as starting material for 2-ethyl-3,5-dimethylpyrazine. After the mixture was refluxed (73°C) the residual 2,5(6)-dimethylpyrazine solution was added over a period of 30 min. The mixture was stirred under refluxed for 2 h, cooled to room temperature, and water (20 mL) was added dropwise. The emulsion was extracted with diethyl ether (3 ×50 mL) and dried over anhydrous Na2SO4. The compounds were purified by means of flash column chromatography. For this purpose, the concentrated extract (1.0 mL) was placed on the top of a water-cooled glass column (33 × 2.5 cm) filled with a slurry of silica gel 60 (with the addition of 7 % water, 40 – 63 μm, Merck, Darmstadt, Germany, # 1.09385.2500) and n-pentane. The target compounds were eluted with n-pentane/diethyl ether (100 ml, 40:60, v/v). The purity of each target compound was analyzed by gas chromatography-mass spectrometry (GC-MS) and nuclear magnetic resonance (NMR). For determining the concentration of each 2-ethyl-3,5(6)-dimethylpyrazine, quantitative NMR (qNMR) was applied. For the NMR experiments, the solvent was distilled off and the residue was solved in CDCl3.
2-ethyl-3,5-dimethylpyrazine: MS (EI): m/z (%) 135 (100), 136 (M+, 81), 42 (18), 108 (17), 107 (15), 56 (12). 1H-NMR (CDCl3, 400 MHz, 25 °C) δ (ppm) 8.15 (s, 1 H, H-C6), 2,80 (q, J=7.6, 2H, H-C7), 2.53 (s, 3 H, H-C9/10, 2.49 (s, 3 H, H-C9/10), 1,27 (t, J=7.6, 3H, H-C8). 2-ethyl-3,6-dimethylpyrazine: MS (EI): m/z (%) 135 (100), 136 (M+, 92), 56 (24), 108 (16), 42 (12), 107 (11). 1H-NMR (400 MHz, CDCl3) δ (ppm) 8.20 (s, 1 H, H-C6), 2.81 (q, J=7.5, 2H, H-C7), 2.54 (s, 3 H, H-C9/10, 2.49 (s, 3 H, H-C9/10), 1,28 (t, J=7.5, 3H, H-C8).
Nuclear magnetic resonance (NMR)
NMR experiments were performed using an Avance III 400 MHz spectrometer equipped with a BBI probe (Bruker, Rheinstetten, Germany). Topspin software (version 3.2) was used for data acquisition. For structure elucidation the compounds were solved in chloroform-d (CDCl3). Chemical shifts were referenced against solvent signal. Quantitative 1H-NMR (qNMR) was done according to Frank et al.87 For this, an aliquot (600 μL) of the dissolved solutions was analyzed in NMR tubes (5 × 178 mm, Bruker, Faellanden, Switzerland).
Gas chromatography – mass spectrometry (GC-MS)
Mass spectra of the synthesized pyrazines in the electron ionization mode were recorded using a GC-MS system consisting of a Trace GC Ultra gas chromatograph coupled to a single quadrupole ISQ mass spectrometer (Thermo Fisher Scientific, Dreieich, Germany) as described more detailed by Porcelli et al.88 A DB-1701 coated fused silica capillary column (30 m × 0.25 mm i.d., 0.25 μm film thickness; Agilent, Waldbronn, Germany) was taken for chromatographic separation using the following temperature program: 40°C held for 2 min, then it was raised at 10 °C/min to 230°C (held for 4 min). Mass spectra were acquired at a scan range of 40–300 m/z at an ionization energy of 70 eV. The mass spectra were evaluated using Xcalibur 2.0 software (Thermo Fisher Scientific).
Molecular cloning of OR5K1
The protein-coding region of human OR5K1 (NM_001004736.3) derived from our previously published OR library.89 Amplification was carried out in a touchdown approach using gene-specific primers (Table S2): an initial denaturation (98 °C, 3 min) and ten cycles consisting of denaturation (98 °C, 30 s), annealing (60 °C, decreasing 1 °C per cycle down to 50 °C, 30 s), and extension (72 °C, 1 min), followed by 25 cycles of denaturation (98 °C, 30 s), annealing (50 °C, 30 s), and extension (72 °C, 1 min), finishing with a final extension step in the end (72 °C, 7 min). Insertion of nucleotides into expression vectors was done with T4-DNA ligase (#M1804, Promega, Madison, USA) via EcoRI/NotI (#R6017/#R6435, Promega, Madison, USA) into the expression plasmid pFN210A,90 and verified by Sanger sequencing using internal primers (Table S3) (Eurofins Genomics, Ebersberg, Germany).
PCR-based site-directed mutagenesis
Mutants L1043.32 and L2556.51 were generated by PCR-based site-directed mutagenesis in two steps. Utilized mutation primers were designed overlapping and are listed in Table S4. Step one PCR was performed in two amplifications, one with the forward vector-internal primer and the reverse mutation-primer, the other with the forward mutation-primer and the reverse vector-internal primer. Amplification was performed with the touchdown approach described above. Both PCR amplicons were then purified and used as template for step two. The two overlapping amplicons were annealed using the following touchdown program: denaturation (98 °C, 3 min), ten cycles containing denaturation (98 °C, 30 s), annealing (start 60 °C, 30 s), and extension (72 °C, 2 min). After this, vector-internal forward and reverse primers were added and 25 further cycles of denaturation (98 °C, 30 s), annealing (50 °C, 30 s), and extension (72 °C, 1 min) were carried out, finishing with a final extension step in the end (72 °C, 7 min). The amplicons were then sub-cloned as described above.
Cell culture and transient DNA transfection
We utilized HEK-293 cells,91 a human embryonic kidney cell-line, as a test cell system for the functional expression of ORs.92 Cells were cultivated at 37 °C, 5% CO2, and 100% humidity in 4.5 g/L D-glucose containing DMEM with 10% fetal bovine serum, 2 mM L-glutamine, 100 U/mL penicillin, and 100 U/mL streptomycin. Cells were cultured in a 96-well format (Nunclon™ Delta Surface, #136102; Thermo Fisher Scientific, Schwerte, Germany) at 12,000 cells/well overnight. Then, cells were transfected utilizing 0.75 μL/well ViaFect™ (#E4981, Promega, USA) with the following constructs: 100 ng/well of the respective OR construct, 50 ng/well of chaperone RTP1S,93 50 ng/well of the G protein subunit Gαolf,94-95 olfactory G protein subunit Gγ13,96 and 50 ng/well of pGloSensor™-22F (Promega, Madison, USA).97 The utilized pGloSensor™-22F is a genetically engineered luciferase with a cAMP-binding pocket, allowing for measurements of a direct cAMP-dependent luminescence signal. All measurements were mock-controlled, i.e. pFN210A without OR was transfected in parallel.
Luminescence assay
Concentration-response assays were measured 42 hours post-transfection as described previously.92 In short, supernatant was removed and cells were loaded with a physiological salt buffer (pH 7.5) containing 140 mmol/L NaCl, 10 mmol/L HEPES, 5 mmol/L KCl, 1 mmol/L CaCl2, 10 mmol/L glucose, and 2% of beetle luciferin sodium salt (Promega, Madison, USA). For luminescence measurements, the GloMax® Discover microplate reader (Promega, Madison, USA) was used. After an incubation for 50 minutes in the dark, the basal luminescence signal of each well was recorded thrice. Then the odorant, serially diluted in the physiological salt buffer with added Pluronic PE-10500 (BASF, Ludwigshafen, Germany), was applied to the cells and luminescence was measured thrice after ten minutes of incubation time. The final Pluronic PE-10500 concentration on the cells was 0.05%.
Data analysis of the cAMP-luminescence measurements
The raw luminescence data obtained from the GloMax® Discover microplate reader detection system were analyzed for concentration/response assays by averaging both data points of basal levels and data points after odorant application. For a given luminescence signal, the respective basal level was subtracted and the now corrected data set was normalized to the maximum amplitude of the reference. The data set for the mock control was subtracted and EC50 values and curves were derived from fitting the function:
to the data by nonlinear regression (SigmaPlot 14.0, Systat Software).98 Data are presented as mean ± SD.
Phylogenetic analysis
NCBI99 was used as database for the retrieval of genetic information on Homo sapiens (human) odorant receptor genes as well as orthologous receptor genes of OR5K1 (for accession numbers see Table S5). The phylogenetic reconstruction of ORs was performed with QIAGEN CLC Genomics Workbench 21.0 (https://digitalinsights.qiagen.com/) and MEGA X software.100 Therefore, in a first step, all sequences were aligned using ClustalW algorithm.101 The evolutionary history was inferred using the Neighbor-Joining method 102 followed by 500 bootstrap replications.103 Scale bar refers to the evolutionary distances, computed using the Poisson correction method.104 Evolutionary analyses were conducted in MEGA X.100 For rooting the constructed tree, human rhodopsin (NCBI entry: NP_000530.1) was used as an out-group.
Homology Modeling
Rhodopsin receptor (PDB ID: 4×1H), β2-adrenergic receptor (PDB ID: 6MXT), CXCR4 receptor (PDB ID: 3ODU), and A2A receptor (PDB ID: 2YDV) were used as templates for modeling the 3D structure of OR5K1, following the template selection from de March et al. 2015.20 The structures were downloaded from GPCRdb,105 and their sequences were aligned to the OR5K1 sequence (residues 20-292) with the Protein Structure Alignment module available in Maestro (Schrödinger Release 2021-3, Maestro, Schrödinger, LLC, New York, NY, 2021). The sequence alignment was then manually adjusted, ensuring that conserved GPCR residues were correctly aligned (Figure S1). OR5K1 shares a sequence identity of 19% with 6MXT.pdb, of 15% with 4×1H.pdb, of 15% with 3ODU.pdb and of 16% with 2YDV.pdb. We modeled the ECL2 region (S1574.57-L1885.37) using as templates NPY2 (PDB ID: 7DDZ) and CCK1 (PDB ID: 7MBY) for the before-Cys45.50 segment, and apelin (PDB ID: 6KNM) for the after-Cys45.50 segment (Figures S2 and S3). We also remodeled the region between P812.58 and L1053.32 with the NPY2 to ensure the correct orientation of the ECL2 towards TM3 and ECL1, and the formation of the conserved disulfide bridge between C3.25 and C45.50. 100 homology models were generated using MODELLER v9.23.106 Four models were selected based on the DOPE score and visual inspection of the ECL2 and the most predictive model, based on ROC AUC (see the paragraph Molecular Docking) was chosen for the following analysis.
Protein preparation and binding site analysis
OR5K1 AF2 model was downloaded from the AlphaFold 2 database (https://alphafold.ebi.ac.uk/entry/Q8NHB7). OR5K1 AF2 and HM were superimposed through the Protein Structure Alignment module available in Maestro (Schrödinger Release 2021-3, Maestro, Schrödinger, LLC, New York, NY, 2021). Hydrogen atoms and side chains of both models were optimized with the Protein Preparation Wizard tool at physiological pH (Schrödinger Release 2021-3, Maestro, Schrödinger, LLC, New York, NY, 2021). Ramachandran plots were generated to verify the reliability of the backbone dihedral angles of amino acid residues in the models. The A100 tool was used to investigate the activation state of the models.69
SiteMap tool (Schrödinger Release 2021-3: SiteMap, Schrödinger, LLC, New York, NY, 2021) was used to characterize the binding cavities of both models.
Molecular Docking
The compounds used in the screening by Marcinek et al. were used for the model evaluation.50 However, we excluded from this set 54 molecules employed as a mixture of isomers. Indeed, the measured activity of the mixture may not correspond to the activity of the individual stereoisomers (e.g., only one stereoisomer is active) and compromise our validation. Among the subset of molecules with defined stereochemistry, we selected 11 agonists with EC50 values below 600 μM and compounds characterized in this work were included in the list of active molecules (Table 1). 131 compounds that did not elicit receptor response were used as inactives (the list of compounds is available at https://github.com/dipizio/OR5K1_binding_site).
3D structures of ligands and inactive molecules were retrieved from PubChem through CAS numbers and prepared for docking through the generation of stereoisomers and protonation states at pH 7.2 ± 0.2 with LigPrep, as implemented in the Schrödinger Small-Molecule Drug Discovery Suite 2021 (LigPrep, Schrödinger, LLC, New York, NY, 2021). Glide Standard Precision (Glide, Schrödinger, LLC, New York, NY, 2021)107-108 was used for docking all compounds to the OR5K1 models. The grid box was centroid of SiteMap grid points for HM and AF2 binding pockets combined together for the models obtained after the first round of IFD, and instead was centroid of the docked 2,3-diethyl-5-methylpyrazine (compound 1) for the models obtained after the second round of IFD simulations.
The docking poses of compound 1 within OR5K1 mutants were performed using the in-place docking (Glide Standard precision), generating the grid from the centroid of the docked compound. Mutants were generated with the ‘Mutate residue’ tool available in Maestro.
An in-house python script based on Scikit-learn (v0.24.2) package was used for the ROC curve analysis,109 and the data were plotted with Matplotlib Python library.110 AUC and EF15% of the training library were used to evaluate the performance of each model in discriminating between active and inactive compounds.
The ROC curves were obtained plotting False Positive Rate (FPR) vs. True Positive Rate (TPR). TPR and FPR values are calculated by the following equations:
where TP is the number of true positive compounds, and FN is the number of false negative compounds.
where FP is the number of false positive compounds, and TN is the number of true negative compounds.
EF15% values are calculated by the following equation:
where Nactives (15%) and Ninactives (15%) represent the number of actives and inactives, respectively, in the 15% of ranked screened compounds.
Induced-fit docking simulations
In the first round of simulations, HM and AF2 starting models were used for IFD simulations using Schrödinger Suite 2021 Induced Fit Docking protocol (Glide, Schrödinger, LLC, New York, NY, 2021; Prime, Schrödinger, LLC, New York, NY, 2021).111 2,3-diethyl-5-methylpyrazine was used as ligand and the flexibility of the side chains at 3 Å from the SiteMap grid points was allowed. The best structures based on AUC values and visual inspection from IFD1 (4 structures after refinement of HM and 7 after refinement of AF2 model) underwent to a second round of simulations (IFD2). In the second round of simulations, the residues at 4 Å from the ligand (2,3-diethyl-5-methylpyrazine) were allowed to move. The most predictive structures from IFD2 (Table S1) were submitted to a third round of IFD simulations (IFD 3), in which only the side chains of L1043.32 and L2556.51 and the ligand were treated as flexible. For an extensive sampling of the leucine residues, we used as ligand both compound 1 and 2.
Clustering of docking poses
For all poses from IFD1, IFD2, and IDF3 we monitored the distance between the ligand centroid and the center between L1043.32 and L2556.51 alpha carbons. The centroids and distances were calculated using PLUMED (version 2.7).112-114 The docking poses from IDF1 and IDF2 with a distance below 0.4 nm were clustered using the conformer_cluster.py from Schrödinger (https://www.schrodinger.com/scriptcenter). First, a pair-wise RMSD matrix was calculated for compound 1 and the residues within 7 Å of its centroid (for HM, residues 104, 105, 108, 159, 199, 202, 206, 255, 256, 276, 279, 280; for AF2, residues: 101, 104, 105, 108, 178, 180, 181, 199, 255, 258, 259, 275, 278, 279), and then the complexes were clustered using the hierarchical cluster method (average group linkage). The number of clusters was set to 31 for AF2 and 34 for HM based on the second minimum of the Kelly-Penalty score. Docking poses obtained from IDF3 were filtered by distance (below 0.4 nm), AUC (greater than 0.8) and the conformations of the binding site were clustered using the conformer_cluster.py from Schrödinger. RMSD matrices of best performing structures from the different clusters were calculated with rmsd.py from Schrödinger (Figure S8).
ChimeraX (v1.3) was used to render the protein images.115
DATA AND SOFTWARE AVAILABILITY
The dataset and refined models can be downloaded from https://github.com/dipizio/OR5K1_binding_site.
Supporting Information
Multiple Sequence Alignment of OR5K1 with templates (Figure S1); ECL2 for OR5K1 and experimental class A GPCRs (Figure S2); OR5K1 models built with AlphaFold 2 and homology modeling (Figure S3); ROC analysis of the starting OR5K1 AF2 and HM models (Figure S4); Binding modes and ROC analyses of the OR5K1 AF2 and HM models after the first IFD simulation round (Figure S5); Distribution of the clusters binding poses of compound 1 in proximity to L1043.32 and L2556.51 (Figure S6); Leucine residues L1043.32 and L2556.51 are highly conserved in OR5K1 homologs (Figure S7); RMSD matrices for the orthosteric binding site of the best performing models obtained after clustering IFD3 models (Figure S8); Models from IFD1 and IFD2 with d < 0.4, AUC > 0.8 (Table S1); Oligonucleotides for molecular cloning of OR5K1 (Table S2); Vector internal oligonucleotides (Table S3); Oligonucleotides for Homo sapiens OR5K1 site directed mutagenesis (Table S4); NCBI reference sequences of olfactory receptor genes investigated (Table S5).
AUTHOR INFORMATION
Authors
Alessandro Nicoli - Molecular Modeling Group, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany,
Franziska Haag - Taste and Odor Systems Reception Group, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany,
Patrick Marcinek - Taste and Odor Systems Reception Group, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
Ruiming He - Molecular Modeling Group, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany; Department of Chemistry, Technical University of Munich, D-85748 Garching
Johanna Kreißl – Analytical Technologies, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany;
Jörg Stein - Food Metabolome Chemistry Group, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
Alessandro Marchetto - Computational Biomedicine group, Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany; Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, 52062 Aachen, Germany,
Andreas Dunkel - Integrative Food Systems Analysis Group, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
Thomas Hofmann - Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich, 85354 Freising, Germany
Author Contributions
The manuscript was written through the contributions of all authors. All authors have approved the final version of the manuscript.
REFERENCES
- (1).↵
- (2).
- (3).
- (4).↵
- (5).↵
- (6).↵
- (7).↵
- (8).↵
- (9).↵
- (10).↵
- (11).↵
- (12).
- (13).
- (14).
- (15).↵
- (16).↵
- (17).
- (18).
- (19).↵
- (20).↵
- (21).↵
- (22).
- (23).↵
- (24).↵
- (25).↵
- (26).↵
- (27).↵
- (28).
- (29).
- (30).
- (31).↵
- (32).↵
- (33).
- (34).
- (35).
- (36).↵
- (37).↵
- (38).↵
- (39).↵
- (40).
- (41).
- (42).
- (43).↵
- (44).↵
- (45).↵
- (46).↵
- (47).↵
- (48).↵
- (49).↵
- (50).↵
- (51).↵
- (52).
- (53).↵
- (54).↵
- (55).
- (56).
- (57).
- (58).↵
- (59).↵
- (60).↵
- (61).↵
- (62).↵
- (63).↵
- (64).↵
- (65).↵
- (66).
- (67).↵
- (68).↵
- (69).↵
- (70).↵
- (71).↵
- (72).↵
- (73).↵
- (74).↵
- (75).↵
- (76).↵
- (77).↵
- (78).
- (79).
- (80).
- (81).
- (82).↵
- (83).↵
- (84).↵
- (85).↵
- (86).↵
- (87).↵
- (88).↵
- (89).↵
- (90).↵
- (91).↵
- (92).↵
- (93).↵
- (94).↵
- (95).↵
- (96).↵
- (97).↵
- (98).↵
- (99).↵
- (100).↵
- (101).↵
- (102).↵
- (103).↵
- (104).↵
- (105).↵
- (106).↵
- (107).↵
- (108).↵
- (109).↵
- (110).↵
- (111).↵
- (112).↵
- (113).
- (114).↵
- (115).↵