Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A Deep Learning Bioinformatics Approach to Modeling Protein-Ligand Interaction with cryo-EM Data in 2021 Ligand Model Challenge

View ORCID ProfileNabin Giri, View ORCID ProfileJianlin Cheng
doi: https://doi.org/10.1101/2022.05.27.493799
Nabin Giri
1Department of Electrical Engineering and Computer Science, University of Missouri, Columbia; ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nabin Giri
  • For correspondence: ngzvh@missouri.edu chengji@missouri.edu
Jianlin Cheng
1Department of Electrical Engineering and Computer Science, University of Missouri, Columbia; ,
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jianlin Cheng
  • For correspondence: chengji@missouri.edu ngzvh@missouri.edu chengji@missouri.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Elucidating protein-ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein-ligand interaction is traditionally tackled by molecular docking and simulation based on physical forces and statistical potentials, which cannot effectively leverage cryo-EM data and existing protein structural information in the protein-ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein-ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein-ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. The results demonstrate the deep learning bioinformatics approach is a promising direction to model protein-ligand interaction on cryo-EM data using prior structural information. The source code, data, and instruction to reproduce the results are open sourced and available on GitHub repository : https://github.com/jianlin-cheng/DeepProLigand

1. Introduction

Proteins are a building block of life. They carry out many vital biological functions. Whether it is acting as an enzyme to accelerate the chemical reactions or as regulatory molecules binding to other molecules to activate their functions, the detailed characterization of proteins and their interaction with their binding partners (e.g., the natural substrates or drugs as ligands) is of great importance in understanding the overall biological systems, designing drugs and treating diseases. A fundamental objective of computational structural biology is to understand and model such molecular interactions of living systems in sufficient details so that the behavior of the system can be predicted or modified as desired. With an ultimate goal to characterize the thermodynamic and kinetic behavior of components and their interactions of living organisms, a near atomic resolution images of interacting molecules such as protein-ligand complexes is required to analyze and understand the physical and geometrical constrains of the molecules.

Cryo-EM, an acronym for cryogenic electron microscopy technique [1] is a revolutionary technology that enables the determination of 3D structure of macro-molecular complexes at atomic resolution. With the development of various techniques in the cryo-EM realm to generate high resolution maps, as seen in Figure 1, EMDataResource [2] has seen a surge in deposition of cryo-EM derived protein density maps which elucidate the protein and ligand interactions in the molecules. The EMDataResource 2021 Ligand Model Challenge [3] was hosted to rigorously benchmark the current methods for generating models using cryo-EM density maps to improve the prediction and validation of protein and ligand interaction, and to identify the metrics which are most suitable for comparing the fit of atomic coordinate models into the cryo-EM maps.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1.

The growth of cryo-EM density maps and cryo-EM-derived protein structures. The statistics was obtained from EMDataResource [15], an unified data resource for 3-Dimension electron microscopy (3DEM) on November 22, 2022.

One of the most popular approaches on modeling the protein-ligand complexes is the molecular docking [4–8], which uses physics- or statistical potential-based molecular simulations to generate protein-ligand complex models and a scoring function for estimation of their binding affinities to rank them. However, even with significant research efforts, despite some success, the protein-ligand interaction prediction problem still remains unsolved because existing methods cannot leverage vast structural data effectively. Inspired by the recent success demonstrated by AlphaFold [9] which uses novel neural network architectures trained on thousands of experimental protein structures to improve the prediction accuracy of protein structures, we in this work have utilized a deep learning-based architecture as a core component of our modeling pipeline for the 2021 Ligand Model Challenge held from February 1 to April 1, 2021. We combine the deep learning-based protein structure prediction with the template-based protein-ligand interaction prediction to determine the structures of protein-ligand complexes. Based on the official results provided by the assessors of the challenge, our method performed best in fitting ligands to cryo-EM maps, measured across all targets, demonstrating the unique value of the novel deep learning bioinformatics approach for modeling protein-ligand interaction.

2. Materials and Methods

We attempt to solve the problem of protein-ligand interaction by using a set of bioinformatics methods that incorporates cryo-EM data and known structural information such as reference protein structures. In particular, we leverage the recent advance of applying deep learning to directly predict the structure of proteins from high-resolution cryo-EM density maps; a succinct review of the methods can be found on [10]. We utilize an existing deep learning-based tool as a key component of our model building pipeline (DeepProLigand), which predicts the 3D coordinates of protein complex using only cryo-EM density map as input. The protein complex model serves as start point for the downstream ligand positioning and model refinement tasks. The workflow as illustrated in Figure 2 demonstrates our approach to generate the structure of protein complex by incorporating a fully automatic deep learning-based method as its primary building block. The modeling pipeline of Figure 2 has three key steps described as follows:

Figure 2.
  • Download figure
  • Open in new tab
Figure 2.

The workflow of DeepProLigand generating protein complex structure from cryo-EM map and reference structure. The cryo-EM map (EMD-22898) illustrated in the workflow is of SARS-CoV-2 ORF3a ion channel in lipid nanodiscs [11].

2.1. Protein complex reconstruction from cryo-EM density maps and reference structures

Using DeepTracer [12] we first predict the 3D backbone coordinates of the protein complex directly from cryo-EM density map. DeepTracer uses a 3D U-Net architecture which is modified from the original 2D U-Net [13] architecture developed especially for biomedical image segmentation. The output from DeepTracer block is a predicted 3D backbone coordinate structure that has the carbon, carbon alpha, nitrogen and oxygen atoms in the Protein Data Bank Format (PDB), which is standardized by wwPDB [14]. The predicted structure reflects the conformation of the protein in the ligand binding mode. Because the reference structure of the protein (prior structure without cryo-EM information) is also provided, we use the structure alignment to combine them to generate a posterior structure, conceptually similar to combining the prior probability and likelihood to generate a posterior probability in the Bayesian reasoning. Specifically, in order to combine the reference structure and the predicted structure together in terms of geometrical alignment, we utilize the UCSF Chimera’s [16] matchmaker function to superimpose both structure together. Once the structures are superimposed, we save the superimposed structure relative to reference structure into a new PDB file. The new PDB file now contains the atoms of both reference and predicted structure into same geometrical space allowing us to average the coordinates of the corresponding backbone atoms and utilize the reference structure’s residue, chain labeling for all the shared components between the two structures. The side chains are added on top of the combined backbone structures by SCWRL4 [17] tool. Finally, a full-atom combined structure consisting of multiple chains is produced for the downstream processing. It is worth noting that our approach of generating protein complex structures is different from the traditional approach of fitting a reference structure into a cryo-EM density map.

Algorithm 1

Average predicted structure and reference structure

Figure
  • Download figure
  • Open in new tab

Algorithm 1 depicts the pseudo code of averaging the backbone atoms’ coordinates of reference and predicted structures. We start by initializing an empty PDB file named average structure that follows the guideline of wwPDB [14]. For each residue of the reference structure, if the distance between the reference structure’s carbon alpha atom and any carbon alpha atom of predicted backbone structure in the 3D geometrical space is less than 1 Angstrom (Å), then all the backbone atoms’ coordinates of the residue in the particular reference structure are averaged with the predicted structure’s corresponding residue and saved in the average structure PDB file; otherwise, the coordinates of the residue in reference structure are simply saved in the average structure PDB file. We refer arithmetic mean, sum of collection of numbers divided by the count of numbers, as average throughout the paper. The distance threshold value we used is 1 Angstrom (Å), however, the threshold value for each chain can be modified as desired. Embedded Image Embedded Image

Here, let xr, yr, and zr be the coordinates for each carbon alpha residue of the reference structure and xp, yp, and zp be the coordinates for each carbon alpha residue of the predicted structure. With this notation, we follow the Algorithm 1 and generate new coordinates xavg, yavg, and xavg which are the average coordinates of both reference and predicted structure. The residues are averaged only if the distance between them, computed using Equation 1, is below a threshold value, we used threshold value as shown in Equation 2 for Target 202 and Target 203 of the 2021 EMDataResource Ligand Challenge. For Target 201, of the challenge, since most of the chains were turned into coils, we used threshold value of 0.3 Angstrom (Å) for chain C and 0.5 Angstrom (Å) for all other chains. The averaged coordinate structure is saved into standard PDB file format.

After the backbone atoms are computed using Algorithm 1, we utilize SCWRL4 [17] to add the side-chain conformation into the protein structure. The deep learning-based method utilized to predict the backbone atoms has high impact on determining the side-chains conformation as well, because high side chain accuracy is often achieved when the backbone prediction is accurate, as also demonstrated by AlphaFold [9].

Algorithm 2

Identify ligands and include them into average structure

Figure
  • Download figure
  • Open in new tab

2.2. Template-based Prediction of Protein-Ligand Interaction

After the protein structure that can accommodate ligands is generated using Algorithm 1, we utilize PyRosetta [18] to identify ligands and add them into the predicted structure by using the reference structure as template, as depicted in Algorithm 2. Since PyRosetta is a residue based tool, when a pose is created, all the atoms in a structure including ligand atoms are indexed by residue indices. Following Algorithm 2, we let res be each residue in reference structure that we want to check if it’s ligand. PyRosetta’s is_ligand function works by comparing the ligand to chemical component dictionary and returns a bool value (i.e, either True for ligand residue or False for non ligand residue) for each residue.

2.3. Refinement of Protein-Ligand Complex Model

After the prediction of protein-ligand complex structure using the approach outlined above, we further refined the predicted complex structure using Rosetta FastRelax. Relax does not perform extensive refinement and only searches the local low-energy backbone and side-chain conformations near the starting conformations by implementing rounds of packing and minimizing, with repulsive weight in the scoring function gradually increased from low value to normal value. The scoring function we used is ref2015_cst.wts, a default score function, which is repeated 5 times. Finally, after the refinement of the protein complex, we used UCSF Chimera’s Fit in Map function to do a rigid body optimization of the refined model. The 3D structure is rotated and aligned so that it fits to the density map. This refinement step is optional. During the blind experiment of the 2021 EMDataResource Ligand Challenge, we submitted both, an unrefined model and a refined model for each target.

2.4. Target cryo-EM Density Maps of 2021 Ligand Challenge

We blindly tested the protein-ligand modeling pipeline DeepProLigand on three targets that were released as 2021 EMDataResource Ligand Challenge targets from February to April 2021. The next section elaborates the three targets and the experimental setting used for each target.

2.4.1. Target 201: Escherichia coli beta-galactosidase

β-Galactosidase [19] target with atomic resolution of 1.9 Angstrom (Å) contains protein Beta-galactosidase, magnesium ion, sodium ion, water and 2-phenylethyl 1-thio-beta-D-galactopyranoside (PTQ) as ligand. The EMDB id of the target in EMDataResource is EMD-7770. We predicted the 3D structure of the complex using the workflow of DeepProLigand as highlighted in Figure 2 and during averaging of the structure, we initialize 0.3 Angstrom (Å) distance threshold for chain C and 0.5 Angstrom (Å) distance threshold for all other chains of the complex by re-initializing the threshold value of Equation 2. The reason for threshold of 0.3 Å in chain C was because most of the chains were turned into coils/turns with threshold of 0.5 Å. The ligand PTQ was appended using the Algorithm 2. Figure 3 shows the map-model overlay of cryo-EM density map EMD-7770 and our reconstructed protein structure model.

Figure 3.
  • Download figure
  • Open in new tab
Figure 3.

Target 201 (EMD-7770) map-model overlay at the recommended contour 0.52 (3.3 σ) with T0201EM0004_1 (ours)

2.4.2. Target 202: SARS-CoV-2 RNA-dependent RNA polymerase

The nsp12-nsp7-nsp8 complex bound to the template-primer RNA and triphosphate form of Remdesivir(RTP) [20] target with atomic resolution of 2.5 Angstrom (Å) contains RNA-directed RNA polymerase, Non-structural protein 8, Non-structural protein 7, Primer, Templete, ZINC ION, PYROPHOSPHATE 2-, MAGNESIUM ION, water, and [(2 R,3 S,4 R,5 R)-5-(4-azanylpyrrolo[2,1-f][1,2,4]triazin-7-y)-5-cyano-3,4-bis(oxidanyl)oxolan-2-yl]methyl dihydrogen phosphate as ligand. The EMDB id of the target in EMDataRe-source is EMD-30210. We predicted the 3D structure of the complex using the workflow of DeepProLigand as highlighted on Figure 2, and during averaging of the structure, we used 1 Angstrom (Å) distance threshold for all chains of the complex. The ligand F86 (remdesivir, covalent inhibitor) was appended using Algorithm 2. Figure 4 shows the map-model overlay of cryo-EM density map EM-30210 and our reconstructed protein structure model.

Figure 4.
  • Download figure
  • Open in new tab
Figure 4.

Target 202 (EMD-30210) map-model overlay at the recommended contour 0.058 (4.3 σ) with T0202EM004_1 (ours)

2.4.3. Target 203: SARS-CoV-2 protein 3a in lipid nanodiscs

SARS-CoV-2 3a ion channel in lipid nanodiscs [21] target with atomic resolution of 2.08 Angstrom (Å) contains ORF3a protein, Apolipoprotein A-I, water and 1,2-Dioleoyl-sn-glycero-3-phosphoethanolamine as ligand. The EMDB id of the target in EMDataResource is EMD-22898. We predicted the 3D structure of the complex using the workflow of DeepProLigand as highlighted in Figure 2, and during averaging of the structure, we used 1 Angstrom (Å) distance threshold for all chains of the complex. The ligand PEE was appended using the Algorithm 2. Figure 5 shows the map-model overlay of cryo-EM density map EMD-22898 and our reconstructed protein structure model.

Figure 5.
  • Download figure
  • Open in new tab
Figure 5.

Target 203 (EMD-22898) map-model overlay at the recommended contour 0.7 (10.3 σ) with T0203EM004_1 (ours)

3. Results

The analysis of the models in this section is based on the official results provided by the organizers of the 2021 Ligand Model Challenge. The fit to map for ligand is assessed by the Q-score [22] and the Z-scores. The Q-score measures how similar map values around an atom are to a Gaussian-like function we would see if the atom is well resolved. Q-score is calculated as a correlation between two vectors: u, which contains map values at points around the atom, and v, which contains values obtained from the reference Gaussian.

We used Q-score to compare the map to model fit for all the models that were submitted to the challenge. Table 1, shows the Q-score of ligand for all the models submitted for Target 201, our model is highlighted in bold for scrutiny. Figure 6 shows the ligand (PTQ)’s binding pose and orientation in our best predicted model, T0201EM004_1. Ligand PTQ binds to all four chains of Target 201, resulting in four binding sites for the ligand. We have visualized three binding locations for the ligand with its binding pose and orientations in Figure 6. Table 2 shows the Q-score of ligand for Target 202, similar to Target 201, our model is highlighted in bold for scrutiny. Figure 7 shows ligand (F86)’s binding pose and orientation in our best predicted model, T0201EM004_1. Ligand F86 binds to only one location in Target 202. We have visualized the binding location for the ligand with its binding pose and orientations in Figure 7.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1.

Evaluation of Target 201: Escherichia coli beta-galactosidase on Q-score for all the models submitted in the 2021 Ligand Model Challenge

Figure 6.
  • Download figure
  • Open in new tab
Figure 6. Target 201.

(A) T0201EM004_1 (ours) docked by Target 201 (EMD-7770) and visualized with electrostatic potential surface generated in UCSF Chimera. (B) Ligand PTQ, image extracted from Protein Data Bank (PDB). (C) Protein-ligand interactions in T0201EM004_1 (ours) model. Chains are colored differently (chain A: blue, chain B: pink, chain C: green and chain D: golden). The ligand is labeled with its atom names as well as the ligand name (PTQ). For chain D: golden and chain C: green, we have labeled the chain residue names for understanding protein-ligand interaction better.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2.

Evaluation of Target 202: SARS-CoV-2 RNA-dependent RNA polymerase on Q-score for all the models submitted in the 2021 Ligand Model Challenge

Figure 7.
  • Download figure
  • Open in new tab
Figure 7. Target 202.

(A) T0202EM0004_1 (ours) docked by Target 202 (EMD-30210) and visualized with electrostatic potential surface generated in UCSF Chimera. (B) Ligand F86, image extracted from Protein Data Bank (PDB). (C) Protein-ligand interactions in T0202EM004_1 (ours) model. Chains are colored differently (chain A: blue, chain B: orange, chain C: green, chain P: yellow, and chain T: teal). The ligand is labeled with its atom names as well as the ligand name (F86)

Table 3 shows the Q-score of ligand for Target 203. Similar to Target 201 and 202, our model is highlighted in bold for scrutiny in Table 3. Figure 8 shows the ligand (PEE)’s binding pose and orientation in our best predicted model, T0201EM004_1. Ligand PEE binds to two location in Target 203. We have visualized the binding location for the ligand with its binding pose and orientation in Figure 8.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3.

Evaluation of Target 203: SARS-CoV-2 ORF3a putative ion channel in nanodisc on Q-score score for all the models submitted in the 2021 Ligand Model Challenge

Figure 8.
  • Download figure
  • Open in new tab
Figure 8. Target 203.

(A) T0203EM0004_1 (ours) docked by Target 203 (EMD-22898) and visualized with electrostatic potential surface generated in UCSF Chimera. (B) Ligand PEE, image extracted from Protein Data Bank (PDB). (C) Protein-ligand interactions in T0203EM0004_1 (our) model. Chains are colored differently (chain A: blue, chain B: pink, chain C: green, and chain D: golden). The ligand are labeled with their atom names as well as the ligands name (PEE)

Figure 9 shows cumulative Z-scores on Q-scores of 17 groups participating in the 2021 Ligand Model Challenge. Our DeepProLigand predictor (EM004) performs best overall on all three targets. Specifically, our protein-ligand model is ranked first for Target 202, second for Target 203, and in the middle for Target 201 as shown in the Z-scores on Q-scores Figure 9. The Q-scores of our best model for the three targets are shown in Table 4. Even though there are too few targets to draw a definite conclusion, the good results indicate that the deep learning structure prediction in conjunction with the template reference structure is able to build a good protein structure framework to accommodate ligands and the template-based protein-ligand prediction can assemble the ligands with the protein structure well for some targets. Incorporating deep learning approach in modeling enables us to predict the protein structure directly from cryo-EM map within minutes, making the approach highly useful on both prediction accuracy and time.

Figure 9.
  • Download figure
  • Open in new tab
Figure 9.

Z-scores on Q-scores for ligand of all the models submitted to 2021 Ligand Model Challenge. The pointed arrow represents our model.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 4.

Q-score of our best model (EM004_1) for three targets

However, we also notice the limitation of our approach in terms of the geometric quality of the atoms in the predicted protein structure. Particularly, there are some atom-atom clashes in the models, which may be caused by the violations of some geometric constraints of atom-atom distances in the protein structure predicted by the deep learning as well as in the averaging of the coordinates of the predicted structure and the reference structure. The violations of geometric and stereochemical restraints are not fixed by the current refinement protocol in the prediction pipeline. The refinement protocol even introduced some new clashes into the model. As AlphaFold demonstrates that the well-trained sophisticated deep learning architecture can accurately capture the geometric restraints of atoms and bonds in protein structure by predicting high-quality protein structures of atomic resolution that are highly similar to natural protein structures, more advanced deep learning architectures can be developed to predict high-quality protein structures compatible with geometric and stereochemical restraints of proteins from cryo-EM density maps and reference structures.

4. Conclusion and Future Work

In this work, we demonstrate that the deep learning prediction of protein structure from cryo-EM maps can generate good protein structures for constructing protein-ligand complexes and the template-based protein-ligand interaction prediction can fit ligands well into the predicted protein structures according to the outstanding performance of our protein-ligand modeling pipeline. It is also worth noting that our method is fully automatic and does not involve any manual tweaking of the models to improve the scores. As discussed before, the current protein-ligand prediction pipeline cannot resolve some violations of some geometric and stereochemical restraints of atoms in protein structures. In the near future, we plan to develop advanced end-to-end deep learning architectures similar to some components in AlphaFold to predict better protein structures from cryo-EM maps and reference structures. Moreover, we plan to design 3D-equivariant deep learning architectures like SE(3)-equivariant Transformer network [23–26] to tackle the problem of geometric constraints which are not addressed by current methods. Finally, an end-to-end direct deep learning prediction of the structure of protein-ligand complexes from cryo-EM density maps, reference structures and ligand information to fully automate all the steps of the entire pipeline in this work is also an interesting direction to pursue. We believe the application of deep learning approach to prediction of 3D structures of protein-ligand complexes leveraging cryo-EM and other related data is a promising avenue to accelerate the advancement of the protein-ligand interaction study [27]. With the proliferation of cryo-EM maps being deposited in EMDataResource database, the use of deep learning-based methods can help to determine the structure of the protein-ligand complexes rapidly and ultimately helps to expedite the drug discovery process.

Author Contributions

Conceptualization, JC; methodology, JC and NG; experiments and data collection, NG; data analysis, NG and JC; writing—original draft preparation, NG and JC; writing— review and editing, JC and NG; visualization, NG; supervision, JC; project administration, JC and NG; funding acquisition, JC. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by two NIH grants (R01GM146340 and R01GM093123), Department of Energy grants (DE-AR0001213, DE-SC0020400 and DE-SC0021303), and two NSF grants (DBI1759934 and IIS1763246).

Data Availability Statement

The cryo-EM density maps are available in the EMDB website [2] with id: EMD-7770, EMD-30210, and EMD-22898. The submitted models metadata and further detailed information on 2021 Ligand Model Challenge can be accessed from the official challenge web-page [3]. The protein complex generated from this method, DeepProLigand, and the software program are released, and can be accessed through the GitHub repository: https://github.com/jianlin-cheng/DeepProLigand

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

We thank the organizers and assessors of the 2021 EMDataResource Ligand Challenge for providing the data for this research and anonymous reviewers for their valuable suggestions.

Footnotes

  • Tables 1, 2 and 3 are revised. Source codes are released.

Abbreviations

The following abbreviations are used in this manuscript:

cryo-EM
Cryogenic electron microscopy

References

  1. 1.↵
    Cressey, D., Callaway, E. (2017). Cryo-electron microscopy wins chemistry Nobel. Nature, 550(7675)
  2. 2.↵
    EMDataResource. Available online: https://www.emdataresource.org/ (accessed on 22 November 2022).
  3. 3.↵
    2021 Ligand Model Challenge. Available online: https://challenges.emdataresource.org/?q=2021-model-challenge (accessed on 22 November 2022).
  4. 4.↵
    Forli, S., Huey, R., Pique, M. E., Sanner, M. F., Goodsell, D. S., Olson, A. J. (2016). Computational protein–ligand docking and virtual drug screening with the AutoDock suite. Nature protocols, 11(5), 905–919
    OpenUrl
  5. 5.
    Gschwend, D. A., Good, A. C., Kuntz, I. D. (1996). Molecular docking towards drug discovery. Journal of Molecular Recognition: An Interdisciplinary Journal, 9(2), 175–186.
    OpenUrl
  6. 6.
    Fan, J., Fu, A., Zhang, L. (2019). Progress in molecular docking. Quantitative Biology, 7(2), 83–89
    OpenUrl
  7. 7.
    Meng, X. Y., Zhang, H. X., Mezei, M., Cui, M. (2011). Molecular docking: a powerful approach for structure-based drug discovery. Current computer-aided drug design, 7(2), 146–157.
    OpenUrl
  8. 8.↵
    Stanzione, F., Giangreco, I., Cole, J. C. (2021). Use of molecular docking computational tools in drug discovery. Progress in Medicinal Chemistry, 60, 273–343
    OpenUrl
  9. 9.↵
    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.
    OpenUrlCrossRefPubMed
  10. 10.↵
    Giri, N., Roy, R. S., Cheng, J. (2022). Deep learning for reconstructing protein structures from cryo-EM density maps: recent advances and future directions. arXiv preprint arXiv:2209.08171.
  11. 11.↵
    Kern, D. M., Sorum, B., Mali, S. S., Hoel, C. M., Sridharan, S., Remis, J. P., … Brohawn, S. G. (2021). Cryo-EM structure of SARS-CoV-2 ORF3a in lipid nanodiscs. Nature structural molecular biology, 28(7), 573–582.
    OpenUrl
  12. 12.↵
    Pfab, J., Phan, N. M., Si, D. (2021). DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proceedings of the National Academy of Sciences, 118(2), e2017525118.
    OpenUrlCrossRef
  13. 13.↵
    Ronneberger, O., Fischer, P., Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234–241). Springer, Cham.
  14. 14.↵
    “Protein Data Bank: the single global archive for 3D macromolecular structure data.” Nucleic acids research 47, no. D1 (2019): D520–D528.
    OpenUrl
  15. 15.↵
    Catherine L. Lawson, Ardan Patwardhan, Matthew L. Baker, Corey Hryc, Eduardo Sanz Garcia, Brian P. Hudson, Ingvar Lagerstedt, Steven J. Ludtke, Grigore Pintilie, Raul Sala, John D. Westbrook, Helen M. Berman, Gerard J. Kleywegt, Wah Chiu, EMDataBank unified data resource for 3DEM, Nucleic Acids Research, Volume 44, Issue D1, 4 January 2016, Pages D396–D403, https://doi.org/10.1093/nar/gkv1126
    OpenUrlCrossRefPubMed
  16. 16.↵
    Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., Ferrin, T. E. (2004). UCSF Chimera—a visualization system for exploratory research and analysis. Journal of computational chemistry, 25(13), 1605–1612
    OpenUrlCrossRefPubMedWeb of Science
  17. 17.↵
    Krivov, G. G., Shapovalov, M. V., Dunbrack Jr, R. L. (2009). Improved prediction of protein side-chain conformations with SCWRL4. Proteins: Structure, Function, and Bioinformatics, 77(4), 778–795
    OpenUrl
  18. 18.↵
    Chaudhury, S., Lyskov, S., Gray, J. J. (2010). PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics, 26(5), 689–691.
    OpenUrlCrossRefPubMedWeb of Science
  19. 19.↵
    Bartesaghi, A., Aguerrebere, C., Falconieri, V., Banerjee, S., Earl, L. A., Zhu, X., … Subramaniam, S. (2018). Atomic resolution cryo-EM structure of-galactosidase. Structure, 26(6), 848–856.
    OpenUrlCrossRefPubMed
  20. 20.↵
    Yin, W., Mao, C., Luan, X., Shen, D. D., Shen, Q., Su, H., … Xu, H. E. (2020). Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir. Science, 368(6498), 1499–1504.
    OpenUrlAbstract/FREE Full Text
  21. 21.↵
    Kern, D. M., Sorum, B., Mali, S. S., Hoel, C. M., Sridharan, S., Remis, J. P., … Brohawn, S. G. (2021). Cryo-EM structure of SARS-CoV-2 ORF3a in lipid nanodiscs. Nature structural molecular biology, 28(7), 573–582.
    OpenUrl
  22. 22.↵
    Krissinel, E., Henrick, K. (2004). Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica Section D: Biological Crystallography, 60(12), 2256–2268.
    OpenUrlCrossRefPubMedWeb of Science
  23. 23.↵
    Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., … Baker, D. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871–876.
    OpenUrlAbstract/FREE Full Text
  24. 24.
    Fuchs, F., Worrall, D., Fischer, V., Welling, M. (2020). Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33,1970–1981.
    OpenUrl
  25. 25.
    Morehead, A., Chen, X., Wu, T., Liu, J., Cheng, J. (2022). EGR: Equivariant Graph Refinement and Assessment of 3D Protein Complex Structures. arXiv preprint arXiv:2205.10390.
  26. 26.↵
    Wu, T., Cheng, J. (2022). Atomic protein structure refinement using all-atom graph representations and SE (3)-equivariant graph neural networks. bioRxiv
  27. 27.↵
    Dhakal, A., McKay, C., Tanner, J. J., Cheng, J. (2022). Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Briefings in Bioinformatics, 23(1), bbab476.
    OpenUrl
Back to top
PreviousNext
Posted November 22, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Deep Learning Bioinformatics Approach to Modeling Protein-Ligand Interaction with cryo-EM Data in 2021 Ligand Model Challenge
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Deep Learning Bioinformatics Approach to Modeling Protein-Ligand Interaction with cryo-EM Data in 2021 Ligand Model Challenge
Nabin Giri, Jianlin Cheng
bioRxiv 2022.05.27.493799; doi: https://doi.org/10.1101/2022.05.27.493799
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A Deep Learning Bioinformatics Approach to Modeling Protein-Ligand Interaction with cryo-EM Data in 2021 Ligand Model Challenge
Nabin Giri, Jianlin Cheng
bioRxiv 2022.05.27.493799; doi: https://doi.org/10.1101/2022.05.27.493799

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4086)
  • Biochemistry (8761)
  • Bioengineering (6479)
  • Bioinformatics (23339)
  • Biophysics (11750)
  • Cancer Biology (9149)
  • Cell Biology (13247)
  • Clinical Trials (138)
  • Developmental Biology (7416)
  • Ecology (11369)
  • Epidemiology (2066)
  • Evolutionary Biology (15087)
  • Genetics (10398)
  • Genomics (14009)
  • Immunology (9121)
  • Microbiology (22040)
  • Molecular Biology (8779)
  • Neuroscience (47366)
  • Paleontology (350)
  • Pathology (1420)
  • Pharmacology and Toxicology (2482)
  • Physiology (3704)
  • Plant Biology (8050)
  • Scientific Communication and Education (1431)
  • Synthetic Biology (2208)
  • Systems Biology (6016)
  • Zoology (1249)