MAGPIE: an interactive tool for visualizing and analyzing protein-ligand interactions

Quantitative tools to compile and analyze biomolecular interactions among chemically diverse binding partners would improve therapeutics design and aid in the study of molecular evolution. Here we present MAGPIE (Mapping Areas of Genetic Parsimony In Epitopes), a publicly available software package for simultaneously visualizing and analyzing thousands of interactions between a single protein or small molecule ligand (the “target”) and all of its protein binding partners (“binders”). MAGPIE generates an interactive 3D visualization from a set of protein complex structures that share the target ligand, as well as sequence logo-style amino acid frequency graphs that show all the amino acids from the set of protein binders that interact with user-defined target ligand positions or chemical groups. MAGPIE highlights all the salt bridge and hydrogen bond interactions made by the target in the visualization and as separate amino acid frequency graphs. Finally, MAGPIE collates the most common target-binder interactions as a list of “hotspots,” which can be used to analyze trends or guide the de novo design of protein binders. As an example of the utility of the program, we used MAGPIE to probe how two ligands bind orthologs of a well-conserved glycolytic enzyme for a detailed understanding of evolutionarily conserved interactions involved in its activation and inhibition. MAGPIE is implemented in Python 3 and freely available at https://github.com/glasgowlab/MAGPIE, along with sample datasets, usage examples, and helper scripts to prepare input structures.


Introduction
It is challenging to determine patterns in a large set of biochemically diverse proteinligand interactions.For example, analyzing how a library of computationally designed proteins interact with a binding partner usually requires ranking the design models by several quality metrics, and then inspecting top-ranked models in detail individually.Similarly, although assembling a multiple sequence alignment (MSA) of hundreds of evolutionarily related proteins is expedited by bioinformatics tools, it remains difficult to visualize how sequence variability at one or more positions in a ligand binding pocket manifests in three dimensions (3D).The problem is exacerbated when important protein regions are non-contiguous in sequence space and when the protein can interact with multiple binding partners.Thus, there is a need for a versatile protein complex analysis tool that represents the sequence diversity and biochemistry inherent in molecular interactions on the fly.
A method to identify sequence conservation and variability among proteins that bind to common ligands in 3D would help us understand the biochemical requirements for key protein interactions and functions and engineer new proteins that respect these requirements.However, existing methods only partially address this need.For example, using sequence logos to study protein-protein interactions requires that all the protein binder sequences are similar enough to align in an MSA. 1 MSA generation algorithms present limitations for investigating protein interactions in 3D as they are designed for aligning sequences of evolutionarily related proteins with some sequence identity.][10][11][12][13] And while protein structural search methods such as TM-align 14 , Dali 15 , and FoldSeek 16 have enabled the identification of proteins with local structurally homologous regions, and programs like SSDraw can align sequences to show how secondary structures are conserved in a set of homologous proteins 17 , these methods are not designed to highlight or organize conserved binding interactions across diverse proteins: for example, small molecule ligands that bind multiple proteins via different binding pocket geometries or antigens that bind a library of engineered antibodies at non-overlapping epitopes.[20][21][22][23] To meet this growing need in protein science, and complementing recent methods for visualizing protein-peptide interactions 24 and protein structural features 25 , we introduce MAGPIE: Mapping Areas of Genetic Parsimony In Epitopes.MAGPIE is a protein complex visualization and analysis software to facilitate the simultaneous comparison of many proteinpartner interactions in 3D by structure and sequence.MAGPIE identifies amino acids (AAs) in a set of proteins that make molecular interactions with a user-defined region on a single target ligand, which can be a protein or a small molecule.MAGPIE also generates AA frequency graphs on the fly that show how often different AAs in the set of proteins interact with specific target ligand AAs or heavy atoms (HAs).Furthermore, MAGPIE generates an interactive 3D representation of the target ligand and visually illustrates the positions of all alpha carbons (Cα) that are near it, coloring these by AA biochemical characteristics, highlighting patches of conserved or similar interactions, and collating trends as a list of "hotspots" that can be used to inform the computational design of new protein binders.We anticipate that MAGPIE will be useful in applications ranging from protein design, where it can identify biases in computational methods for engineering protein-ligand interactions, to molecular biology, where it could illuminate functional adaptation in proteins by revealing how their features have emerged and changed over evolutionary history.

Software overview
MAGPIE operates in several steps in a Jupyter notebook format with optional helper scripts in Python for input dataset preparation (Figure 1).As inputs, the user provides a set of protein complex structures in Protein DataBank (PDB) format that are structurally aligned on the target ligand (Figure 1A).The target ligand must have the same chain index in all input structures, and the protein binding partners must also share a chain index that is different from the target ligands.The helper scripts aid in preparing datasets for MAGPIE by automatically renumbering atoms, renaming chains, protonating protein residues, and aligning the structural models on the target ligands using either a reference structure from the dataset or an MSA (Figure 1A, B).For small molecule targets, the alignment helper script generates separate sets of aligned structures for different ligand conformational isomers ("conformers") with a userspecified global root-mean-square deviation (RMSD) threshold ("conformer pools") (Figure 1B).MAGPIE records the positions of all atoms within the target chain in all the structures.Subsequently, it also stores the positions of all atoms from the binder chains in the dataset (Figure 1C, i).
Second, MAGPIE employs a K-DTree to efficiently query which AAs from the binding partners fall within a user-defined distance in Ångstroms (Å) of any AAs/heavy atoms (HAs) from any atoms in the target ligand (Figure 1C, ii). 26This step identifies potential binding partner AA interactions with the target ligand.MAGPIE calculates distances and torsion angles for all binder AAs in this group with respect to the nearest target AAs/HAs to identify hydrogen bond and salt bridge interactions and to determine trends across the set of protein binders.
Third, MAGPIE employs the Plotly library to create a color-coded interactive 3D visualization, displaying the 3D locations of all binding partner residue Cα and the target ligand structure main chain or full molecular structure (Figure 1C, iii). 27This allows users to explore the spatial arrangement of the protein-ligand interactions by zooming in on specific regions of the complex, panning and rotating the structure, and hovering over target and binding partner AAs/HAs to see their identities and visually spot trends in the interactions.The target structure is represented in black (for protein targets) or CPK coloring (for small molecules), while the binder AAs can be colored using the RasMol-based "Shapely" or "Amino" colors, which groups them by their biochemical characteristics, such as AA charge, hydrophobicity, and size. 28GPIE then generates plots to display the frequency of AAs among the binder residues within a user-specified distance from target positions, which can be selected and changed by the user ad hoc while exploring the 3D visual representation (Figure 1C, iv).MAGPIE includes options to customize a subset of the target ligand AAs (for protein ligands) or HAs (for small molecule ligands) for detailed analysis in specific regions.Once the positions are selected, MAGPIE identifies binding partner residue Cα atoms using the K-DTree from Step 2, calculates the frequency of each AA in the set of binders within the user-specified distance, and generates an AA frequency graph using the LogoMaker Python package (Figure 1C, v). 29The AA frequency graph shows the HA or AA index position in the target ligand vs. the frequency of AAs among binding partner residues from the input PDB dataset found within the specified distance.Each one-letter AA code is represented by a height that corresponds to its frequency in the set of binding partners.The AA frequency graphs help to quantify the distribution and conservation of AAs at specific positions involved in the binding interactions with the target.The number of individual interactions counted for each column of the graph is reported in the x axis label.Further, the AA frequency graphs can be customized to show hydrogen bond or salt bridge partners for a quantitative readout of specific types of interactions at different target positions in the dataset.In the 3D visualization, the colors of the protein binder Cα can be toggled to show hydrogen bond or salt bridge partners in red, with all other AAs in white.
To collate and report trends in interactions with the shared target ligand, MAGPIE additionally produces a data table of ligand AAs or HAs that participate in hydrogen bonds and salt bridges that are enriched in the set of protein binders.MAGPIE uses the DBSCAN algorithm implemented in Python Scikit-learn to generate a list of "hotspots" to identify the AAs that are involved in the most prevalent interactions with the target ligand and group them by their biochemical characteristics. 30,31The hotspots can be used to guide efforts in reengineering important protein-ligand interactions or build protein binding partners de novo using generative design methods. 32,33The parameters for identifying hotspots, such as how many AA are required to form a hotspot and how far away they may be from each other in Cartesian space, are also user-adjustable.The detailed MAGPIE methods and algorithms are available in section 2 of the supplemental material.We present three case studies to demonstrate how MAGPIE can be used.First, we explore trends in how dozens of antibodies bind to the same target antigen (Figures 2 and 3).
Second, we study how a common small molecule metabolite interacts with sequence-and structure-diverse protein binding partners in a large set of natural complexes (Figure 4).Finally, we build and analyze structural models of distantly related bacterial orthologs of a central glycolytic enzyme bound to allosteric ligands (Figures 5 and 6).

Case study 1: antibody-antigen complexes in which the antigen is the target ligand
Figure 2 illustrates how MAGPIE can be used to study protein-protein interactions.In this case study, we explored how a set of 63 nonredundant antibodies (specifically their fragment antigen-binding regions, or Fabs) and nanobodies bind to the SARS-CoV-2 spike protein receptor binding domain (RBD).A subset of these binders are shown aligned on the RBD in Figure 2A. 34MAGPIE aids in visualizing the complete collection of antibody-and nanobody-binding epitopes within 8 Å of the RBD, with the binder paratope AAs colored according to the Amino color code (Figure 2B, C).MAGPIE shows which RBD epitopes are most involved in these interactions, as well as the prevalence of different binder AAs, as the user zooms, pans, and toggles among coloring schemes that highlight all interactions, hydrogen bonds, and salt bridges.We focused on Fab and nanobody interactions with a common RBD epitope comprising two loops (Figure 2B).Of the 63 binders, 27 interact with at least one of the two loops in this region via their complementarity-determining regions (CDRs) (Figure S1).MAGPIE compiles all the models in one place and allows the user to see the local environment around every AA in the two loops at different levels of detail and from any angle (Figure 2C-E).We generated an AA frequency graph to show which antibody and nanobody AAs are enriched in the local vicinity of every position in this RBD epitope in the full dataset (Figure 2F).MAGPIE revealed that the two loops in the epitope participate in markedly different interactions: while the first loop (residues 127-131) interacts almost exclusively with serines and glycines in this set of binders, the AAs that interact with the second loop (residues 146-149) are more diverse and include polar, hydrophobic, and charged side chains.This information allows the user to focus on specific target residues when inspecting individual complexes in the dataset to understand the molecular details of each enriched interaction more deeply (Figure 2G).MAGPIE further revealed that RBD AAs in this epitope frequently hydrogen bond with the binders in the dataset (Figure 2H-J).The hydrogen bond AA frequency graph shows that in the first loop, serines account for 60% of the hydrogen bond partners, while glycine backbone interactions make up most of the remaining hydrogen bonds.The enrichment of these small AAs in the loop suggests close proximity of the CDR backbone to the RBD in this region (Figure 2H).The hydrogen bonding partners in the second RBD loop in the AA frequency graph are more diverse and include larger AAs that are both polar and hydrophobic, suggesting that the small hydrophobic residues A146 and G147 at the beginning of this loop form backbone hydrogen bonds and hydrophobic interactions with the binders.Closer inspection of structures in the dataset confirms this hypothesis.For example, in Figure 2G, the small hydrophobic AAs at positions 146 and 147 in the second loop make backbone hydrogen bonds as well as closepacked hydrophobic interactions with one antibody CDR, whereas the other loop is involved in backbone and side chain hydrogen bonds with serines spanning multiple CDRs.
We observed that these RBD AAs make additional nonpolar interactions with hydrophobic and charged residues in the binders, which is suggested by the visualization (Figure 2D) and confirmed by comparing the AA frequency graphs for all contacts vs. hydrogen bond partners only (Figures 2F, H).For example, about 15% of interactions with F127 in the first loop are with hydrophobic AAs proline, tyrosine, and alanine, but none of these AAs contribute to any hydrogen bonds observed with F127 in the dataset.However, F127 pi stacks with these AAs in individual structures (Figure 2G).MAGPIE can also identify salt bridges in protein-protein complexes (Figure 2K), though we found that this two-loop epitope does not participate in any.
The user has the option to output a .csvfile that contains all the detailed information about every hydrogen bond and charged interaction observed in the pool.
Finally, towards identifying optimal epitopes for therapeutic targets, analyzing docked models, or choosing positions for de novo-designed binding interactions, MAGPIE can identify hotspots: groups of AAs in the pooled binding partners that are close in 3D space and have similar biochemical characteristics (Fig. 3).The distance between AAs that constitute a hotspot and the number of AAs required for a hotspot are adjustable by the user in the hotspot selection feature.Using the default parameters of a 2 Å distance threshold and 15 AA minimum to identify the hotspots, MAGPIE identified five hotspots in this region with varying biochemical characteristics.The biochemical makeup of the AAs in each hotspot are summarized in a graph output (Fig. 3), with a simplified color scheme similar to the Amino scheme in the full 3D MAGPIE visualization.MAGPIE also outputs a .csvfile that contains the hotspot residues and their exact AA composition.
In summary, the AA frequency graphs generated by MAGPIE showcase the diversity of interactions with Fabs and nanobodies from the SARS-CoV-2 spike RBD binder dataset for any user-defined AAs in the RBD belonging to any region, which can be altered ad hoc.Further, MAGPIE identification of CDR biochemical features that we observed to facilitate binding to the RBD in this dataset could enable the rapid design of antibody-alternative protein therapeutics that bind the antigen with a large, highly structured binding interface [35][36][37][38] , which may prove to be more robust to viral mutations than a flexible antibody CDR.

2.3.
Case study 2: protein complexes with a shared small molecule target ligand MAGPIE can also be applied to study protein-small molecule complexes.For example, identifying trends in how a specific small molecule binds structurally diverse proteins could provide guidance for designing a binding partner de novo for a chemically similar ligand. 39,40The common metabolite coenzyme A (COA) binds hundreds of different proteins from a variety of organisms, and experimentally solved structures are available for many of these complexes in the PDB.Additionally, COA presents a challenging target for designing new protein binders given its inherent flexibility, with four main rotatable bonds that aid in its function as a carrier of acyl groups in various metabolic reactions.It is unknown to what extent topologically diverse binding partners converge on specific shared interactions with individual functional groups in common metabolites like COA.
In our case study, MAGPIE highlighted similarities and differences among COA interactions in 199 binding pockets and interfaces from bacterial enzymes, including ligases, acetyltransferases, synthases, and epimerases (Figure 4A).We compiled the COA dataset by searching the PDB for COA-containing structural models, which yielded more than 600 hits (details in section 8 of the supplementary material).We randomly selected 199 of these and used our cleaning and alignment method (Figure 1A, B) to standardize and separate the COAbound structures into 31 conformer pools based on a 2.5 Å RMSD threshold (Figure S2).
Representative COA structures from four conformer pools are shown in Figure 4B.MAGPIE can be used to visualize the local chemical environments for each atom in each COA conformer (Figure 4C, D, Figure S3A, B).Focusing on the first conformer pool, which includes 22 structures, we observed that several of the COA nitrogen atoms are neighbored by a combination of hydrophobic AAs and a few charged AAs (Figure 4D, H), many of which make hydrogen bonds with COA (Figure 4E).Of the nitrogen HAs in the conformer pool, only the NH2 group of the adenine ring (N6A, Figure 4C) interacts appreciably with polar and charged AAs.
Using a 2 Å threshold for distance among hotspot AAs and a minimum number of 20 AAs to form a hotspot, MAGPIE identifies six hotspots (Figure 4F), four of which cluster near the other nitrogen atoms in Figure 4C.The hotspots primarily contain hydrophobic and charged residues (Figure 4G).However, Hotspot 3, which is near the diphosphate group of COA and distal to any of the nitrogen atoms, includes several hydrophilic AAs.
MAGPIE can be used to investigate any COA conformer pool with more than a few members to discover how the local environment of specific functional groups changes in response to changes in the global conformation of the metabolite when it is complexed with different proteins.To probe this, we compared the COA nitrogen atoms from Conformer 1 as shown in Figure 4 to the same nitrogen atoms in the 21 structural models comprising Conformer Pool 2 (Figure S3A).Using the same MAGPIE parameters, we observed that the COA-binding proteins in this pool favor interactions that are more localized to one side of the COA molecule in the concave area between the 3' phosphoadenosine and the cysteamine group (Figure S3B), and a smaller proportion of these interactions are with hydrophobic AAs, as compared with Conformer Pool 1 (Figure S3F).Despite including a comparable number of models as the first pool, MAGPIE finds only two hotspots in the second conformer pool, which are dominated by hydrophilic and charged AAs (Figure S3D, E).One of the hotspots neighbors the β-alanine and cysteamine nitrogens in COA, as observed in the first conformer pool (N4P, N8P, respectively).
Comparing the local interaction partners for N4P and N8P between the two conformer pools, we find that they are quite different, with several charged and polar AAs enriched in the local environment in Conformer Pool 2 (Figure S3F).For example, whereas N4P in Conformer 1 is frequently supported by a nearby small or medium-sized hydrophobic residue (Figure S4), in Conformer 2 this atom tends to form hydrogen bonds with serines, asparagines, or glutamic acids (Figure S3G).The other nitrogen atoms we considered have more similar chemical environments in the two conformer pools: N9A is neighbored by valines and glycines, and N6A has a varied environment.This analysis illustrates how the flexible COA molecule might be constrained to a specific geometry in a binding site for a forward-design application.

Case study 3: trends in inhibitor binding for an evolutionarily conserved kinase
As a final case study involving protein-small molecule complexes, we explored how a well-conserved glycolytic enzyme, phosphofructokinase-1 (PFK-1), evolved across bacterial phyla while maintaining binding to an inhibitor and an activator in an allosteric pocket.PFK-1 performs the ATP-catalyzed conversion of fructose-6-phosphate (F6P) to fructose-1,6bisphosphate (FBP).All known PFK-1 are allosterically inhibited by phosphoenolpyruvate (PEP) and allosterically activated by adenosine diphosphate (ADP), which bind the same allosteric site in the enzyme.We sought to determine how distantly related bacterial PFK-1 adapted to accommodate binding both PEP and ADP while maintaining their opposite regulatory effects, despite low apparent sequence identity in the allosteric pocket due to evolutionary drift.
Because few bacterial PFK-1 structures have been experimentally solved [41][42][43][44][45] , we instead compiled a diverse set of sequences from the Uniprot database as inputs for structural prediction using AlphaFold2 46 , and subsequently modeled PEP into the allosteric pocket of each model by structural alignment with the previously solved mutant Geobacillus stearothermophilus PFK-1-PEP (PDB: 4I4I) complex (Figure 5A) (all-atom RMSD of the binding pocket = 0.371 Å; global RMSD = 0.493 Å). 41 We optimized each ortholog's PFK-1 model to a stable, energetically favorable conformation without steric clashes in the pocket using the relax application in the Rosetta macromolecular modeling software. 47Finally, to confirm that the models were reasonable, we calculated the ligand energy, RMSD, and TM-score of each model using US-align with the PEP-bound PFK-1 from Geobacillus as the reference structure and filtered out outliers or models with positive ligand energies (Figure 5B). 14,41,48Additionally, models were excluded if the ligand was observed to bind outside of the binding pocket after relaxing the structure.The final pool of PEP-bound bacterial PFK-1 structural models included 72 PFK-1 orthologs from 7 phyla and 39 orders.We calculated an average 50.2%sequence identity between any two PFK-1 from one phylum, vs. 29.54% between PFK-1 from different phyla, which suggested that our models sufficiently represented bacterial sequence diversity.ADP-bound PFK-1 models were prepared using the same strategy, using the crystal structure of ADP-bound Escherichia coli PFK-1 (PDB: 1PFK) as a reference for structural alignment.This resulted in a pool of 65 ADP-bound bacterial PFK-1 models from 4 phyla and 15 orders.MAGPIE visualization of the PEP-bound PFK-1 models revealed four different PEP conformers among the 72 structures after binning by 0.4 Å global RMSD using our helper script.
The top two conformers exhibited nearly identical spatial arrangement of hydrogen bond partners (Figures 6A, S5).Using a 1.5 Å threshold for distance among hotspot AAs and a minimum number of 15 AAs to form a hotspot, MAGPIE identifies two hotspots for each of the PEP conformers.Both hotspots for the second conformer pool are localized to the same region, which overlaps with one of the hotspots from the first conformer pool, relative to the ligands' functional groups.However, MAGPIE found a unique hotspot for Conformer 1 near the phosphate group of PEP.These findings suggest that PEP can bind in different conformations that interact with spatially and biochemically distinct sets of AAs enriched in the allosteric site of PFK-1 orthologs, while also maintaining some conformer-independent trends.We hypothesize that the enrichment of Conformer 1 in our dataset is due to its ability to form stabilizing interactions with binding pocket residues in Hotspot 2 that are not possible for other PEP conformers.
By contrast, MAGPIE visualization of the ADP-bound PFK-1 models revealed four different ADP conformers among the 65 input structures after binning conformer pools using a 0.7 Å global RMSD for the ADP molecule (Figure 6B).Most models were binned into the first conformer category (n=58), so we focused our analysis on this ADP conformer.Using a 1.5 Å threshold for distance among hotspot AAs and a minimum number of 20 AAs to form a hotspot, MAGPIE identifies 6 hotspots in tight clusters around ADP.These hotspots localize around each functional group of the ADP molecule, with Hotspot 1 near the beta-phosphate, Hotspots 4 and 5 near the adenosine moiety, and Hotspot 2 near the ribose group, for example.Interestingly, the hydrophobic region that dominates ADP Hotspot 1 overlaps spatially with PEP Conformer 1's unique hotspot.Differences in the PEP and ADP hotspots can be used to inform the rational design of small molecule binders to inhibit or activate bacterial PFK-1 orthologs using a similar mechanism as the enzyme's natural effectors to artificially modulate the enzyme's activity.
Further, using MAGPIE to compare PEP interactions in structural models of bacterial vs. metazoan PFK-1 orthologs could guide the design of bacteria-specific inhibitors as antibiotics that do not disrupt human PFK-1 in tissues. 49

Discussion
As highly accurate protein structure prediction methods spur the development of computational approaches to model and design protein-ligand binding interactions, the need for versatile tools to analyze the biochemical details of structural complexes en masse grows.Methods to easily compare protein interactions with other biomolecules will aid in biological discovery and therapeutics development.MAGPIE provides a practical way to explore the diverse interactions that natural and designed proteins alike can form with protein and small molecule ligands.

Data collection and curation.
MAGPIE datasets of protein complex structures require at least one common target ligand.
They can be computationally generated models (e.g., homology models or designed complexes) or experimentally solved structures.Anyone can assemble a MAGPIE dataset using the PDB advanced search query builders.For the protein-protein complex case study, we used a 68-member subset of the previously compiled SARS-CoV-2 RBD-antibody dataset from Gowthaman et al. 34 To assemble the small molecule ligand (COA) dataset included here, we used the chemical similarity search in the PDB with the PubChem identifier code for COA, 87642, as the chemical attribute.We defined the refinement resolution for the range 1.5-3 Å.
The query resulted in >600 structural models, from which we randomly chose 199 models.More information about the models in this study is in section 8 of the supplementary material.

Input structure standardization and alignment.
Recognizing that input structures of protein-protein and protein-small molecule complexes for analysis by MAGPIE may include heterogeneous components, we prioritized flexibility in our structure preparation method (helper script MAGPIE_input_prep.py) by enabling a variety of user options to identify the target ligands and protein binders in the datasets.Users must define an input directory, an output directory, and identification options for both the target ligand and the protein binders.A pre-filter step can be applied to the PDBs entailed in a method called mesh search.Measured in Ångstroms, chain name(s), sequence fragment(s), ligand PDB code(s), and ligand indices can be supplied to conduct a nearest neighbor search.The input criteria become centers for a radial search in which anything outside an 8 Å radius will be filtered out.If any atom from another chain falls within the search radius, the structure to which it belongs is retained.This pre-filter step is used to narrow down undesired atoms that later filtering steps would fail to remove.For protein target ligands, users can define the chain name and/or all or part of the AA sequence as a string or file, optionally with a sequence identity percentage threshold.For small molecule target ligands, users can define the chain name and/or the PDB ID code and/or residue index.For the protein binders, regardless of target ligand type, users can define chain name(s) and/or protein sequence(s).If more than one identifier is provided for the target or binder, the user can also optionally define the order in which they are identified.All elements of the structure that are neither target ligand or protein binder are then deleted, and the protein complex chains are renamed such that all the target chains and ligand chains are respectively the same for all structural models in the set.If there is more than one chain defined as the binder or target, it is combined and renumbered as one chain.In the case of small molecule ligands, the atoms can also be renumbered and renamed to be consistent for all the models via a second-order connectivity graph using a bond length of 2.1 Å to find neighboring atoms.This method is recursively called to ensure atom identification and naming consistency.
Once the ligands and binders are standardized in the set of input structures, they are protonated and supplied to a structural alignment method that we implemented as a command line tool using the macromolecular modeling software PyMOL (Schrodinger) (helper script align_protein_chain.py for protein target ligands, and align_small_molecule.py for small molecule target ligands).For complexes that include protein target ligands, the input structures are returned after global structural alignment on the target.For small molecule ligands, there may be several ligand conformations represented in the dataset.Therefore, the user can define a maximum RMSD threshold (Å), and the script returns one or more directories, each containing sets of aligned structural models with target ligand conformations that fall within the RMSD threshold.Each cluster should be separately input to MAGPIE to visualize and analyze the local environment of each of its atoms.

Performance, scalability, and web server implementation.
MAGPIE is implemented in Python and uses multiple CPU processors to optimize speed.MAGPIE can be run on the cloud-based Google CoLab server if the input PDB dataset includes fewer than about 1000 structures.For private and/or larger datasets where limited computing resource allocations are a concern, we advise instead downloading the MAGPIE source code and running it locally and/or using the multithreaded implementation.

Figure 1 .
Figure 1.MAGPIE framework.(A) The user compiles a set of protein complex structural

Figure 3 .
Figure 3. Binding hotspots.(A) Default settings in MAGPIE's hotspot finding feature revealed

Figure 4 .
Figure 4. Protein-small molecule interactions: coenzyme A binds diverse proteins.(A) A

Figure 5 .
Figure 5. MAGPIE input structure preparation pipeline for case study 3. (A) MAGPIE input

Figure 6 .
Figure 6.MAGPIE analysis of ligand-protein binding interactions of the allosteric