Piperazine-based HIV-1 entry inhibitors: Massive in silico library design and screening for gp120 attachment inhibitors

HIV-1 attachment, despite being an ideal target stage to stop infection from the beginning, remains as one of the HIV lifecycle phases with less amount of designed and commercially available inhibitors. To contribute to the urgently needed discovery of new active compounds that could become part of the current highly active antiretroviral therapy, and as an attempt to explore a massive chemical space, high-throughput virtual screening of 16.3 million combinatorially generated and piperazine-cored compounds, was accomplished. Docking calculations, molecular dynamics simulations, and QSAR analyses were carried out to assess the suitability of each ligand to bind gp120 envelope glycoprotein, thus preventing it from binding to CD4 co-receptor. Ligand 255 stands out as a promising candidate to be tested beyond computational methodologies, and the 4,5,6,7-tetrahydroindole fragment is reported as a better group to bind inside the Phe43 cavity than the substituted indoles reported in the literature.


INTRODUCTION
Since its discovery in the early 1980s, the Human Immunodeficiency Virus (HIV) has attracted the attention for being the etiologic agent of Acquired Immune Deficiency Syndrome (AIDS), a global epidemic that peaked in 1999 (1) despite its probable spreading throughout the 1900s from primates to humans (2). Its high rates of recombination and mutation, persistence in the host's body and high turnover rates allows it to elude the immune response and resist drug therapy. This high genetic diversity poses a threat to the quick effective development of treatment methods against the virus since T lymphocytes or CD4 cells are the most affected by HIV infection. These 3 cells coordinate the action of other cells to fight infection, leading to a debilitated immune system when their count drops significantly (1).
Even though HIV-1 life cycle presents plenty of steps, enzymatic activities, and mechanisms that could be interfered by the action of drugs, very few from them have been fully exploited (3); such is the case of entry inhibitors for which a small amount of active compounds are available.
By 2013, 24 drugs approved by the Food and Drug Administration (FDA) were available for treatment of HIV-1, distributed in six different classes (3): Nucleoside/Nucleotide reverse transcriptase inhibitors (NRTIs), Non-nucleoside reverse transcriptase inhibitors (NNRTIs), Integrase inhibitors (InSTIs), Protease inhibitors (PIs), and Entry inhibitors. Classified as entry inhibitors, attachment inhibitors represent a promising group of biologically active compounds, due to their capacity to stop infection from the very beginning, even though they have not been explored as much as those ligands targeting other viral proteins, such as reverse transcriptase or protease, for which most commercial drugs are designed. Among the pharmacological targets of this class of inhibitors, gp120 is of special interest thanks to its role: binding to CD4 co-receptors, marking the start of the whole HIV life cycle.
The inhibition against attachment between gp120 and CD4 receptor has been addressed with the small molecule family of BMS-378806, showing promising results in clinical assays, although it has not been yet approved. BMS-378806 alters the conformation of gp120 by binding to a pocket in which the CD4 receptor would normally enter (3).
The prototype of BMS-378806 ( figure 1, compound 3), compound 1, was discovered thanks to a cell-based screening assay, and mechanistic studies showed that it compromised the interaction between gp120 and CD4. Since the C-4 position of the indole is lightly sensitive to the potency of 4 the whole compound, SAR studies suggested new substitutions like the one shown in compound 2 with a fluorine atom (4). Following the improvement of the lead molecules, compound 3 showed inhibition of infection in several types of clinical isolates of the B subtype virus; however, the compound failed to achieve target exposure and its development was truncated during clinical trials (5) Figure 1: Compounds 1 and 2, prototypes of . Compound

was used as the reference for docking calculations
The indole ring is considered the central core of these compounds because of its sensitivity to potency, offering the opportunity for structural modification. On the other hand, the piperazine ring turned out to be of critical importance for the antiviral activity (4,6), biologically and electronically. Because of its anti-HIV properties, its usual interaction at the entrance of the Phe43 cavity, and vast presence in a significant amount of anti-HIV compounds with reported activities.
It is important to bear in mind that the replacement of components in existent drugs is an important strategy to afford more synthetically accessible molecules. (4). 7 Figure 2: Interaction between gp120 (green) and CD4 (orange) proteins. The attachment of the virus depends on the insertion of CD4 Phe43 residue (blue) in the gp120 cavity.
The resulting structure was submitted to a final series of minimizations under an implicit solvent refinement process. In this case, the positions of all atoms were optimized from the substructures were minimized under a VSGB solvation model, using the automatic method provided by Prime(16), with 65 steps by iteration, and a total of 2 iterations. An RMSD gradient of 0.01 kcal/(mol Å) was considered for convergence. This process was repeated three times, until the final structure was visually checked removing the substructure C from the workspace, allowing the Phe43 cavity to be both accessible to a ligand, and suitable for the generation of a grid. Same steps were taken to prepare the gp120 structure coming from the 1G9M(18) PDB file used in the final steps of the docking stage, called consensus docking, preventing a structural bias in the docking score values.

Combinatorial library production and grid generation
Since piperazine was a common component among many anti-HIV compounds described in the literature, and its important role in anti-HIV activity has been identified (6), we decided to use this scaffold as the core of a series of combinatorially generated virtual compounds to perform high throughput virtual screening using Glide and its set of score functions to obtain possible hits. To generate this library, we turned to Combiglide(16) and two of its main features: Interactive Enumeration and Receptor Grid Generation.
Our fragment collection was built by importing the Glide 670-fragment database provided by Schrödinger Inc. online. A total of 4041 different bonding fragments were attached to both nitrogen atoms included in the piperazine core, named Pos1 and Pos2, with no linkers defined between the nitrogen atom and the fragment, where each resulting molecule was considered an independent combination. By combination, it generated a total of 16 329 681 compounds to be screened, an unprecedented number of analyzed ligands compared to gp120 inhibitor studies reported in the literature. To decrease the computational demand required for the minimization of more than 16 million compounds, the next stages of were performed with the gp120 tangled (unminimized) structures. Figure 3: Example of a combinatorially generated molecule by Combiglide, made of three sections, a piperazine core and two different or equal molecular fragments taken out of a previously prepared database From the chemical point of view, Combiglide-generated molecules are constituted by the piperazine core and two fragments bonded in different union points, providing new properties (such as solubility, pharmacokinetics, a right number of hydrogen bond donors/acceptors, and molecular weight, to reach the desire biological activity). It is worth mentioning that the initial Glide database only includes fragments derived from small molecules reported in the medicinal chemistry literature, as stated in their website, no larger than 37 atoms and molecular weights below 226 g/mol, making synthetic viability of the generated library considerably higher compared to those randomly generated.
Additionally, a grid box was prepared to perform future docking stages. This box was formed by an innerbox of 10x10x10 Å and an outterbox of 30x30x30 Å, sized for ligands with a length of 20 Å, both centered in residues Trp427, Trp112, Val255, Thr257, Glu370, Ile371, and Asp368 selected (x,y,z coordinates: 27.28, -11.81, 82.61). An OPLS3(16) force field was used to generate the grid box, which included a van der Waals radius scaling factor of 1.0 with a partial charge 10 cutoff of 0.25. A similar grid file is obtained for the gp120 structure coming from the 1G9M PDB file, prepared with the same steps previously explained.

Docking funnel
Molecular docking was performed with Glide(16) taking the generated library through a bottleneck process that narrows the search of drug-like molecules (figure 4). The docking funnel is divided into three steps: High Throughput Virtual Screening quality (HTVS) docking, Standard Precision (SP) quality docking, and Extra Precision (XP) quality docking. To select which ligands were going to remain in the screening analysis, a reference ligand had to be considered. Compound 1 (figure 1) was used as a reference; since a crystallographic structure of this molecule bound inside the Phe43 cavity of gp120 is not available, we carried out an XP quality docking obtaining a score of -7,7 kcal/mol used to filter out all possible hits during the screening. Jobs and advanced settings were set as predetermined by the program, rewarding intramolecular hydrogen bonds and penalizing nonplanar conformation for amides, while planarity of conjugated pi groups was enhanced, and ligand sampling was performed as flexible. Per-residue interaction scores for residues within 12 Å of the grid center were included in the calculation for SP and XP docking screenings While all the generated ligands were submitted to the first screening, only those ligands within the same score order as the reference ligand (see above) (better than -7.0) were submitted to the SP screening. Nonetheless, ligands submitted to the XP step were previously processed with LigPrep(16) to generate those states that would be found at physiological pH (7.4). Final hits were submitted to consensus docking, as well as to descriptor analysis and biological activities prediction using QikProp(16).

Consensus docking and first ranking
After performing all the screening steps, a consensus docking was carried out to identify those ligands capable of binding with high affinity and getting high docking scores with: a) a structure coming from a different PDB file, and b) a different docking software, as a measure to eliminate biased ligands that would only come as positive candidates with the first gp120 minimized structure, enhancing the possibility to find hits that would bind the protein in any of the conformations presented by the amino acids conforming the Phe43 cavity. 275 hits resulting from the XP step of the docking funnel, named them ligand_001 through ligand_275, and were submitted them to three different calculations: Glide XP quality molecular docking with gp120 from the PDB file 1G9M, AutoDock 4.2.6(16) molecular docking with gp120 from the PDB file 1GC1, and AutoDock molecular docking with gp120 from the PDB file 1G9M Gp120 envelope glycoprotein was prepared from both the 1GC1 and 1G9M PDB files. The standard preparation workflow was followed, by adding all hydrogens and assigning Gasteiger charges (19,20). Grid maps were calculated following the standard AutoDock protocol for fixed side chains. For the grid box, 55 points were specified in x, y, and z directions, having a spacing factor of 0.375 Å. Extended search parameters were specified in the script. A Lamarckian genetic algorithm was used, with 25 000 000 evaluations, a population size of 150, and 20 runs. A 2.0 Å RMSD tolerance was considered for clustered results.
To obtain the best 100 hits of this stage a ranking function (equation 1) was applied to both Glide XP and AutoDock calculations and their docking score values:

Equation 1: Consensus docking ranking function
Where the ranking value (RV) is defined by the absolute value of the sum of the docking score (DS) and XP docking score (XP) average, provided by Glide in the two PDB files, 1GC1 and 1G9M (1 and 2, respectively), and the average of the lowest energy (LE) and best cluster (BC) values provided by AutoDock, in both PDB files. Thereafter the ranking function was used to provide a position from 1 to 275 comparing every ligand value against the rest of the molecules included in the set.

Induced-fit docking, second ranking, and visual inspection
Once the top 100 hits had been selected, they were included into the next stage of the project using the Induced-Fit Docking(16) tool from Glide, using the standard protocol with an OPLS3 force field. Side chains within 5.0 Å of ligand poses were refined with Prime, and final structures within 30 kcal/mol of the best structure were redocked with XP quality. The second ranking consisted in the comparison between the absolute values of induced-fit docking score averages by each ligand-receptor complex group of poses (Please refer to supplementary info). These groups were previously filtered by leaving out those ligands with functional groups too reactive or too problematic for future redesign (e.g. esters).

Molecular Dynamics
Simulations were carried out to have a deeper understanding of the ligand-receptor interaction with the help of Desmond(16). These simulations were run during 25 ns in OPLS3 force field, using a TIP4PEW solvent model, an orthorhombic box with minimized volume of 612,144 Å3 (distances a=b=c=15Å), chloride ions placement by recalculation (usually 4 or 5 needed to neutralize system), and a 0.15 M NaCl concentration.

Generation Stage
Despite better performance in clinical trials by compound 3, in this work we decided to use compound 1 as a scored reference for various reasons: first, as the BMS-378806 prototype, it is an HIV-1 in vitro inhibitor with high potency and low cytotoxicity toward a range of cell lines (6,21), second, its structure is chemically simpler than compounds 2 and 3, therefore making it easier to synthesize (especially the piperazine moiety, against the chiral methyl-piperazine present in compound 3), and finally, it yielded a higher docking score, giving us a better cutoff value for the candidate filtering stage explained in the methodology. Finally, inhibitors targeting gp120 are expected to be more effective against HIV infection than those targeting gp41: gp120 binding site to CD4 is exposed in a native state, while gp41 binding sites are exposed after the gp120-CD4 complex is formed (5) Protein preparation with both PDB files through the protein preparation wizard in Glide, was the first successful step towards generating a grid that would discriminate between ligands fitting the cavity, and those with an inadequate conformation. Ramachandran plots were obtained for both procedures, showing that most calculated angles agreed with the template of energetically allowed regions for backbone dihedral angles in both cases.
14 Envelope proteins gp120/gp41 have specific affinity with CD4 receptor embedded in CD4+ T cells (22). This interaction is followed by CC or CXC chemokine receptors biding, which are present in the surface of lymphocytes and monocyte/macrophages. These interactions lead to conformational changes in the HIV-1 envelope, eventually ending up in membrane fusion in less than an hour after the first contact between the virus and the T cell.
The deglycosylated core of gp120 is formed by 25 β-strands, 5 α-helices and 10 defined loop segments, also grouped as five conserved constant regions, C1-C5, and five variable loop regions, V1-V5, conforming a prolate ellipsoid with 50x50x25 Å, folded into two major domains (inner and outer) connected through a bridging sheet. Loops in this protein are relatively mobile, and glycosylation sites are all surface-exposed. (5,17) CD4 co-receptor binds into a depression or cavity formed at the interface between the outer and inner domains, and the bridging sheet. Interaction between both proteins is held by complementary electrostatic potentials: 219 van der Waals interactions and 12 hydrogen bonds are spread around all the contacts between 22 and 26 residues in CD4 and gp120, respectively; among those contacts we can find CD4 residue Phe43 (hence the name of Phe43 cavity) surrounded by the following gp120 residues when inserted into the cavity (figure 4) : Trp427, Trp112, Val255, Thr257, Glu370, Ile371, and Asp368, while hydrophobic interactions are found between CD4 Phe43 and gp120 Trp427, Glu370, Gly473, and Ile371, and between CD4 Arg59 and gp120 Val430 (17). Equally important to mention is the fact that residues Trp427, Glu370, and Asp368 are mutational hotspots, meaning that their substitution would affect CD4 binding, even though Phe43 only interacts with residues Glu370, Ile371, Asn425, Met426, Trp427, Gly73, and Asp368. (17). At the beginning, minimization of the gp120 protein was performed without any other moieties attached, nevertheless, since the resulting surfaces showed a repositioning of the sidechains of the amino acids, we left CD4 structure bound inside the cavity. When this action was considered, after deleting CD4, Phe43 cavity was clearly present in the structure, eliminating major mistakes at the time of grid generation.

Reference compound
Compound 1 was docked (figure 5) with extra-precision quality to obtain a reference score value and future cutoff for the library filtering. An XP GlideScore of -7.7 kcal/mol was obtained for this 16 compound with three fragments: a piperazine core, an indole linked through a diacetyl bridge, and a benzamide moiety.
Compound 1 presented interactions inside and outside the Phe43 cavity ( figure 6). The benzamide fragment displayed two pi-pi stacking interactions with Phe382 and Tyr384 residues as the main binding force of the ligand, while the indole moiety was partially exposed to water.
The piperazine core occupied the "entrance" of the cavity, surrounded by Trp427, Asn425, Met436, and Gly473 residues, needed to establish an interaction between gp120 and CD4. It is important to bear in mind that interaction diagrams built by Maestro show all residues interacting with the ligand within a 5 Å, but the fact that an amino acid is displayed does not imply that the contact is significant, unless the program indicates so (i.e. H-Bond interactions, ionic interactions, etc.)  The ADME analysis properties are shown in the table 1 (abbreviations can be found in the annex B). Green boxes indicate values inside the recommended thresholds, yellow ones indicate acceptable but not optimal values, while red ones point out undesirable values. Compound 1 shows a majority of optimal values, with exception of the blood/brain barrier coefficient, which explains the -1 value for central nervous system activity.

Docking Funnel
As described in the methodology, high-throughput virtual screening was the start of the filtering process for a total of 16 329 681 ligands. The docking score cutoff value used for this stage was -7.0 kcal/mol, to include all those molecules within the same affinity range or order of magnitude than the reference compound (-7.7 kcal/mol), leaving 60,379 molecules to work with during the next step in the docking funnel.
Molecules with lower (better) values than the cutoff represent around 0.37% of the initial size of the library. This means that the majority of compounds are not suitable for gp120 inhibition because either the compound scores poorly and lacks stable interactions, or the size and conformation of the molecule prevent it from fitting inside the cavity, in which case the program does not score it.
While the HTVS docking stage served primarily as a way of eliminating all ligands with low scores, leaving those worth working with, the SP docking stage reinforced the evaluation of the remaining portion of the library through a more detailed scoring function. For this stage, a cutoff value of -7.7 kcal/mol was used to refine those compounds with equal or better activity than the reference, leaving a total of 12,446 ligands, or nearly 21% of the initial ligands for this step.
Resulting ligands of the SP docking stage were submitted to the XP docking stage after they underwent a conformational analysis with LigPrep. From the initial 12,446 ligands, a final number of 275 compounds (2.2%) was obtained with a cutoff value of -7.7 kcal/mol and the most robust scoring algorithm available within Glide.
It is equally important to mention that scores through the docking funnel ranged between -1.2 and -9.7 kcal/mol, with obvious exception of ligands that did not fit into the cavity. According to the equation ∆ = − , where ∆ represents the Gibbs free energy change for the formation of the protein-ligand complex and stands for the affinity constant of the same complex, we can obtain the difference of affinity between a ligand and the reference through the This means that the screened library contained ligands with affinities 7.4 times higher than the reference.

Consensus docking and first ranking
Once the docking funnel obtained the best 275 hits out of the initial library, the ligands were renamed 1 through 275 for future calculations. Consensus docking was performed using the 1G9M PDB file, and the AutoDock 4.2.6 software to eliminate those ligands binding only the gp120 structure created by minimization of the 1GC1 PDB file. This step becomes necessary when trying to separate those molecules biased to bind only one specific conformation of the protein from those that truly establish interactions with significant amino acids. Still, this is only helpful when both structures are structurally equal or very similar; for example, 1G9M and 1GC1 differ only by a mutation, which is not located in the region of our interest.
Even though scoring functions for both programs are different, a similarity in the order of the results is noticeable among them and both PDB files. The ranking shows in green the best 100 compounds, in yellow the following 100 best ligands, and in red the last 75 ones. Nevertheless, these compounds can still be analyzed and redesigned to become bioactive compounds, since they still figure as the best virtual hits from this library.

Refinement stage
After the screening and validation stage was finished, best ligands needed to be studied in more detailed through the refinement stage, where induced-fit docking (IFD) and molecular dynamics gave us a better insight of the calculated interactions.

Induced-Fit docking, second ranking, and visual inspection
After the first ranking was performed, the 100 best ligands were submitted to IFD calculations to find the best possible poses of the ligand inside the Phe43 cavity, and the best conformations of 21 the protein when in contact with the ligand. After output poses where averaged and ranked as a simple comparison of the absolute values of the IFD score, the best 20 compounds were obtained.
These 20 compounds were pre-filtered before resulting the best ligands. None of these compounds are expected to present undesired reactivity under physiological conditions and their metabolic degradation should proceed without further consequences for the organism. This represents two main advantages: one, the compounds are expected to be stable when isolated as final synthesis products, and two, they are less likely to present unusual reactivity or side reactions when tested in biochemical assays.
The reader must be aware that the generated library is made of symmetrical compounds, where  interaction, it was revealed that a small fraction of it corresponded to ionic contact between the residue and the positive charge of the amino group. This last piece of information, however, is subject to experimental evidence, since the nitrogen atom is not the only one that can be protonated under physiological conditions.

Ligand 002
Ligand 002 docking calculations only showed one strong interaction between the nitrogen atom owned by the 4,5,6,7-tetrahydroindole fragment and Asn 425, confirmed as a 98% simulation time contact by the MD studies, which also revealed a purely hydrophobic interaction between this fragment and Trp 427 amino acid. Classes A and B interactions were also present within the contact volume.
While docking calculations showed solvent exposure for the [1,6]

Ligand 009
This virtual hit presents almost the same interactions as ligand 081, with slightly better simulation times for residues Trp 427 and Asn 425. Despite durable bonds in the inner area of the cavity, the chromane moiety in the exterior does not present any interactions that might anchor the ligand outside the hydrophobic pocket, explaining the total solvent exposure shown in the simulation diagram.

Ligand 022
This ligand shows typical interactions for the 4,5,6,7-tetrahydroindole fragment, displaying a low 37% simulation time hydrophobic interaction with Trp 427 that is not predicted by docking. At the same time, the oxindole fragment presents several contacts of great interest. Docking studies show this fragment interacting through a donor H-bond between the nitrogen atom and the Gly 473 residue, and acceptor H-bond between the carbonyl group and the Met 475 residue, an amino acid that is not included in any of the interaction classes described at the beginning of this section.
On the other hand, MD simulations show no interaction with the Gly 473 residue, instead displaying a water-mediated double interaction between residues Asp 474 and Met 475, and the carbonyl group.
Crystallization of the complex would be needed to confirm if this water molecule is required to establish an interaction in the outer side of the pocket, or if it happens to just be a statistically significant interaction throughout the simulation with different water molecules.

Ligand 059
Ligand 059 showed a total of 5 interactions in docking calculations: a double hydrophobic interaction with Trp 427 from both rings forming the indole fragment, an H-bond from the nitrogen atom of this fragment towards Asn 425, and two H-bonds coming from the protonated nitrogen atom present in the 1-(methylenesulfonyl)piperazine with Asp 368. All of these class B interactions were confirmed by MD studies with simulation times higher than 90%, except for Trp 427 contacts. Solvent exposure was also shown to be the same for both studies.

Ligand 177
Along with ligand 059, ligand 177 has a different first fragment than the other virtual hits. The benzonitrile fragment presented no interactions within the inner section of the Phe 43 pocket in docking calculations, even though this cavity is highly hydrophobic. Nevertheless, poor interactions between this fragment and Trp 427 and Met 475 residues was documented by MD simulation, leading to fragile anchorage of the ligand inside the gp120 protein.
On the other hand, the 1H-1,5-benzodiazepine-2,4(3H,5H)dione moiety showed multiple interactions, albeit different between both analyses. Docking showed a double accepting H-bond between residues Arg 475 and His 105 (none of them class A or B) and one of the carbonyl groups, and a donor H-bond towards Trp 427 coming from one of the nitrogen atoms of the benzodiazepine group, while MD studies displayed one of these nitrogen atoms in contact with Met 426 residue through a poor 32% H bond, and its second nitrogen forming an occasional water-bridged contact with Asp 368. As it happened with previous hits, this molecule showed total solvent exposure throughout the simulation.

Ligand 085
Even though class A and B interactions were present within 5 Å from the molecule, only the 4,5,6,7tetrahydroindole fragment presented interactions like the ones described in previous ligands, as well as full solvent exposure inside the pocket. From both, docking and MD studies, we determined that the 1-methyl-2,3-dihydro-4(1H)-quinolinone moiety did not present significant contacts that might improve the ligand's inhibitory capacity.

Ligand 044
Structurally and conformationally similar to ligand 059, ligand 044 presented typical interactions for the 4,5,6,7-tetrahydroindole moiety in both analyses. Solvent exposure was shown only for the piperidine group present in the 1-(methylenesulfonyl)piperidine fragment. This last group showed no interactions in docking calculations, but one H-bond contact through a bridging water molecule with Asp 368 in the MD simulation.

Molecular dynamics quality parameters
As a first measure to monitor simulation quality, RMSD of the alpha carbon atoms of the protein complexes was recorded during the simulations, and then compared to that RMSD of the protein simulated on its own.
All of the simulations converged within 5 ns, a relatively small amount of time for medium-size systems such as gp120, having an RMSD of less than 5 Å during the following 20 ns. The exceptions to this description were simulations MD3 and MD5, that converged around 7-8 Å, and 5-6 Å respectively.
Alternatively, the protein RMSF was also recorded for each one of the simulations. The clear majority of residues displayed an RMSF value between 1-2 Å, showing low to zero fluctuation for most amino acids contained in gp120. However, it is worth discussing those peaks rising above 2 Å:

Residues 120-140:
Showing the highest RMSF values, up to 15 Å, these residues belong to the V1/V2 loop and β2 region of gp120 protein, the most mobile region in the protein when detached from the whole viral structure. Nonetheless, this area is also in contact with any ligand inside the cavity, making it susceptible to fluctuations due to existing contacts, clashes, and interactions.

Residues 150-160:
As a part of the V1/V2 loop, their 3 Å fluctuations might be caused by the loop's natural movement.

Residues 270-280:
These residues with 3 Å fluctuations belong to the β10 region, usually affected by movements made by domain V3 loop, next to it.

Residues 300-320:
The second most mobile region of the protein with RMSF values of up to 6 Å, these amino acids form the V3 loop, a region responsible for chemokine attachment once gp120-CD4 interaction has been achieved, triggering conformational changes.

Residues 340-350:
With values of 3 Å, these residues belong to the α2 helix of gp120. These changes cannot be attributed to the effect of the ligand, taking into consideration the distance between the helix and the subdomains surrounding the Phe 43 pocket.

Biological properties prediction (ADME)
A set of endpoints and descriptors were calculated with three programs (QikProp, T.E.S.T., and Derek Nexus)(16) to study the predicted chemical and biological properties of the best virtual hits.   Table 3 shows two endpoints calculated with T.E.S.T., oral LD50 in rats and mutagenicity (Ames test). No ligands, except for ligands 044 and 009, showed toxicities below 500 mg/Kg, while seven of them presented no evidence of mutagenic behavior. Ligand 255 again showed improved characteristics against the other 9 virtual hits, with an LD50 around 900 mg/Kg (Class III of toxicity: slightly toxic), and mutagenicity negative. Results showed in this table next to an asterisk were obtained through nearest neighbor method, which might be taken with caution, even if it is a valid approach. Results expressed as N/A mean a predictive quantitative structure-activity relationship model could not be generated for this molecule.  Table 4 shows the results for human, monkey and rat predictions of at least an "Equivocal" level alert. The

CONCLUSIONS
The bottleneck methodology applied in this work allowed us to explore a much bigger chemical space than those ones reported at experimental studies, with a lower use of resources and time.
Descriptors used in this work were sufficient to predict chemical structures with affinities up to 3 times higher, for the gp120 protein, than those reported as active in preclinical studies. At the same time, with this methodology we narrowed down a 16.3 million compounds database to a few molecules that can be analyzed, synthesized, assayed, and redesigned afterwards, within a short amount of time, and less resources.
Thanks to the analysis of all interactions established between gp120 and the virtual hits, it becomes possible to point out at a first scaffold to inhibit CD4 attachment: a piperazine core with a 4,5,6,7-tetrahydroindole moiety as a first fragment interacting in the inner section of the cavity, and a second fragment capable of forming H-bond/ionic interactions in the external region of the cavity. Even though the piperazine did not present any interactions with the surroundings, its conformation and volume are key for blocking the Phe43 cavity. Most important residues include Asn425 and Trp427, key amino acids in CD4 Phe43 interaction. These residues were present in almost all the interactions of our virtual hits, giving us a hint to redesign and improve functional groups in contact with them, in order to achieve better simulation time percentages.
Molecular dynamics studies confirmed the validity of the results obtained by the docking stages, since the ligand interactions were present towards the same amino acids in the majority of cases, and the quality parameters were good throughout the complete trajectories, whose 25 ns simulation time was much longer than other in silico studies reported in the literature. This last point was proved through quick convergence of the RMSD values (5 ns) and low deviations (5 Å).
Simulations showed that these ligands do not get out of the Phe 43 pocket, due to strong and durable H-bond interactions, and despite the solvent exposure of more than half the hits inside the cavity. RMSF values demonstrated that ligands have a significant effect on the V1/V2 loop region, and the β15 strand. These could lead to important conformational fluctuations that prevent gp120-CD4 interaction.
Biological properties prediction displayed great profiles for many of the top-ten ligands discussed in this work. While ligand 255 showed the most promising results on these characteristics, the rest of the molecules could also prove potential candidates during clinical trials, subject to redesign to improve inhibitory activity.
Novelty of this work resides in the combinatorial and massive generation of libraries of compounds that would possibly inhibit gp120-CD4 interaction by insertion inside the Phe43 cavity. The docking funnel described in the methodology represents a powerful and robust tool to discriminate between ligands with lower or higher binding affinity. Finally, we hope that the description of the methodology made in this work will help perform high throughput virtual screening of large libraries with different biological targets and diseases. Future directions include 38 the improvement of the methodology, the testing of new cores and generation of new libraries, as well as the reduction of computational resources needed to perform the screening.

ACKNOWLEDGEMENTS
We thank the General Direction of Computing and Information and Communication