Abstract
Ligand GA is introduced in this work and approaches the problem of finding small molecules inhibiting protein functions by using the protein site to find close to optimal or optimal small molecule binders. Genetic algorithms (GA) are an effective means for approximating or solving computationally hard mathematics problems with large search spaces such as this one. The algorithm is designed to include constraints on the generated molecules from ADME restriction, localization in a binding site, specified hydrogen bond requirements, toxicity prevention from multiple proteins, sub-structure restrictions, and database inclusion. This algorithm and work is in the context of computational modeling, ligand design and docking to protein sites.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
gordoncs{at}uga.edu,
https://www.dropbox.com/sh/s5nm3yzsd3l4y5r/AADCjHIymuu0nWSYnD3V0dQRa?dl=0
1 An initial examination of 4 heavy atom types with valence greater than 1 in a linear chain gives 4N, with N the number of heavy atoms. The exact number of molecules given atomic and pseudo-residue content can be calculated from graph counting developments in early and current studies of large orders in perturbation theory of quantum field theories.
2 This software is extensively documented. Stereoisomer and ring conformation information is also in the output.
3 There are many different SMILES expressions for a given molecule. The IUPAC convention for unique SMILES is not implemented in this work. Due to the multiple paths from molecule A to molecule B in modifications, a unique representation may restrict the search, although conversion to a unique form is useful in database searching.
4 The details of CSD GOLD and its use are documented at [15].
5 The choice of a soft limit that is a penalty proportional to the violation or a direct hard limit can have an effect on the GA evolution of the population.