PT - JOURNAL ARTICLE AU - Galkande Iresha Premarathna AU - Leif Ellingson TI - Classification of protein binding ligands using structural dispersion of binding site atoms from principal axes AID - 10.1101/2020.12.21.423752 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.12.21.423752 4099 - http://biorxiv.org/content/early/2020/12/21/2020.12.21.423752.short 4100 - http://biorxiv.org/content/early/2020/12/21/2020.12.21.423752.full AB - Many researchers have studied the relationship between the biological functions of proteins and the structures of both their overall backbones of amino acids and their binding sites. A large amount of the work has focused on summarizing structural features of binding sites as scalar quantities, which can result in a great deal of information loss since the structures are three-dimensional. Additionally, a common way of comparing binding sites is via aligning their atoms, which is a computationally intensive procedure that substantially limits the types of analysis and modeling that can be done. In this work, we develop a novel encoding of binding sites as covariance matrices of the distances of atoms to the principal axes of the structures. This representation is invariant to the chosen coordinate system for the atoms in the binding sites, which removes the need to align the sites to a common coordinate system, is computationally efficient, and permits the development of probability models. These can then be used to both better understand groups of binding sites that bind to the same ligand and perform classification for these ligand groups. We demonstrate the effectiveness of our method through classification studies with two benchmark datasets using nearest mean and polytomous logistic regression classifiers.