PT - JOURNAL ARTICLE AU - Huikun Zhang AU - Spencer S. Ericksen AU - Ching-pei Lee AU - Gene E. Ananiev AU - Nathan Wlodarchak AU - Julie C. Mitchell AU - Anthony Gitter AU - Stephen J. Wright AU - F. Michael Hoffmann AU - Scott A. Wildman AU - Michael A. Newton TI - Predicting kinase inhibitors using bioactivity matrix derived informer sets AID - 10.1101/532762 DP - 2019 Jan 01 TA - bioRxiv PG - 532762 4099 - http://biorxiv.org/content/early/2019/01/28/532762.short 4100 - http://biorxiv.org/content/early/2019/01/28/532762.full AB - Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets.We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on two kinase targets not included in the PKIS. Using limited training data in both retrospective and prospective tests, bioactivity fingerprints based on chemogenomic data outperform chemical fingerprints in predicting active compounds in both standard virtual screening metrics and accurate identification of hits from novel chemical classes.Author Summary In the early stages of drug discovery efforts, computational models are used to predict activity and prioritize compounds for experimental testing. New targets commonly lack the data necessary to build effective models, and the screening needed to generate that experimental data can be costly. We seek to improve the efficiency of the initial screening phase, and of the process of prioritizing compounds for subsequent screening.We choose a small informer set of compounds based on publicly available prior screening data on distinct (though related) targets. We then use experimental data on these informer compounds to predict the activity of other compounds in the set against the target of interest. Computational and statistical tools are needed to identify informer compounds and to prioritize other compounds for subsequent phases of screening. Using limited training data, we find that selection of informer compounds on the basis of bioactivity data from previous screening efforts is superior to the traditional approach of selection of a chemically diverse subset of compounds. We demonstrate the success of this approach in retrospective tests on the Published Kinase Inhibitor Sets (PKIS) chemogenomic data and in prospective experimental screens against two additional non-human kinase targets.