Time-resolved compound repositioning predictions on a text-mined knowledge network

Background Computational compound repositioning has the potential for identifying new uses for existing drugs, and new algorithms and data source aggregation strategies provide ever-improving results via in silico metrics. However, even with these advances, the number of compounds successfully repositioned via computational screening remains low. New strategies for algorithm evaluation that more accurately reflect the repositioning potential of a compound could provide a better target for future optimizations. Results Using a text-mined database, we applied a previously described network-based computational repositioning algorithm, yielding strong results via cross-validation, averaging 0.95 AUROC on test-set indications. The text-mined data was then used to build networks corresponding to different time-points in biomedical knowledge. Training the algorithm on contemporary indications and testing on future showed a marked reduction in performance, peaking in performance metrics with the 1985 network at an AUROC of .797. Examining performance reductions due to removal of specific types of relationships highlighted the importance of drug-drug and disease-disease similarity metrics. Using data from future timepoints, we demonstrate that further acquisition of these kinds of data may help improve computational results. Conclusions Evaluating a repositioning algorithm using indications unknown to input network better tunes its ability to find emerging drug indications, rather than finding those which have been withheld. Focusing efforts on improving algorithmic performance in a time-resolved paradigm may further improve computational repositioning predictions.

according to the Rephetio framework. Finally, an ElasticNet regularized logistic regression was 156 performed using the python wrapper (https://github.com/civisanalytics/python-glmnet) for the Fortran 157 library used in the R package glmnet [18]. Hyperparameters were tuned via grid search and once chosen 158 left constant throughout all future runs. 159 To evaluate the model, the DrugCentral gold standard was partitioned by indication into 5 equal 160 partitions. One-fifth of the indications were withheld during training, and negative training examples were 161 sampled at a rate of ten times the number of positives from the set of non-positive drug-disease pairs. The 162 corresponding TREATS edges for holdout indications were removed from the hetnet before feature 163 extraction in an attempt to limit the model's ability to learn directly from those edges. The five-fold cross-164 validations were performed a total of ten times, each with a different random partitioning. 165

Time-restricted learning models 166
The models for the time-resolved networks were trained using the positive gold-standard indications 167 where drug was approved in the years prior to and including the year of the network. Training negatives 168 were selected randomly from the pool of non-positive drug-disease pairs at a rate of ten times the number 169 of positives. After training, the models were then tested on positive indications dated after the year of the 170 network, as well as a proportional number of negatives. 171 To combine the results of all of the models across the varying network years, the prediction 172 probability for each model was first converted to z-score. This allowed for a cross model comparison of 173 the results. The standardized probabilities for gold-standard drug-disease indications were then grouped 174 according to the difference in years between the network the probability was derived from and the 175 approval year of the drug in the indication. This grouping allowed for the generation of performance 176 metrics for a relative drug approval year. Negative examples were chosen at random from the non-177 positive set of drug-disease pairs, across all models, at a rate of ten times that of the positives. Area under 178 the receiver operator characteristic (AUROC) and precision recall curves (AUPRC) were then calculated 179 for each of the different time differences from negative 20 to positive 20 years. 180

Feature performance analyses
To test the relative importance of each edge type to the model, one of the better performing networks on 182 future indications, 1985, was chosen as a baseline. We performed a 'dropout' analysis in which edge 183 instances were removed randomly from the network at rates of 25%, 50%, 75%, and 100% before running 184 the machine learning pipeline. For dropout rates of 25%, 50%, and 75%, the 5 replicates were run with 185 different random seeds, to account for the differences that specific edges may produce when selected for 186 dropout. Performance metrics AUROC and AUPRC of these different dropout results were then 187

5-fold cross-validation on text-mined data 198
A hetnet comprised of biomedical knowledge was built from SemMedDB, a database containing subject, 199 predicate, object triples that were text-mined from PubMed abstracts. After data processing steps (see drug-disease pairs. In mapping these drug and disease concepts to those found in SemMedDB, 3,885 214 indications were lost due an inability to map the disease condition to a unique concept ID (see methods  215 for examples), and further reductions came due to the merging of highly related disease concepts, 216 resulting in 5,337 unique indications that could be used as true-positives for training and testing purposes. 217 After preparation of the hetnet and the gold standard, the utility of this text-mined knowledge 219 base for the prediction of novel drug-disease indications was examined using a modified version of the 220 The ElasticNet logistic regression in this analysis used feature selection to reduce the risk of 235 overfitting with a highly complex model. In comparing the models, there was a fairly consistent selection 236 of short metapaths with only two edges that include important drug-drug or disease-disease similarity 237 measures ( Figure 1E). These include two related drugs, one of which treats a disease 238 (dwpc_CDrtCDtDO), or two associated diseases, one of which has a known drug treatment 239 (dwpc_CDtDOawDO). However, other metapaths of length 3 which encapsulated drug-drug or disease-240 disease similarities were also highly ranked. This includes two drugs that co-localize to a given 241 anatomical structure (dwpc_CDloAloCDtDO), two diseases that present in the same anatomical structure 242 (dwpc_CDtDOloAloDO), or diseases that affect similar phenomena (dwpc_CDtDOafPHafDO). In this 243 case anatomical structures could include body regions, organs, cell types or components, or tissues, while 244 phenomena include biological functions, processes, or environmental effects. It is important to again note 245 that these 'similarity measures' are purely derived from text-mined relations. 246 While these results indicate a fairly accurate classifier in this synthetic setting, the paradigm 247 under which they are trained and tested is not necessarily optimal for finding novel drug-disease 248 indications. A cross-validation framework essentially optimizes finding a subset of indication data that 249 has been randomly removed from a training set. However, prediction accuracy on randomly removed 250 indications does not necessarily extrapolate to prospective prediction of new drug repurposing candidates. 251 Framing the evaluation framework instead as one of future prediction based on past examples may be 252 more informative. For example, the question 'given today's state of biomedical knowledge, can future 253 indications be predicted?' may more closely reflect the problem being addressed in drug repositioning. 254 The best way to address this question would be to perform the predictions in a time-resolved fashion, 255 training on contemporary data and then evaluating the model's performance on an indication set from the 256 future. constructed for the various timepoints, the number of nodes and edges always increased, but edges 275 increased more quickly with later timepoints producing a more connected network than earlier ( Figures  276   2A and 2B). 277 The number of indications that could be mapped to a given network year increased quickly at first 278 but rose much more slowly in the later years of the network, even though the total number of concepts in 279 the network continued to increase. For the majority of the years of the network, the split between current 280 and future indications remained at a ratio of around 80% current and 20%, ideal for a training and testing 281 split. However, after the year 2000, the number of mappable future indications continued to diminish year 282 after year, reducing the test set size for these years (Supplemental Figure S2, Additional File 1). 283

Machine learning results 284
The performance of each model against a test set of future indications steadily increased from the earliest across almost all models. One difference found from the cross-validation results is the appearance of the 296 `Physiology` metanode in two of the top selected metapaths, one connecting two diseases through 297 common physiology, and one connecting two drugs that both augment a particular physiology. Model 298 complexity was also diminished compared to those seen in during cross-validation, with the majority of models selecting less than 400 features, or 20% of the total available (Supplemental Figure S3, Additional 300

File 1). 301
Finally, one question to explore is whether or not there is a temporal dependence on the ability to 302 predict indications. For example, is there better performance on drugs approved 5 years into the future 303 rather than 20, since one only 5 years pre-approval may already be in the pipeline with some important 304 associations already known in the literature. To answer this, the results from all network years were 305 combined via z-scores. Grouping indications by approval relative to the year of the network allowed for 306 an AUROC metric to be determined for different timepoints into the future ( Figure 3C). This analysis 307 revealed that there is still a substantial predictive ability for drugs approved up to about 5 years into the 308 future. However, after 5 years, this value quickly drops to a baseline of .70 for the AUROC and .15 for 309 the average precision. These results indicate a temporal dependence on the ability to predict future 310 indications, with the model being fairly inaccurate when looking far into the future. 311

Edge dropout confirms importance of drug disease links 312
Many other efforts in computational repositioning have found that emphasis on drug-drug and disease-313 disease similarity metrics results in accurate predictors [6,19,20]. To further investigate the types of 314 information most impactful in improving the final model, an edge dropout analysis was run. The 1985 315 network was chosen as a base network for this analysis both due to its relatively strong performance on 316 future indications and its centralized time point among all the available networks. By taking each edge 317 type, randomly dropping out edge instances at rates of 25%, 50%, 75% and 100%, and comparing the 318 resulting models, the relative importance of each edge type within the model could be determined. The 319 edge that was found to have the largest impact on the resulting model was the 'Chemicals & Drugs -320 TREATS -Disorders' edge, reducing the AUROC by .098 ( Figure 4A). This result reinforces the idea 321  indicate 95% confidence interval over 5 replicates with different seeds for dropout. The 9 edge types that 501 had the greatest reduction from 0 to 100% dropout are displayed. B) Edge replacement analysis showing 502 changes in AUROC when edges are replaced with those of the same type from another year's network. 503 The top 9 edges that showed greatest loss in performance in the dropout analysis between 0 and 100% 504 dropout are displayed. 505