A framework for translation of genomic responses from mouse models to human inflammatory disease contexts

The high failure rate of therapeutics showing promise in mouse disease models to translate to patients is a pressing challenge in biomedical science. However, mouse models are a useful tool for evaluating mechanisms of disease and prioritizing novel therapeutic agents for clinical trials. Though retrospective studies have examined the fidelity of mouse models of inflammatory disease to their respective human in vivo conditions, approaches for prospective translation of insights from mouse models to patients remain relatively unexplored. Here, we develop a semi-supervised learning approach for prospective inference of disease-associated human in vivo differentially expressed genes and pathways from mouse model experiments. We examined 36 transcriptomic case studies where comparable phenotypes were available for mouse and human inflammatory diseases and assessed multiple computational approaches for inferring human in vivo biology from mouse model datasets. We found that a semi-supervised artificial neural network identified significantly more true human in vivo associations than interpreting mouse experiments directly (95% CI on F-score for mouse experiments [0.090, 0.175], neural network [0.278, 0.375], p = 0.00013). Our study shows that when prospectively evaluating biological associations in mouse studies, semi-supervised learning approaches combining mouse and human data for biological inference provides the most accurate assessment of human in vivo disease and therapeutic mechanisms. The task of translating insights from model systems to human disease contexts may therefore be better accomplished by the use of systems modeling driven approaches. Author Summary Comparison of genomic responses in mouse models and human disease contexts is not sufficient for addressing the challenge of prospective translation from mouse models to human disease contexts. Here, we address this challenge by developing a semi-supervised machine learning approach that combines supervised modeling of mouse experiment datasets with unsupervised modeling of human disease-context datasets to predict human in vivo differentially expressed genes and pathways as if the model system experiment had been run in the human cohort. A semi-supervised version of a feed forward artificial neural network was the most efficacious model for translating experimentally derived mouse molecule-phenotype associations to the human in vivo disease context. We find that computational generalization of signaling insights from mouse to human contexts substantially improves upon direct generalization of mouse experimental insights and argue that such approaches can facilitate more clinically impactful translation of insights from preclinical studies in model systems to patients.


97
The utility of mouse models for studying inflammatory pathologies in particular was recently 98 assessed by a pair of studies examining the correspondence between gene expression in 99 murine models of inflammatory pathologies and human contexts (1, 2). The human and mouse 100 microarray cohorts assembled by the two studies had the rare property that mouse molecular 101 and phenotype data were well matched to human in vivo molecular and phenotype data. This

107
The key methodological difference between the two studies was that while Seok et al. examined

117
The aim of our study here is to address the challenge of prospective inference of human 118 biology from a model system study by developing a machine learning approach for inferring 119 human biological associations as if the model system study had been conducted in a human 120 cohort. Within this framework, a machine learning approach is judged successful if it correctly 121 predicts a higher proportion of human DEGs and enriched signaling pathways than were 122 implicated by the corresponding mouse disease model prior to any computational analysis. The 123 essence of our approach is to apply a machine-learning classifier to assign synthetic 124 phenotypes derived from those in a mouse dataset to molecular datasets of disease-context 125 associated human samples. These synthetic phenotype labels of the human samples are then 126 used for differential expression and pathway enrichment analysis to derive a set of predicted 127 molecule-phenotype associations for the human samples. We were able to assess the efficacy 128 of this approach by testing it on the datasets from the Seok and Takao

181
The F-score provided a summarized score that gave an equal weighting on both the accuracy of 182 DEG and pathway predictions and how comprehensive the predictions were relative to the 183 human-predicted associations.

184
The mouse model-predicted DEGs and enriched pathways constituted the baseline 185 performance of mouse-to-human translation that our machine-learning approaches needed to 186 improve upon to be considered successful ( Figure 1A). We implemented supervised and semi-

214
In the supervised case, the mouse molecule-phenotype dataset was used to train a machine 215 learning classifier and the resulting classifier was applied to the human molecular dataset to 216 infer synthetic phenotype labels ( Figure 1B). In the semi-supervised case, the mouse dataset 217 was first used to train a machine learning classifier in a supervised manner and then this 218 classifier was applied to infer synthetic phenotype labels in the human dataset ( Figure 1C).  (Table S1). However, both α (p = 0.000242) and the type of machine learning method (p = 0.00189) significantly impacted the F-score metric (Table S2). The 247 significance of the regularization parameter and classifier type for the F-score and not the AUC 248 metric suggests that though each machine learning approach performed with comparable 249 accuracy, the biological relevance of the predicted phenotypes were significantly influenced by 250 the stringency of feature selection and choice of machine learning approach.

251
Since the F-score was a direct measure of the biological relevance of the predictions made 252 by a particular algorithm, we focused on F-score as the most relevant performance metric. That 253 is, we emphasized the capability for gaining biological insights over mere numerical predictive 254 capacity. We computed the 95% confidence intervals of the F-scores for each machine learning 255 approach and mouse model across all case studies and regularization parameters ( Figure 2).

256
The overall performance of mouse-derived DEGs for prospectively predicting human DEGs was  (Table S3). We then performed WMW tests comparing the DEG F-scores 267 between all pairs of machine learning approaches and confirmed that the ssANN performed 268 significantly better than all other machine learning approaches (Table S4). Finally, we 269 examined the average performance of the ssANN specifically across all case studies for each 270 setting of the regularization parameter α (Table S5). An α value of 1.0 corresponding to Lasso  Table S6). In most cases, the mouse model pathway F-score is 288 higher than the DEG F-score indicating that the mouse models considered here are more   (Table S7). This pathway signature of human sepsis appears to be highly 346 reproducible in multiple mouse sepsis models, rendering it a stable signature for assessing 347 therapeutic interventions and benchmarking mouse sepsis models against human data.

Model Performance Assessment
Classification models were evaluated by their ability to discriminate between human phenotypes 557 and by the extent to which analyzing the human molecular data using the predicted human 558 phenotypes implicated the same genes as using the true human phenotypes. Classification