Sparse network-based regularization for the analysis of patientomics high-dimensional survival data

André Veríssimo; Eunice Carrasquinha; Marta B. Lopes; Arlindo L. Oliveira; Marie-France Sagot; Susana Vinga

doi:10.1101/403402

Abstract

Data availability by modern sequencing technologies represents a major challenge in oncological survival analysis, as the increasing amount of molecular data hampers the generation of models that are both accurate and interpretable. To tackle this problem, this work evaluates the introduction of graph centrality measures in classical sparse survival models such as the elastic net.

We explore the use of network information as part of the regularization applied to the inverse problem, obtained both by external knowledge on the features evaluated and the data themselves. A sparse solution is obtained either promoting features that are isolated from the network or, alternatively, hubs, i.e., features that are highly connected within the network.

We show that introducing the degree information of the features when inferring survival models consistently improves the model predictive performance in breast invasive carcinoma (BRCA) transcriptomic TCGA data while enhancing model interpretability. Preliminary clinical validation is performed using the Cancer Hallmarks Analytics Tool API and the String database.

These case studies are included in the recently released glmSparseNet R package¹, a flexible tool to explore the potential of sparse network-based regularizers in generalized linear models for the analysis of omics data.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.