Abstract
A tumor is thought to result from successive accumulation of genetic alterations – each resulting population manifesting itself with a novel ‘cancer phenotype.’ In each such population, clones of higher fitness, contributing to the selection of the cancer phenotype, enjoy a Darwinian selective advantage, thus driving inexorably the tumor progression to metastasis: from abnormal growth, oncogenesis, primary tumors, to metastasis. Evading glutamine deregulation, anoxia/hypoxia, senescence, apoptosis and immune-surveillance are strategies employed by the tumor population along the way to navigate through various Malthusian disruptive pressures resulting from the interplay among the evolutionary forces. Inferring how these processes develop and how altered genomes emerge is a key step in understanding the underlying molecular mechanisms in cancer development, and developing targeted approaches to treat the disease. The problem also poses many interesting challenges for computer and data scientists, primarily spurred by the availability of rapidly growing uninterpreted cancer patient data. We develop an algorithm that seeks to estimate causation by modifying statistical notion of correlation in terms of probability raising (PR) with a frequentist notion of temporal priority (TP) – thus developing a sound probabilistic theory of causation, as originally proposed by Suppes. This reconstruction framework is able to handle the presence of unavoidable noise in the data, which arise primarily from the intrinsic variability of biological processes, but also from experimental or measurement errors. Using synthetic data, we rigorously compare our approach against the state-of-the-art techniques and, for some real cancer datasets, we highlight biologically significant conclusions revealed by our reconstructed progressions.
- Abbreviations
- TP
- Temporal Priority
- PR
- Probability Raising
- DAG
- Directed Acyclic Graph
- CNV
- Copy-Number Variants
- CAPRI
- CAncer PRogression Inference.