Abstract
The ability to direct a Probabilistic Boolean Network (PBN) to a desired state is important to applications such as targeted therapeutics in cancer biology. Reinforcement Learning (RL) has been proposed as a framework that solves a discrete-time optimal control problem cast as a Markov Decision Process. We focus on an integrative framework powered by a model-free deep RL method that can address different flavours of the control problem (e.g., with or without control inputs; attractor state or a subset of the state space as the target domain). The method is agnostic to the distribution of probabilities for the next state, hence it does not use the probability transition matrix. The time complexity is only linear on the time steps, or interactions between the agent (deep RL) and the environment (PBN), during training. Indeed, we explore the scalability of the deep RL approach to (set) stabilization of large-scale PBNs and demonstrate successful control on large networks, including a metastatic melanoma PBN with 200 nodes.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Sotiris Moschoyiannis and Vytenis Sliogeris have been partly funded by UKRI Innovate UK, grant 77032.
(e-mail: s.moschoyiannis{at}surrey.ac.uk).
(e-mail: wuyuhu{at}dlut.edu.cn).
Added appendix, changed stylefile.
↵5 V. Mnih and et al, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
↵7 J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347,2017.
↵8 J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning. PMLR, 2015, pp. 1889–1897
↵9 V. Mnih and et al, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.