Spreading predictability in complex networks

Many state-of-the-art researches focus on predicting infection scale or threshold in infectious diseases or rumor and give the vaccination strategies correspondingly. In these works, most of them assume that the infection probability and initially infected individuals are known at the very beginning. Generally, infectious diseases or rumor has been spreading for some time when it is noticed. How to predict which individuals will be infected in the future only by knowing the current snapshot becomes a key issue in infectious diseases or rumor control. In this report, a prediction model based on snapshot is presented to predict the potentially infected individuals in the future, not just the macro scale of infection. Experimental results on synthetic and real networks demonstrate that the infected individuals predicted by the model have good consistency with the actual infected ones based on simulations.


Introduction
diverse collection of outbreaks and identified a fundamental entropy barrier for disease 16 time series forecasting through adopting permutation entropy as a model independent 17 measure of predictability. Funk et al [29] presented a stochastic semi-mechanistic 18 model of infectious disease dynamics that was used in real time during the 2013-2016 19 West African Ebola epidemic to fit the simulated trajectories in the Ebola Forecasting 20 Challenge, and to produce forecasts that were compared to following data points. 21 Venkatramanan et al [30] proposed a data-driven agent-based model framework for 22 forecasting the 2014-2015 Ebola epidemic in Liberia, and subsequently used during 23 the Ebola forecasting challenge. The data-driven approach can be refined and adapted 24 for future epidemics, and share the lessons learned over the course of the challenge. 25 Zhang et al [31] proposed a measurement to state the efforts of users on Twitter to get 26 their information spreading. They found that small fraction of users with special 27 performance on participation can gain great influence, while most other users play a 28 role as middleware during the information propagation. 29 Up to now, most researches are focused on macro level of spreading prediction, but 30 few on micro level. However, the detailed infected individuals should be known so as 31 to contain the spread of serious infectious diseases such as SARS [32,33] and 32 H7N7 [34,35]. Besides aspect of macro level of spreading, we should pay attention to 33 some more details besides the general infection coverage so as to achieve fine 34 prediction. Chen et al. did some interesting works on this area [23]. They presented 35 an iterative algorithm to estimate the infection probability of the spreading process 36 and then apply it to mean-field approach to predict the spreading coverage. 37 Combing mean-field or pair approximation models with infection probability 38 estimating strategy [23], the number of infected nodes from a given snapshot of the 39 propagation on network can be predicted, but can not determine which nodes will be 40 infected. In this paper, we present a probability based prediction model to estimate 41 the infection probability of a node, further, to determine the nodes being infected in 42 the future. 44 For a given snapshot, a susceptible node can be infected by a probability in the future. 45 Denoting by P u (t) the score of node u at time t, we have,

Materials and methods
where Γ u is the neighbors of node u and infected probability µ is estimated by IAIP 47 model (Iterative Algorithm for estimating the Infection Probability) [23]. Since an 48 infected node always attempts to infect its susceptible neighbor once time and a 49 recovered node doesn't infect any of its susceptible neighbor, so, in Eq. (1), for node v, 50 it is reasonable to assume that P v (t) = 1 for infected node and P v (t) = 0 for recovered 51 node. For susceptible node u, the probability to be infected at time t is P u (t).

52
Obviously, the initial condition is, In Eq. (1), the score P u (t) for susceptible node u will be converged to a unique steady 54 state denoted by P u (t c ) , where t c is the convergence time. The final score P u = P u (t c ) 55 is the probability to be infected of susceptible node while spreading achieves steady 56 state. of average over 10000 simulations, we use predictability χ and Pearson correlation ρ to 70 evaluate our model. These two metrics can be calculated by: where Q l is the number of infected nodes of the l th simulation from snapshot.

77
To simulate the spreading process on networks, we employ the 78 Susceptible-Infected-Removed (SIR) model [36]. In a network, we randomly select one 79 node as the initial spreader. The information from this node will infect each of this 80 node's susceptible neighbors with probability µ, namely the infection probability.

81
After infecting neighbors, the node will immediately become recovered (i.e., the 82 recovering probability is 1). The new infected nodes in next step will infect their 83 neighbors as the initial node. If it is not specially stated, we take the snapshot after 84 five steps of spreading from the initial node as the known information. 85 We test our method on synthetic and real networks. Synthetic networks are can de written as: i.e., B = 0, there are a few nodes with extremely large degree, the information can be 126 spread out easily so long as it reaches to a node with large degree. So, it is relatively 127 easy to predict which node will be infected in the future. As the B increases, the 128 network evolves to random, a node getting infected or not will be hard to predict 129 relatively, so the predictability and correlation decrease when B increases, as shown in 130 Fig. 3(a). If rewiring probability p < 0.2, the information is hard to diffusion to other 131 nodes since the WS network is almost regular, so it is hard to predict the infected 132 nodes. When rewiring probability p > 0.2, the network has relatively strong random, 133 the information reaches to other nodes easily, consequently, it is easy to predict the 134 infected nodes, as shown in Fig. 3(b). In GN network, if average internal degree ⟨k in ⟩ 135 is larger, the community structure is clearer, correspondingly, the information is hard 136 to escape the community boundary, and the predictability and correlation will getting 137 worse, as shown in Fig. 3(c).

138
Besides the network parameter listed above, the density of network, i.e., average 139 node degree ⟨k⟩, also affects the predictability and correlation, as shown in Fig. 4.

145
The effect of stage of snapshot 146 We further analyze the predictability χ and correlation ρ under different stage of 147 snapshot, as shown in Fig. 5. In Fig. 5, T is the spreading time of snapshot.

148
Generally, it is difficult to estimate the infected rate precisely if just the snapshot in 149 the early stage is given since there is little usable information, so, it is hard to predict 150 the infected nodes. As T increases, more information could be used, the predictability 151 χ and correlation ρ are getting better. In the late stage, many nodes of snapshot are  snapshot, the information will be diffused easier, and so, it is more easy to predict the 162 infected nodes in the future, correspondingly, the predictability χ will getting better.

163
Besides synthetic networks, we also analyze the predictability χ and correlation ρ 164 on 11 real networks. The properties and analysis results on these real networks are 165 shown in Table 1 Up to now, most of researches mainly focus on the infection scale or threshold when 173 they study the spreading dynamics in complex networks. However, following questions 174 may be more important and interesting: Which nodes will be infected in the future presented a probability based prediction model to predict the infection nodes. Three 177 synthetic and eleven real networks are used to evaluate the proposed model.

178
Experimental results demonstrate that the model proposed could predict the infection 179 nodes precisely in the sense of probability. In this paper, we just discuss the prediction 180 model on static networks. The analyzing will get more difficult if the networks are 181 evolving [42][43][44]. Furthermore, we analyze the effect of structure of networks, but we 182 don't consider the moving or self-protecting of individuals while disease outbreaks.

183
Actually, as the diseases information makes individuals alert and take measures to 184 prevent the diseases, the effective protection is more striking in small community [45]. 185 We will study these more comprehensive cases deeply in the future.