Reinforced Random Walker meets Spike Timing Dependent Plasticity

We blended Reinforced Random Walker (RRW) and Spike Timing Dependent Plasticity (STDP) as a minimalistic model to study plasticity of neural network. The model includes walkers which randomly wander on a weighted network. A walker selects a link with a probability proportional to its weight. If the other side of the link is empty, the move succeeds and link’s weight is strengthened (Long Term Potentiation). If the other side is occupied, then the move fails and the weight of the link is weakened (Long Term Depression). Depending on the number of walkers, we observed two phases: ordered (a few strong loops) and disordered (all links are alike). We detected a phase transition from disorder to order depending on the number of walkers. At the transition point, where there was a balance between potentiation and depression, the system became scale-free and histogram of weights was a power law. This work demonstrate how dynamic of a complex adaptive system can lead to critical behavior in its structure via a STDP-like rule.


Introduction
Learning is among the most underlying traits of the brain, which is carried out through strengthening and weakening of synapses between various neurons (synaptic plasticity) [1,2]. The case of a neuronal activity resulting in the strengthening of a specific synapse is called Long Term Potentiation (LTP), and in the case of its weakening, Long Term Depression (LTD) [3].
Spike Timing Dependent Plasticity (STDP) is an asymmetrical form of Hebbian learning that models synaptic plasticity based on the time difference between firing of pre and post-synaptic neurons [4,5,6]. In the most common form of the STDP model, for which there exist many experimental confirmations, pre-synaptic neuron spiking prior to a post-synaptic neuron results in LTP, and spiking of a pre-synaptic neuron subsequent to the spiking of a post-synaptic neuron results in LTD [7,8,9].
A lot has been done to study the effects of STDP on the evolution and on the properties of the final structure of neural networks [10,11,12,13,14]. In this paper, by proposing a model of reinforced random walkers, we tried to reconstruct the main properties of STDP.
Reinforced Random Walkers (RRW) are random walkers on a network, such that their passage through a link increases the chance that they will pass through the same link in the future [15]. RRW have been used to model the behavior of cells whose passage through a specific point causes the chemical environment to adjust to the transmission of the rest of the cells [16]. To illustrate, this is used to model the movement of myxobacteria (a bacteria that lives in soil) [17] and the migration of endothelial cells during tumor-induced angiogenesis [18,19,20].
In the original implementation, the walkers only reinforce the links as they pass but in a recent model, Mehraban and Ejtehadi introduced an RRW with two kinds of walkers, the first reinforces the connections and the second plays a weakening role. Depending on the ratio of walkers, they observed three phases including, ordered, disordered and transition phase [21].
In the following sections we will show that how our model, although much simplified, possesses the main characteristics of STDP and report how a network subject to this model evolves and turns into its final structure.
Many evidences suggest that the brain operates on a critical point or somewhere close to it, by observing scale-free behavior in variables such as avalanche sizes [22,23,24], a benefit of which is the maximization of information processing [25].
It has been suggested that the Synaptic Plasticity Plays a crucial role in getting to the critical point [26,27,28]. Most of these studies address functional criticality, when the activity of the network shows the sign of critical behavior [29,30].
In the current research, we are focused on the structure of the network and we have shown that at a specific point when, what we interpret as LTP and LTD play an equal role in the evolution of the system, the system finds itself in the critical point in respect to the synaptic weights. This point guides us towards a new idea regarding how the brain approaches the critical point.

Model
Our model is a reinforced random walk on directed network without any self loops. We start by a fully connected network, for which all the weights are equal. We then insert some walkers into the network. Each node has at most one walker (walkers are fermions), which until picked, seats on the node. Each network has three parameters: number of its nodes (n), number of its walkers (m), and evolution rate constant (a), where a = 1 + and 1. At each time step, we randomly pick a node and if it contains a walker, we select one of its neighbors with a probability of p ij , where P is the weight matrix of the network. After selecting the walker's destination, two cases may happen when: (i) Selected destination node j is not occupied by any other walkers, where the motion takes place and the selected link will be strengthened: (ii) Selected destination node j is already occupied by another walker, where the motion dose not take place and the selected link will be weakened: In our simulations the evolution rate is assumed to be a fixed value of a = 1.1 ( = 0.1). The weight of outgoing links from a given node sums to one, so, in addition to strengthening (weakening) of the link's transition probability, the transition probabilities of the remaining links associated with the departure node is weakened (strengthened).
We observed the evolution of the network over time. The initial form of the network was a fully connected network with equal weights. However, due to the self-organization property of the model, final properties of the network after reaching the steady state are quite independent of its initial conditions -the initial values of the transition probability matrix P and the initial place of walkers.
We assume that the system is in the steady state when the average Shannon entropy per node has reached a constant value and will remain unchanged thereafter. The entropy per node is defined by [31]: In our model, each node plays the role of a neuron and the existence of a walker in a node translates to activity (spiking) of that neuron. The model in question, albeit very simple, possesses the main properties of the strengthening and weakening of a synapse: (i) LTP and LTD are reconstructed by successful and unsuccessful movement attempts: Each time a walker succeeds (post-synaptic neuron is stimulated by the pre-synaptic neuron and spikes), the synapse between two neurons (the link between the two nodes) is strengthened (LTP). And each time a walker fails (pre-synaptic neuron tries to stimulate the post-synaptic neuron just when the latter is active), the synapse between two neurons is weakened (LTD).
(ii) Most of the time Hebbian learning requires synapses competition [32,11]. In our model, as we saw, not only a successful move increases the transition probability of that link, but it also decreases the transition probability of other links associated with the departure node.

Results
We simulated different networks with different number of nodes n and walkers m, and judging by the results, the final sate of the network was merely a function of walker density x = m−1 n−1 (to exclude the current selected walker and node we subtracted m and n by one). We used entropy per node (Eq.3) as an order parameter. Sweeping x from zero to one and monitoring the final entropy of the system, we observe a transition phase which takes place at exactly x = 1 2 (Fig.1). Depending on the value of x, the system falls to one of the three distinct phases: ordered, disordered and transition phase.
Ordered phase: For x < 1 2 , the system reaches the ordered phase. In this phase, a few loops form in the network and all of the walkers move throughout them. The emergence of loops indicates that the transition probability of the links for each node i is as follows: There is a node k which p ik → 1, and for all others (i = k), p ij → 0. A sample of this phase can be seen in Fig. 3. If every node of the network is incorporated in a loop, the entropy of the system will be equal to zero (Eq.3). But as it can be seen in Fig.3, for x < 0.4 we have an entropy greater than zero, which increases upon decreasing x. The reason for the emergence of such behavior is that some nodes would not evolve as a result of decreasing x, and in practice they do not take part in any loops. Indeed, after a while, the system creates a few loops which attracts all of the walkers and leave no walker for other nodes to evolve. This behavior is limited upon the increase of x, for as the number of walkers increase, they are going to need loops which incorporate a greater number of nodes for them to move through, because, if there are not many nodes participating, the number of the failed movement begins to increase, which effectively ruins the loop. The aforementioned behavior can be observed according to the histogram of transition probability (Fig.2). As is evident, with the walkers increasing, we have more links with the probability 1 and less links with probabilities between 0 and 1.
Disordered phase: For x > 1 2 , the system will reach the disordered phase, such that, as a result of more than half of the nodes being occupied, at each time step the probability of weakening is greater than the probability of strengthening of the selected link, and the decrease in the transition probability of a specific link results in the same state, similar to that of other links. Eventually, the elements of the probability ratio matrix reduce to a symmetrical form, which translates to equal transition probability for all links; Therefore, we expect that in this phase, all the links emerging from a node have equal probability, p ij = 1 n−1 , for i = j. In this case the entropy reaches its maximum (Eq.3), and as we can see in Fig.1 and Fig.4, the results agree with what we have expected from the model. transition phase: For x = 1 2 , the system would not gravitate towards either of the aforementioned attracting phases. In other words, neither the strengthening nor the weakening role prevails, and as in fact half the containers are empty and the other half occupied, the occupation and availability probabilities of a selected node are equal. In fact, the phase transition takes place exactly when there is an odd number of nodes and the number of the walkers is as follows: In Fig.5, the transition probability histogram is shown for a network with n = 501 and m = 251. As it can be seen, the transition probability histogram is scale-free, which indicates that for x = 1 2 , the system will face a critical point. due to the fact that for  x = 1 2 an equilibrium is established between LTD and LTP, which leads the system to its critical point.
As previously mentioned, the final state of the system is entirely independent of the initial conditions. In order to put that to the test, we changed the number of the walkers after the system reached its steady state, the system immediately changes its phase to one that is related to its new number of walkers ( Fig.6).
It is also possible to diagnose the phase of the system from its dynamics, as is shown in Fig.7, for the transition phase, the oscillations of the system past reaching equilibrium are much more frequent compared to the other two phases. Also, there are oscillations in the disordered phase subsequent to the equilibrium that does not emerge in the ordered phase, and this freezing of entropy is a result of the emergence of loops and the walkers trapped inside them.
To further investigate the steady state of the system as a function of walker's density, we measured mean clustering coefficient of the network, standard deviation of input strength of nodes, and the mean standard deviation of input and output weights.
Various definitions are available for the clustering coefficient [33,34,35]; we used the Zhang-Horvath's definition [36] that is given with respect to a weighted directed which all weights are normalized by maximum element of W . As we expected, when x < 1 2 clustering coefficient is small, due to loop formation, and it increases as x goes above 1 2 , which means all weights tend to be equal ( Fig. 8-a). In-strength and Out-strength of a node are defined by: As a result of normalization, the Out-strength for all nodes is equal to S out i = 1, whereas the In-strength is free to take various values. As walkers are neither to be destroyed nor to be created, the average of In-strength over the whole network should be 1. But  , upon decreasing x the standard deviation increases, because a few loops form, all walkers will be trapped in them, and that in turn prevents evolving of other links. c) The mean value of the standard deviation of the output weights. d) The mean value of the standard deviation of the input weights, prior to x < 1 2 , the standard deviation of both output and input weights is rather large due to loops being formed. For a system with 101 nodes whose every node participates in a loop, this value is equal 0.1, for x < 1 2 , an increase in x leads to this value approaching 0.1. Also, for x > 1 2 , the standard deviation approaches zero, which indicates that all the incoming and also outgoing links are equal. The network size was fixed to 101 nodes, The horizontal axis shows the density of walkers for different x, the standard deviation of the In-strength indicates the similarity among nodes ( Fig. 8-b).
Furthermore, the standard deviation of the input (output) weights of a specific node is an indicator of how similar the outgoing (incoming) links to that node are ( Fig.  8-c,d).

Conclusion
In this paper, STDP is modeled using a network evolving with respect to the movement of its reinforced random walkers. We have shown that in the presented model, the steady state of the system is self-organized and it does not depend on the initial conditions of the system (Fig.6), and upon changing the density of walkers, it can fall into three distinct phases: ordered, disordered or transition (Fig.1). We have also shown that the system has a critical point (transition phase), in which the system behaves in a scale free manner, the histogram of transition probabilities (weight of links) is power law (Fig.5) and the relaxation time was too long (Fig.7). As mentioned, the presented model is a simplification of STDP and a lot can be done to improve it. For instance, in a neural network, the activity of the network rises and falls, depending on adaptation, external stimuli or the internal state of the system, that means the number of walkers are not conserved.
Using Eq.7, 8 transition probability evolves according to: In the stationary state p ij (t + dt) = p ij (t), hence the second part of the right hand side of Eq.9 becomes zero: Eq.10 has two equilibrium points : Also we know that : ∀i : n j=1 p ij = 1 (12) ∀i, j : p ij ≤ 1.
Due to the Eq.11, 12, and 13 our system has two stationary states [21]: (i) Disordered sate: non-zero answer of Eq. 11 is satisfied for all destination nodes which are connected to node i (∀j = i : p ij = n k=1 p 2 ik ). According to normalization (Eq.12) we have: ∀j = i : p ij = 1 n − 1 .
(ii) Ordered state: there exists a node k which satisfies non-zero answer of Eq. 11 and for all other destination node, zero-answer of Eq.11 holds. Considering Eq.12 we have: p ik = 1 ∀j = i, j = k : p ij = 0.
Second order derivative of p ij determines the stability of the stationary states, due to the Eq.9: 2p ik δp ik )).
If m−1 n−1 > 1 2 , Eq.14 will be greater than zero in the ordered state (so this fixed point is unstable in this regime), and less than zero in the disordered state (so this fixed point is stable in this regime). Hence, for m−1 n−1 > 1 2 , our network reaches the disordered state.
If m−1 n−1 < 1 2 Eq.14 will be less than zero in ordered state and greater than zero in disordered state. Hence, for m−1 n−1 < 1 2 , our network reaches the ordered state. Finally, for m−1 n−1 = 1 2 , Eq.10 is zero for all conditions and with respect to the simulation results, at this point, the network operates on a critical point.