Interacting roles of lateral and medial Orbitofrontal cortex in decision-making and learning : A system-level computational model

In the context of flexible and adaptive animal behavior, the orbitofrontal cortex (OFC) is found to be one of the crucial regions in the prefrontal cortex (PFC) influencing the downstream processes of decision-making and learning in the sub-cortical regions. Although OFC has been implicated to be important in a variety of related behavioral processes, the exact mechanisms are unclear, through which the OFC encodes or processes information related to decision-making and learning. Here, we propose a systems-level view of the OFC, positioning it at the nexus of sub-cortical systems and other prefrontal regions. Particularly we focus on one of the most recent implications of neuroscientific evidences regarding the OFC - possible functional dissociation between two of its sub-regions : lateral and medial. We present a system-level computational model of decision-making and learning involving the two sub-regions taking into account their individual roles as commonly implicated in neuroscientific studies. We emphasize on the role of the interactions between the sub-regions within the OFC as well as the role of other sub-cortical structures which form a network with them. We leverage well-known computational architecture of thalamo-cortical basal ganglia loops, accounting for recent experimental findings on monkeys with lateral and medial OFC lesions, performing a 3-arm bandit task. First we replicate the seemingly dissociate effects of lesions to lateral and medial OFC during decision-making as a function of value-difference of the presented options. Further we demonstrate and argue that such an effect is not necessarily due to the dissociate roles of both the subregions, but rather a result of complex temporal dynamics between the interacting networks in which they are involved. Author summary We first highlight the role of the Orbitofrontal Cortex (OFC) in value-based decision making and goal-directed behavior in primates. We establish the position of OFC at the intersection of cortical mechanisms and thalamo-basal ganglial circuits. In order to understand possible mechanisms through which the OFC exerts emotional control over behavior, among several other possibilities, we consider the case of dissociate roles of two of its topographical subregions - lateral and medial parts of OFC. We gather predominant roles of each of these sub-regions as suggested by numerous experimental evidences in the form of a system-level computational model that is based on existing neuronal architectures. We argue that besides possible dissociation, there could be possible interaction of these sub-regions within themselves and through other sub-cortical structures, in distinct mechanisms of choice and learning. The computational framework described accounts for experimental data and can be extended to more comprehensive detail of representations required to understand the processes of decision-making, learning and the role of OFC and subsequently the regions of prefrontal cortex in general.

Transitivity [15], but also to represent largely varying subjective values in an adaptive 23 manner [16,17]. These representations were proposed to be based on a common 24 currency [1,6,18] that guide the comparison for the decision between different objects 25 that are otherwise incomparable. Alternative to the theory of common currency, it 26 was proposed that what the OFC facilitates is the process of common scaling [19][20][21] 27 which is qualitatively distinct from that of converting different rewards into a common 28 currency. Instead, common scaling corresponds to retaining the individual value of 29 each reward, and converting them to a different scale that makes them comparable. 30 Evidently, with complex possibilities in the process of valuation, arise different 31 possibilities of action selection processes. 32 Furthermore, exact representations and mechanisms through which the OFC 33 contributes to behavior are still up to active debate [22]. Moreover, several early 34 implications of the OFC in the paradigms related to reversal learning [23], response 35 inhibition [24,25], flexible stimulus-outcome associations [26,27] have been overturned 36 using the same experimental techniques [7] or even more accurate ones [28], modified 37 task structures [29,30] or pointing out the fact that the findings from other related 38 brain regions explain certain implications better [31][32][33][34][35]. 39 Dissociate roles of lateral and medial OFC 40 The evident underlying complexity of studying the role of the OFC in value-based 41 decision making and learning, and goal-directed behavior is underlined by the large 42 heterogeneity of the region, unlike the rest of PFC which is homogeneously granular. 43 The heterogeneity is multi-fold : different groups of neurons that encode different 44 aspects of choice process in a single task context [11], cyto-architecturally different 45 areas (granular and agranular) and their remarkably distinct connectivity pathways 46 through different brain structures [36][37][38]. The possibility of functionally dissociate 47 roles of topologically different sub-regions of the OFC has been of wide interest 48 recently. While there are other sub-divisions of the OFC reported to be playing 49 functionally distinct roles in behavior [39][40][41], the distinction that is most extensively 50 reported to imply strikingly different functional roles is the one between lateral and 51 medial parts of OFC [28,42,43]. In the scope of this article, as referred in most of the 52 related experimental works, ventromedial prefrontal cortex (vmPFC) is also 53 considered under the purview of medial OFC [44,45]. It has also been observed that 54 lateral and medial OFC have clear divergent connections to different networks [46]. 55 Both in monkeys and humans lateral OFC is reported to receive extensive projections 56 from diverse sensory modalities through the somatosensory and insular cortices, and 57 also heavy projections from amygdala [47][48][49]. Whereas, the medial OFC has strong 58 December 3, 2019 3/24 projections from hippocampus, hypothalamus, ventral striatum (VS), relatively less 59 projections from amygdala, and is strongly connected with the cingulate cortical 60 areas [50,51]. 61 In this current work, we present a recurrent neural network model of decision 62 making and learning involving the OFC. The OFC, together with some nuclei of the 63 basal ganglia (BG)(especially VS) and the thalamus (Th), forms a closed loop whose 64 dynamics leads to action selection by competition resolution. This loop is a part of 65 several similar generic loops that are formed between different cortical regions and 66 different nuclei of the BG. A generic loop will be referred hereafter as a CBG loop. 67 Notably, we separate the part involving the OFC into two CBG loops involving lateral 68 and medial OFC, accounting for the individual experimental implications of the lateral 69 and medial sub-regions. The input to the lateral OFC loop is provided with the 70 information that represents exteroception -the value information arising from external 71 factors like visual cues; the input to the medial OFC loop is predominantly the 72 interoception -value information more with respect to internal motivational processes 73 like satiety levels and internal needs [42] 1 . Across both these loops, we use the idea of 74 the Current Subjective Value (CSV) in the model as an input to lOFC and mOFC. 75 Besides the activation in the loops that represents the visual salience of the cues, CSV 76 represents the value based on which the sub-regions of OFC contribute to the 77 decision-making process. Such a subjective value is known to arise from a 78 comprehensive relation of lateral OFC with basolateral amygdala and the ventral 79 striatum in an ongoing task context [52][53][54][55] (see Materials, CSV). 80 We provide a plausible explanation for one of the prominent experimental 81 observations regarding the dissociate roles of lateral and medial OFC, studied in 82 individual lesions in monkeys [43]. We represent this proposed dissociation in terms of 83 the representation and processing of the task information. Furthermore we argue that, 84 in the context of learning vs choice notion of lateral and medial OFC [56], more than a 85 clear dissociation, it is the temporal interaction of both the sub-regions that highlights 86 their roles at different stages of a decision task.

88
We first describe the performance of an existing model of decision-making and learning 89 on a 2-arm bandit task with probabilistic reward. We take the advantage of generic 90 nature of the task to highlight the fundamental dynamics of the model. We then show 91 that the model presented here with the distinct description of lateral and medial OFC 92 replicates the results of basic model, robustly and in more realistic timescales. We 93 further present complementary findings of separate lesions (simulated) of the lateral 94 and medial OFC components in the model. We discuss the effect of these findings on 95 the performance in different task contingencies, replicating a neuroscientific evidence 96 found in monkeys with lesions to different subregions of OFC. 97

98
Multi-arm bandit task is a classic reinforcement learning problem that has been used 99 in the study of decision-making in experimental [7,43,57] and computational 100 neuroscience [58][59][60]. Typically, in an N-arm bandit task, there are N possible cues 101 (bandits) each carrying a different probability of reward and requiring a particular 102 action to do, in order to select the cue. Fig. 1A shows an example trial of a 2-arm 103 bandit task that has been used to study the computational models of probabilistic 104 reward-based learning involving the basal ganglia (BG) [60,61]. In this case, cue is one 105 1 What was referred to as OFC in this work actually recorded from the lateral areas of the OFC of the four possible shapes. The reinforcement in the model during the task is driven 106 by the probabilistic reward offered at the end of each trial, with a different probability 107 for each cue. It has been shown that monkeys learn to perform the task [57], learning 108 the reward contingencies over time and choosing always the best rewarding option 109 after learning. 110 The basic model (referred hereafter as OFC model) is a set of inter-connected CBG 111 loops and an associative network (ASC), each network processing different information 112 and contributing for a decision within the network (Fig 1C). In each trial, the CBG cue 113 labeled 'limbic' takes as the input, the activation for the shapes that are presented in 114 the trial. This activation represents a constant visual salience component, that in the 115 simplest case, is same for every stimulus (shape). Similarly the other CBG position 116 loops (CBG pos ) takes as the input, the activation of the positions where the shapes are 117 presented. Since the positions are chosen randomly and carry no significance in 118 obtaining reward, there is no value-learning in this CBG pos loop. Hence the activation 119 of a position represents just the presence of a cue at that position. Finally, the ASC 120 network takes as the input, the combined information of binding specific shape to a 121 specific position. The ASC network represent the associative loop through lateral PFC 122 and the dorsomedial striatum (DMS) which is believed to represent a multi-modal 123 information of stimulus-vs-position mapping [62]. This is implemented in the form a 2 124 dimensional mapping for each shape against all possible position and each position 125 against all possible shapes ( Fig 1B, blue squares). The networks are inter-connected in 126 such a way that while each of the CBG loops independently processes the information 127 that it is activated with, it also affects the activities in the other through the ASC 128 network. The network architecture within each CBG loop that guarantees the 129 resolution of competition between the options is based on classical BG pathways that 130 have been previously explained with computational accounts [59,60,63].

131
In each trial of the task, the model is presented with pseudo-randomized pairwise  Fig 1D). Thus, after the 'Decision' phase of the trial, the shape 138 at the chosen position is considered as the choice of 'cue' and the reward is delivered 139 according to the predetermined probability associated to that cue. . The position that is chosen implies the choice of the shape made. B. Basic model involving two CBG loops and an associative loop (ASC), one CBG loop leading to a choice between the two cues and the other between the two positions. The final output that is considered from the model within a trial is that of the decision of CBG position, the cue shown at the chosen position is considered as the chosen cue. Note the CBG Cue is labelled limbic, as it will be developed more into components representing sub-regions of the OFC. Blue arrow represents the connection that can be modified by learning. C. The proposed change in the original model which will be described in detail in the following section. D. Activation of each cue that is shown in a choice, its position and the combined information. Also, the evolution of activity in a CBG loop -solid lines for cue, dashed lines for positions. cue and position is observed (Fig 2B, left). A running average over the choice of 10 147 trials is considered for the performance over 120 trials. The performance of the model 148 under the EASY condition replicates animals' behavior [57] (Fig 2D, blue). In the 149 DIFFICULT condition ( Fig. 2A, right), the reward probabilities of both the shapes 150 are lower or closer. This should result in lower rate of reinforcement and thereby make 151 it difficult to make a correct choice. Animals however, with considerable amount of 152 training, were shown to identify the option with more chance of reward and thus make 153 correct choices [7,43]. We tested the same model as in the previous EASY case ( Fig   154   2A, left), but the model couldn't learn the appropriate contingencies well. The

155
Decision Times (DTs) were longer compared to the previous case ( Fig 2C) and the 156 overall performance was sub-optimal ( Fig 2D, red). Correct choice means the shape that rewards the most according to the predetermined probabilities. Lighter color filling represents the standard deviation.

158
We then extend the 'limbic' CBG loop to individually describe two separate CBG 159 loops -one representing the lateral OFC and the other representing the medial OFC.

160
Here after this version of the model will be referred as lmOFC model. The CBG loop 161 involving lateral OFC builds on the top of the single limbic loop from the basic model 162 (described in Fig 1B). In addition to the activation (I ext ) to the network, a Current 163 Subjective Value (CSV) for each shape is also added to the input. CSV represents the 164 subjective value of a shape at any moment taking the externally learned reward 165 contingencies and internal bodily desire for the reward that the shape leads to (see 166 Materials, CSV). Another key aspect of lOFC is that it properly assigns the obtained 167 reward to the appropriate choice made in that trial (referred as credit assignment). 168 There has been evidence that neurons in lateral OFC are particularly active after the 169 reward delivery in a choice [42] and also the fact that medium spiny neurons neurons 170 which are extensively involved in decision-making are consistently active for a while 171 after reward delivery [55]. These evidences support the possibility that cortico-striatal 172 synaptic plasticity is a plausible phenomenon in the context of obtaining reward.

173
Similar arguments were made by other experimental findings [7].

174
The CBG loop with medial OFC receives input from the CSV layer. Medial OFC 175 has a separate value comparison mechanism implemented as a simple 'recurrent 176 excitation lateral inhibition' model, activated by the CSVs received. It was shown that 177 the activity in medial OFC correlated to the value difference between the options [64]. 178 Supporting the view that the relative difference of the presented options is represented 179 in vmPFC, multiple value comparison mechanisms have been proposed. This value difference signal further allows vmPFC to perform a value comparison to facilitate the 181 choice through principles of recurrent excitation and lateral inhibition [21,[65][66][67]. The 182 output activities of mOFC are fed into its CBG loop. It has been shown that one of 183 the general function of populations in the PFC is to maintain history of decision 184 events such as previous action, previous reward etc [68]. Accordingly, we implemented 185 a simple history of rewards in mOFC, without cue-specific information. As the lOFC 186 maintains the current choice until the reward delivery and later [42], possibly a history 187 of choices is maintained in lOFC. It was shown that lesions to lOFC affect the 188 appropriate consolidation of the reward history with the choice history [7]. Hence, for action-values. The reason is that the task randomizes the positions where the cues are 201 present and hence the action required to chose a cue. 202 We then tested the lmOFC model on the DIFFICULT condition as in the previous 203 task. The model performed considerably well compared to the previous OFC model, 204 with much faster DTs. Both the models have an estimated value difference for the 205 ongoing task, across all the trials. Interestingly, the precise value comparison in 206 mOFC estimates the value difference across all the trials better than that estimated by 207 the OFC model under DIFFICULT condition (Fig 3D).  209 We tested the lmOFC model on a 3-arm bandit task (Fig 4). Each of the three cues 210 that are shown in every trial has a reward probability upon its choice. As shown in and star respectively in a given experimental session. The task is carried out under 213 three different reward schedules (Fig 5A-C). In all the sessions, V1 and V3 are fixed to 214 be .7 and 0.05. V2 value is changed across three types of sessions : V2_HIGH, V2_MID and V2_LOW where V2 is set to 0.6, 0.3 and 0.1. Similar task schedule was 216 used on animals to test the effects of lesions of lateral and medial OFC separately [43]. 217  In the case of V2_MID and V2_LOW, the performance was observed to be similar to 242 that of controls, except for a slight delay in reaching better performance. Such a 243 normal performance in the case of V2_MID and V2_LOW can be attributed to the One of the major changes in case of the lateral lesion is the credit assignment. In the 251 control condition, when there is a reward delivered, the activation of the chosen cue in 252 lOFC is active (Fig 1D, CBG, after 'Reward' until 2500ms). When there is no lOFC in 253 the network, the association of current reward to only current choice can no longer be 254 done. In this case, we still consider that the CSV for each cue is sent as an input to 255 mOFC, because mOFC/vmPFC has been shown to receive projections from the 256 ventral striatum [50,51,73], which is a crucial component of the CSV layer.  towards the later trials of the task [74]. However, it is important to note that, in a 335 different formal description, it has been highlighted that ventrolateral PFC (vlPFC) 336 encodes the Availability (probability) of rewards whereas the OFC was shown to 337 encode the Desirability (palatability) of rewards [75]. However it was shown activity in 338 medial and lateral orbitofrontal cortex, extending into vmPFC, was correlated with 339 the probability assigned to the action actually chosen on a given trial [70]. space [9] and to encode the value of the offered and chosen goods [10,11] 365 State/Task space representation in the OFC 366 The OFC has been proposed to encode the task states and represent a cognitive map 367 of this task space [9]. It has also been shown that OFC lesions in animals cause 368 deficits in acquiring information about the task [27,76]. In this work, related to the 369 simple 2-arm bandit task done under a DIFFICULT condition (Fig 2D, red), the the reward delivery phase [77]. This model needs to extended further by incorporating 388 the cortico-cortical interactions, within the subregions of OFC as well as other related 389 prefrontal regions like the ACC which are believed to be rather interactive in their 390 roles in behavior [78]. Furthermore, the role of any possible interaction between both 391 the sub-regions of OFC, given their connectivity through the medial orbital sulci [46], 392 has not been explored much. However, even if it is the case that the lateral OFC 393 represents identity specific rewards and vmPFC represents general, scaled reward 394 signals, it is unclear how these two signals could be linked to sub-serve goal-directed 395 behavior. To this extent, there is not much evidence except one study that showed the 396 functional connectivity was predictive of satiety related changes in choice 397 behaviour [79]. Similarly, the role of VS also becomes crucial in serving such value 398 signals that combine both value and internal motivation before the comparison 399 processes in the mOFC/vmPFC. As far as the interest in the dissociate contribution of subregions of OFC is concerned, 419 there are not so many experimental evidences that could establish a double 420 dissociation between different subregions [28,42,43,80]. Importantly, it has been found 421 out that both lateral and medial regions of OFC represent the perceived value of task 422 events, albeit with different levels of participation in different task settings [42]. It is 423 generally single-neuron recording studies or lesion studies in macaques or rats [81] 424 predominantly on the better accessible lateral OFC than the medial OFC, and BOLD 425 signal correlation from fMRI studies in humans. Few behavioral studies on frontal 426 damage patients have also discussed separate roles of OFC and vmPFC [4,26,82,83]. be made, lOFC doesn't represent the value in a choice-free context [80].

433
Notwithstanding some complementary [85,86] as well as contrasting [87,88] 434 findings, separable representations of absolute values and relative values seem to be a 435 key dissociation between the medial [5,19,89,90]   after the lesion, and with that of the controls in the later sessions after the lesion [28]. 455

456
We use a neuro-computational connectionist modeling approach to highlight the and Substantia Nigra pars Reticulata (SNr). 482 We implement computational model of parallel loops of three kinds, which were 483 originally described as : limbic, sensori-motor and associative [94]. shown in Fig 1A, the cues (shapes) that are presented in each trial are represented 495 within these limbic loops (CBG cue in Fig 1A). The information about the position of 496 the cue (thus the required action to select the cue) is represented in the sensori-motor 497 loops (CBG pos in Fig 1A), from the regions in Parietal Cortex to form the feedback  Fig 1A). The combined information of which cue is present in which position, which 502 solves the binding-problem, is represented in the lPFC and DMS [62].

503
The population dynamics and the learning mechanisms described below have been 504 adapted from similar works before on the thalamo-cortical BG loops [59,60].
The activation function f n in Eq. 2 is the same for all the structures within a CBG 518 loop and it is a clamping function, except for the striatal structures. The activation of 519 striatal populations, due to their neuronal properties [99][100][101], can be obtained by Also, there is a fixed gain parameter that characterizes the strength of interaction 528 between the two populations to which i and j belong. For example, for any pair of 529 connections ij between CTX(i) and STR(j), the gain Ĝ CT X_ST R is fixed. A positive 530 or negative Ĝ defines the connection as excitatory or inhibitory respectively. In the 531 "direct" pathway, as a result of two inhibitory and one excitatory connection, it is 532 referred as a positive feedback loop. In the "hyperdirect" pathway, as a result of two 533 excitatory and one inhibitory connection, it is referred as a negative feedback loop [63]. 534

535
The connections between the OFC and the CBG cue loop in the basic model (Fig. 1B) 536 are modifiable. Similarly, after the model is changed to lmOFC model (Fig. 3), the The RPE, δ t is calculated using a simple critic learning algorithm given below.
where R, the reward, is 0 or 1, depending on whether a reward was given or not on 555 that trial. After the ∆W t is calculated, the synaptic weights are updated according to 556 S2 Equation.
And upon weight changes, to make sure the weights stay within the 557 initial bounds, every weight update is followed by a normalization of weights (S3 558 Equation) . v i is the CSV of the cue represented by neuron i in the CBG. The CSV of 559 the chosen cue is then updated by : where α c is the critic learning rate and is set to 0.025 and α LT P and α LT D are set to 561 0.004 and 0.002 respectively.

563
OFC is known to represent a current subjective value (CSV) of a stimulus with respect 564 to the body's internal state (like satiety or desirability of the outcome the stimulus 565 announces). Two primary brain structures that crucially involve with the OFC in this 566 regard are : the amygdala and the ventral striatum (VS). The basolateral amygdala 567 (BLA) has been shown to interact with the OFC and update its stimulus-outcome 568 associations and hence the subjective value of a stimulus [53,54]. On the other hand, 569 the ventral striatum was found to represent a unified quantity as a combination of 570 subjective value and internal motivation using different kind of neurons [55]. Several representations [103][104][105][106][107]. A much detailed representation and role of ventral striatum 573 and its distinct relation to lateral and medial OFC also could be a key factor to 574 study [108][109][110][111]. instance, an indirect pathway, which are not considered in this work. Indirect pathway, 585 involving STN, GPe (Globus Pallidus pars externa) and STR, is also a part of 586 "classical" view of CBG network [112][113][114]. Image re-illustrated, inspired from [93] 587 S1 Appendix.