Neural basis of common-pool resources exploitation

Why do people often exhaust unregulated common (shared) natural resources but manage to preserve similar private resources? To answer this question, in the present work we combine a neurobiological, economic, and cognitive modeling approach. Using functional magnetic resonance imaging on 50 participants, we show that sharp depletion of common and private resources is associated with deactivation of the ventral striatum, a brain region involved in the valuation of outcomes. Across individuals, when facing a common resource, ventral striatal activity is anti-correlated with resource preservation (less harvesting), whereas with private resources the opposite pattern is observed. This indicates that neural value signals distinctly modulate behavior in response to the depletion of common versus private resources. Computational modeling suggest that over-harvesting of common resources is facilitated by the modulatory effect of social comparison on value signals. In sum, the results provide an explanation of people’s tendency to over-exploit unregulated common natural resources.

study was approved by the local ethics committee of the Canton of Basel City,

Switzerland.
Experiment. Participants had to manage a CPR in the form of a fish stock. To avoid any demand effects and suspicion toward the two different (but structurally identical) conditions, we implemented a between-subject design: participants were randomly assigned to either a social or a non-social condition. Overall, they encountered 16 sessions (maximum 8 trials per session). In every trial, participants decided between three possible net sizes for fishing with one, two, or three fish, respectively ( Figure 1). Their task was to collect as many fish as possible, and each collected fish led to a monetary payoff (0.25 Swiss francs per fish). In the social version of the experiment (social condition), two other participants (pre-recorded in a behavioral pre-study) also decided between the three net sizes. In the non-social version of the experiment (non-social condition), the same number of fish "migrated" to two neighboring lakes. Importantly, the change of the resources due to the two other pre-recorded participants or the "migration" to the two neighboring lakes was identical in both conditions. Participants were informed that although the fish stock in the lake is decreased by fishing, it is also replenished naturally. Accordingly, at the end of every trial, the number of fish in the lake was multiplied by 1.5, which yielded the number of fish for the next trial (with 16 fish representing the maximum capacity of the lake). In case no fish remained for the next trial, the session ended automatically. The instructions clearly explained that the number of fish taken out could increase, sustain, or decrease the fish population.
Participants were informed that whenever the total number of fish collected by the three participants was smaller than six units, the fish population would increase over the trials.
In contrast, whenever the total number of fish collected by the three participants was larger than six, the fish population would decrease over the trials. If the total number of fish collected by the three people was equal to six, the fish population would stay constant over the trials. Thus, a net size of two fish corresponded to a cooperative/sustainable level of harvesting, whereas three represented over-harvesting and one led to replenishment. The experiment started with a short training session. On average, participants earned 33. 3  fMRI data analysis. Image analysis was performed with SPM8 (Welcome Department of Imaging Neuroscience, London, UK). The first four EPI volumes were discarded to allow for T1 equilibration, and the remaining images were realigned to the first volume.
Images were then corrected for differences in slice acquisition time, spatially normalized to the Montreal Neurological Institute (MNI) T1 template, resampled to 3x3x3 mm 3 voxels, and spatially smoothed with a Gaussian kernel of 8 mm full-width at halfmaximum. Data were high-pass filtered (cutoff at 1/128 Hz). All five-time windows (frames) of the trial were modeled separately in the context of the general linear model implemented in SPM. The last trials in each session were excluded from the analysis of interest. Motion parameters were included in the GLM as covariates of no interest.
We constructed separate regressors for different scenarios of resource depletion: the feedback on 'sharp' (subtraction of 6 fish as a result of over-exploitation by others or large migration) or 'moderate' (subtraction of 2-4 fish) depletion (due to fishing by others or migration) were modeled as individual hemodynamic responses (2s after trial onset).
Based on the ensuing parameter estimates, contrasts of interest were generated. For additional group analysis, the contrast images were then entered into a second level analysis with the participant as a random grouping factor. To examine regions monitoring perceived fluctuations of the CPR in a separate analysis, one regressor specified for all feedback, regardless of specific scenarios of resource depletion, was parametrically modulated by the total number of fish taken away from the lake in each trial (by all parties). In addition, different cognitive models were used to analyze the data: to examine regions associated to RPE, one regressor was parametrically modulated by the RPE that was calculated for each trial based on a social or non-social version of the reinforcement learning models (see below for details).
We focused on the ventral striatum and the ventromedial prefrontal cortex (vmPFC) because they belong to the brain's valuation system through their essential role in valuation and reward-based learning (Bartra et al., 2013;Levy & Glimcher, 2012). Based on previous studies we created a bilateral region of interest (ROI) in ventral striatum with a 10mm sphere with center in MNI coordinates [x=16, y=8, z=−8], corresponding to prediction error peak activity location in a previous study of sequential decision making (Deserno et al., 2015), along with its contralateral hemisphere homologue. Additionally, we created another 10-mm spherical ROI located in vmPFC [x=2, y=46, z=−8] based on a coordinate-based meta-analysis of subjective value (Bartra et al., 2013), along with its contralateral hemisphere homologue. To control for α-errors whole brain analyses we set the cluster-forming threshold at p < .001 and family-wise-error corrected for multiple comparisons at the cluster level to a threshold of p < .05.
We also calculated a psychophysiological interaction (PPI) analysis (Friston et al., 1997) to assess functional connectivity of the right ventral striatum ( Figure S4). The PPI analysis was performed by extracting signal time series from a 5mm sphere centered at [9, 5, -5] -the overall group maximum of the right ventral striatum deactivation to the sharp depletion of the resource calculated using a second-level random effects analysis that included all participants in both conditions. Social reinforcement learning model. To explain the effect of the social context on fishing behavior in the CPR task, we used a variation of reinforcement learning model (Sutton & Barto, 1998). The model assigns to each choice option a subjective expectation value, which is updated on a trial-by-trial basis. The probability p t (i) of choosing an option (net size) i at time t depends on the option's subjective expectations, as specified by a softmax choice rule: where Q t-1 (i) is the current subjective (expected) value for choice i, and β > 0 is the inverse temperature parameter that determines the choice sensitivity of the chosen the option with the highest subjective value. Large values of β signify that the option with the highest subjective value is chosen with a high probability, whereas low values of β signify high choice randomness. The subjective values Q t (i) were updated each trial after the participant made a decision and obtained feedback about the two competitors' decisions (social condition) or migration (non-social condition). Thus, in all trials t, such that t ∈ [1,8 ]∩ Ζ we calculated the subjective value for each choice i: where R i,t is the participant's reinforcement from the current choice and where (R i,t -Q t-1 (i)) represents the RPE between the participant's expectation and the actual reward. The parameter  denotes a learning rate [0,1]). Unlike standard reinforcement models (Sutton & Barto, 1998), we assumed not only that the expectation of the chosen option was updated, but also that the expectations of the two unchosen options were updated The model assumes that when a participant starts fishing in the first trial (t = 1), she has an a priori expectation about the outcome of her choice, that is Q t=0 (i). To estimate this expectation, we calculated the actual frequencies of choosing net sizes one, two, and three in the first trials of all sessions multiplied by four: where (n i, / N) t=1 is the number of particular choices (e.g. net size in the first trial in each session, divided by the total number of sessions). The expected frequencies were multiplied by four to scale the initial expectancies to the real range of rewards that could be obtained in the task (number of fish: 1, 2, and 3). While Q t =0 (i) is different for every participant, it is a constant for each participant and is not estimated as a free parameter when fitting the model. We further suggested that in the social condition people not only take their personal payoff into account but also compare their payoff with the other players' payoffs to determine an overall reinforcement (following social preference models, e.g. Fehr & Schmidt, 1999). Therefore, the reinforcement of an outcome results from the personal payoff and a social comparison component. According to the social comparison component of our model, the participant received a negative reinforcement if the participant's payoff was lower than the other players' average payoff. When the participant took more than the other players took on average this led to a reward.
Therefore, in the social learning model, the reinforcement R i,t in a given trial t is a weighted sum between the direct reward from the resource-derived reward and a social comparison component SocComp i,t : where 0 ≤ θ s ≤ 1 indicates the relative weight given to the personal payoff and the social comparison value. The social comparison component SocComp i,t was calculated at every trial t for the three net sizes i as the difference between the participant's own payoff OwnPayoff i , t and the average payoff of the other players ⟨ OthersPayoff t ⟩: Thus, the social learning model has three free parameters: the learning rate α, the inverse temperature β, and the social comparison weight θ s . We designate the RPE derived from the social learning model as the social RPE (sRPE).
Non-social learning model. We suggest that in the non-social condition, people take their personal payoff into account but are also motivated to sustain the resource in the long term. Therefore, the reinforcement of an outcome would result from the weighted personal payoff and a sustainability component SustComp i,t : SustComp i,t is the negative absolute value of the difference between the optimal (sustainable) total number of fish removed from the stock (i.e. SustainableCatch = 6 fish) and the sum of the actual number of fish taken out (i.e. OwnPayoff i,t ) and migrated to another lake (i.e. Outflow i,t ): This implies that the value of the sustainability component was either zero (when the sum of fish taken from the resource was equal to the sustainable number) or negative (when "too many" or "too few" were extracted). The rationale behind this "punishment" was that taking "too few" misses a chance to profit and taking "too many" harms the sustainability of the resource and thereby jeopardizes future payoffs. Thus, according to the sustainability component, a participant was penalized for taking too many from the resource if the migration was large, and similarly, participants were also penalized for taking too few if the migration was small. Importantly, in the social and the non-social conditions, participants were clearly informed in the instructions of the experiment that when the resource decreased by 6 fish the number of fish in the lake would stay constant over time. The non-social learning model also had three free parameters: the learning rate α, the inverse temperature β, and the sustainability weight θ n . We designate the RPE derived from this model as non-social RPE (nRPE).
Evaluation of the models. Initially we evaluated the models by comparing them to the null (baseline) model, which assumed a uniformly random choice of the three net sizes (i.e. predicting a uniform choice probability of 1/3) using the Bayesian Information Criterion (BIC; Schwarz, 1978 We also tested the two learning models against two competing models (Table 1).
Accordingly, we implemented a simple reinforcement learning model (RW model, Rescorla & Wagner, 1972) and a modified inequity aversion model (Fehr & Schmidt, 1999). The reinforcement model only considered the personal payoffs in the task as reinforcement and had no sustainability component. Thus, it was nested within the (social or non-social) learning model when setting the weight θ s (or θ n ) of the corresponding models equal to zero.
Finally, the inequity aversion model was identical to the social learning model, with the exception that the comparison component was defined as: Wwhere the δ's were the advantageous (δ + ) and disadvantageous inequality (δ -) coefficients (Fehr & Schmidt, 1999). According to BIC scores, the Rescola-Wagner Learning models fitting procedure. We estimated four types of models (Table 1) Pilot Behavioral Study. The aim of this experiment was to test the CPR paradigm and to collect behavioral data for the follow-up behavioral and imaging studies.
Participants. In each experimental session, three participants played with each other while dealing with a CPR. The participants (N=24, aged 18-28 years, mean 21.8 years, 9 females) performed the task simultaneously ( Figure 1, main text). Participants performed 20 sessions (8 trials per session). The tasks were performed in groups of six participants and in separate cubicles to ensure participants' anonymity.
Experiment. Participants were informed that they were joining in a "fishing study" investigating decision making. Participants had to imagine that they were fishing at a lake together with two other fishermen. Their task was to collect as much fish as possible and each collected fish led to a monetary payoff (0.25 Swiss Francs per fish). In every trial, participants decided between three possible net sizes for fishing: one, two, or three.
Overall, depletion of the resource was caused by their own behavior and the behavior of two other anonymous players present in the room. Participants were informed that although the number of fish in the lake decreases by fishing, it also grows naturally due to proliferation of fish. Indeed, at the end of every trial, the remaining number of fish in the lake was multiplied by 1.5, which gave the total number of fish for the next trial (with a maximum number of 16 fish representing the utmost capacity of the lake). In case no fish remained for the next trial, the whole session ended automatically. The instructions clearly explained that the amount of fish removed by the players could increase, sustain, or decrease the fish stock. For example, the participants were informed that whenever the total number of fish collected by the three participants was smaller than six, the fish population would increase over the trials. In contrast, whenever the total number of fish collected by the three participants was larger than six, the fish population would decrease over the trials. If the total number of fish collected by the three persons was equal to six, the fish population would stay constant over the trials. Indeed, the net size of 2 fish corresponded to a cooperative/sustainable level of harvesting. The experiment started with a short training session. The task was programmed with the software z-Tree (Fischbacher U., 2007). In the follow-up fMRI study, the study design was similar to the social version of the CPR task, with the difference that all participants in the lab made decisions at the same time.
Results. Overall, participants did not follow the game-theoretical prediction of completely self-interested people who would always select the largest net size for all trials in the game. Nevertheless the participants over-harvested and depleted the CPR: on average 58.7% (SD=32.5) of sessions were completed before the 8 th trial, which indicated overharvesting behavior (mean number of trials in a session = 7.4). The average selected net size (net size = 2.3) was significantly higher than the "sustainable" size of the net (net size=2), t(1,23)=5.73, p=8e-6. Two highly competitive participants (the average net size=2.6 and 2.7) were selected for the fMRI version of the study and their behavioral results were used in the social and private conditions.
Behavioral Study. The goal of this study was to examine how people deal differently with social and private resources. We used a modified version of the CPR task from the Pilot Behavioral Study. The experiment was identical to the fMRI version design, but it was conducted in a behavioral laboratory.
Participants. We invited thirty-seven healthy students to test the CPR task for the followup fMRI study.To avoid any demand effects and suspicion toward the two different (but structurally identical) conditions, we implemented a between-subjects design: Participants were randomly assigned to the social or private condition of the CPR task (with N=19 for the social and N=18 for the private condition). Overall, they played 16 sessions (8 trials per session).
Experiment. In every trial, participants decided between three possible net sizes for fishing one, two, or three fish. In the social version of the experiment (social condition), two other participants (pre-recorded from Study 1) also decided between the three net sizes. In the non-social version of the experiment (private condition), the same number of fish "migrated" to two neighboring lakes. Importantly, the change of the resources due to the two other pre-recorded participants or the "migration" to the two neighboring lakes was identical in both conditions.
Results. Similar to the fMRI experiment, participants depleted the resource of fish significantly faster in the social condition than in the private condition (mean number of trials in the social condition = 6.24 vs. 7.00 in the private condition, t(1,35)=3. 30 illustrates that participants more often used the smallest net size in the private condition than in the social condition (Fig.S1b) and the largest net size was selected more often in the social condition than in the private one. Similar to the results in the fMRI study, in the social condition, after the over-exploitation of the fish resource by others (6 fish were collected by other players), participants then also over-exploited the resource in the next trial. However, in the private condition, a similar reduction of the fish stock (6 fish migrated) led to resource preservation. This observation was supported by a significant  S1c). Overall, the results were later replicated in the behavioral results of the fMRI study reported in the main text, providing independent additional evidence for the observed results.
Game-theoretical analysis of the CPR task. What is the game-theoretical solution for the fishing game when assuming only self-interested and rational (i.e. payoffmaximizing) players? In the CPR task, the solution can be easily determined by backward induction. The task has a finite number of trials which are common knowledge to all players. Therefore, it is clear that in the very last trial, it is best for everyone to choose the largest net size to maximize payoffs. Given this behavior, it is also rational to choose the largest net size in the second-last trial, and so on. Therefore, the game-theoretical solution is to choose the largest net size in all trials of the task.
How should a self-interested player behave in the non-social situation (private condition), in which no other players are involved? Here the solution depends on a person's belief about the amount of fish that migrates to the two other lakes. If a person believes that the migration rate is low, then the person should choose the largest net size all the time. In contrast, if the player believes that the migration rate is high, it can be payoff-maximizing to choose a small net size to sustain the resource to allow for future consumption.
However, the optimal behavior will depend on the specific beliefs about the migration rate. When assuming uniform priors of players' beliefs about the migration rate, it can be predicted that the consumption rate should be lower in the private condition than in the social condition, which is in line with the behavioral findings.
More specifically, we determined the optimal behavioral strategy for the game in the private condition given different beliefs about the migration rate. The migration to the first lake is represented by L t 1 , the migration to the second lake by L t 2 , and its sum represents the total migration L t for trial t. The beliefs about migration can be represented by the probability with which a player believes that the particular migration rate occurs, that is Pr (L t 1 ) and Pr (L t 2 ) (note the migration to each lake is discrete and ranges between 1 and 3 fish). We examined three different assumptions about the players' beliefs. First, we assumed that all three possible migration rates for each lake were constant and equally likely (Belief 1). Second, we assumed that the players' beliefs about the different migration rates would reflect the average migration observed in the whole task (Belief 2; i.e. if a migration of 2 fish to one lake occurred in half of all trials and sessions, the probability would be .50). Third, we assumed that a player would start with an initial belief that every migration rate would be equally likely (Belief 3).
After completing the first session, this belief is updated according to the observed migration rates in each trial. To update the belief after the completion of session S we determine: where L represents the three possible migration rates of 1, 2, or 3 to one of the two lakes Mathematically, the expected payoff given a player's strategy for the whole task is calculated as the total payoff that can be obtained in the task multiplied by the probability of obtaining this payoff given a particular strategy: The probability of obtaining a specific total payoff depends on the strategy. On the one hand, the strategy defines the net size and affects the payoffs, but it also affects the development of the resource and thereby the size of the resource in subsequent trials.
The results of this analysis are illustrated in Figure S1. When assuming that all migration rates are equally likely, then according to the best strategy, one should choose a net size of 1 in the very first trial, increase the net size to 2 in trial two to four and starting from trial five, one should always choose net size 3 (Belief 1 optimal solution). When assuming that the players would know the actual migration rates in all trials (which is unrealistic but interesting to set up as a benchmark), they should also choose net size 1 in trial 2, and net size 2 for trials two to four, and always a net size of 3 from trial five onwards (Belief 2 optimal solution). Finally, when assuming equal priors for the first trial that are updated on the observed migration rates, then it is optimal to choose net size 1 for trial 1 and to increase the net size for the following trials with a net size of 3 starting from trial six onwards (Belief 3 optimal solution). Overall, the analysis shows that given a variety of beliefs, the payoff maximizing strategy is not to choose the largest net size at the beginning of the task in the private condition, but to choose the largest net size at the end, starting at the sixth trial at the latest. Thus, according to this analysis, one would expect smaller net sizes in the non-social as compared to the social condition at the beginning of the task, which is consistent with the experimental findings. To sum up, independently of one's belief, the game-theoretical optimal strategy in the non-social condition is to start from smaller net sizes and increase the net size towards the end of the game. The behavioral data was congruent with this strategy. Importantly, the social learning model fit behavioral data during the social condition better than the non-social learning model, whereas the non-social learning model fit behavioral data in the non-social condition better than the social learning model ( Figure   2). The choice sensitivity parameter values were fairly homogeneous across both subjects and model type fit (β~1.5, Figure S3), whereas learning rates varied greatly across both participants and model types ( Figure S3). We also used a linear mixed-effects model (LME) to test the effect of ModelType (four model types) and Condition (non-social, social) on BIC score with participant as a random effect grouping factor (Table 2), with random intercepts to account for the unobserved heterogeneity due to sampling subjects from a population, that is, to allow generalizing statistical inference to the population level. (Random slopes were not included because the variability in the model type and condition predictors across participants was too low to yield meaningful random effects estimates.) The LME was fit with the Matlab function fitlme, which implements restricted maximum likelihood with a trust-region based on a quasi-Newton optimizer.

Results
Based on the LME model, an ANOVA was performed using the Satterthwaite approximation to the effective degrees of freedom afforded by the LME (Table 1)  with participant as grouping random effects to test the interaction between model type and treatment (Table 3, Figure 3). Mauchly's test reported no violation of the sphericity assumption. The interaction term was larger than zero (F(1,48)=8.09, p=6.5e-3), confirming the congruency between social and non-social models and treatments.
To sum up, participants depleted CPR faster in the social than in the non-social condition, and this over-exploitation can be explained by a learning mechanism modulated by social comparison in the social condition.
Neuroimaging results. Sharp decrease of the CPR (extraction of 6 fish due to overexploitation by others or to extensive migration) was associated to ventral striatum deactivation more strongly than a moderate CPR decrease (extraction of 4 or fewer fish) in both conditions ( Figure 4A, Table S2 To further test the hypothesis that the ventral striatum differently monitors the resource changes in social and non-social contexts we conducted a more detailed parametric analysis. Using the total number of fish removed from the lake in each trial (by all parties) as the modulation parameter, we found an effect of the total resource change on the activity of the ventral striatum: activity of the ventral striatum negatively correlated with CPR depletion (total decrease of CPR, Figure 5A middle). The resource-monitoring modulation of the right ventral striatum activity was stronger in the social than in the non-social condition (Table S3).
As shown in the lower part of Figure 5, the over-exploitation of the CPR was predicted by our social learning model. Using parametric fMRI analyses, we investigated modulation of the ventral striatum and vmPFC activity by different versions of RPE ( Fig   4A right, and  These results indicated that the dopaminergic regions differentially monitor resources in the social and non-social conditions. Moreover, the activity of the right ventral striatum was sensitive to the social comparison of the outcomes during CPR depletion. In the social condition, we observed positive task-related functional connectivity (sharp depletion < moderate depletion) between the ventral striatum and the anterior dorsolateral prefrontal cortex (anterior DLPFC): both decreased activity in response to the overexploitation of the resource by others (Figure 4, Table S4). In the non-social condition, anterior DLPFC-ventral striatum connectivity was reduced as a result of a trend toward negative connectivity ( Figure S4). Interestingly, in the non-social condition, connectivity strength anti-correlated with the tendency to preserve the resource. Thus, the anterior DLPFC could be involved in regulating ventral striatum activity in non-social contexts, but its control would be suppressed during social competition.

Discussion
The current study explores the differences of how people deal with a private good as compared to a common/public good. The results of our study indicate that during the ) and policy (action selection) equation are common to all models (except the null).

Non-social learning 3N
Payoff and Sustainability   1 Figure S1 | The optimal behavioral strategy to maximize payoff in the non-social condition. Belief 1: assuming equal prior beliefs for the three possible migration rates.
Belief 2: assuming beliefs corresponding to the actual migration rates. Belief 3: assuming equal prior beliefs for the three possible migration rates for the first round of all games and updating of these beliefs for the following rounds.     Local maxima within these clusters are reported together with the number of voxels (No. of Voxels); BA, Brodmann area; x, y, z are MNI coordinates of the local maximum.
Table S3 | Brain regions parametrically modulated by the social and non-social prediction errors in the social and private conditions, correspondingly (whole-brain analysis).