Introduction

Humans are unique and sophisticated social beings (Herrmann, Call, Hernandez-Lloreda, Hare, & Tomasello, 2007) whose daily interactions require the ability to decipher and learn from a range of social signals. The impact of these signals is magnified during adolescence, a developmental period in which the social environment is shifting, with more time spent with peers and less time with parents (Larson & Richards, 1991). This change is associated with a tendency to rely on peers rather than parents for guidance and approval. Perhaps it is not surprising that adolescents, as compared with children and adults, show increased attention and neural activation in response to peer acceptance (Guyer, Choate, Pine, & Nelson, 2012; Silk et al., 2012). Feelings of relatedness with others and perceived acceptance during adolescence are associated with higher self-esteem, better adjustment in school, and greater self worth (Rudolph, Caldwell, & Conley, 2005; Vanhalst, Luyckx, Scholte, Engels, & Goossens, 2013; Wentzel & Caldwell, 1997). In contrast, peer rejection in the adolescent is associated with school withdrawal, aggression, and mental health problems (Dodge et al., 2003; Laird, Jordan, Dodge, Pettit, & Bates, 2001; Prinstein & Aikins, 2004; Veronneau, Vitaro, Brendgen, Dishion, & Tremblay, 2010; White & Kistner, 2011). Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period and how peers can impact quality of life and outcomes in the adolescent.

Social contexts are acutely salient to adolescents, which can ultimately can lead to altered decision-making abilities around one’s peers (Blakemore & Mills, 2013; Somerville, 2013; Steinberg, 2008). Having peers in a car increases accident rates in adolescents, but not adults (Chen, Baker, Braver, & Li, 2000), and the presence of peers increases risky decision making in adolescents, relative to children and adults (Chein, Albert, O'Brien, Uckert, & Steinberg, 2011; Gardner & Steinberg, 2005; Weigard, Chein, Albert, Smith, & Steinberg, 2014). Importantly, adolescents who feel rejected by their peers are more likely to engage in risky behaviors in order to fit in with the group (La Greca, Prinstein, & Fetter, 2001). Thus, the mechanisms of how adolescents differ from children and adults in how they process and learn from social feedback is central to understanding the link between social acceptance and risky behavior observed during adolescence.

It has been suggested that feedback from peers serves as a reinforcer to influence behavior. This hypothesis is supported by a growing body of work demonstrating overlapping neural circuitry for evaluating social (praise, gain in reputation, positive affect) and nonsocial (juice, money) rewards (Bhanji & Delgado, 2013; Fareri, Niznikiewicz, Lee, & Delgado, 2012; Izuma, Saito, & Sadato, 2008; Lin, Adolphs, & Rangel, 2012; Meshi, Morawetz, & Heekeren, 2013; Rademacher et al., 2010; van den Bos, McClure, Harris, Fiske, & Cohen, 2007). Our recent work in adults demonstrated that the ventral striatum supported learning from varying amounts of positive social feedback from peers (Jones et al., 2011) and that the most reinforcing peers had a greater influence on social preferences and reaction times, as one would predict with traditional reinforcement learning theory.

The goal of the present study is to evaluate differences across age in social reinforcement learning from peers. Recent work has shown that adolescent reaction times, choice behavior, and neural activity in the ventral striatum are hypersensitive to rewarding stimuli, as compared with children and adults (Cauffman et al., 2010; Cohen et al., 2010; Galvan et al., 2006; Geier, Terwilliger, Teslovich, Velanova, & Luna, 2010; Van Leijenhorst et al., 2010), with greater positive prediction error signals in the ventral striatum to large monetary rewards (Cohen et al., 2010). Therefore, one hypothesis is that adolescents’ learning will be hypersensitive to the receipt of positive reinforcement, reflected by higher positive learning rates and greater activity within the ventral striatum during prediction error learning, as compared with children and adults. Alternatively, recent work by Crone and colleagues suggests similar neural patterns across age during prediction errors (van den Bos, Cohen, Kahnt, & Crone, 2012) but increasing functional connectivity between the ventral striatum and prefrontal cortex with age. Therefore, developmental changes in decision making may be related to differences not in reward-related learning signals per se but, rather, in how these signals can guide expectations and behavior (van den Bos et al., 2012). At different ages, incentives and outcomes can have a differential influence on impulse control (Teslovich et al., 2014), and this developmental change in impulsivity to incentives like points or money may be true for how learning behavior is influenced by social positive feedback from peers. To address these alternatives, exploratory neuroimaging analyses were conducted to determine whether circuitry that processes affective salience, which is elevated in adolescence during appetitive and social processing (Guyer et al., 2012; Guyer, McClure-Tone, Shiffrin, Pine, & Nelson, 2009; Masten et al., 2009; Somerville, 2013) was uniquely elevated in adolescents, relative to children and adults, when processing positive social feedback.

Testing 8- to 25-year-old participants with a previously established paradigm (Jones et al., 2011), the present study sought to determine whether adolescents, as compared with children and adults, differentially learn to associate different peers with distinct probabilities of receiving positive feedback and measured neural responses related to learning processes using fMRI. We used a traditional reinforcement learning model, Rescorla–Wagner (Rescorla & Wagner, 1972), to compute positive and negative learning rates, which were used to model trial-by-trial neural responses to prediction errors and cue values. Two continuous age predictors, one that tested for linear age effects and a second that tested for quadratic effects (that peak or trough in adolescence), were used to test for age differences in response to varying amounts of positive social feedback.

Method

Participants

One hundred twenty-five healthy participants 8–25 years of age completed the behavioral task. Ninety-five individuals completed the task during fMRI scanning. Usable data were obtained from n = 120 individuals for the behavioral analyses and n = 78 for the fMRI analysis (see Table 1). All participants had no history of neurological or psychiatric disorders based upon parent or self-report with the Structured Clinical Interview for DSM–IV axis I Disorders or the Kiddie–Schedule for Affective Disorders and Schizophrenia for School Age Children–Present and Lifetime Version. Estimated IQ as measured by the Wechsler Abbreviated Intelligence Scale (Wechsler, 1999) did not differ by age and is reported in Table 1. Participants provided informed written consent (parental consent and participant assent for minors) approved by the Institutional Review Board of Weill Cornell Medical College. All participants were compensated following their participation.

Table 1 Age and gender demographics of participants included in behavioral (left) and fMRI (right) analyses

A subset of 95 participants completed the task during fMRI scanning.. Participants were eligible for the fMRI if they were right-handed, had no metal implants, including braces or metal retainers, and had no reported history of claustrophobia. Children and adolescents interested in MRI were first acclimated to the scanner environment in a mock MRI scanner, while being trained to remain still inside of the MRI environment. If participants were ineligible for MRI, did not pass the mock scanning session, or were not interested in undergoing fMRI, they completed the task outside of the scanner, contributing to behavioral data analyses. Data from 5 individuals were eliminated due to less than 60 % accuracy within a task condition. For participants who completed the scan, further exclusion criteria for head motion are described in the fMRI preprocessing methods section below. For a summary of demographics for the final sample of participants included in the behavioral and fMRI analyses, see Table 1.

Experiment cover story

The experiment was conducted during two separate sessions and is described in Jones et al. (2011). The first session introduced the cover story leading participants to believe that they would receive actual social feedback from peers during a task that would be completed on the second visit. Participants were shown up to five photographs of gender-, age-, and ethnicity-matched peers. They then selected three peers with whom they would like to interact and rated the peers on a scale from 1 (not very) to 10 (very) for how likeable and attractive they looked. Participants also completed a personal survey where they listed information about themselves (birthday, hometown, and favorite music, TV shows, books, quotes, and activities). Participants were told that each of the three selected peers would see their survey over the next few days, as well as the surveys of 2 other supposed participants. These three peers would write notes indicating a positive interest in the participant’s survey or in one of the other two surveys. Participants were told that each of these individuals could write a small number of notes, emphasizing their limited number and enhancing the positive value of receiving a note. Participants were then scheduled for a second session.

At the second session, participants were told that the experimenters had compiled notes from the three selected peers and they would be shown how often each of the peers decided to write notes to them (positive social reinforcement) or to one of the other supposed participants (no positive social reinforcement). At the beginning of the second session, participants were reminded that receiving a note indicated that the peer was interested in something written in their personal survey.

Unbeknownst to the participants, peer interaction (i.e., delivery of notes) was experimentally manipulated so that each of the three peers was associated with a distinct probability of social reinforcement (Fig. 1a): (1) rare interaction, defined by positive social reinforcement (notes) on 33 % of the trials and no positive social reinforcement on 66 % of the trials; (2) frequent interaction, defined by positive social reinforcement on 66 % of the trials and no positive social reinforcement on 33 % of the trials; and (3) continuous interaction, defined by positive social reinforcement on all trials (100 %). This contingency structure was based upon studies in nonhuman primates (Fiorillo, Tobler, & Schultz, 2003; Schultz, Dayan, & Montague, 1997). The probability of reinforcement associated with each of the face stimuli was counterbalanced across participants to equate for low-level stimulus features across conditions.

Fig. 1
figure 1

Task parameters. a Three peers chosen by the participant are associated with distinct probabilities of positive reinforcement. b Schematic of one trial within a run. The face of one peer (cue) is displayed for 2 s, during which the face stimulus winks (500 ms) and participants press one of two buttons indicating in which eye the wink occurred, followed by a variable interstimulus interval, followed by the note outcome (feedback). In this example, the participant receives the note (positive social reinforcement) because it appears in the middle hand. If the note appears in one of the hands to the left or to the right of the middle hand, this indicates that the participant did not receive the note (no positive social reinforcement)

Task parameters

At the start of each trial (Fig. 1b), a picture of one of the three peers was presented for 2 s (cue). During the 2 s, the stimulus would wink for 500 ms in either the left or the right eye, indicating that a note was ready to be passed. Participants signaled that they were ready to receive the note by pressing one of two buttons indicating whether the wink was in the left or the right eye. This behavioral element was included to ensure attention to the cues and to acquire an objective reaction time measure of learning about the reinforcement contingencies for each of the three peers across the experiment. After a jittered interstimulus interval of a picture of a folded note (2, 4, 6, or 8 s), three hands appeared at the bottom of the screen, with one hand holding a note for 2 s (feedback). Participants had been instructed that if the middle hand held the note, this signified that the participant had received a note from that peer (positive social reinforcement). If the note appeared in one of the hands to the left or right of the middle hand, this signified that the note was given to someone else (no positive social reinforcement). If the participant pressed incorrectly or did not respond during the cue, no feedback was given. A jittered intertrial interval (2, 4, 6, or 8 s) followed, where participants rested while viewing a fixation crosshair. Participants viewed 18 trials per run in a pseudorandomized order, with 6 trials per condition (rare, frequent, continuous) for six runs, for a total of 108 trials, 36 trials per condition. To enhance the believability of the cover story and keep participants engaged, one of the supposed “notes” was shown between each run, which were generated by the experimenters and always indicated positive interest in the participant’s personal survey (e.g., “I love playing football too, and I am on my school’s team”; “Where did you go when you visited California?’; “I also love the book The Secret Garden”).

To further index learning with the reaction time data, at the end of the experiment, after the six experimental runs, participants completed a reversal run (18 trials), during which reaction times were recorded. Contingencies were reversed for the rare and continuous conditions such that the rare peer now provided 100 % reinforcement and the continuous peer now provided 33 % reinforcement to the participant. The frequent peer’s probability (66 %) did not change.

The task was presented using E-Prime software, and the participants who completed the task during fMRI viewed images on an overhead liquid crystal display panel with the Integrated Functional Imaging System–Stand Alone (IFIS–SA) (fMRI Devices Corporation, Waukesha, WI). E-Prime software, integrated with the IFIS system, recorded button responses and reaction times using the Fiber Optic Button Response System (Psychology Software Tools, Inc., Sharpsburg, PA).

At the end of the experiment, participants completed posttest ratings of attractiveness and likeability for each peer on the same scale as that used at the beginning of the experiment. All participants expressed that they believed the interaction was real and that they were actually receiving notes. To assess whether participants held explicit knowledge of the social reinforcement contingencies associated with each peer, they were asked whether any of the three peers provided positive reinforcement more often than any others. If the participant said yes, they were asked to describe what pattern they noticed, and descriptions were scored on the basis of whether the participant accurately stated which peer provided the most, middle, and least positive social feedback. Eight individuals (ages: 8, 11, 13, 14, 15, and three 22-year-olds) correctly ranked the three peers in this way and were thus considered explicitly aware of the social reinforcement contingencies. Behavioral results did not change when these participants were excluded from the analysis. Participants were then debriefed regarding the cover story and the rationale of the experiment.

Image acquisition

Participants were scanned with a General Electric Signa HDx 3.0T MRI scanner (General Electric Medical Systems, Milwaukee, WI) with a quadrature head coil. A high resolution, 3-D magnetization prepared rapid acquisition gradient echo anatomical scan was acquired (256 × 256 in-plane resolution, FOV = 240 mm; one hundred twenty-four 1.5-mm sagittal slices). Blood oxygenation level dependent (BOLD) functional scans were acquired with a spiral in and out sequence (Glover & Thomason, 2004) (repetition time TR = 2,000 ms, echo time = 30 ms, flip angle = 90°). Twenty-nine 5-mm-thick contiguous coronal slices were acquired per TR, for 129 TRs per functional run with a resolution of 3.125 × 3.125 mm (64 × 64 matrix, FOV = 200 mm) covering the entire brain except for the posterior portion of the occipital lobe.

Data analysis

Age effects

The goal of the present study was to determine whether there were developmental differences in learning the positive reinforcement contingencies associated with each peer. Dependent variables were analyzed for two distinct patterns of continuous age contingent changes: (1) quadratic, representing U or inverted-U effects for which adolescents differ from both children and adults and (2) linear, progressively increasing or decreasing age effects. A linear function was calculated by mean-centering age (in the behavioral sample, M = 15.86 years; in the fMRI sample, M = 16.69 years), and a quadratic function was calculated by squaring the mean-centered linear age variable. Dependent measures were entered into linear regression with the two continuous age predictors to determine whether linear or quadratic age differences explained variance in the data. Interactions between the task condition variables and age were tested using the continuous age predictors as covariates in analyses. Given previous work demonstrating sex differences in sensitivity to processing social feedback in adolescence (Guyer et al., 2009), significant age effects were also tested for additional modulation of participant sex.

To analyze age-independent effects, additional analyses included comparing the dependent variables without the inclusion of the age predictors to find main effects of receiving varying amounts of positive social feedback on behavioral measures (preference ratings, accuracy, reaction times, and learning rates) and fMRI (cue and feedback portions of the trials). Statistical calculations for behavioral measures were conducted in PASW Statistics 19 software (SPSS, Chicago, IL).

Preference ratings

A difference measure was generated for the attractiveness and likeability ratings of the peers before and after the task by subtracting the preinteraction score from the postinteraction score. Main effects of age, probability of reinforcement (rare, frequent, continuous), and age × probability interactions on preference ratings (post- minus pretask) were assessed using a 1 × 3 repeated measures analyses of variance (ANOVAs) with continuous linear and quadratic age predictors included as covariates. Post hoc analyses were performed with paired sample t-tests, and p < .05, two tailed, was considered significant. Three of 120 individuals (all 22 years of age) were missing both attractiveness and likeability ratings, and four 9-, 10-, 13-, and 17-year-olds were missing attractiveness ratings.

Mean accuracy and reaction times

Mean accuracy (correctly indicating whether the right or the left eye winked) was calculated for each probability (rare, frequent, continuous) for each participant. Accuracy analyses were completed as described above with a 1 × 3 repeated measures ANOVA with continuous linear and quadratic age predictors included as covariates. Reaction times to the cue after the wink occurred were z-score transformed to each individual’s mean and standard deviation after first removing outliers (defined as reaction times 3 standard deviations above or below the individual’s mean reaction time) and log-transforming each reaction time to satisfy normality assumptions. To test for reaction time modulation as a function of contingency reversal, we computed a difference score by subtracting the average z-scores in the final (sixth) run of the experiment from the average z-scores in the reversal run separated by the two conditions in which the probabilities reversed (rare and continuous). The sixth run had an equal number of trials as the reversal run. One participant (13 years of age) was missing data from the reversal run. Prior work (O'Doherty, Buchanan, Seymour, & Dolan, 2006), and adult behavior on this paradigm (Jones et al., 2011) demonstrated an increased speeding by the late trials toward cues that provide the most reinforcement, as compared with those that provide less reinforcement. The z-scored reaction times from the late trials, defined as the final third of the experiment (fifth and sixth runs), were averaged by cue (rare, frequent, and continuous) and were used only in exploratory correlations with parameter estimates from the neuroimaging data (described in greater detail in the Neuroimaging analyses independent of reinforcement learning model section).

Reinforcement learning model

We used a simple reinforcement learning algorithm (Rescorla–Wagner) to model the trial-by-trial variance in participants’ reaction times (Rescorla & Wagner, 1972). The Rescorla–Wagner rule probes learning through a prediction error (PE) signal δ, which is the difference between the experienced outcome (R: positive social feedback or no positive feedback) and expected outcome (V) for each trial. PE takes the form of δ = RV and can be used to subsequently update expected outcome weighted by a fixed learning rate α: V t+1 = V t + αδ t for given trial t. Reaction time has been shown in previous studies to be a reliable indicator of learning contingencies and speeding to cues predicting higher value and slowing to cues predicting lesser value has been associated with conditioning as predicted by reinforcement learning models (Bray & O'Doherty, 2007; Seymour et al., 2004). We extended the standard Rescorla–Wagner learning model and used separate learning rates for positive social feedback (α +) and no positive social feedback (α ) (Caze & van der Meer, 2013; Kahnt et al., 2009):

$$ \left\{\begin{array}{ccc}\hfill {V}_{t+1}={V}_t+{\alpha}^{+}{\delta}_t,\hfill & \hfill if\hfill & \hfill {\delta}_t\ge 0\hfill \\ {}\hfill {V}_{t+1}={V}_t+{\alpha}^{-}{\delta}_t,\hfill & \hfill if\hfill & \hfill {\delta}_t<0\hfill \end{array}\right. $$

We separately estimated learning parameters for the two types of feedback, since previous reinforcement learning studies have shown developmental differences in learning from positive and negative feedback (Christakou et al., 2013; van den Bos et al., 2012). While we are labeling α as negative, we acknowledge that this parameter represents updating of value based on no positive social reinforcement, rather than to overtly negative outcomes as in prior studies. The Rescorla–Wagner model was fit to each participant’s trial-by-trial z-score transformed logarithmic reaction times [log(RT)] using a maximum-likelihood estimation algorithm to derive the best-fitting model parameters (α + , α and initial V) for each participant.

Similar to previous developmental studies (Christakou et al., 2013; van den Bos et al., 2012), differences in the rate of learning from positive social reinforcement and from no positive social reinforcement were modeled separately. To determine age differences, we examined α + and α as the dependent variables in separate multiple regression analyses testing whether linear and/or quadratic age explained a significance portion of variance in positive or negative learning rates. Sex was added as a regressor to models where there was a significant effect of age. To determine whether higher learning rates correspond to quick behavior changes based upon the amount of positive feedback, the significant age effects on α were further interrogated with post hoc correlations with the difference scores between reaction times for pre- relative to postreversal cues (rare and continuous). Bonferroni-adjusted critical α = 0.025 controlled for multiple tests with the two reversal conditions.

Neuroimaging preprocessing and first-level modeling

Functional images were slice-time corrected and realigned to the first volume using six-plane rigid body transformation. Given the developmental sample, analyses minimized the influence of participant motion on fMRI signal. Functional volumes were flagged for excessive motion if associated with head movement exceeding 1.56 mm (half a voxel) in any plane, relative to the volume before it. Thirty participants had data that were flagged on the basis of these criteria. Twelve individuals had motion within a single TR that was greater than 4.99 mm and were excluded from analyses. Remaining individuals were included, but TRs with motion between 1.57 and 4.99 mm were censored from first-level general linear model (GLM) analyses (mean motion = 3.42 mm, standard deviation = 1.04 mm; number of censored TRs for each individual was less than 5 %). See Table 1 for demographics of the imaging sample.

Anatomical and functional data sets were spatially coregistered. Both sets of images were warped to Talairach and Tournoux (Talairach & Tournoux, 1988) coordinate space by applying the warping parameters obtained from the transformation of each subject’s high-resolution anatomical scan using a 12-parameter affine transformation to a template volume (TT_N27). Talairach-transformed functional images were smoothed with an isotropic 6-mm Gaussian kernel and resampled to a resolution of 3 × 3 × 3 mm.

Reinforcement learning model neuroimaging analyses

A GLM analysis was performed to estimate neural responses to stimuli as a function of reinforcement learning. Individual participant learning rate (α + , α ), prediction error (δ t ), and cue value V t parameters from the reinforcement learning models were included as parametric regressors with signed numbers in individual-subject GLMs. Each participant’s GLM contained five task regressors: (1) cue onset times, defined as the time points at which peer faces were presented; (2) a parametric regressor paired with cue timings containing value estimates for each trial (V t ); (3) feedback onset times, containing values corresponding to the time points at which the note feedback was presented; (4) a parametric regressor paired with feedback onset time representing prediction error values (δ t ); and (5) incorrect trial onset times. Task regressors were convolved with a gamma-variate hemodynamic response function. Regressors of noninterest included motion parameters and linear and quadratic trends for each run to account for correlated drift and residual motion effects. In order to isolate positive prediction errors from negative prediction errors, a second set of first-level general linear model analyses were performed as described above, but with feedback trials divided and modeled on the basis of the two types of prediction errors: positive prediction error (δ +) and negative prediction error (δ ).

Following GLM estimation for each participant, we generated group random effects statistical maps for prediction error and cue learning value using the beta estimates for the parametric regressor representing prediction error values (δ t ) and values to the cues (V t ). To test for main effects across all participants, separate within-subjects voxel-wise one-sample t-tests were performed to identify regions demonstrating activity that positively or negatively correlated with prediction error and that positively or negatively correlated with cue value based on learning history. To test for age effects, the linear and quadratic age predictor variables were entered as separate covariates on the parametric regressor that represented prediction errors, and separate age analyses were conducted on the parametric regressor that represented cue values. Follow-up analyses with the parametric regressors representing positive prediction error δ + and negative prediction error δ were performed only in instances where there were significant age effects with either α + or α . To generate statistical maps that corresponded to the age effects observed with the behavioral learning rates, the linear and quadratic age predictor variables were entered as separate covariates on the parametric regressor representing positive prediction error δ + and negative prediction error δ .

Neuroimaging analyses independent of reinforcement learning model

Motivated by the age differences in the explanatory power of the reinforcement learning framework, we focused on neural activation patterns to the receipt of positive feedback that were independent of reinforcement learning parameters. Each participant’s GLM contained task regressors as described above, but without parametric modulation from the reinforcement learning model. Exploratory random effects group analyses were conducted on individual participant beta estimates for the regressor representing the receipt of positive reinforcement relative to baseline, with a single within-subjects voxel-wise one-sample t-test and the quadratic age predictor as a covariate. This analysis was performed to identify regions demonstrating activity to the receipt of positive social feedback that was unique to adolescents.

Two of the regions identified in the whole brain corrected analysis, the putamen and supplementary motor area, are regions involved in planning self-initiated movement (Alexander & Crutcher, 1990; Wiese et al., 2004). In order to understand whether activity in these regions corresponded to participant’s behavioral responses, we examined the relationship between z-scored reaction time data from the late trials and beta estimates from the supplementary motor area and putamen. We targeted the late trials because they reflect when participants have had the opportunity to learn the reinforcement contingencies associated with the three peers and allowed us to test whether neural motor activity corresponded to behavioral responses after learning. Correlations were performed with the z-scored reaction times for three peer contingencies (rare, frequent, and continuous) and the beta estimates from the putamen and supplementary motor area, corrected for six distinct tests (Bonferroni-adjusted critical α = 0.008).

Results of all whole-brain analyses were considered significant by exceeding a p-value/cluster size combination that corresponded to whole-brain p < .05, corrected for multiple comparisons as calculated with 3dClustSim in AFNI (p < .005/49 voxels). For the main effect of prediction error across all participants, the surviving cluster extended from the prefrontal cortex into the striatum, as displayed in Fig. 3, with the peak in the prefrontal cortex. We used an anatomical mask of the striatum that included the entire caudate, putamen, and ventral striatum in order to identify a subpeak within this cluster, which was identified in the ventral striatum (x = −7, y = 8, z = 2).

All significant effects were plotted for inspection and possible outliers by extracting parameter estimates for each participant from a 6-mm (29-voxel) spherical region of interest around the cluster peak. Parameter estimates were also used in analyses to test possible age and sex differences and used to rule out potential age confounds in signal-to-noise ratio (SNR).

SNRs were calculated in order to determine whether age differences remained significant when accounting for differences in SNR across participants. For each participant, the ratio was computed with the mean baseline estimate from the GLMs divided by the standard deviation from the residual time series (Johnstone et al., 2005; Somerville et al., 2013). SNR values were calculated for each participant within regions of interest that were derived from the age differences maps. Partial correlation analyses tested whether age effects on the insula (Fig. 4), age effects on the putamen and supplementary motor area (Fig. 5), correlations between the insula and putamen (Fig. 5c), and correlations between the supplementary motor area and late trial z-scored reaction times (Fig. 5d) remained significant when controlling for SNR.

Results

Behavioral data

Likeability and attractiveness ratings

An analysis of differential likeability ratings from posttask relative to pretask indicated a main effect of reinforcement probability, indicating that preferences for the three peers changed differently after the task, F(2, 228) = 5.64, p < .01. Post hoc analyses demonstrated that participants liked peers that gave them continuous positive social feedback more than those who rarely gave them positive social feedback, t(116) = 3.45, p < .01, and at marginal significance, t(116) = 1.971, p = .051, participants liked peers who gave them frequent (66 %) positive feedback more than those who rarely (33 %) gave them positive feedback. There was no significant difference in preference ratings for the continuous peer and the frequent peer (p > .24). There were no interaction effects with age on likeability ratings (all ps > .38). Attractiveness ratings of the peers did not significantly change from before to after the task (main effect of attractiveness, p > .58). There was a significant interaction between linear age and probability of reinforcement on attractiveness ratings, F(2,220) = 3.18, p < .05. Post hoc correlations were not significant (ps > .13).

Accuracy

Accuracy of detecting the wink in the left or right eye was high for all participants (M = 94 %, SD = 4.9 %), and as was expected, accuracy increased with age, with a main effect of linear age, F(1, 117) = 11.66, p < .01. Regardless of age, task accuracy was modulated by the amount of positive social reinforcement that participants received, F(2, 234) = 19.45, p < .01. Post hoc analyses showed that participants were less accurate in their button response to the rare peer (M = 91.17 %, SD = 6.16 %), as compared with the frequent peer (M = 94.03 %, SD = 6.80 %), t(119) = 4.87, p < .01, and with the continuous peer (M = 94.61 %, SD = 5.38 %), t(119) = 6.38, p < .01. There were no differences in accuracy to the frequent and continuous peers (p > .36). There were no significant interaction effects with accuracy and age (ps > .78).

Reinforcement learning

A linear regression testing the age predictors on the omnibus reinforcement learning model fit (β) did not demonstrate linear or quadratic age main effects (ps > .42), confirming that parameter estimations could be compared across age. Linear regressions testing for age effects on individual parameter weights yielded a significant fit with the quadratic model on α + (β = 0.22, p < .02), with adolescents demonstrating lower positive learning rates than children and adults. This model also demonstrated a significant fit of the linear age predictor on α + (β = −0.26, p < .01), with increasing age predicting lower α +.

In order to understand how individuals who had a zero positive learning rate impacted the findings, we removed individuals where α + = 0, which indicated that these individuals had no change in reaction times on trials following positive prediction errors to index learning from positive feedback. There were 20 individuals who had a zero positive learning rate; their ages are plotted and based on inspection of the data the majority of these individuals were adolescents (Fig. 2b; see Supplementary Fig. 1a for full age distribution). Removing the 20 individuals from analysis, the quadratic age fit on α + remained significant (β = 0.21, p < .04) (Fig. 2a), as did the linear age fit on α + (β = −0.27, p < .01). There was no effect of sex on α + (p = .42). Neither age fit predicted variance in negative learning rates α - (ps > .09; see Supplementary Fig. 1b for age distribution of individuals with an α - = 0).

Fig. 2
figure 2

Behavioral data. a Positive learning rate, α +, shows a quadratic fit with age demonstrating that adolescents, relative to children and adults, have lower α + values. b Age distribution plot of individuals with a zero α +. c α + is positively correlated with the change in reaction times from the final experimental run to the reversal run in the continuous condition. The relationship between learning rates and reversal reaction times demonstrates that individuals with higher learning rates are vigilant at tracking the varying amounts of positive social reinforcement reflected by quick behavior changes

Given the age effects on α + demonstrating a reduced positive learning rate in adolescence, we conducted correlation analyses with the reversal data and positive learning rates in order to understand how quick changes in behavior may correspond to higher or lower positive learning rates. Participants who had higher α + values demonstrated greater slowing in their reaction times to the continuous peer, who provided less positive reinforcement during the reversal condition, r(117) = .23, p < .02. This effect remained significant after removing individuals who had an α + = 0, r(97) = .22, p < .03 (Fig. 2c). This positive correlation suggests that higher learning rates reflect vigilance to reinforcement contingencies, as indicated by a rapid change in behavior when the contingencies were reversed. The relationship between α + and the reversal reaction times for the rare peer was not significant after multiple comparison correction (p = .04; Bonferroni-adjusted α = .025).

Imaging

Cue values and prediction errors

Prediction error signals (δ t ) while processing the feedback portion of trials were positively associated with BOLD activity that extended from the medial prefrontal cortex (mPFC) to the striatum (see Fig. 3). There were no age differences within the peak cluster located in the mPFC (x = −1, y = 47, z = 8) and peak cluster in the ventral striatum (x = −7, y = 8, z = 2) (all ps > .46). Additionally, whole-brain analyses demonstrated no linear or quadratic age-mediated patterns of neural activation. Additional regions that showed positive and negative correlations with prediction error signals (δ t ) are listed in Table 2.

Fig. 3
figure 3

Regions demonstrating positive correlations with prediction errors (δ t ). Far right panel displays subpeak in a striatum mask. For all imaging pictures R=L

Table 2 Regions demonstrating positive and negative correlations with prediction error (δ t ) and cue values (V t )

Motivated by behavioral findings indicating that adolescents show lower positive learning rates, relative to children and adults, we tested for neural activity that tracked with positive prediction errors (δ +) targeting adolescent-specific effects. We found that adolescents, relative to children and adults, demonstrated greater positive correlations in the anterior to mid insula (x = −40, y = 2, z = 2; 62 voxels) as a function of positive prediction error processing (Fig. 4). Adolescent-specific effects in the insula remained significant when controlling for SNR in this region. There were no gender differences in response in this region, and whole-brain analyses demonstrated no linear age effects for positive prediction errors.

Fig. 4
figure 4

a Age differences for positive correlations with positive prediction errors (δ +). The insula was engaged more in adolescents, relative to children and adults. b The scatterplot displays the parameter estimates in the insula for positive prediction errors distributed by age for descriptive purposes. The line represents a quadratic fit. For all imaging pictures, R = L

Cue values (V t ) were positively associated with activity in the rostral anterior cingulate cortex (rACC) (x = −1, y = 47, z = 2; 102 voxels) that extended into the medial prefrontal cortex (see Supplementary Fig. 2), with greater activity in this region to larger cue values. No other regions demonstrated positive correlations with V t , and there were no age or gender differences in rACC response (ps > .35). Regions demonstrating a negative correlation with cue value are listed in Table 2. Analysis with the quadratic age predictor demonstrated that children and adults showed a greater positive correlation (U-shaped curve) with cue values (V t ) in the postcentral gyrus (x = −19, y = −40, z = 68; 92 voxels), the anterior caudate (x = −7, y = 20, z = 8; 60 voxels), and the uncus that extends into the amygdala (x = −22, y = 5, z = −22; 52 voxels), as compared with adolescents. There were no sex differences in these regions (ps > .36). The greater positive correlation (U-shaped curve) in these regions across age is consistent with the observed behavior changes with age where adolescents demonstrated lower positive learning rate values. Whole-brain analyses demonstrated no linear age effects to cue value learning.

Adolescent-specific response to positive social feedback without parametric modulation

Motivated by the finding that adolescents show lower positive learning rates, an additional set of GLMs were estimated using task timings and no learning parameters. These analyses were conducted to identify developmental shifts in the neural response pattern to receiving positive feedback, independent of learning-related parameters. Adolescents showed greater activity in the supplementary motor cortex and in the putamen when receiving positive social reinforcement, regardless of which peer gave the feedback (Table 3; Fig. 5). Adolescent-specific effects in the putamen and supplementary motor cortex remained significant when controlling for SNR within these regions. Greater parameter estimates in the insula during positive prediction error learning were positively correlated with greater activation in the putamen to positive feedback, r(76) = .27, p < .02, Bonferroni-adjusted α = .025 (Fig. 5c), but not in the supplementary motor area (p = .44). The correlation between the insula and putamen remained when controlling for SNR in these two regions. It is important to note that activation in the supplementary motor area does not merely constitute carryover motor activation from the cue response. The supplementary motor area activation to the feedback portion of trials is spatially nonoverlapping with the peak primary motor activation observed in the cue portion of trials, and this primary motor activation during the cue demonstrated no age differences.

Table 3 Regions demonstrating adolescent-specific activation to the receipt of positive social feedback
Fig. 5
figure 5

Age differences in activation to the receipt of positive social feedback. a Greater activity in the putamen and supplementary motor area (SMA) was found in adolescents, relative to children and adults. b The scatterplot displays the parameter estimates in the putamen and SMA for all positive social feedback distributed by age for descriptive purposes. The lines represent a quadratic fit. c Positive correlation between parameter estimates for positive prediction error in the insula and activation in the putamen. d Scatterplot showing the relationship between activation in the SMA and z-scored reaction times in the late trials for the least reinforcing peer. The negative association suggests that greater activation in the SMA corresponds to greater speeding to the peer who provides the least amount of positive feedback. For all imaging pictures, R = L

Exploratory analyses focused on understanding whether the adolescent-specific activation patterns in the supplementary motor cortex and the putamen corresponded to changes in participant’s behavior as reflected by late trial z-scored reaction times. There was a trend of lower z-scores, which reflected faster reaction times, during the late trials to the rare cue that corresponded with greater activity in the premotor cortex, r(76) = −.23, p < .039, Bonferroni-adjusted α = .008 (Fig. 5d), with no significant correlations for the frequent or continuous cue and premotor activity (ps > .23) or with the putamen for all of the three cues (ps > .37). The trend for an association between reaction times to the rare cue in the late trials and activity in the premotor region remained significant when controlling for SNR in the premotor area. There were no significant sex differences in the supplementary motor cortex and putamen. Together, these results suggest that elevated activity within a motor circuit in adolescents when receiving positive social feedback is associated with speeding responses to cues of the least reinforcing peer.

Discussion

Using a paradigm that manipulated the probability of receiving positive social feedback, we observed adolescent-specific age differences in reinforcement learning behavior and neural response patterns. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents, as evidenced by lower positive learning rates and elevated activity in response planning circuitry to the receipt of positive feedback, regardless of the expected outcome. These behavior and neural patterns support the hypothesis that adolescence is a period of unique sensitivity to peers but also suggest that adolescent behavior in social contexts is not explained by simple reinforcement learning theory.

Adolescents showed lower positive learning rates than did children and adults during social reinforcement learning, with reaction times serving as a behavioral index. Prior work has demonstrated age differences in behavioral performance (van Duijvenvoorde, Zanolie, Rombouts, Raijmakers, & Crone, 2008) and linear changes with age in reinforcement learning rates (Christakou et al., 2013; van den Bos et al., 2012) after receiving positive and negative feedback. Prior studies (Christakou et al., 2013; van den Bos et al., 2012) generated learning rates based on choice behavior and used nonsocial reinforcers (i.e., points or money). In the present study, positive learning rates showed a quadratic pattern. There are two possible explanations for this difference: (1) Adolescents did not learn to discriminate between the cues that are associated with different amounts of positive social feedback, or (2) adolescents’ behavior is not captured by simple reinforcement learning predictions. The model predicts that as the participant learns to associate different cues with different amounts of positive reinforcement, larger positive prediction errors result in greater changes in next-trial behavior, whereas small or no prediction errors will result in less change in behavior. A low positive learning rate, or a learning rate of zero, reflects either little change in behavior on trials following positive prediction errors or equal change in behavior after a small or large positive prediction error. The learning rate data were further explained by the fact that individuals who showed rapid behavior changes during the reversal test (when they expected to receive positive feedback and did not) had higher learning rates, suggesting that they were vigilant at tracking the varying amounts of feedback.

Although adolescents demonstrate lower positive learning rates, it is unlikely that they simply don’t learn. Preference ratings demonstrate that adolescents, similar to children and adults, rated peers who gave them more positive feedback as more likeable at the end of the experiment. Additionally, there was no difference observed in negative learning rates across development. Adolescents’ positive learning rate profile could be explained by an overall vigilance to the receipt of peer approval (Collins & Steinberg, 2007) and is consistent with work showing that a close friend, but also an anonymous or unknown peer, can enhance adolescents’ risk-taking behavior (Gardner & Steinberg, 2005; Weigard et al., 2014). Alternatively, lower positive learning rates could be explained in part by increased motivation toward that which is socially the least reinforcing, which would mean an equal speeding toward the least and most reinforcing peers. This explanation aligns with work suggesting that adolescents engage in risky behavior when they perceive themselves to be less socially accepted (Prinstein, Boergers, & Spirito, 2001). Future work will be necessary to differentiate between these two possible explanations. In addition, comparing monetary and social reward learning (Kohls, Peltzer, Herpertz-Dahlmann, & Konrad, 2009)—specifically, in adolescents—would help to illuminate the unique nature of the social learning rate differences observed across age and distinguish social reinforcement learning from other types of reinforcement learning (Christakou et al., 2013; Cohen et al., 2010; van den Bos et al., 2012).

Imaging data provide further insight into the observed age-related differences in social learning. We demonstrated that in adolescents, the anterior to mid insula response is correlated with positive prediction error fluctuations, more than in children and adults. Elevated activity in the insula to social cues during adolescence has been reported in a number of studies (Guyer et al., 2012; Guyer et al., 2009; Masten, Telzer, Fuligni, Lieberman, & Eisenberger, 2012), and the insula is considered to play an important role in processing emotional salience. For instance, the insula has been implicated in processing subjective feelings and awareness about one’s body (Craig, 2009; Critchley, Wiens, Rotshtein, Ohman, & Dolan, 2004; Damasio, 2003), feelings of distress or pain (Eisenberger, Lieberman, & Williams, 2003; Lamm, Batson, & Decety, 2007), and overall processing of affective states that are the result of interacting with other people (Lamm & Singer, 2010). In addition, a consistent role for the insula has been observed in detecting novel events (Downar, Crawley, Mikulis, & Davis, 2001) and incorporating this information with that of affective feelings to generate what has been described as a global subjective feeling state (Singer, Critchley, & Preuschoff, 2009). The nonlinear findings in the insula support the hypothesis that peer approval is emotionally salient to the adolescent and extend existing accounts of insula function to social learning contexts.

In addition to adolescent-specific findings in the insula, the data demonstrated that adolescents, more so than children and adults, activated regions within response planning circuitry when receiving positive social approval, regardless of which peer gave them the feedback. Nonhuman primate and human imaging work has shown that the putamen and supplementary motor area encode self-initiated preparation for movement (Alexander & Crutcher, 1990; Cunnington, Windischberger, Deecke, & Moser, 2002; Wiese et al., 2004), which suggests that peer approval may motivate adolescents toward action. A trend emerged such that those individuals who, at the end of the experiment, demonstrated greater speeding toward the least reinforcing peer also showed greater activation in the supplementary motor area while receiving positive feedback. This was not the case for speeding to the most reinforcing peers. Greater premotor activity at the time of receiving positive feedback and faster response times to the cue of the least reinforcing peer may suggest a heightened motivation in the adolescent for peer approval. It is important to note that the activation maps were exploratory, since they were generated by positive social feedback events versus baseline, rather than by subtracting a control condition. However, such an approach has merits in a developmental sample, exposing changes that may be otherwise hidden with a subtraction analysis (Church, Petersen, & Schlaggar, 2010). Increased activity in response planning circuitry could contribute to observed behavioral changes during adolescence in social contexts. Future work is necessary to explore possible connections between premotor activity and risk-taking behavior during adolescence.

We found that the ventral striatum and medial prefrontal cortex were equivalently engaged across age during social reinforcement learning. This finding is consistent with other reinforcement learning studies (van den Bos et al., 2012) and suggests that fundamental reinforcement learning mechanisms support social reinforcement learning from late childhood to adulthood. Adolescents’ lower positive learning rates, in conjunction with findings of common activation across age in reward-related circuitry, indicate that adolescents are not simply influenced by peers because they find their feedback more reinforcing. Likeability ratings also did not interact with age, suggesting that the perceived value of peers based on reinforcement history was equivalent for children, adolescents, and adults. Rather, the heightened activity in the insular cortex and regions within response planning circuitry of adolescents may suggest an affective-motivational sensitivity toward any peer approval.

Our analysis approach modeled an ideal function that peaks at 15 and 16 years of age in the behavioral and imaging data, respectively. Thus, the continuous analyses of age are not optimized to directly compare subgroups of adolescents. Recent studies of reinforcement and social cognition have shown increased sensitivity in affective-motivational circuitry in early versus late adolescence (Engelmann, Moore, Monica Capra, & Berns, 2012; Pfeifer & Blakemore, 2012). Visual inspection of the scatterplots in Figs. 2, 4, and 5 suggests that the naturally occurring peak/trough in age analyses consistently falls in late adolescence, consistent with prior research on adolescent social sensitivity (Somerville et al., 2013). However, more work will be required to further specify the ages of greatest social sensitivity during the adolescent years. Generally, our findings are consistent with recent models of adolescent development that propose adolescent-specific increases in the motivational salience of peers, thereby influencing neural circuitry function and, in turn, increasing sensitivity to peer approval and learning in the adolescent (Crone & Dahl, 2012; Somerville, 2013).

In conclusion, we show an adolescent-specific effect of positive social feedback from peers on learning and neural activation patterns. Differing amounts of positive reinforcement enhanced learning in children and adults, whereas adolescents were motivated by all positive peer feedback, even from the least reinforcing peer. Adolescents’ sensitivity to peer approval has important implications for understanding how peers influence adolescents to make both good and bad choices (Chen et al., 2000; Luthar & D'Avanzo, 1999; Wentzel & Caldwell, 1997), as well as the effects that peers have on adolescent health outcomes such as self-esteem, mental health, and school adjustment (Bishop & Inderbitzen, 1995; Laird et al., 2001). Ultimately, adolescents’ response to positive social signals may inform the development of interventions that target risky behaviors that occur in the presence of peers.