Fear Generalization as Threat Prediction: Adaptive Changes in Facial Exploration Strategies revealed by Fixation-Pattern Similarity Analysis

Animals can effortlessly adapt their behavior by generalizing from past experiences, and avoid harm in novel aversive situations. In our current understanding, the perceptual similarity between learning and generalization samples is viewed as one major factor driving aversive generalization. Alternatively, the threat-prediction account proposes that perceptual similarity should lead to generalization to the extent it predicts harmful outcomes. We tested these views using a two-dimensional perceptual continuum of faces. During learning, one face is conditioned to predict a harmful event, whereas the most dissimilar face stays neutral; introducing an adversity gradient defined only along one dimension. Learning changed the way how humans sampled information during viewing of faces. These occurred specifically along the adversity gradient leading to an increased dissimilarity of eye-movement patterns along the threat-related dimension. This provides evidence for the threat-prediction account of generalization, which conceives perceptual factors to be relevant to the extent they predict harmful outcomes.


29
Generalization is a fundamental cognitive ability, making it possible to use previously acquired knowledge 30 in novel situations (Pavlov, 1927;Hovland, 1937;Spence, 1937;Guttman and Kalish, 1956;Shepard, 1987; 31 Tenenbaum and Griffiths, 2001;Struyf et al., 2015). For instance, a previously unseen food item that is potentially 32 harmful can be avoided, if it resembles one that is known to be harmful. In potentially threatening situations, this 33 ability is called fear (or equivalently aversive) generalization and provides an important benefit to the organism 34 by shaping behavior adaptively (Dunsmoor et al., 2011;Greenberg et al., 2013;Onat and Büchel, 2015;Resnik 35 and Paz, 2015; Kahnt and Tobler, 2016). 36 According to one prominent view, the perceptual model, it is the perceptual similarity between previously 37 learned and novel generalization samples that determines the degree of generalization. In this view, the degree to 38 which a novel stimulus is considered as harmful is directly related to the degree of overlap between shared sensory 39 features with a previously learnt harmful stimulus. An alternative view proposes that fear generalization is an 40 active cognitive act (Shepard, 1987), related specifically to the prediction of potential threat in uncertain situations 41 (Onat and Büchel, 2015). According to the threat-prediction model, perceptual factors can contribute to fear perceptual similarity between the generalization and learning samples has been explicitly used as a cue for 45 signaling the threat. As a result, threat-prediction has commonly been confounded by perceptual similarity, making 46 it impossible to dissociate their independent contributions. Therefore, it remains to be shown whether threat-47 prediction or perceptual similarity explains fear generalization best in a paradigm that dissociates perceptual 48 similarity and threat prediction. 49 In order to dissociate independent contributions of perceptual similarity and threat-prediction, we 50 investigated fear generalization using a two-dimensional perceptual space ( Fig. 1A) with faces arranged along a 51 circle within this space (Butter, 1963;Onat and Büchel, 2015). By pairing one item with an aversive outcome and 52 keeping the most dissimilar one neutral (opposite face separated by 180 degrees), we introduced an adversity 53 gradient defined exclusively along one perceptual dimension. The perceptual space can therefore be decomposed 54 into threat-specific and unspecific components (Fig 1B, middle panel), where the latter models perceptual 55 similarity independent of adversity. Hence, using this set of stimuli made it possible to dissociate independent 56 contributions of perceptual factors related to similarity as such, from those relevant for the prediction of threat. 57 We based our analysis on multivariate eye-movement patterns. This is motivated by the fact that faces on 58 the stimulus continuum are characterized by subtle differences. Humans explore such complex stimuli serially (Itti 59 and fixations to derive a similarity metric. One great benefit of this approach was that the circular organization of our 69 stimuli allowed us to formulate hypotheses on how the similarity relationships between exploration patterns could 70 change along the threat-specific and -unspecific directions of the perceptual continuum. This way, perceptual 71 similarity and threat-prediction account translate to different weighting of specific and unspecific components (Fig 72 1B-E, middle panels). 73 First, the perceptual similarity account predicts that exploration patterns should enable evaluation of 74 perceptual similarity with the CS+ face. This would result in exploration strategies that strongly mirror the physical 75 similarity relationships between faces ( Fig 1C) and lead potentially to globally increased dissimilarity following 76 learning. As similarity information varies both on the specific as well as the unspecific dimensions, the perceptual 77 similarity hypothesis predicts a stronger, but more importantly equal contribution of the underlying specific and 78 unspecific components (Fig 1C, middle panel). In contrast, the threat-prediction account requires that eye-79 movements support a categorization process for faces based on their outcome as harmful vs. safe (Shepard, 1987;80 Ohl et al., 2001;Vervoort et al., 2014;Dunsmoor and Murphy, 2015;Qu et al., 2016). To achieve this, exploration 81 strategies would be tailored to target locations that are maximally discriminative of the CS+ and CS-faces. This 82 would lead to exploration patterns becoming more similar for faces sharing similar features with the CS+ and CS-83 faces, while simultaneously predicting an increased dissimilarity between these two sets of exploration patterns 84 ( Fig 1D). Increased similarity only along the threat-relevant dimension would then result in an ellipsoid 85 representation of similarity relationships. Therefore, the adversity categorization hypothesis would lead to an 86 increase of the adversity specific component without influencing the unspecific component ( Fig 1D, middle panel). 87 Alternatively, but still in line with the threat-prediction framework, exploration strategies could support threat-88 prediction by tailoring viewing patterns to quickly identify the CS+. A new sensorimotor strategy exclusively for 89 the adversity predicting face would lead to a localized change in the similarity relationships around the adversity-90 predicting CS+ face ( Fig 1E) We created 8 face stimuli that were organized along a circular similarity continuum characterized by subtle 117 physical differences in facial elements across two dimensions (gender and identity; see SFig 1 for stimuli). We 118 calibrated the degree of similarity between faces using a simple model of the primary visual cortex known to mirror 119 human similarity judgments ) (see SFig 2 for calibration). The physical similarity relationship 120 between all pair-wise faces conformed with a circular organization (Fig 1A, top right panel), such that dissimilarity 121 varied with angular difference between faces (lowest for left and right neighbors and highest for opposing faces) 122 with equidistant angular steps. Participants (n = 74) freely viewed these faces before and after an aversive learning 123 procedure (Fig 2A) while we measured their eye-movements. During the conditioning phase, one of the eight faces 124 was introduced as the CS+, being partially reinforced with an aversive outcome (UCS, mild electric shock in ~30% 125 trials). The CS-was the face most dissimilar to the CS+ (separated by 180°) and was not reinforced. During the 126 subsequent generalization phase, all faces were presented again and the CS+ continued to be partially reinforced 127 to prevent extinction of the previously learnt association. These reinforced trials were excluded from the analysis. 128 To ensure comparable arousal states between the baseline and generalization phases, we administered UCSs also 129 during the baseline period, however they were fully predictable as their occurrence was indicated by a shock 130 symbol (Fig 2A). Furthermore, we inserted null trials during all phases (i.e. trials without face presentation but 131 otherwise exactly the same) in order to obtain reliable baseline levels for skin-conductance responses. 132

Fear tuning profiles in subjective ratings and autonomic activity 133
As expected, the effect of learning was mirrored both in autonomous nervous system activity as well as 134 subjective ratings of UCS expectancy ( Gray shaded areas in SCR depict response amplitudes evoked by null trials (mean and 95% CI). Scatter plots show 151 amplitude parameter of Gaussian fits (denoted by alpha symbol) for each volunteer. Horizontal lines within the 152 scatterplots depict group-level means, asterisks indicate significant differences in α (compared to baseline phase, 153 paired t-test, ***: p < .001). 154 As in previous studies, we characterized fear generalization by computing fear tuning profiles based on 155 subjective ratings and SCR. In both recording modalities, responses decayed with increasing dissimilarity to the 156 CS+ face and reached minimal values for CS-. We modeled these with a Gaussian function centered on the CS+ 157 face. At the group-level, model comparison favored the flat null model over the Gaussian in both recording 158 modalities before learning (p = .44 for SCR, p = .17 for ratings, log-likelihood ratio test; black horizontal lines in 159 Fig 2B). However, following the conditioning phase the Gaussian model fitted the data significantly better 160 (comparison to flat null model, p < .001 for SCR and subjective ratings, log-likelihood ratio test). Fear-tuning 161 profiles at the single-subject level were in agreement with the overall group-level picture. We summarized fear-162 tuning profiles of individual participants with the amplitude parameter of the fitted Gaussian function, which characterizes the modulation depth of fear tuning (i.e. the strength of fear tuning) after accounting for baseline 164 shifts. The average amplitude parameter following the conditioning phase was significantly bigger than the 165 baseline phase in both recording modalities (paired t-test, p < .001, see Fig 2B). In summary, univariate fear-tuning 166 in SCR and subjective ratings confirmed that aversive learning was successfully established and transferred 167 towards other perceptually similar stimuli. 168

Multivariate fear tuning profiles in eye movements 169
We analyzed exploration behavior using fixation density maps (FDMs). These are two-dimensional Already during the baseline period the dissimilarity matrix was highly structured ( Fig 3A). In agreement 182 with a circular similarity geometry and the MDS depiction, lowest dissimilarity values (1.05 ± .01; M ± SEM) 183 were found between FDMs of neighboring faces (i.e. first off-diagonal), whereas FDMs for opposing faces 184 separated by 180° exhibited significantly higher dissimilarity values (1.21 ± .01; paired t-test, t(73) = 7.41, p < 185 .001). Using the Perceptual Model, we investigated the contribution of physical characteristics of the stimulus set 186 to the observed pre-learning dissimilarity structure ( Fig 1B). This model uses a theoretically circular similarity 187 matrix (consisting of equally weighted sums of specific and unspecific components) as a linear predictor. This 188 model performed significantly better compared to a null model consisting of a constant similarity for all pairwise 189 FDMs comparisons (for Perceptual Model adjusted r 2 = .09; log-likelihood-ratio test for the alternative null model: 190 p < 10 -5 ; BIC NullModel = -96.1, BIC Perceptual = -244.7; see S1 Table for  Adversity Tuning model). w circle : weight for the circular component, which is the sum of equally weighted specific 207 and unspecific components; w specific /w unspecific : weights for specific and unspecific components; w Gauss : weight for 208 adversity component centered uniquely on the CS+. (**: p < .01; ***: p < .001, paired t-test). 209 We observed significant changes between baseline and generalization dissimilarity values in an element-210 wise comparison (Fig 3A, indicated by asterisks). This provides evidence for learning-induced changes in the 211 similarity relationships. Following learning, the same Perceptual Model was again significant (adjusted r 2 = .35; p 212 < .001, log-likelihood ratio test), but now performed better compared to the baseline phase (BIC Perceptual = -244.7 213 for the baseline vs. BIC Perceptual = -1697.4 for the generalization phase; see S2 Table for model fitting results). 214 Critically, we found a significant increase in the model parameter from baseline to generalization phase (w Circle = 215 0.13 ± 0.01; paired t-test, t(73) = 4.03, p < .001; Fig  in dissimilarity between FDMs. Overall, these results are compatible with the view that aversive learning led to a 217 better separation of exploration patterns globally, in agreement with the Perceptual Model (Fig 1C), which 218 predicted a major contribution of perceptual similarity on exploration patterns following learning. 219 However, according to the multi-dimensional scaling method the separation between exploration patterns 220 occurred mainly along the adversity gradient defined by the CS+ and CS-faces, whereas the separation along the 221 orthogonal unspecific direction did not exhibit any noticeable changes ( Fig 3B). We thus extended the circular 222 Perceptual Model to capture independent variance along the two orthogonal directions using the Adversity 223 Categorization model ( Fig 1D). .001), and these learning-induced changes were significantly larger in the specific as compared to unspecific 232 component (t(73) = 2.92, p < .005, paired t-test). This observation provides evidence that increased overall 233 dissimilarity was driven by changes in the scanning behavior specifically along the task-relevant adversity 234 direction. 235 We next evaluated whether the observed anisotropy in the similarity geometry was relevant for the 236 w unspecific ) could predict stronger aversive learning, as measured with the modulation depth of fear-tuning profiles 238 coming from subjective ratings and skin-conductance responses. We found weak, but significant evidence for an 239 association with an increased tuning strength in the ratings (r = .25, p = .03), which was only marginally present 240 for SCR (r = .23, p = .07). This suggests that separation of exploration patterns along the adversity gradient is 241 related to aversive learning, and rules out the possibility that it could simply result from increased exposure to the 242 CS+ and CS-faces throughout the conditioning period. 243 The remodeling of the similarity geometry along the adversity gradient can also be accompanied by 244 exploration strategies that are specifically tailored for the adversity predicting face but not for CS-resulting in 245 localized changes only around the CS+ face. We subjected this view to model comparison by augmenting the 246 previous model with a similarity component that consisted of a two-dimensional Gaussian centered on the CS+ 247 face. Positive contribution of this predictor would lead to more similar exploration patterns specifically around the 248 CS+ ( Fig 1E). It can thus capture changes in similarity relationships that are specific to the CS+ face.  Fig 3C). Also, pair-wise differences between 256 parameter estimates did not reach significance (t(73) = 0.72, p = 0.47). We therefore conclude that further 257 extensions of the Adversity Categorization model to include components for adversity-specific changes around 258 the CS+, did not result in a better understanding of the adversity-induced changes in the similarity geometry of 259 exploration strategies. 260

Temporal and spatial unfolding of adversity specific exploration 261
While SCRs and subjective ratings provide insights about the aggregate cognitive evaluations of a given 262 stimulus, eye-movements have the potential to provide information on how these cognitive evaluations unfold 263 over both spatial and temporal domains. We therefore repeated the FPSA using eye-movement data that originated 264 identify the CS+ based on their shock expectancy ratings (CS+ > CS-, n = 61). First, to gain insights on the 266 temporal dynamics of specific exploration, we used a moving window of 500 ms with steps of 50 ms and repeated 267 FPSA using the Adversity Categorization Model (Fig. 4A). Before learning, the time-course of adversity-specific 268 and unspecific components was not distinguishable. However, during the subsequent generalization phase they 269 diverged rather early. We tested the anisotropy difference in similarity geometry (w spefic -w unspecific ) before and 270 after learning. The difference reached significance first at the time window corresponding to the interval 400-900 271 ms after stimulus onset ( Fig. 4A top row, paired t-test, p = 0.03). As humans explore visual scenes serially with 272 fixational eye-movements, the order of fixations (1 st fixation, 2 nd fixation, and so on) is another natural metric to 273 evaluate the temporal progress (Tatler et al., 2005). The same analysis indicated that adversity-specific exploration 274 started following the first fixation (note that the first fixation is the landing fixation on the face following stimulus 275 onset; Fig. 4A) and stayed constant during the stimulus presentation. Overall, the temporal FPSA indicated that 276 humans started to forage for adversity-specific information early, as soon as after the first landing fixation. 277 One limitation of FPSA is that the information about the spatial origin of specific or unspecific exploration 278 strategies is lost. To circumvent this limitation, we ran FPSA at localized portions of the FDMs in a similar manner 279 to a searchlight analysis in brain imaging (Kriegeskorte et al., 2007). For a given spatial portion (defined by a 280 square window of 30 pixels, ~1 visual degree), we fitted the Adversity Categorization Model, and assigned specific 281 and unspecific weights to the center position of the searchlight (Fig. 4B). The map that is obtained by repeating 282 this analysis at all spatial locations provides an indication of the facial locations that are explored either with a 283 specific or unspecific exploration strategy. We found that both before and after learning, specific and unspecific 284 components were strongly localized around the eye region. We tested the difference in anisotropy between the 285 baseline and test phases within the three commonly used regions of interest at different facial elements (eyes, nose 286 and mouth; ROIs shown in Fig 4C)  Overall, searchlight FPSA indicated that adversity-specific exploration strategies were specifically tailored to find 294 differences around the eye region. anisoptropy (w specific -w unspecific ) between before and after aversive learning (test -baseline) for three different 306 ROIs (*: p < 0.01, paired t-test). 307

Comparison of FPSA to ROI-based analysis of eye-movements 308
In order to control whether the multivariate approach of FPSA on eye movement patterns exceeded the 309 sensitivity of common ROI-based analyses on fixation counts, we computed changes in fixation counts in the three 310 common regions of face stimuli, i.e. eyes, nose and mouth (same as depicted Fig 4C). If conditioning before the 311 generalization phase lead to an increased saliency of facial features that are diagnostic of the CS+ face, one would 312 expect a non-flat fear tuning in the number of fixations towards these facial features, which would receive more 313 fixations with increasing similarity to the CS+ face. In line with previous reports (Walker-Smith et al., 1977), eyes 314 together with the nose region were the most salient locations across the baseline and generalization phases, and 315 attracted ~84% of all fixation density, whereas the mouth region had only a marginal contribution with ~3.5%. 316 Investigating changes from baseline to generalization phase, we found that aversive learning increased the number 317 of fixations directed at the nose (+4.4%) and mouth (+0.8%) regions at the expense of the eye region (-5.3%). 318 Most importantly, model comparison on fixation density favored the flat null model for all regions even at low 319 statistical thresholds in all facial elements across both baseline and generalization phases (p > 0.05, log-likelihood 320 test; SFig. 3). While a weak tuning was apparent in the mouth region during generalization, this did not reach 321 significance (p > 0.1). Therefore, our observations at the group-level were limited to unspecific changes between 322 phases that were independent of the adversity gradient introduced through conditioning. 323

324
The present work tested the validity of competing hypotheses on aversive generalization. We used a 325 stimulus continuum that was defined by perceptual similarity, as well as threat-related information as two 326 independent perceptual factors. As an extension of similarity-based multivariate pattern analyses used to 327 investigate representational content of neuronal activity in fMRI (Kriegeskorte et al., 2008)  in dissimilarity along the direction of the adversity gradient. This adversity-specific exploration strategy appeared 334 early, as soon as following the landing fixation, and lasted continuously for the duration of stimulus presentation, 335 mainly to forage perceptual evidence around the eye region. This shows that behavior during fear generalization 336 was specifically tailored to detect differences along the threat-relevant stimulus dimension. Overall, these changes 337 in exploration patterns indicate that fear generalization can be understood as an active process related to the 338 prediction of threat, and not simply a response to perceptual similarity between learned and novel samples. 339 Perceptual similarity plays a decisive role to the extent it predicts the occurrence of relevant events. 340 Our results showed that the separation between the CS+ and CS-exploration patterns were increased the 341 most. There are at least two different scenarios, which could lead to this observation (Fig. 5A). This regards the 342 way how learning potentiates sensory features as being diagnostic for the prediction of harmful outcomes, and 343 making them target locations for attentional allocation during active viewing. In the first scenario, aversive learning 344 potentiates unique combination of visual features that are specific to the harm-predicting item, namely the CS+ 345 face identity (Fig. 5A, top panel). Alternatively, aversive learning can lead to an increased saliency for 346 discriminative features that separate best the harm-and safety-predicting prototypes. In this view, aversive learning 347 consists of recovering the vector of features that defines the adversity gradient (Fig. 5A, bottom panel). This feature 348 vector could overlap with a categorical information that is either naturally present (such as gender, ethnicity or 349 emotional expression), or learned de novo with experience (Kietzmann and König, 2010; Qu et al., 2016). Both 350 scenarios lead to an increased separation between the CS+ and CS-poles as observed in this study. However, they 351 have divergent predictions when tested with stimuli organized in three concentric circles (Fig. 5B). If learning 352 modifies uniquely identity-specific representations, faces from outer and inner circles that are close to the CS+ 353 would be explored similarly, hence resulting in a shrinkage of the similarity geometry around the CS+ face. On 354 the other hand, if learning is based on a vector representation, the pattern separation would add to differences that 355 are already present, resulting in three concentric ellipses centered on the same point. harmful and safety predicting stimuli. When tested subsequently with stimuli organized as three concentric 363 stimulus gradients, this scenario predicts three concentric ellipses sharing the same center of gravity (gray 364 horizontal line) (B) In another scenario, during conditioning humans learn the specific feature values that predict 365 an harmful outcome (black square). When tested with the concentric stimulus set, this scenario predicts faces that 366 are similar to the CS+ face to be explored similarly. This would result in a global shift in the center of gravity 367 towards the CS+ face (yellow, orange and red horizontal lines). 368 Our results contrast with previous conceptualizations of fear generalization as driven simply by perceptual 369 similarity. According to the perceptual model, fear generalization is viewed as resulting from perceptual similarity 370 between current observations and an event that is known to truly predict a behaviorally relevant outcome. This 371 model has received substantial support from the associative learning theory, which provided a mechanistic 372 framework for the perceptual model (Lissek et al., 2014). In this view, the extent to which a novel stimulus evokes 373 fear-related responses is directly proportional to the overlap in the associative connections formed previously 374 during learning. Different theories within this associative framework have emphasized either elemental (Rescorla 375 and Wagner, 1972) or configural (Pearce, 1987(Pearce, , 2002 factors in the computation of a perceptual similarity metric 376 for the guidance of generalization. However, these invariably reduced the cognitive ability for generalization to a 377 byproduct of perceptual similarity between learning and generalization samples, as has been indicated in recent 378 reviews (Soto et  requires instead a consistent change in the similarity relationships of exploration patterns. Therefore, one major 406 benefit of FPSA was making it possible to test the validity of hypotheses in the presence of highly variable inter-407 subject eye-movement patterns. Second, and equally importantly, it is not clear how one could test different 408 hypotheses outlined here using one-dimensional generalization profiles -collected from subjective ratings, 409 autonomous recordings or fixation counts from individual ROIs. Therefore, another benefit of FPSA was making 410 it possible to subject a rich set of hypotheses to statistical testing. 411 Could our results be explained by an unbalanced exposure to the CS+ and CS-faces that were presented 412 during the conditioning phase? As participants have seen these faces more often, one can argue that this could 413 potentially bias eye movement patterns. While we cannot completely exclude a contribution of exposure, we have 414 shown that anisotropy correlates with indicators of aversive learning as measured by subjective ratings and, to 415 lesser extent with autonomous activity. Furthermore, unpublished observations on neuronal recordings present 416 evidence that humans do differentiate CS+ and CS-faces even when controlling for the effect of exposure. 417 Therefore, we believe that the major drive that leads to the separation of patterns along the specific axis are due to 418 the affective nature of learning, rather than occurring merely as a result of exposure. 419 Eye-movements patterns can provide important insights about what the nervous system tries to achieve 420 as they summarize the final outcome of complex interactions at the neuronal level (König et al., 2016). Our results 421 demonstrate that changes induced by aversive generalization extend beyond autonomous systems or explicit 422 subjective evaluations, but can also affect an entire sensory-motor loop at the systems level (Dowd et al., 2016). 423 Furthermore, the methodology applied here can easily be extended to neuronal recordings, where gradients of 424 activity during generalization have been successfully used to characterize selectivity of aversive representations. 425 Therefore, it will be highly informative to test different hypotheses we outlined here using neuronal recordings 426 with representational similarity analysis during the emergence of aversive representations. 427

Participants 429
Participants were 74 naïve healthy males and females (n = 37 each) with normal (or corrected-to-normal) 430 vision (age = 27 ± 4, M ± SD) and without history of psychiatric or neurological diseases, any medical condition 431 or use of medication that would alter pain perception. Participants had not participated in any other study using 432 facial stimuli in combination with aversive learning before. They were paid 12 Euros per hour for their 433 participation in the experiment and provided written informed consent. All experimental procedures were approved 434 by the Ethics committee of the General Medical Council Hamburg. 435

Data sharing 436
The dataset used in this manuscript has been published as a dataset publication (Wilming et al., 2017). 437 We publicly provide the stimuli as well as the Matlab (MathWorks, Natick MA) code necessary for the 438 reproduction of all the results presented in this manuscript (Onat and Kampermann, 2017). The code can be used 439 to download the data as well. 440

Stimulus preparation and calibration of generalization gradient 441
Using a two-step procedure, we created a final set of 8 calibrated faces (Fig 1A, see also SFig 1) that were 442 perceptually organized along a circular similarity continuum based on a model of the primary visual (V1) cortex. 443 Using the FaceGen software (FaceGen Modeller 2.0, Singular Inversion, Ontario Canada) we created two gender-444 neutral facial identities and mixed these identities (0%/100% to 100%/0%) while simultaneously changing the 445 gender parameters in two directions (more male or female). In the first step, we created a total of 160 faces by 446 appropriately mixing the gender and identity parameters to form 5 concentric circles (see SFig 1) based on FaceGen 447 defined parameter values for gender and identity. Using a simple model of the primary visual cortex known to 448 closely mirror human perceptual similarity judgments (Yue et al., 2012), we computed V1 representations for each 449 face after converting them to grayscale. The spatial frequency sensitivity of the V1 model was adjusted to match 450 human contrast sensitivity function with bandpass characteristics between 1 and 12 cycles/degree, peaking at 6 451 cycles/degrees (Blakemore and Campbell, 1969). The V1 model consists of pair of Gabor filters in quadrature at 452 five different spatial scales and eight orientations. The activity of these 40 channels were averaged in order to 453 obtain one single V1 representation per face. We characterized the similarity relationship between the V1 454 representations of 160 faces using multidimensional scaling analysis with 2 dimensions (SFig 2). As expected, 455 while two dimensions explained a large variance, the improvement with the addition of a third dimension was only 456 minor, providing thus evidence that the physical properties of the faces were indeed organized along two-457 dimensions (stress values for 1D, 2D and 3D resulting from the MDS analysis were 0.42, .04, .03, respectively). 458 The transformation between the coordinates of the FaceGen software values (gender and identity mixing values) 459 and coordinates returned by the MDS analysis allowed us to gather FaceGen coordinates that would correspond to 460 a perfect circle in the V1 model. In the second step, we thus generated 8 faces that corresponded to a perfect circle. 461 This procedure ensured that faces used in this study were organized perfectly along a circular similarity continuum 462 according to a simple model of primary visual cortex with well-defined bandpass characteristics known to mirror 463 human similarity judgments. Furthermore it ensured that dimensions of gender and identity introduced independent 464 variance on the faces. 465 To present these stimuli we resized them to 1000x1000 pixels (originals: 400x400) using bilinear 466 interpolation, and slightly smoothed with a Gaussian kernel of 5 pixels with full-width at half maximum of 1.4 467 pixels to remove any possible pixel artifacts that could potentially lead participants to identify faces. Faces were 468 then normalized to have equal luminance and root-mean-square contrast. The gray background was set to the same 469 luminance level ensuring equal brightness throughout of the experiment. Faces were presented on a 20" monitor 470 (1600 x 1200 pixels, 60 Hz) using Matlab R2013a (Mathworks, Natick MA) with psychophysics toolbox 471 (Brainard, 1997;Pelli, 1997). The distance of the participants' eyes to the stimulus presentation screen was 50 cm. 472 The center of the screen was at the same level as the participants' eyes. Faces spanned horizontally ~17° and 473 vertically ~30°, aiming to mimic a typical face-to-face social situation. Stimuli are available in (Onat and 474 Kampermann, 2017). 475

Experimental paradigm 476
The fear conditioning paradigm (similar to (Onat and Büchel, 2015)) consisted of baseline, conditioning 477 and test (or generalization) phases (Fig 2A). Participants were instructed that the delivery of UCSs during baseline 478 would not be associated with faces, however in the following conditioning and generalization phases they were 479 instructed that shocks would be delivered after particular faces have been presented. In all three phases, subjects 480 were instructed to press a button when an oddball stimulus appeared on the screen. 481 Four equivalent runs with exactly same number of trials were used during baseline (1 run) and 482 generalization phases (3 runs) consisting of 120 trials per run (~10 minutes). Every run started with an eye-tracker 483 calibration. Between runs participants took a break and continued with the next run in a self-paced manner. We 484 avoided having more than 1 run in the baseline period in order not to induce fatigue in participants. At each run 485 during the baseline and generalization phases, 8 faces were repeated 11 times, UCS trials occurred 5 times and one 486 oddball was presented. This consisted of a blurred unrecognizable face, which volunteers were instructed to press 487 a key. We presented 26 null trials with no face presentation but otherwise the same trial structure (see below 488 sequence optimization). In order to keep arousal levels comparable to the generalization phase, UCSs were also 489 delivered during baseline, however they were fully predictable by a shock symbol therefore avoiding any face to 490 UCS associations. providing an optimal design efficiency (thus making deconvolution of autonomic skin conductance responses more 504 reliable). However, all conditions in an m-sequences appear equally number of times. Therefore, in order to 505 achieve the required reinforcement ratio (~30%), we randomly pruned UCS trials and transformed them to null 506 trials. Similarly, oddball trials were pruned to have an overall rate of ~1%. This resulted in a total of 26 null trials. 507 While this deteriorated the efficiency of the m-sequence, it was still a good compromise as the resulting sequence 508 was much more efficient than a random sequence. Resulting from the intermittent null trials, SAOs were 6 or 12 509 seconds approximately exponentially distributed. 510 Face onsets were preceded by a fixation-cross, which appeared randomly outside of the face either on the 511 The side of fixation-cross was balanced across conditions to avoid confounds that might occur (Arizpe et al., 2015). 513 Therefore, the first fixation consisted of a landing fixation on the face. 514

Calibration and delivery of electric stimulation 515
Mild electric shocks were delivered by a direct current stimulator (Digitimer Constant Current Stimulator, 516 Hertfordshire UK), applied by a concentric electrode (WASP type, Speciality Developments, Kent UK) that was 517 firmly connected to the back of the right hand and fixated by a rubber glove to ensure constant contact with the 518 skin. Shocks were trains of 5-ms pulses at 66Hz, with a total duration of 100 ms. During the experiment, they were 519 delivered right before the offset of the face stimulus. The intensity of the electric shock applied during the 520 experiment was calibrated for each participant before the start of the experiment. Participants underwent a QUEST 521 procedure (Watson and Pelli, 1983) presenting UCSs with varying amplitudes selected by an adaptive algorithm 522 and were required to report whether a given trial was "painful" or "not painful" in a binary fashion using a sliding 523 bar. The QUEST procedure was repeated twice to account for sensitization/habituation effects, thus obtaining a 524 reliable estimate. Each session consisted of 12 stimuli, starting at an amplitude of 1mA. The subjective pain 525 threshold was the intensity that participants would rate as "painful" with a probability of 50%. The amplitude used 526 during the experiment was 2 times this threshold value. Before starting the actual experiment, participants were 527 asked to confirm whether the resulting intensity was bearable. If not then the amplitude was incrementally reduced 528 and the final amplitude was used for the rest of the experiment. 529

Eye tracking and fixation density maps 530
Eye tracking was done using an Eyelink 1000 Desktop Mount system (SR Research, Ontario Canada) 531 recording the right eye at 1000 Hz. Participants placed their head on a headrest supported under the chin and 532 forehead to keep a stable position. Participants underwent a 13 point calibration / validation procedure at the 533 beginning of each run (1 Baseline run, 1 Conditioning run and 3 runs of Generalization). The average mean-534 calibration error across all runs was Mean = 0.36°, Median = .34°, SD = 0.11. 91% of all runs had a calibration 535 better than or equal to .5°. 536 Fixation events were identified using commonly used parameter definitions (Wilming et al., 2017) 537 (Eyelink cognitive configuration: saccade velocity threshold = 30° / second, saccade acceleration threshold = 538 8000° per second 2 , motion threshold = .1°). Fixation density maps (FDMs) were computed by spatially smoothing 539 (Gaussian kernel of 1° of full width at half maximum) a 2D histogram of fixation locations, and were transformed 540 to probability densities by normalizing to unit sum. FDMs included the center 500x500 pixels, including all facial 541 elements where fixations were mostly concentrated (~95% of all fixations). 542

Shock expectancy ratings and autonomic recordings 543
After baseline, conditioning and generalization phases, participants rated different faces for subjective 544 shock expectancy by answering the following question, "How likely is it to receive a shock for this face?". Faces 545 were presented in a random order and rated twice. Subjects answered using a 10 steps scale ranging from "very 546 unlikely" to "very likely" and confirmed by a button press in a self-paced manner. 547 Electrodermal activity evoked by individual faces was recorded throughout the three phases. Reusable 548 Ag/AgCl electrodes filled with isotonic gel were connected to the palm of the subject's left hand using adhesive 549 collars, placed in thenar/hypothenar configuration. Skin-conductance responses were continuously recorded using 550 a Biopac MP100 AD converter and amplifier system at a sampling rate of 500 Hz. Using the Ledalab toolbox 551 (Benedek and Kaernbach, 2010a, 2010b), we decomposed the raw data to phasic and tonic response components 552 after downsampling it to 100 Hz. Ledalab applies a positively constrained deconvolution technique in order to 553 obtain phasic responses for each single trial. We averaged single-trial phasic responses separately for each 554 condition and experimental phase to obtained 21 average values (9 (8 faces + 1 null condition) from baseline and 555 generalization and 3 (2 faces + 1 null condition) from the conditioning phase). CS+ trials with UCS were excluded 556 from this analysis. These values were first log-transformed (log 10 (1+SCR)) and subsequently z-scored for every 557 subject separately (across all conditions and phases), then averaged across subjects. Therefore, negative values 558 indicate phasic responses that are smaller than the average responses recorded throughout the experiment. Due to 559 technical problems, SCR data could only be analyzed for n = 63 out of the 74 participants. 560

Nonlinear modelling and model comparison 561
We fitted a von Mises function (circular Gaussian) to generalization profiles obtained from subjective 562 ratings, skin-conductance responses and fixation counts at different ROIs by minimizing the following likelihood 563 term in (1) following an initial grid-search for parameters 564 (1) 565 where x represents signed angular distances from a given volunteer's CS+ face; G(x|q) is a von Mises-566 like function that was used to model the adversity tuning. It is defined by the parameter vector q, which codes for 567 the resulting generalization profile; D(x) represents the observed generalization profile for different angular 569 distances; and N(x| 0, s) is the normal probability density function with mean zero and standard deviation of s. 570 The fitting procedure consisted of finding parameters values that minimized the sum of negative log-transformed 571 probability values. Using log-likelihood ratio test we tested whether this model performed better than a null 572 model consisting of a horizontal line, effectively testing the significance of the additional variance explained by 573 the model. G(x) was a scaled and shifted version of a normalized von Mises function in the form 574 G(x) = a · V(x | K, µ) + q (2) 575 a represents the depth of adversity tuning which corresponds to the difference between peak and baseline 576 responses, and q sets to the baseline level. K and µ controls the precision of the tuning and the peak position of 577 adversity tuning, respectively. V(x) is a modified von Mises function that is scaled to fit between 0 and 1 using 578 the following equation: 579 V(x) = [exp( K·cos(x-µ)) -exp(-K) ] / exp(K) -exp(-K) (3) 580

Fixation-pattern similarity analysis 581
FPSA was conducted on single participants. Condition specific FDMs (8 faces per baseline and 582 generalization phases) were computed by collecting all fixations across trials on a single map which was then 583 normalized to unit sum. We corrected FDMs by removing the common mean pattern (done separately for baseline 584 and generalization phases). We used 1 -Pearson correlation as the similarity metric. This resulted in a 16x16 585 similarity matrix per subject. Statistical tests for element-wise comparison of the similarity values were conducted 586 after Fisher transformation of correlation values. The multidimensional scaling was conducted on the baseline and 587 generalization phases jointly using the 16x16 similarity matrix as input (mdscale in MATLAB). Importantly, as 588 the similarity metric is extremely sensitive to the signal to noise ratio (Diedrichsen et al., 2011)

present in the 589
FDMs, we took precautions that the number of trials between generalization and baseline phases were exactly the 590 same in order to avoid differences that would have been caused by different signal to noise ratios. To account for 591 unequal number of trials during the baseline (11 repetitions) and generalization (3 runs x 11 = 33 repetitions) 592 phases, we computed a similarity matrix for each run separately in the generalization phase. These were later 593 averaged across runs for a given participant. This ensured that FDMs of the baseline and generalization phases had 594 comparable signal-to-noise ratios, therefore not favoring the generalization phase for having more trials. 595 We generated 3 different models based on a quadrature decomposition of a circular similarity matrix. A 596 circular similarity matrix of 8x8 can be obtained using the term MÄM, where M is a 8x2 matrix in form of [cos(x) 597 sin(x)], and the operator Ä denotes the outer product. x represents angular distances from the CS+ face, is equal to 598 0 for CS+ and π for CS-. Therefore, while cos(x) is symmetric around the CS+ face, sin(x) is shifted by 90°. For 599 the Bottom-up Saliency and Increased Arousal models (Fig 1B and C) we used MÄM as a predictor together with 600 a constant intercept. For the tuned exploration model depicted in Fig 1D, we used cos(x) Äcos(x) and sin(x)Äsin(x) 601 to independently model ellipsoid expansion along the specific and unspecific directions, respectively. Together 602 with the intercept this model comprised 3 predictors. Finally the aversive generalization model (Fig 1E) was 603 created using the predictors of the tuned exploration model in conjunction with a two-dimensional Gaussian 604 centered on the CS+ face (in total 4 predictors). We tested different widths for the Gaussian and took the one that 605 resulted in the best fit. This was equal to 65° of FWHM and similar to the values we observed for univariate 606 explicit ratings and SCR responses. 607 All linear modeling was conducted using non-redundant, vectorized forms of the symmetric dissimilarity 608 matrices. For a 8x8 dissimilarity matrix this resulted in a vector of 28 entries. Different models were fitted as 609 mixed-effects, where intercept and slope contributed both as fixed-and random-effects (fitlme in Matlab). We 610 selected mixed-effect models as these performed better than models defined uniquely with fixed-effects on 611 intercept and slope. To do model selection, we used Bayesian information criterion (BIC) as it compensates for an 612 increase in the number of predictors between different models. Additionally, different models were also fitted to 613 single participants (fitlm in Matlab) and the parameter estimates were separately tested for significance using t-614

test. 615
For the analysis of temporal and spatial unfolding of adversity specific exploration patterns, the same 616 analysis was run, but restricted to include only the given time windows / fixations. In this analysis, we only 617 included participants who had a significant fear-tuning of explicit shock ratings from the generalization phase (n 618 = 61). For the time-windowed approach, periods of 500 ms were used, repeated shifted by 50 ms, thereby obtaining 619 a rolling-window analysis, while fixation-wise analyses were based on FDMs that included only on the given (1 st , 620 2 nd , and so on) fixation. 621 622