Abstract
The present study attempts to identify how trait anxiety, measured as worry-level, affects the processing of threatening speech. Two experiments using dichotic listening tasks were implemented; where participants had to identify sentences that convey threat through three different information channels: prosody-only, semantic-only and both semantic and prosody (congruent threat). We expected different ear advantages (left or right) depending on task demands, information type, and worry level. We used a full Bayesian approach for statistical modelling and analysis. Results indicate that when participants made delayed responses (Experiment 1), reaction times barely increased as a function of worry level, but under time pressure (Experiment 2) worry level induced clear decreases in reaction times. We explain these results in terms of multistep models of anxiety and language, concluding that present results mainly indicate effects threat aversion-related over-attention to threat, and that we do not provide enough evidence for supporting the integration of anxiety and language multiphasic models.
Introduction
Humans can convey emotion information through different channels, and in the particular case of language the manipulation of tone and/or meaning (i.e. prosody and semantics) are common ways to do so. While prosodic information relies on suprasegmental variation of intensity, pitch, voice quality and duration, semantic information relies on segmental information: morphemes (minimal meaningful language units) composed of varying combinations of phonological segments (Liu et al., 2013). These different informational features (suprasegmental information associated with prosody, and segmental information associated with semantics) can develop together in a complex language emission such as an emotional sentence, and can convey emotional information simultaneously (Nygaard et al., 2009; Schirmer and Kotz, 2003). To our knowledge, whether intrinsic affect differences between individuals (e.g. variation in trait anxiety) has differentiable effects on prosody and semantics remains an unexplored problem in language perception and comprehension research. Investigating this possible connection can bring to light possible effects of anxiety on language processing, moving forward the understanding of individual differences in language processing but also refining understanding of speech information properties.
The present study aims to understand the effect of trait anxiety on these information properties of speech. We use dichotic listening (DL) which provides a robust test of functional hemispheric lateralization (Hugdahl, 2011), tapping into features of both speech (language) and anxiety (affect) processing. DL can provide a behavioural test of laterality in such a way that information- and affect-related aspects of processing can be disentangled. Normally, responses to DL tasks that do not involve prosody or emotion indicate a right ear advantage (REA): faster response times and/or higher accuracy for language processing at stimuli presented at the right ear, (Hugdahl, 2011). Differently, DL responses to emotional and/or prosodic stimuli show either diminished REAs or a left ear advantage (LEA) (Godfrey and Grimshaw, 2015; Grimshaw et al, 2003).
The idea of exploring laterality in this way is based on previous theoretical models and supporting evidence indicating that brain hemispheres have different processing functions for both speech’s information features and intrinsic affect. On the language side, evidence suggests the left lateralization of segmental aspects of speech and right lateralization of suprasegmental features of speech (Poeppel, 2003; Poeppel et al., 2007; Zatorre, 2001; Zatorre et al., 2002). On the affect side, the relationship between affect and cognition in anxiety is understood to be mediated by right-lateral prefrontal cortex (Gable et al., 2019). Other approaches distinguish between anxious arousal (physiological hyperarousal) and anxious apprehension (worry), where the first is posited as right lateralized and the second as left lateralized or bilateral (Heller et al., 1997; Nitschke et al., 1999; Spielberg et al., 2013). This could imply that intrinsic lateralization patterns induced by individual differences (e.g. anxiety) could match emotional speech’s lateralization patterns. Hence, different information properties (i.e. semantics or prosody) conveying similar emotions (i.e. threat) could affect anxious people in different ways by enhancing or dampening their inherent lateralization patterns when processing emotional stimuli. This motivates the question: what is the difference between semantic and prosodic comprehension in natural emotional expression as processed by anxious people? Before answering this question, we need to find out the points of connection between speech, emotional language and anxiety processing, if any.
Emotional Language Lateralization
Neuroscientific research has observed that information conveyed through prosody or semantics/syntax is processed via differently lateralized brain routes (Belin et al, 2004). Other findings indicate that this difference might be due mainly to the emotional content of language stimuli (Liebenthal et al., 2005). These differences, however, may not be exclusive. Indeed, if emotional language lateralization is considered as a phasic process, then differences in lateralization might change at any point of the processing time-course (Schirmer and Kotz, 2006). Hence, some of these differences might be related to informational processing and others to the processing of affect/cognition. Therefore, hemisphericity patterns might be due to both emotional and speech processing, but a particular observed left, right or bilateral orientation might be evident depending on the observed time phase. One model addressing this issue is the multistep model of emotional language, which proposes three main processing stages: early stage perceptual processing, mid stage recognition processing, late stage evaluation processing (Kotz and Paulmann, 2011).
Under this model, early stages involve the processing of acoustic properties (purely acoustic information), where greater right hemisphere (RH) engagement would be associated to prosodic processing and left hemisphere engagement (LH) would be associated to phonological processing (Poeppel, 2003; Zatorre, 2001). This leads to the interpretation that LH might process segmental (phonologically composed words) information better, while RH privileges suprasegmental (prosody) information. Mid stages might involve the emotional recognition of stimuli (e.g. integration of previously processed information), implying greater involvement of RH or LH depending on stimulus type and/or conveyed emotion (Schirmer and Kotz, 2006). Late stages would be associated to informational integration and evaluation of emotional stimuli (Kotz and Paulmann, 2011). Another crucial aspect of the model is that it also considers information transferring between hemispheres (Kotz and Paulmann, 2011). Mechanism of callosal relay have been proposed as important aspects for RH to LH (and vice versa) communication of prosodic and syntactic information (Friederici et al, 2007), and also interhemispheric communication of emotional prosody processing (Ross et al, 1997). In addition, callosal relay mechanisms have been proposed as an explanation for different effects of emotional semantics and prosody processing in a dynamic model of DL (Grimshaw et al, 2003). Hence, the observation of bilateral involvement does not necessarily mean that both hemispheres are processing the same information/task, and the observation of unilateral processing does not necessarily mean that the contralateral hemisphere does not play a role.
With all this in mind, there are some relevant issues that this model does not take into account. First, the process does not need to end at an evaluation stage, as natural responses to emotional stimuli are, in general, behaviourally oriented (Vuilleumier, 2005). Thus, a fourth stage associated to goal-orientation might be required to fully grasp emotional language processing. Goal-orientation can be understood as the interruption or pursuing of an organism’s current goals (Bar-Haim et al., 2007), such as the interruption of current behaviour after the perception of a threatening stimulus in order to re-assess situation and environment. Multistage models of intrinsic affect (i.e. anxiety) have proposed that goal-orientation comprises a fourth stage following three initial stages: pre-attentive evaluation of threat, re-orientation of attention, and threat evaluation (Bar-Haim et al., 2007). This is a remarkable match with models of emotional language processing (Kotz and Paulmann, 2011), which match well with the first three of these stages, characterized in the language literature as: identification, recognition, and evaluation. Hence, after evaluation, a deliberation (goal-orientation) stage is a theoretically relevant (if not necessary) theoretical addition. In this sense, tasks that induce overt behaviour should make such a deliberation stage evident, in which participants need to decide about their responses after evaluating the stimuli. Whether this stage has idiosyncratic lateralization patterns as proposed for the previous three stages (Kotz and Paulmann, 2011) is something that has not been consistently explored yet.
Anxiety and Threat: Affect Lateralization
The strong effects of anxiety over deliberation processes, such as those induced by worry (Corr and McNaughton, 2012; McLaughlin et al., 2007), could imply that people high in trait anxiety process and respond differently to threatening semantics or prosody, including diversified lateralization patterns. The lateralization of affect might not only depend upon processing the emotional content of a particular stimulus, of any type (not only language), that can induce a specific emotion (e.g. threat-inducing fear or anxiety), but also upon individual differences between participants that may also cause different lateralization patterns. This is especially indicated by studies that demonstrate such variation not only when processing emotional stimuli but also during resting state (Nietschke et al., 2000; Engels et al., 2007).
Variation in lateralization patterns related to anxiety have been discussed from a number of different theoretical perspectives. First, anxiety has been proposed to be elicited by a behavioural inhibition system (BIS), which stops approaching behaviour of the organism in order to allow this organism to scan the environment in search of potential threat (McNaughton and Gray, 2000; Corr and McNaughton, 2012). Second, in the approach-withdrawal model, LH would be more engaged in approach-related emotions, while RH would show more involvement on withdrawal-related emotions (Davidson, 1992). This has been also captured by the valence-arousal model (Heller et al, 1997), where two types of anxiety are distinguished: anxious apprehension (worry-related) and anxious arousal (physiological hyperarousal), processed by LH and RH respectively.
In effect, models of anxiety processing propose BIS as a conflict resolution system (Corr and McNaughton, 2012), where anxiety can be interpreted as a plausible intermediate state between approach and withdrawal, or calm and fear. Here, behaviour inhibition and arousal increase in preparation to approach/withdraw responses when possible or needed (McNaughton and Corr, 2014). In other words, behavioural inhibition for environmental scanning might increase arousal levels (McNaughton and Gray, 2000), which can induce fear-related responses if stimuli within the environment appear threatening enough. Thus, the interplay of lateralization patterns associated to worry and arousal might develop differently through the time-course of stimulus evaluation. Indeed, evidence from a functional magnetic resonance imaging (fMRI) study indicates that emotional language induces different lateralization responses, at different processing stages, for different types of anxiety (Spielberg et al, 2013). Where anxious apprehension is associated to a later and continued involvement of LH structures, interpreted as over-engagement with threat (e.g. rumination), and matching evaluation (mid-phase) and orientation/deliberation (late phase) stages (Bar-Haim et al., 2007). Differently, anxious arousal was associated with a faster and of shorter duration RH response interpreted as over-attention to threat, thus matching pre-attentive (early) and re-orientation (early-mid) stages (Bar-Haim et al., 2007).
Electroencephalography (EEG) research using the event-related potential technique (ERP), which offers high temporal resolution, has observed this over-attention and over-engagement response when threatening faces are used as stimuli (Eldar et al, 2010). Over-engagement with threat has also been observed when people with generalized anxiety disorder respond to threatening images (MacNamara and Hajcak, 2010). Furthermore, recent research has observed that socially anxious people present a right lateralized over-attention response to threatening words (Wabnitz, 2015). Hence, trait anxiety might directly affect the processing of emotional language. Previous EEG evidence indicates that anxiety has an effect on the recognition of prosody (Pell et al, 2015). However, not much is known about the interaction between emotion (threat) as conveyed through different information channels (prosody, semantics) and intrinsic affect (anxiety). If phasic lateralization patterns are integrated in a multistage model of anxiety (Bar-Haim et al, 2007) and this is compared to a multistep model of emotional language (Kotz and Paulmann, 2011), then it might be possible to predict very specific behavioural responses for anxious and non-anxious people. More precisely, there is a possible overlap between language processing mechanisms and anxiety processing mechanisms, which could become evident by comparing how people with higher trait anxiety processes different types of speech (i.e. prosody and semantics) as compared to less anxious people.
Present Experiment
As previously mentioned, emotional and/or prosodic stimuli show either diminished REAs or a left ear advantage (LEA) in some DL studies (Godfrey and Grimshaw, 2015; Grimshaw et al, 2003), indicating a RH processing preference for emotion and/or prosody. However, few dichotic listening (DL) experiments have researched the effects of anxiety on emotional speech processing (Gadea et al, 2011). They either use speech/prosody as an emotion-eliciting stimulus or use DL mainly as an attentional manipulation technique (Bruder et al., 1999; 2005; Leshem, 2018; Peschard et al., 2016; Sander et al., 2005). As a result, they are limited in the extent to which they reveal the relationship between dynamic variations in emotion language processing (prosody/semantics). Instead, studies focusing on the dynamic properties of emotional language, whether using DL or not (e.g. measuring laterality through electrophysiological measures), do not tend to consider individual differences (e.g. Godfrey and Grimshaw, 2015; Grimshaw et al., 2003; Kotz and Paulmann, 2007; Paulmann and Kotz, 2012; Techentin et al., 2009; Wabacq and Jerger, 2004). Therefore, on one side of the picture speech stimuli are typically treated as generic threatening stimuli, so possible differences induced by the informational features of speech that may vary over time are overlooked. On the other side, participants are typically regarded as a homogeneous group, so possible differences induced by anxiety-related processing, that may vary over time and may differ across informational features are overlooked.
Another important thing to consider is that in natural speech, emotional prosody might not be constrained to a single word, as is the case in the experimental manipulations of most of the studies we have cited above. However, semantics is always constrained by sentence’s structure and lexical meaning. In other words, while a lexical item needs to be identified within a sentence in order for emotional semantics to be recognized, prosody might be expressed from the beginning of a sentence. This makes difficult to generalize from word level, or highly controlled sentences, to real world emotional utterances.
To address these issues, we designed two web-based DL experiments, using semi-naturalistic sentences in order to ensure dynamic language processing beyond the single word level. Participants were asked to discriminate between neutral and threatening sentences (the latter expressing threat via semantics, prosody or both), in a direct-threat condition: identifying whether a threatening stimulus was presented to the left or right ear, and in an indirect-threat condition: identifying whether a neutral stimulus was presented the left or right ear. Participant’s anxiety level was measured by using a psychometric scale. By so doing we were able take advantage of past studies researching the attentional effects of threatening language on anxiety and of studies researching the dynamics of speech’s informational properties within a single study.
Both speech processing and anxiety literature seem to converge on theoretical perspectives incorporating multistep models, so we designed two experiments to tap into different points in processing for which individual variation in anxiety may affect speech. In particular, we aimed to differentiate responses made at late evaluative stages (delayed response) vs. responses made at earlier attentive stages (online response) as early over-attention to threat (Bar-Haim et al., 2007) might affect earlier prosody/semantic lateralization patterns (Kotz and Paulmann, 2011), and later over-engagement with threat (Bar-Haim et al., 2007) might affect later emotional language evaluation stages (Kotz and Paulmann, 2011). Thus, Experiment-1 required participants to wait until after sentences’ offset to respond (delayed response), and Experiment-2 required participants to respond during sentence presentation (online response).
For Experiment-1 we hypothesize that anxious over-engagement with threat at mid-late evaluative stages (Bar-Haim et al., 2007) should increase left hemisphere (LH) engagement (Spielberg et al, 2013), disturbing possible LH to right hemisphere (RH) information transferring (Grimshaw et al., 2003; Kotz and Paulmann, 2011). Hence, we predict that a left ear advantage (LEA), usually observed in DL experiments as an effect of prosody/emotional stimuli (Godfrey and Grimshaw, 2015; Grimshaw et al., 2003), should decrease as a function of anxiety, especially for semantic threat. This implies slower and less accurate responses for anxious people at their left ear when responding to semantically threatening but prosodically neutral stimuli (which we named Semantic stimuli). As present sentences are naturalistic, they have varied durations, but are long on average (∼2s). This implies that answering after sentence’s offset emphasizes late stage processing, understood to start at around 400ms (Kotz and Paulmann, 2011), followed by deliberation (∼600ms). This late stage could be sustained for a long period of time, as it is characterized by a cyclic BIS process (McNaughton el at., 2013). Hence, if trait anxiety extends deliberation through excessive worry, then responses locked to sentence’s offset should be slower.
For Experiment-2 we expect that, as responses are forced to be faster (online), prosody should induce the most noticeable effects, as online responses may overlap with early-mid emotional processing stages (Kotz and Paulmann, 2011). Therefore, we hypothesize that higher anxiety should reduce LH involvement (Spielberg et al., 2013) due to over-attention to threat effects, characteristic of earlier-mid processing stages (Bar-Haim et al., 2007). Hence, we predict an enhanced LEA for highly anxious participants, especially for prosodically threatening but semantically neutral stimuli (which we named Prosody stimuli). Thus, faster and more accurate responses for anxious people at their left ear when attending prosodic stimuli. In other words, as participants are required to answer as fast as possible, and prosody is readily identifiable in each sentences, but semantics required the identification of lexical items, processes before quick responses (∼100, ∼200ms) should take precedence for prosody, while semantics might be affected by later processes (∼400ms) as responses could be naturally slower independent of anxiety.
Methods
Experiment 1: Delayed Response
Participants
Participants were recruited using Prolific (prolific.ac). Only participants reporting being right-handed, having English as first language, without hearing and neurological/psychiatric disorders, and using only a desktop or laptop to answer the experiment were recruited. After exclusion, due to poor accuracy or not finishing the task properly, 44 participants (mean age = 31.7, 27 females) were retained (26 excluded). Participants were remunerated on a £7.5/hour rate. All participants gave their informed consent before participating. It is important to clarify, the web-based nature of the experiment implies that task compliance levels could be low, as there is no direct control over participants meeting requested requirements (e.g. appropriate headphones) or performance (e.g. answering randomly). For this reason, and also to avoid issue related to possible impulsive behaviour or to age-related audition loss, we decided to accept participants well above the adolescence threshold and amply below critical ages for audition loss. Hence, only participants between 24 and 40 years old were accepted to take part.
Materials
Four types of sentences were recorded: Prosody (neutral-semantics and threatening-prosody), Semantic (threatening-semantics and neutral-prosody), Congruent (threatening-semantics and threatening-prosody), and Neutral (neutral-semantics and neutral-prosody). We first extracted semantically threatening sentences from movie subtitles by matching the subtitles them with a list of normed threatening words from the extended Affective Norms for English Words (ANEW) (Warriner et al., 2013). For the present study, any word over 5 points in the arousal scale, and below 5 points in the valence and dominance scales was considered threatening (these scales ranged from 1 to 9 points). Every word with less than 5 arousal points and between 4 and 6 (inclusive) valence points was considered neutral. Words’ frequencies were extracted from SUBTLEX-UK (van Heuven et al., 2014), only sentences containing words with Zipf log frequencies over 3 were included. Before recording, ten participants rated the threat level of each visually presented sentence by using a 0-8 Likert scale presented in Gorilla (gorilla.sc). Sentences’ mean ratings were analysed using the Bayesian Estimation Superseeds t-test (BEST) method (Kruschke, 2013). Threatening semantics’ ratings (m = 5.48) were considerably higher than neutral semantics’ ratings (m = 0.3). See Annex for detailed results.
After this, sentences were recorded in an acoustically isolated chamber using a RODE NT1-A1 microphone by a male English speaker. The speaker was not a professional actor or voice actor (i.e. untrained or naïve speaker). The speaker was instructed to speak in what he considered his own angry threatening/angry or neutral voice for recording Prosody/Congruent and Semantic/Neutral sentences respectively. Sentences were not repeated across type (i.e. each type has a unique set of sentences). Neutral dichotic pairs were also unique across conditions (480 different sentences). Due to a technical problem several sentences were recorded with very low amplitude. Therefore, sentences were normalized and cleaned from noise in Audacity (Audacity Team, 2019, audacity.org). Figure 1 shows oscillograms and spectrograms of four example sentences, Table 1 in the Results section also includes a summary of stimulus properties by condition, and the full set of materials can be downloaded from our Open Science Framework (OSF) repository (link in the Data Statement section).
Sentences’ average length is 1720.65ms, and acoustic measures were extracted using Parselmouth (Jadoul et al., 2018) Python-Praat interface. Two relevant measures were compared via Bayesian estimation supersedes the t-test (BEST): Median pitch (MP: median F0 across whole sentence) and Hammarberg index (HI: maximum energy differences between the 0-2000hz and 2000-5000hz ranges). While MP is similarly high for Prosody (m ≈ 133Hz, SD ≈ 12Hz) and Congruent (m ≈ 130Hz, SD ≈ 9Hz) and similarly low for Semantic (m ≈ 90Hz, SD ≈ 2Hz) and Neutral (m ≈ 96Hz, SD ≈ 4Hz), HI is similarly low for Prosody (m ≈ 23Hz, SD ≈ 4Hz) and Congruent (m ≈ 24Hz, SD ≈ 5Hz) and similarly high for Semantic (m ≈ 32Hz, SD ≈ 5Hz) and Neutral (m ≈ 34Hz, SD ≈ 4Hz). Interestingly, semantic ANEW measures show a similar pattern for arousal and valence respectively. Arousal is similarly high for Semantic (m ≈ 6.0, SD ≈ 0.9) and Congruent (m ≈ 5.9, SD ≈ 0.9) and similarly low for Prosody (m ≈ 3.6, SD ≈ 0.9) and Neutral (m ≈ 3.9, SD ≈ 0.7), and Valence is similarly low for Semantic (m ≈ 2.9, SD ≈ 1.0) and Congruent (m ≈ 3.3, SD ≈ 1.2) and similarly high for Prosody (m ≈ 5.9, SD ≈ 0.8) and Neutral (m ≈ 5.8, SD ≈ 0.9). See annex for full results of BEST analyses, including ANEW measurements.
This may indicate that as arousal increases and valence decreases, sentences become more semantically threatening, namely convey offense or pain/harm (Borelli et al., 2018; Ho et al., 2015). Similarly, as MP increase (measuring pitch of sentence) and voice HI decreases (measuring voice quality of sentence) the sentence should become more prosodically threatening, express hot anger (Banse and Scherer, 1996; Hammerschmidt and Jürgens, 2007), thus conveying threat. To check this, a random subset of 7 prosody-only sentences was compared to a random subset of 7 neutral sentences in an online rating questionnaire in the same manner as semantic threat. Ten participants rated these spoken sentences in Gorilla (gorilla.sc). Results showed that threatening prosody (m = 5.6, SD = 2.3) is rated as more threatening than neutral prosody (m = 0.71, SD = 1.3). Ratings for semantic threat on written sentences also show that participants (n=22) rate threatening semantics (m = 5.6, SD = 2.2) as more threatening than neutral (m = 0.5, SD = 1.2). An ordered-logistic regression was used to statistically asses these ratings, for results see the Annex.
Next, sentences were paired using Audacity: sentences were paired such as their durations were as similar as possible. Silences between words were extended, never surpassing 40ms, to match sentences’ latencies as closely as possible. After this, sentences were allocated to one of the stereo channels (left or right) of the recording; each pair was copied with mirrored channels. A silence (∼50ms) was placed at the beginning and at end of each pair. This resulted in a total of 480 pairs where 80 sentences of each type (congruent, semantic, prosody) were each paired with a neutral sentence of the same length twice, so every sentence was presented once at each ear.
Procedure
Before starting the experiments, participants answered the Penn State Worry Questionnaire (PSWQ) (Meyer et al., 1990) to assess their worry-level, and the Anxious Arousal sub-scale of the Mood and Anxiety Symptoms Questionnaire (MASQ-AA) (Watson et al., 1995) to assess their arousal level. This follows previous approaches (Nitschke et al., 1999), with the difference that we used PSWQ scores as continuous predictor instead of splitting participants between high and low anxiety groups. PSWQ results indicated a distribution which is varied enough in terms of worry level (mean = 47.31, median = 48.0, range [33, 67]). PSWQ measures worry in a scale ranging from 16 to 80 points (median = 48 points), showing a consistent normal distribution in tested samples (mean close to median, as in our samples), and has been shown to have high internal consistence and validity (for details see: Meyer et al., 1990). MASQ-AA scores indicate that participants showed low levels of arousal, as none of them marked above the median. According to previous literature (e.g. Heller et al., 1997; Nitschke et al., 1999; Spielberg et al., 2013), high scores of MASQ-AA would be indicative of trait anxious arousal (hyperarousal), while high scores of PSQW would indicate trait anxious apprehension. Therefore, we can safely assume that our sample does not include participants with high or trait hyperarousal. So, we only included PSWQ scores in the analyses.
After a practice session, participants were randomly assigned to a list containing half of the total number of dichotically paired sentences (threat-neutral pairs) per threatening type (Prosody|Neutral, Semantic|Neutral, Congruent|Neutral), that is 40 pairs per type (120 in total). Sentences’ lists were created previous to the experiment using randomly selected sentences from the total pool. Sentences were presented randomly to participants. In one half of the study they were instructed to indicate at which ear they heard the threatening sentence by pressing the right or left arrow keys (direct-threat condition). In the other half of the study they were instructed to respond in the same way, but indicating which ear they heard the neutral sentence in the dichotic pair (indirect-threat condition). This was intended to address attention effects (Aue et al., 2011; Peschard et al., 2016). Starting ear (left or right) and starting condition (direct- or indirect-threat) were counterbalanced. Participants were told to answer, as fast as possible, only when the sentence finished playing and a bulls-eye (target) image appeared on the screen. A 1400ms inter-stimulus-interval (ISI) was used, and the target image stayed on the screen during this period.
Analysis
Reaction time (RT) data were recorded in milliseconds, locked to sentence’s offset. Accuracy was coded as correct=1 and else=0 (including misses and false alarms). Participants with hit rates below 70% were excluded, as lower thresholds are too close to chance. This is mainly due to the nature of web-based experiments, where compliance levels cannot be more directly controlled. Thus, we believe that using too low exclusion criteria (e.g. ∼50% or chance) is not methodologically warranted, as we cannot attest for how much variance or bias is added by unidentified non-compliance. Moreover, by setting a higher criterion for inclusion, we ensure that participants are understanding the sentence content sufficiently for the various proposed stages of processing to occur. Two Bayesian hierarchical models were built for reaction time (RT) and accuracy. The RT model, shown on Figure 2, was the basic model structure for all analyses, based on Kruschke (2015) and Martin’s (2018) guidelines. The model for indirect-threat RT was identical excepting the number of observations (obs. = 5,427). Models for accuracy (Figure 2’s lower panel) also show a different number of observations, where indirect-threat = 5,767 obs. Differences in the number of observations are due to the fact that RT data used only correct responses; also, overlapping responses (those going beyond the ISI), were treated as false alarms.
RT models used a robust regression (Kruschke, 2015) in order to account for outliers through a long-tailed Student-t distribution. In this way, RTs that are implausibly fast or implausibly slow do not need to be removed, but can be dealt with statistically. Both accuracy and RT models were sampled using Markov Chain Monte Carlo (MCMC) No U-turn Sampling (NUTS) as provided by PyMC3 (Salvatier et al., 2016). Four chains of 3000 tuning steps and 2000 samples were used for RT models and four chains of 4000 tuning steps and 4000 samples were used for accuracy models. Plots and model comparisons were produced using Arviz (Kumar et al, 2019) and Matplotlib (Hunter, 2007).
Presently, we are interested in a basic science interpretation of our results rather in an applied science interpretation where threshold decisions are necessary (see Kruschke, 2018). For this reason, we sill focus on the magnitude of effects of regressions’ estimates and on the certainty of these estimates. To account for this, we provide the highest density intervals (HDIs), sometimes referred as credible intervals, for all relevant measures. We interpret overlapping HDIs as less certain effects or no-effects if overlap is wide or total. More information about estimates can found as summary files in our OSF repository (link in the Data Statement section).
Experiment 2: Fast Response
Experiment 2’s methods were the same as Experiment 1’s methods, same inclusion criteria, same platform, and same materials. Again, the arousal scale did not show any scores above the scale’s median. PSWQ scores indicated a sufficiently varied distribution (mean = 45.22, median = 45.0, range = [26,61]). The only elements that changed from Experiment-1 were the following: 1) As this experiment is understood as more difficult as participants have to answer as fast as possible (before sentence ending) to a widely varied set of sentences, and they are compelled to refrain from any answer as soon as sentences end, accuracy rejection threshold was relaxed to 60% (slightly closer to chance). Given this, 24 participants were excluded and 52 participants (mean age = 31, 24 females) were kept for the final analysis. 2) Participants were instructed to answer, as fast and as accurately as possible, before the sentence finished playing, and to withhold any response when a stop sign image appeared on the screen after sentences’ end. 3) The same robust regression model was applied without duration, which would be inappropriate as participants answer before offset (incomplete sentence’s duration).
Results
Experiment 1: Delayed Response
All models sampled properly ( ≤ 1, ESS > 400); energy plots, traceplots and autocorrelation plots also indicate good convergence. Plots and results from these checks, including raw data and full summaries of parameters and conditions, can be found in our OSF repository (link in the Data Statement section).
Reaction time (RT) results from the direct-threat task indicate that worry, as measured by the Penn State Worry Questionnaire (PSWQ), has little to no effect on Congruent sentences or ear, with all regression estimates remaining around 400ms. Prosody sentences at the left ear show a similar pattern, but Prosody at right ear tends to increase around 115ms from the lowest worry score (33 points) to the highest worry score (67 points). Semantic estimates show to be higher overall, indicating slightly slower responses respect to other conditions (around 50 to 80ms), but they show little increase as a function of worry. Note that due to HDI overlap, within and across conditions, these increases cannot be considered certain, where right ear Prosody is the less uncertain increase. Table 2 and Figure 3 summarise these results.
In the indirect-threat task, estimates show to be similar, but RT increases seem to be slightly bigger for Congruent. Prosody and semantic conditions also show similar estimates. However, in this case the most certain increase is for Prosody at the left ear that shows the biggest and most certain increase, around 140ms from lowest to highest worry score. See Table 3 and Figure 4 for summaries of these results.
Accuracy results for the direct-threat task indicate that responses to all conditions are estimated to be over 70% accurate and, excepting Semantic, they show a small but uncertain increase. Conditions at the right ear show to be around 10% less accurate respect to contralateral ear, but with substantial HDI overlap. See Table 4 and Figure 5 for summaries.
Similarly, accuracy results for the indirect-threat task indicate that responses to all conditions are estimated to be over 70% accurate and. In this case, right ear shows small and relatively certain increases for Congruent and Semantic at the right ear (around 5% to 8%), and Prosody tends to slightly decrease at the left ear (around 14%). All other conditions show little to no effect of worry or ear. Table 5 and Figure 6 summarise these results.
Experiment 2: Fast Response
Models for Experiment 2 did not include duration in the regression, as the relationship between duration and worry level is not clear, as participants must always answer before the end of each sentence. Again, all effects show good precision and all models sampled properly ( ≤ 1, ESS > 400), with energy plots, traceplots, and autocorrelation plots showing good convergence (for images see our OSF repository, link in the Data Statement section).
Results from the direct-threat task indicate that RTs consistently decrease as a function of worry for all ears and conditions. Congruent shows some HDI overlap, indicating less certain decreases. Instead, Prosody and Semantic decrease between 170ms and 213ms. Semantic and Congruent show faster RTs respect to Prosody, with differences ranging around 200ms to 300ms. Note that Prosody and Semantic, though showing wider HDIs at worry score extremes (less precision), show no HDI overlap, indicating better certainty for estimated decreases. See Table 6 and Figure 7 for summaries of these results.
In the indirect-threat task, estimates show much smaller decreases in RT as a function of worry. Where HDI overlap is present in all conditions. Again, Congruent and Semantic show smaller RTs respect to prosody, in a similar range as previously observed. Table 7 and Figure 8 contain summaries of these results.
Accuracy results for the direct-threat task indicate that responses to all conditions are estimated to be over 59% accurate. Congruent shows almost no change as a function of worry or ear. Prosody and Semantic at the left ear show decreases of around 8-9% respectively, but decreases of around 14% to 19% respectively at the right ear. Note that the latter show no HDI overlap. See Table 4 and Figure 5 for summaries.
Accuracy results for the indirect-threat condition also indicate overall 59% accuracy. All conditions seem to decrease as a function of worry (less than 10%), with Semantic at the right ear being the only condition without HDI overlap and showing a decrease of around 12% from lowest to highest worry levels. See Table 9 and Figure 10 for summaries of these results.
Discussion
Results from the delayed response experiment (Experiment 1) indicate small effects of worry on RT and accuracy. In the direct-threat task RTs tend to increase slightly more at the right ear as a function of worry, in particular for Prosody. In the indirect-threat task, RTs increase generally more as a function of worry, in particular for the left ear and the Prosody condition. However, for both tasks, effects are considerably small as evidenced by relatively high uncertainty and HDI overlap. Accuracy results from direct-threat indicate small increases as a function of worry for all conditions, with slight but uncertain underperformance at the right ear for all sentence types. Results from indirect-threat indicate less certainty and a less coherent pattern, which may be indicative of no systematic effect. Results from the fast response experiment (Experiment 2) showed the opposite pattern. Direct-threat results indicate that RTs become faster as a function of worry. These effects are generally certain for Prosody and Semantic types at both ears. At the indirect-threat task, results indicate that effects are considerable reduced for all conditions, namely RTs decrease less. Direct-threat accuracy tends to decrease as a function of worry, with more certain effects for right ear Prosody and Semantic. Indirect-threat results indicate a reduction of effects and certainty, but only for left ear Prosody and right ear Semantic.
For present purposes we will take the most conservative approach. That is, we uphold the most straightforward interpretation, which takes all effects into account simultaneously due to their involvement into an interaction. This simply indicates that Experiment 1 (delayed response) results need to be considered as generally uncertain, and only Experiment 2 (fast response) direct-threat results can be considered relevant. In the latter task, Congruent RT results do not show very certain effects, but RTs of Prosody and Semantic at both ears tend to decrease up to around 200ms as a function of worry (PSWQ score), Prosody RTs are 100ms greater overall respect to other conditions (though with some HDI overlap). For the same experiment and task, accuracy shows to relevantly decrease only for Prosody and Semantics at the right ear. This implies that our hypotheses indicating ear preferences as distinct by type cannot be supported. However, our hypothesis on differing effects of fast responses is partially corroborated. This, however, is linked to an effect of congruency rather than emotional information type. In other words, when threatening prosody and semantics match (Congruent type), effects of worry are minimal, and when they do not match (Prosody and Semantic), worry induces faster responses (smaller RTs). Furthermore, this can evidence an accuracy-speed trade-off as faster responses tend to be associated with decreased accuracy, but only at the right ear.
In order put the present results in context, it is important to recapitulate important aspects that differentiate the current experiments from previous relevant studies: 1) The use of worry-level as a continuous variable. Worry is associated with anxious apprehension (Heller et al, 1997), which implies more chances of participants over-engaging with threat. 2) Stimuli were semi-naturalistic sentences, providing stronger contextual effects. In addition, their longer durations can facilitate engagement with their content. 3) Information channels were manipulated to disentangle effects of semantics and prosody from effects of emotional expression (Kotz and Paulmann, 2007). 4) The use of two tasks measuring responses directed to threatening or neutral stimuli (direct vs. indirect threat, e.g. Sanders et al, 2005) helps to check whether attention effects could be inducing different response patterns. 5) Two experiments were implemented to verify whether answering after sentences’ end or during sentence presentations (delayed vs. fast) can influence laterality patterns by tapping into different moments of a multistep emotional language processing mechanism (Kotz and Paulmann, 2011).
With this in mind, it is important to carefully interpret the lack of clear laterality (ear) effects for most tasks. Weak ear effects might be explained by the great variability between items and the high duration (also very variable) of sentences. However, the lack of sensitivity of DL when more naturalistic stimuli/context are provided cannot be discarded as a possible explanation. If DL effects are task dependent (Godfrey and Grimshaw, 2015), increased naturalness on stimuli and context can bring out a myriad of bilateral processing patterns that might make ear advantages disappear on the long run when prolonged auditory stimuli are listened to. Although there is previous evidence suggesting a right lateralized pattern for prosody vs. semantic evaluation in an EEG experiment (not considering anxiety), using a congruency (not DL) task with sentences as stimuli (Kotz and Paulmann, 2007), further experimentation using a similar paradigm has not observed this pattern (Paulmann and Kotz, 2012). Although this pattern is explained by the strong association between pitch recognition and RH engagement (Kotz and Paulmann, 2007; Zatorre et al., 2002), there are other frequency and spectral features that might be important for recognizing both threatening and neutral sentences (Banse and Scherer, 1996; Hammerschmidt and Jürgens, 2007; Liu et al., 2013; Xu et al., 2013; Zatorre et al., 2002). This could imply that distinguishing prosody and semantics might be a continuous process that can have diversified effects even during sentence presentation.
Indeed, by manipulating angry prosody changes at the beginning and end of sentences, an EEG study has observed that when prosody changes from angry to neutral within sentences, processing is more effortful (Chen et al, 2011). This might indicate that the rich acoustic nature of prosody might be detected quickly but resourcefully. Recent EEG research has observed that anxious people present ERP differences at both early and late processing stages when answering to threatening prosody and non-language vocalizations (Pell et al, 2015). This is consistent with the notion of early over-attention and later over-engagement, and indicates that behavioural responses might change given early or late variations in threat.
Another possible explanation is callosal relay (Atchely et al., 2011; Grimshaw et al, 2003), where increased anxiety would disrupt RH to LH callosal information transferring of threatening prosody. It has been proposed that callosal relay is highly relevant for language informational and emotional processing (Friederici et al, 2007; Kotz and Paulmann, 2011; Steinmann et al., 2017). Hence, interference at one hemisphere (e.g. rumination or worry impacting LH) can have an effect on information transferring to the other. Thus, callosal relay effects could have a relevant impact on how DL tasks are processed, subject to both top-down and bottom-up effects (Westernhausen and Hugdahl, 2008), which is particularly relevant when laterality effects induced by acoustic or lexical properties need to be disentangled from those induced solely by emotional processing (Grimshaw et al, 2003; Leshem, 2018).
These patterns indicate that our extension of dichotic listening models (Grimshaw et al., 2003) does not guarantee laterality effects induced by sentences’ information type. Also, our prediction of a strong effect of worry level (trait anxiety), affecting emotional language processing due to possible over-engagement with threat (Bar-Haim et al., 2007; Spielberg et al., 2013), was not strongly supported. It was proposed that delayed responses facilitate over-engagement with threat due to the long latency between sentence presentation and response. This, together with the high variability in sentences’ durations and content might have nullified ear and/or type effects and may not be enough to induce a sufficiently strong over-engagement-related delay in responses.
Although there is a trend indicating a response slow-down as a function of worry for direct-threat right ear Prosody and indirect-threat left ear prosody, more certain results are required to confirm these patterns. This implies that this trend may not be replicated in future studies, thus we do not develop further interpretation. Furthermore, in Experiment 2, we failed to observe any clear Type or ear effect when responses were forced to be fast (during sentence), besides a small accuracy effect on ear at the direct-threat task for Semantic and Prosody only. Partially aligned with our prediction, the pattern of Experiment 2 (fast response) is the opposite as in Experiment 1 (delayed response). That is, RTs decrease as a function of worry (apprehensive anxiety) and accuracy also decreases. This, however, may evidence an accuracy-speed trade-off related with a general effect task effect (Robinson et al., 2013), where more anxious participants tend to sacrifice accuracy to answer faster to stimuli presented under pressure. Thus, it is difficult to corroborate our hypothesis about specific early or mid-early effects of emotional speech. First, RT increases are associated with information type (semantic or prosodic), but with congruency. Second, the effects happen irrespective of ear, giving no evidence of lateralisation effects. Even so accuracy tends to show a more consistent effect at the right ear only, this does not match a hypothetical RH privilege for either prosodic or emotional content (Kotz and Paulmann, 2011).
Similarly, previous research using single words, dichotically presented as direct- and indirect-threat (or anger), and measuring anxiety, did not find differences in RT for left or right ears (Sander et al., 2005; Leshem, 2018; Peshard, 2016), but did find differences in attention focus per ear. Present results do not indicate a clear distinct pattern on indirect-threat tasks. Recent research (Leshem, 2018) did not find effects of trait anxiety on ear either; present results, going even further, evidence a precise pattern of weak or negligible interactions between ear and worry (trait anxiety). Although, it is also important to emphasize that the absence of other effects might be induced by stimuli’s high variability in length and content. This lack of clear ear effects makes difficult to integrate DL literature to explain whether this RT/accuracy decrease is actually associated with effects of anxiety on early-mid speech processing stages of language processing, as we proposed based on multistep models of emotional language (Kotz and Paulmann, 2011).
Given this, our results suggest that any type of threatening language, attended either directly or indirectly, is affected by apprehensive anxiety only under task pressure (i.e. fast response). Therefore, our proposal of adding a fourth stage to a multistep model of emotional language (Kotz and Paulmann, 2011) is not well supported. Previous research has found that non-language simple threatening stimuli (e.g. noise) can indeed induce similar over-engagement effects (slower RTs) in association with behavioural inhibition but not with trait anxiety (Massar et al., 2011), while other studies indicate that induced anxiety slows down RTs irrespective of stimulus emotional content (Aylward et al., 2017). Presently, by using longer and more naturalistic stimuli, we have observed the opposite pattern. Only direct attention (answering directly) to threat induces a speed-up in RTs and a decrease in accuracy when fast responses are required and stimuli are not congruent (both semantically and prosodically threatening). Note that present congruency is informational (prosodic-semantic) not emotional congruency as explored in previous research on induced anxiety (e.g. Robinson et al., 2012). This may imply that attention mechanism of early detection, namely over-attention to threat, could provide an explanation for faster responses when the stimulation period is shortened or responses are under pressure (Cisler and Koster, 2010), in particular for stimuli which are harder to recognise/process (i.e. not congruent).
Therefore, our initial assumption that a fast response experiment (Experiment 2) would be enough to identify difference at early processing stages is not necessarily correct, at least given present stimuli and task. The varied position of threatening lexical items and/or threatening intonation emphasis might cause a general slow-down of responses, because very specific features of sentences need to be identified and participants have time to do so (the whole extent of a sentence). While evaluation mechanisms could serve as an explanation, the fact that there are not strong effects associated with difficulties categorising of identifying stimuli makes them weak candidates. This may also be related to a general caveat of our approach. Behavioural measures such as DL, though able to portray a very general picture of underlying brain processes, might not be enough. Better spatial and temporal resolution is required to disentangle laterality and early-stage effects of threatening language. The latter is particularly relevant, as the time-course of emotional language processing might have crucial differences at much shorter time-scales, as evidenced by previous EEG research (Chen et al, 2011; Kotz and Paulmann, 2007; Paulmann and Kotz, 2012; Pell et al, 2015; Wabnitz et al., 2015; Wambacq and Jerger, 2004). In consequence, present tasks could be replicated by using EEG measures, in particular Experiment 1, where EEG measures such as event-related potentials could provide richer information about processing occurring during sentence listening, before response preparation and response execution. This could also provide lab results as point of comparison with present web-based results. But more importantly, this is crucial for identifying differences in the neural signature of anxiety and language processing, indispensable for properly understanding time-related models of language and anxiety processing.
In conclusion, present results indicate that extending multistep models of language processing (Schirmer and Kotz, 2006; Kotz and Paulmann, 2011) by including aspects of multistage models of anxiety (Bar-Haim et al, 2007; Corr and McNaughton, 2012) could be a relevant theoretical approach. However, present results do not provide strong support for present hypotheses, which makes necessary to test the language-anxiety relationship in different ways. Present studies suggest a speed-accuracy trade-off as a function of anxiety when responses are required to be fast, which mainly affects stimuli lacking one informational dimension (i.e. prosody-only threat or semantic-only threat). Nevertheless, these are not sufficient to ascertain a relationship between language and anxiety processing. Further experimental testing is thus required, in particular by implementing physiological measures such as EEG, and tasks that do not involve DL, using more controlled stimuli and investigating the effects of stimuli below or above the sentence level, such as phrases or narratives.
Declaration of interests
None.
Data statement
All data, analyses’ scripts, and additional info can be found at the Open Science Framework (OSF) repository: https://osf.io/z8pgf/?view_only=b5da5ce6c8644bc182231cd9b96be173. Identifier: DOI 10.17605/OSF.IO/Z8PGF.
Acknowledgements
This research is supported by CONICYT Becas Chile 72170145 (https://www.conicyt.cl/). Contributions to the study were as follows. Busch-Moreno: Conceptualization, Methodology, Software, Analysis, Investigation, Resources, Curation, Writing, Visualization, Administration. Vinson: Conceptualization, Methodology, Resources, Revision, Supervision, Administration. Tuomainen: Conceptualization, Resources, Revision, Supervision, Administration. Preliminary results of this experiment, using a frequentist framework, were presented at the International Congress of Phonetic Sciences (ICPhS), Melbourne, 2019.
Footnotes
§ Please note that the current version of this preprint has substantially changed, as we have corrected an important statistical flaw from previous versions. This has importantly altered results, hence the title change. Due to this flaw, the published manuscript of this study has been withdrawn.
Re-analysis after statistical flaw. Because of this, results and discussion have greatly changed.
https://osf.io/z8pgf/?view_only=b5da5ce6c8644bc182231cd9b96be173