Modulation of articulation muscles during listening and reading: a matter of intention to speak out loud

The articulation of speech sounds is often contingent on the intention to subsequently produce other sounds (co-articulation). Thus, intended acts affect the way current acts are executed. We show that the intention to subsequently repeat a short sentence, overtly or covertly, significantly modulated articulatory muscle activity already during speech perception or reading (input interval) and when delaying response (i.e., prior to production). Young adults were instructed to read (whole sentences or word-by-word) or listen to recordings of sentences to be repeated afterwards, either covertly or overtly. Surface electromyography (sEMG) recordings showed different patterns of articulatory muscle modulation in the two articulatory muscles measured – orbicularis oris inferior (OOI) and sternohyoid (STH). In the OOI, activity increased relative to baseline during speech perception or reading in both intended output conditions (overt and covert); in the STH, articulatory muscle activity decreased, during the input intervals, in both intended output conditions. However, the modulations in EMG activity were contingent on the intention to subsequently repeat the input overtly or covertly, so that activation in the OOI, and inhibition in the STH, were significantly more pronounced when overt responses were intended. Input modality was also a factor; immediately before overt responses, activity in the OOI muscle increased for listening and word-by-word reading, but not in reading whole sentences. The current results suggest that speech perception and articulation interact already during the input phase, listening or reading, reflecting the intended responses. However, this interaction may differentially affect facial-articulatory and laryngeal control mechanisms of speech production.


Introduction
Whether and how motor representations of speech are integral to speech perception has been debated for many decades. One perspective posits that the motor system plays a functional role both in speech production and in speech perception (Liberman, 1957;Liberman, 1967 [1][2][3][4][5][6] . A related position assumes an interaction between the perception of speech and the motor articulatory system, although the actual activation of the motor system is not necessary for perception (Hickok & Poeppel, 2007, Poeppel & Hickok, 2004Scott et al., 2009) [7][8][9] . For example, according to feedback control models of speech production, there are pathways both for activation of motor speech systems from sensory input (forward prediction pathway) and for mutual activation of auditory speech systems from motor activation (feedback correction pathway) (Hickok et al., 2011;Guenther & Hickok, 2016) 10,11 . Thus, the state of articulatory musculature may be potentially affected during speech perception (Galantucci et al., 2006; Glenberg & Gallese, 2012; Hickok et al., 2011) [12][13]10 . Current notions of whether and how motor representations of speech contribute to speech perception do not differentiate between the effects of different language input modalities. The primary modality for language is auditory, but in literate individuals the interpretation of visual written language also becomes automatic (Logan, 1997; Augustinova & Ferrand, 2014) [14][15] . Input-modality specific effects on articulatory muscles can lend support for the possibility of more than a single mode of interaction between language perception and speech production. In the present study, the state of articulatory muscles during speech perceptionlanguage input phase -was recorded to test the conjecture that the articulatory musculature is affected by the intention to subsequently produce covert vs. overt speech but also by the mode of language input (listening, reading). To this end, a repetition task was developed, requiring the delayed production of either covert or overt speech in response to sentences presented as auditory or visual input, and the modulation of articulatory musculature activity preceding sentence repetition was examined.
We recorded surface electromyography (sEMG) over the Orbicularis Oris Inferior (OOI) and the Sternohyoid (STH) muscles throughout the repetition task, as shown in panel A of Figure   1. The motivation for including the STH stemmed from the crucial role that laryngeal muscle activity and activity in the Laryngeal Motor Cortex (LMC) have in learned vocal behaviors specifically speech production and singing (Jürgens, 2002;Simonyan, Ostuni, Ludlow, and Horwitz, 2009) 16,17 . Symonian (2014) 18 suggests that LMC topography and connectivity in humans is potentially the underlying cause for the unique human ability to produce highly controlled motor outputs characteristic of speech.
Previous studies have shown subtle activity in speech musculature using sEMG during verbal mental imagery, silent reading, and silent recitation (Jacobson, 1931;Livesay et al., 1996;McGuigan, & Dollins, 1989) [19][20][21] . Also, activity in the OOI and the STH muscles was significantly higher when patients with schizophrenia reported hallucinations (Rapin et al., 2013) 22 . However, a 'premotor silent period', that is, a brief inhibition relative to baseline has been found before the initiation of both real and imagined movements in tasks unrelated to language (Conrad et al., 1983;Richartz et al., 2010;Kolářová et al., 2016) [23][24][25] . Thus, the approach to data analysis in the current study was to compare the direction (excitation or inhibition) and magnitude of the modulation of activation in articulatory muscles in the different input and intended response conditions, during the input, i.e., perceptual epoch; before sentence repetition was cued (panel B of Figure 1).

Participants
Twenty undergraduate students (5 males) 21-35 years old, participated in the study. All were healthy native speakers of Hebrew, right handed, and had normal or corrected-to-normal vision and normal hearing. Informed consent forms were obtained from every participant prior to their participation. The study was approved by the Ethical Review Committee of the University of Haifa, and all methods were carried out in accordance with relevant guidelines and regulations, complied with the standards defined in the Declaration of Helsinki. The participants received either course credit or monetary compensation for their time (70 NIS).

Stimuli
Given that stimulus recognition in the auditory and visual modality can be temporally distinct (Whiting et al., 2014) 26 , there were two visual presentation conditions: a visual whole sentence presentation condition (the whole sentence was presented onscreen), and a visual wordby-word condition in which each word of the sentence was presented separately (at the center of the screen). The visual word-by-word condition was to test whether a manipulation within the visual modality, where the linguistic input is presented incrementally in time (as in auditory presentation) would differ, in terms of the motor activity modulation of articulatory musculature, from the visual whole sentence presentation condition.
The experiment included a set of 60 sentences (3 words, subject-verb-object). There were 3 sets of 20 sentences (each set containing 96 syllables of 100 possible syllables in Hebrew). The stimulus sets were counterbalanced across participants and the three input mode conditions (auditory, visual whole sentence, visual single word sentence). Visual sentence presentation was in Ariel font (size 20), black -on a gray background.
The duration of visually presented sentences was equated to the duration of the corresponding recorded sentences (Audacity(R) Version 2.0.0. (Audacity Team, 2012)); for word-by-word visual stimuli, each word was presented separately, at the center of the screen, for a duration proportionate to the duration of its voiced production.

Apparatus
Surface EMG data were recorded using passive electrodes (Impedance was ≥30Ω) (See  The sequence of events on each triala fixation target (2 sec) preceded stimulus presentation (fixation segment); a stimulus presentation segment of variable duration (with randomly chosen time jitter wherein the fixation cross reappeared, total 2.2 -2.8 sec); a Go signal (0.5 sec) and a response interval (ending with a space bar press).

Experimental Design
The experimental design was a 2 (Intended Output: Covert and Overt) X 3 (Input Type: Auditory (A), Visual Whole Sentence (V1), and Visual Word-by-Word (V2)) within subject comparison design. Input conditions (A, V1, V2) were mixed and counterbalanced within each block. Each target sentence was presented twice (for overt and for covert repetition) but never in the same input modality. There were separate blocks wherein either a covert or an overt repetition was required. The covert block always preceded the overt blocks (panel B of Figure 1).

Procedure
Before the presentation of each sentence, a fixation cross appeared at the center of the screen for 2 seconds. On each trial, participants were presented with the sentence to be repeated (in either the visual or auditory modality) but were instructed to repeat the target sentence only after given a visual 'go' cue ( Figure 1). Thus, each trial was divided into a baseline fixation epoch, an 'input' epoch, which was the sentence presentation part of the trial, and an 'output' epoch, after the cue to repeat. This analysis reduces the need for a temporally offset, separate, baseline condition such as maximal volitional contraction (MVC) condition to control for these large differences (Stepp 2012) 27 , because for each participant, the individual's specific baseline for the ongoing trial was used. Thus, we compared musculature activity at a given interval in each trial to an immediately preceding, non-verbal baseline. The difference measure enabled us to conduct a within-subject analysis decreasing the variance due to individual differences, bypassing the ubiquitous common methodological problem with sEMG of individual differences in the raw signal.
The sentence to be repeated followed in one of three modes: 1) auditory, via headphones (the fixation cross stayed on screen), 2) visual, sentence in full at the center of the screen (no audition), 3) visual, word by word presentation of the sentence at the center of the screen (no audition). In the first block, participants were instructed that repetition was to be covert (silent repetition); in the second block repetition was to be overt (repetition out loud). Immediately following sentence presentation offset, a fixation cross was presented at the center of the screen for a duration of 400-700 ms (jitter), until the onset of a red dot. The red dot cued response initiation (Go cue) and was presented at the center of the screen until the participant pressed the "space" key (end of response).
Response times (RT) were measured from the onset of the red dot to "space" key press. Six trials with responses over 5 seconds or shorter than 200 ms were excluded. Inaccuracy of production (overt repetition block) occurred in eight trials (out of 1200 responses); these trails were excluded.

EMG data processing
Preprocessing was done using MATLAB R2018b (MathWorks Inc.). The sEMG raw data were rectified by absolute value and fed into a 20-450 Hz Butterworth band-pass filter (Butter, filtfilt, MATLAB). Additional movement artifact removal was done by offline inspection of the video recording. Swallowing was determined using both the video recordings (for physical movement of STH electrodes) and irregular peaks in the signal. About 4.7% of the trials (113/2400 trials) were excluded from analysis due to technical and movement artifacts.
In the analysis, we first addressed the dynamics of muscle activity during the fixation interval, by comparing the RMS values of two intervals of 200 ms each, one from the beginning of the fixation epoch (Fix1) and one from the end (Fix2) ( Figure 3). The goal of this analysis was to confirm that the difference measure of activity modulation between intended output conditions (covert and overt) was comparable. Therefore, we tested whether there would be a modulation throughout the fixation epoch as a function of the two intended output conditions (covert vs. overt). In both conditions and in both muscles there was lower activation levels in Fix2 compared to Fix1; in the OOI, the difference between the two fixation intervals was greater in the Overt than in the Covert condition. We next compared muscle activity (relative to Fix2) in four intervals of 200 ms each ( Figure 5) within the stimulus presentation segments: an interval starting at the onset of sentence presentation (interval A), an interval at the end of sentence presentationfrom 300 ms to 100 ms prior to the end of the presentation (interval B), an interval starting at the offset of sentence presentation (interval C), and an interval just before the repetition cue (interval D). RMS values of sEMG activity (µV) were computed for each of the intervals, and a relative activity measure was computed by subtracting the RMS (µV) value of Fix2 from each of the RMS(µV) values of the intervals.

Statistics
Three-way within-subject ANOVAs were conducted the all three within-subject factors in the design (Input Type, Intended Output, and Interval), and multiple comparisons were controlled using the Bonferroni correction. Mauchly's Test of Sphericity was used to check for violations in sphericity in each factor. Violations of sphericity were corrected either with the Greenhouse-Geisser or Huynh-Feldt corrections. One-sample t-tests were run on all conditions of the design to examine whether they were significantly different from baseline. Statistical analyses were computed with IBM SPSS Statistics for Windows, version 23 (IBM Corp., Armonk, N.Y., USA).

Results
The panels in Figure 2     single trial. A,B,C: auditory, visual whole-sentence, and visual word-by-word presentation, respectively, for covert repetition; D, E, F: auditory, visual whole-sentence, and visual word-byword presentation, for overt repetition. The vertical lines indicate (from left-to-right) the fixation onset, sentence presentation onset, the GO cue onset, repetition completion.

Fixation Epoch
An analysis of the fixation epoch was undertaken following the findings of an ECoG study, where data was recorded during a single word repetition task (Pei et al., 2016) 28 . Of relevance here, is that in this ECOG study 28 , there was a difference in the signal dynamics during baseline, between the overt and the covert conditions (see their Figure 8, p. 2968). We therefore examined the dynamics within our baseline condition in both muscles. Figure 3 shows the changes in RMS activation values between the first and the last 200 ms of the fixation epoch (which lasted 2 sec).

Sentence Presentation Epoch Analyses
The analyses of the sentence presentation epoch compared muscle activity (relative to the    The ΔRMS values of the muscle activity during all intervals indicated that the STH muscle was inhibited throughout the sentence presentation epoch (see Table 1

Discussion
The results of the current study show that differential inhibition as a function of whether to subsequently vocalize or not, occurs in the STH, with more inhibition when overt responses were intended. In addition, the OOI appears to be more activated compared to baseline, and again, the effect was more pronounced when an overt response was intended. The modality of presentation interacted with both intended output and with the interval: Articulatory muscle activation in the OOI in the covert condition revealed overall less activation during sentence presentation in the auditory than the visual condition. When overt repetition was required, activation was significantly stronger in the last interval (prior to Go cue to repeat) for both auditory and visual word-by-word sentence presentation, but not for the visual whole sentence presentation condition.
One of the important findings of the current study is the dissociation between pattern of activation in the OOI and the STH. The articulatory motor system is often viewed as a singular, synched mechanism. Especially in studies that examine neural activation during speech production. However, these results reflect the complexity of the articulatory system in that facial musculature involved in articulation is activated prior to repetition (in comparison to baseline) while laryngeal musculature involved in articulation is inhibited prior to repetition (in comparison to baseline).
These findings suggest that motor activation of facial articulatory musculature occurs during speech perception, and they align with findings from studies of the neural mechanisms of speech which also show this. For example, Pulvermuller et al. (2006) 29 found that areas of the precentral gyrus associated with tongue and lip movements during phoneme production were also activated in a somatotopic manner during perception of the same syllables (i.e. when subjects listened to the same phonemes). Feedback and control models of speech production, such as the HSFC model and the DIVA model also posit the involvement of motor cortical areas (namely, the inferior frontal gyrus) during speech perception (Hickok et al., 2011;Guenther & Hickok, 2016) 10,11 .
Additionally, these results show that activation was affected by modality of presentation during the delay period, after sentence presentation and before the Go cue to initiate repetition.
Feedback and control models of speech production, such as the HSFC model and the DIVA model, posit that acoustic targets guide articulatory motor production. According to the DIVA model, an auditory target map is formed prior to speech production initiation guiding forward predictions as to how the production should sound 11 . The auditory target map can be manipulated by presenting variations of the acoustic targets. For example, in Tourville et al. (2008) 30 perturbed speech was presented to subjects as their own produced speech in several, random trials, and the result was a compensation in production of the syllables in the direction opposite of the perturbation. Thus, according to the model, differences in auditory stimuli to be repeated (i.e. auditory vs. visually presented sentences) can result in different motor output. When stimuli are presented visually, the auditory representation of the linguistic content is generated internally. This is the difference between the auditory (external auditory representation) and visually (internal auditory representation) presented sentences. The HFSC model assumes internal auditory representation as auditory targets required for repetition (i.e. visually presented sentences to be repeated), production will be different compared to when repetition of an external auditory target is required (Hickok et al., 2011) 10  The results of the current study present similar articulatory muscle activation trends for covert and overt speech. In the OOI, both covert and overt intended responses elicited activation throughout input perception and the delay intervals. In the STH, both covert and overt intended responses elicited inhibition throughout input perception and the delay intervals. In both cases, when overt responses were intended, activity in the OOI or inhibition in the STH was stronger compared to when covert responses were intended. This result can be linked to the findings of Brumberg et al. (2016) 34 that reported ECoG recordings in participants instructed to read a familiar text aloud or silently. They found left fronto-motor activity at 440-240 ms prior to the initiation of production, in both the overt and covert speech production conditions. However, this activity was significantly higher when overt speech production was intended compared to the covert speech production condition. The findings of the current study raise the possibility that the left fronto-motor areas dynamics may relate, directly or indirectly, to the differential dynamics of inhibition of the articulatory musculature in the language perception phase. However, our measurements were purely motor; we have no way of knowing their origins in the central  36 , found inhibitory preparatory processes only 200 ms (but not 500 ms or 900 ms) prior to an imperative signal to move either the left or right index fingers in a delayed response task. Unlike the trend for increased inhibition closer to the Go signal, found by Lebon et al. (2016) 36 , in the current study, inhibition tended to be consistent from the onset of sentence presentation up to the final 200 ms interval before the GO signal. A positive relationship between the magnitude of pre-movement depression of tonic muscle activity and the subsequent phasic innervation burst has been suggested for ballistic arm movements (Conrad et al., 1983) 23 . Our findings may be linked to the findings of Conrad et al. (1983) 23 , if one considers the sentence presentation (input) epoch in the current study as a predictive phase necessarily preceding the actual Go signal. With the caveat of a different time scale and resolution, the current results show that the intended speech action (overt or covert) was a major factor in determining the degree of inhibition or activation. However, the extended time course of muscle modulation and the fact that it occurred from the very beginning of the sentence presentation epoch, when both a covert and overt response was intended, suggest a pattern of modulation that differs from the pattern of inhibition described in non-language manual responses.
The notion that inhibition in the STH muscle can be linked to motor preparartion of speech production is supported by findings of neuroimaging studies of the LMC. A fMRI study by Symonian et.al. (2009) 17 used a syllable repetition task and a controlled breathing task to examine the involvement of the LMC in controlled breathing and in speech. They found bilateral activation of the LMC during the controlled breathing task, and a left lateralized activation of the LMC during the syllable repetition task. The authors also observed functional coupling between the LMC and Inferior Frontal Gyrus (IFG), and together with the finding that evoked activity of the LMC during IFG electrical stimulation (Greenlee et al., 2004) 37 , they concluded that the processing of any speech production component requires a functional link between these two brain regions to enable speech motor preparation. The finding that STH muscle activation is modulated by the intention to speak or not, supports this hypothesis.

Conclusions
In a more general context, the current findings are also consistent with previous findings showing that planned, intended, motor actions can significantly affect ongoing actions; e.g., coarticulation, the finding that skilled speakers generate the initial phonemes of a sequence differentially, depending on the final phonemes (Kühnert & Nolan, 1999) 38 . This has been has shown in a non linguistic task as well. Rozanov et al. (2010) 39 showed that the performance of the initial movements of a well-trained sequence of finger movements was compromised by the intention to subsequently omit a movement. In the current study, anticipatory effects were evident at the stage of input, before articulation actually occurred.
The current results indicate that, prior to repetition, the articulatory system was differentially activated in the different muscle systems during speech perception and reading.
Muscle activity was modulated by the intention to subsequently vocalize the input or not. Thus, the intention to act in the future (in terms of voicing) was a significant factor in articulatory muscle activity during language perception. The data of the current study suggest that the role of intention in language production during reading and listening, is dynamic, continuous, and contextual.