Inter-Subject Correlation during New Music Listening: A Study of Electrophysiological and Behavioral Responses to Steve Reich’s Piano Phase

Musical minimalism utilizes the temporal manipulation of restricted collections of rhythmic, melodic, and/or harmonic materials. One example, Steve Reich’s Piano Phase, offers listeners readily audible formal structures containing unpredictable events at local levels. Pattern recurrences may generate strong expectations which are violated by small temporal and pitch deviations. A hyper-detailed listening strategy prompted by these minute deviations stands in contrast to the type of listening engagement typically cultivated around functional tonal Western music. Recent research has suggested that the inter-subject correlation (ISC) of electroencephalographic (EEG) responses to natural audio-visual stimuli objectively indexes a state of “engagement”, demonstrating the potential of this approach for analyzing music listening. But can ISCs capture engagement with minimal music, which features less obvious expectation formation and has historically received a wide range of reactions? To approach this question, we collected EEG and continuous behavioral (CB) data while 30 adults listened to an excerpt from Steve Reich’s Piano Phase, as well as three controlled manipulations and a popular-music remix of the work. Our analyses reveal that EEG and CB ISC are highest for the remix stimulus and lowest for our most repetitive manipulation. In addition, we found no statistical differences in overall EEG ISC between our most musically meaningful manipulations and Reich’s original piece. We also found that aesthetic evaluations corresponded well with overall EEG ISC. Finally we highlight co-occurrences between stimulus events and time-resolved EEG and CB ISC. We offer the CB paradigm as a useful analysis measure and note the value of minimalist compositions as a limit case for studying music listening using EEG ISC. We show that ISC is less effective at measuring engagement with this minimalist stimulus than with popular music genres and argue that this may be due to a difference between the type of engagement measured by ISC and the particular engagement patterns associated with minimalism.

natural audio-visual stimuli objectively indexes a state of "engagement", demonstrating the potential of this approach for analyzing music listening. But can ISCs capture engagement with minimal music, which features less obvious expectation formation and has historically received a wide range of reactions? To approach this question, we collected EEG and continuous behavioral (CB) data while 30 adults listened to an excerpt from Steve Reich's Piano Phase, as well as three controlled manipulations and a popular-music remix of the work. Our analyses reveal that EEG and CB ISC are highest for the remix stimulus and lowest for our most repetitive manipulation.
In addition, we found no statistical di↵erences in overall EEG ISC between our most musically meaningful manipulations and Reich's original piece. We also found that aesthetic evaluations corresponded well with overall EEG ISC. Finally we highlight co-occurrences between stimulus events and time-resolved EEG and CB ISC. We o↵er the CB paradigm as a useful analysis measure and note the value of minimalist compositions as a limit case for studying music listening using EEG ISC. We show that ISC is less e↵ective at measuring engagement with this minimalist stimulus than with popular music genres and argue that this may be due to a di↵erence between the type of engagement measured by ISC and the particular engagement patterns associated with minimalism. The genre of musical minimalism is famously (or, perhaps infamously depending on the lis-2 tener) characterized by highly recurrent, starkly restricted pitch and rhythmic collections. 3 From the early days of scholarship on minimal, or "repetitive music" as it was often called, 4 commentators described the music's timbral and rhythmic staticity and its limited pitch 5 patterns (Mertens, 1983, p. 12). While many advocates reported what we might call blissing 6 out to this "meditative music" (to use yet another early term for this repertoire), some com-7 posers went on record to state their intention that the music should be listened to carefully 8 (Henahan, 1970;Strongin, 1969). For example, the composer Steve Reich wrote in 1968 that 9 he wanted to write works with musical processes that listeners could perceive: works where 10 the process unfolded very gradually in order to "facilitate closely detailed listening" (Reich,11 2009, p. 34). Reich's Piano Phase (1967) shows how this type of granular listening might 12 unfold. The piece, written for two pianos or marimbas, alternates between two distinct and 13 highly repetitive states resulting from a single process. During in-phase sections, the two 14 performers play a short musical unit in rhythmic unison, though varying in pitch alignment 15 ( Figure 1). In between these in-phase sections, one performer gradually accelerates, resulting 16 in unpredictable note onsets (i.e., phasing sections). Over time these phasing sections lead 17 to a new pitch alignment in the subsequent in-phase section. 1 The driving phasing process 18 1 The piece begins with one pianist (Pianist 1) playing a twelve-note pattern consisting entirely of sixteenth notes and containing five unique pitches in the treble register. The pattern can be divided into two groups of six sixteenth notes, and Reich gave a metronome marking of 72 beats per minute to the dotted quarter note (one group of six sixteenth notes). The score consists of numbered modules that are repeated an indeterminate number of times: Reich noted approximate ranges for the number of repetitions above each module. After the pattern is established in the first module, the second pianist (Pianist 2) fades in, playing the identical pattern in unison with Pianist 1. After repeating the pattern in unison for some time, Pianist 2 accelerates very slightly while Pianist 1 holds the opening tempo, causing the sound from the two pianos to wobble out of sync to varying degrees as the pattern is repeated at di↵erent tempos (we call these portions phasing sections). Various and unpredictable rhythm and pitch events emerge and disappear in these phasing sections. Eventually Pianist 2's acceleration process culminates in another unison module where each pianist's sixteenth notes are once again realigned (which we label in-phase sections). While the pianists' rhythms are realigned, the pitch content of the pattern will have shifted: In this example, Pianist 2 aligns the second pitch of the opening pattern with the first pitch of the pattern (played by Pianist 1). Piano Phase proceeds by alternating between phasing and in-phase sections, where each successive in-phase section presents the next shifted alignment of the opening, twelve-note pattern (note three aligns with the first note of the pattern, a phasing section occurs, then note four aligns with the first note of the pattern, etc.). o↵ers the listener an outline of how the piece unfolds at a macro-level while leaving many 19 details unpredictable-from rhythms during the phasing sections to accent patterns during 20 in-phase sections. For a listener interested in detailed minutia and slight variation, the work 21 may fascinate; in other moods or with other priorities, the piece can bore, confuse, and even 22 anger (Rockwell, 1973). How might we measure listeners' engagement with such repertoire, 23 given its reduced musical parameters and varied and polarized reception (Dauer, 2020)?  (2020) presented popular, Hindi-language songs from "Bollywood" films to participants and 31 reported higher behavioral ratings and ISCs for their original versions when compared with 32 phase-scrambled manipulations. Madsen  have also historically reported an arguably more mood-driven type of engagement with this 54 type of music, which, in contrast with detailed listening, allows for a more internal floating 55 away of attention, still connected to the stimulus but unlikely to be correlated between 56 participants (Lloyd, 1966). Therefore, we also included a manipulation of Piano Phase 57 with frequent changes in the content (resulting from reshu✏ing five-second segments of the 58 original excerpt). If ISC indexes this style of engagement in Piano Phase, we predicted 59 less of the listening style for this manipulation. To examine the possibility of listeners 60 being bored by the original work, we also introduced a third control stimulus with extreme 61 repetition, which we expected to elicit no meaningful engagement. Finally, we included a    "Early Works" released by Double Edge (Reich, 1987). We used the first five minutes and   sections (i.e., points of arrival at the alignments of in-phase sections) are situated without 110 the functional transitions (i.e., the phasing sections).

111
As a contrast to the sudden changes embodied by the Abrupt Change condition, we 112 created the Segment Shu✏e condition ( Figure 2C). Here we divided the Original audio into 113 five-second segments and randomly reordered them (i.e., "shu✏ed" them). In order to avoid  Reich's piece. 5 The entire track was used in the experiment and its duration (5:05) informed 129 the length of the other stimuli. Listening to Remix, we identified moments (musical events) 130 that we predicted would engage listeners (for a full list, see Table S1). These events guided 131 our interpretation of time-resolved EEG and continuous behavioral (CB) results.

132
All stimuli were presented to participants as mono .wav files; the second audio channel was    Participants did not perform any task during the presentation of the stimuli and were told 159 to refrain from moving their body in response to the music: they were told not to tap their 160 feet or hands, or bob their heads. After each stimulus in Block 1, the participant rated 161 how pleasant, well ordered, musical, and interesting the preceding stimulus was on a scale 162 of 1 (not at all) to 9 (very) via key press using a computer keyboard. Participants were 163 permitted to move and take short breaks in between stimuli (during which time a "break" 164 screen appeared). When ready, the participant initiated the next stimulus by pressing the 165 space bar on the keyboard. 166 The EEG net was removed after Block 1, and the participant returned to the sound booth 167 to complete Block 2. Here the participant heard the same five stimuli (in random order) 168 and this time completed a continuous behavioral task while listening. Their task was to 169 continuously report their level of engagement-which was defined as "being compelled, drawn the participant rated how engaging they found the preceding stimulus to be overall, using 174 the same 1-9 key press scale used in Block 1. The ordering of blocks was not randomized 175 (i.e., the EEG block always preceded the CB block) because we wanted to ensure that during 176 recording of EEG data in Block 1, participants would not be biased with the definition of 177 engagement and the continuous reporting task that came in Block 2.

178
The experiment was programmed in MATLAB using the Psychophysics Toolbox (Brainard,179 1997). Stimuli were played through two Genelec 1030A speakers located 120 cm from the 180 participant. Stimulus onsets were precisely timed by sending square-wave pulses to the EEG 181 amplifier from a second audio channel (not heard by the participant). We used the Electri-182 cal Geodesics, Inc, (EGI) GES 300 platform (Tucker, 1993), a Net Amps 300 amplifier, and 183 128-channel electrode nets to acquire data with a 1 kHz sampling rate and vertex reference.

184
Before beginning the EEG block, we verified that electrode impedances were below 60 k⌦   Participants also rated the stimuli in both blocks. We computed ISC of both the EEG and 211 CB measures, and also computed mean CB across participants. Finally, we analyzed the 212 ratings to determine whether they di↵ered significantly according to stimulus. 213 Figure 3: Analysis pipeline for experiment data. Participants heard each of the five stimuli twice, once in each block. During Block 1 we recorded EEG, and during Block 2 participants completed the continuous behavioral (CB) task. Participants answered questions about each stimulus after hearing it. For the EEG data we computed spatial components maximizing temporal correlation and projected electrode-by-time response matrices to component-by-time vectors. For vectorized EEG as well as CB vectors, we then computed inter-subject correlation (ISC) of the vectors on a per-stimulus basis, across time and in a time-resolved fashion. We additionally computed the time-resolved mean values between participants. We aggregated and analyzed ratings.

214
Previous EEG ISC studies have prepended a spatial filtering operation before computing 215 correlations in order to maximize signal-to-noise ratio of the data while also reducing the 216 dimensionality of each EEG trial from a space-by-time matrix to a time vector (Dmochowski 217 et al., 2012). Therefore, we filtered the EEG data using Reliable Components Analysis 218 (RCA) prior to computing ISC (Dmochowski et al., 2012(Dmochowski et al., , 2015. RCA maximizes across-219 trials covariance of EEG responses to a shared stimulus relative to within-trials covariance, 220 and therefore maximizes correlated activity across trials (i.e., ISC). It is similar to PCA,      interesting, and engaging ratings, Remix was significantly higher than the other four con-300 ditions (p F DR < 0.01, 10 comparisons) and Tremolo was significantly lower than the other 301 four conditions (p F DR < 0.01). However, these ratings did not di↵er significantly between 302 Original, Abrupt Change, and Segment Shu✏e conditions.

303
Ratings for how "well ordered" the stimuli were followed a slightly di↵erent pattern. 304 Figure 4: Behavioral ratings for all questions in the experiment (responses were ordinal and are slightly jittered for visualization only). Ratings for "pleasant", "musical", "well ordered", and "interesting" come from Block 1 and ratings for "engaging" come from Block 2. For pleasant, musical, interesting, and engaging, responses for Remix were significantly higher than for the other conditions. For these same questions, responses were also significantly lower for Tremolo compared to all other conditions. For ratings of well ordered, we saw a similar pattern except that Abrupt Change was significantly higher than Segment Shu✏e. While Remix was significantly higher than all other conditions (see Table S4), Tremolo was 305 significantly lower than all other conditions except Segment Shu✏e (p F DR = 0.719). In 306 addition, Segment Shu✏e was significantly lower than Abrupt Change (p F DR = 0.036).

308
In computing the EEG ISCs, we first spatially filtered the responses for each stimulus in 309 order to reduce their dimensionality from 125 electrodes to a single, maximally correlated 310 spatial component (RC1) for each stimulus. These components are shown in Figure 5A.  Table S8 for a full list of   Table S9 for a full list). In addition to calculating the overall ISC for EEG and CB data, we were interested in observ- consistently, in-phase sections fail to correspond to any significant ISC peaks. Both EEG 357 and CB ISC also contain a significant peak at the start of the excerpt. In the time-resolved 358 CB ISC data, only a handful of small peaks occur above the significance threshold after the 359 initial drop; they seem unrelated to phasing and in-phase musical events, and only 4.7% of 360 the ISC values are significant (Table 1). In contrast with phasing sections eliciting consistent 361 peaks in the EEG ISC data, the CB mean data shows an increase in mean engagement rating 362 after the start of each in-phase section. There also appears to be a slight decrease across the 363 length of the stimulus.

364
EEG ISC data for the Abrupt Change condition shows significant peaks within five 365 seconds of the in-phase shifts (shifts number two, three, five and six as marked in solid 366 lines in Figure 7B; (18.6% of ISC values are significant; see Table 1)). In contrast with the 367 Original condition, in the Abrupt Change condition, where in-phase sections begin suddenly, 368 they seem to elicit ISC peaks in the EEG data. The other small significance peaks in the EEG 369 data come between in-phase changes, perhaps as participants anticipate stimulus alterations 370 during the long stretches of unchanging material (perhaps something like the contingent 371 negative variation between warning and imperative stimuli, (Tecce, 1972)). After an initial 372 descent, the CB ISC data shows significant peaks around the first two and final two in-phase 373 changes (percentage of significant time-resolved CB ISCs = 7.0%; see Table 1). The other 374 two significant peaks appear between in-phase changes, perhaps related to the e↵ect noted 375 above. As in the Original condition, time-resolved CB mean data shows slight increases in 376 engagement ratings after all six abrupt changes and an overall decline in engagement.

377
The perennially unpredictable changes in Segment Shu✏e were met with frequent, small 378 bursts of significant ISC correlations in the EEG data ( Figure 7C; 15.9% significant ISC 379 values; see Table 1). Comparing EEG and CB ISC time courses reveals unreliable alignment:  Table 1).

385
Time-resolved ISCs for the Remix condition give ample opportunity to correlate peaks 386 with musical events, with statistically significant EEG ISC in 45.6% time windows and 387 significant CB ISC in 25.9% of time windows (Table 1). We selected the coded events in 388 Figure 7D based on moments in the work that we deemed most musically salient (see Table   389 S1 for the timings and descriptions of all twenty events). Note that not all of these events 390 aligned with ISC peaks, but here we discuss some that did. After a sample from Piano  This double event seems associated with an EEG ISC peak but no significant CB activity.

401
A similar compositional technique plays out before minute 3:00. Two coded lines before  Figure 7D). The ISC peaks in both the EEG and 405 CB data anticipate the reentry of additional instrumental lines, possibly in line with the 406 previously mentioned e↵ect: an anticipation that something must be coming given the static 407 situation. 408 We did not expect any significant EEG ISC peaks for Tremolo, with its static, stark 409 content. We see only occasional, small peaks above significance ( Figure 7E; percentage of 410 significant time-resolved EEG ISCs = 7.4%; percentage of significant time-resolved CB ISCs 411 = 1.0%; see Table 1). We also note that in contrast to the other stimulus conditions, the   trajectories are also present in minimalism but often in a stretched out form (Fink, 1996). analyzing the EEG data-e.g., by assessing alpha power, or correlation thereof-may prove 542 more appropriate measures for indexing listener states while listening to minimalist music. 543 We might hypothesize that when participants are diversely engaged with a stimulus, a sim-544 ilar psychological state may be shared-but one that is better indexed by other means than The data generated and analyzed in this study can be found in the Naturalistic Music EEG  Figure S1. Continuous behavioral (CB) reports of engagement from individual participants, grouped by stimulus condition. Figure S2. Scatter plot of each participant's mean CB value with the behavioral rating in response to the question "How engaging was the stimulus?" Spearman's rho also reported for each correlation.