Modality-specific frequency band activity during neural entrainment to auditory and visual rhythms

Rhythm perception depends on the ability to predict the onset of rhythmic events. Previous studies indicate beta band modulation is involved in predicting the onset of auditory rhythmic events (Snyder & Large, 2005; Fujioka et al., 2009, 2012). We sought to determine if similar processes are recruited for prediction of visual rhythms by investigating whether beta band activity plays a role in a modality dependent manner for rhythm perception. We looked at source-level EEG time-frequency neural correlates of prediction using an omission paradigm with auditory and visual rhythms. By using omissions, we can separate out predictive timing activity from stimulus driven activity. We hypothesized that there would be modality specific markers of rhythm prediction in induced beta band oscillatory activity, characterized primarily by activation in the motor system specific to auditory rhythm processing. Our findings suggest the existence of overlapping networks of predictive beta activity based on common activation in the parietal and right frontal regions, auditory specific predictive beta in bilateral sensorimotor regions, and visually specific predictive beta in midline central, and bilateral temporal/parietal regions. We also found evidence for evoked predictive beta activity in the left sensorimotor region specific to auditory rhythms. These findings implicate modality dependent networks for auditory and visual rhythm perception. The results further suggest that auditory rhythm perception may have left hemispheric specific mechanisms.


Introduction
Perceiving a rhythm requires making predictions about the temporal onset of rhythmic events. This ability allows us to dance in time with music, play music with others, detect a musical beat, and notice when timing is off the beat. Common measures of rhythm perception are sensorimotor synchronization (SMS) tasks that involve synchronizing one's movements to rhythmic stimuli. While most humans have little trouble synchronizing to auditory rhythms accurately, synchronizing to visual rhythms can be more variable. SMS to auditory rhythms are more reliable and adaptive (Chen et al., 2002;Repp, 2003;Repp and Penel, 2004;Lorås et al., 2012), compared with visual flashing rhythms (Repp & Su, 2013;. However, when synchronizing movements with rhythmically moving visual stimuli such as a bouncing ball, synchronization accuracy improves, yet not to the level of auditory synchronization (Hove et al., 2010;Hove et al., 2013b;Iversen et al., 2015;Gan et al., 2015). The reasons for the disparity in SMS accuracy across auditory and visual modalities are as of yet unclear, and a closer investigation of these mechanisms is required for a complete understanding of neural timing and synchronization processes. The present study aims to explore neurophysiological mechanisms of auditory and visual entrainment, particularly with regard to prediction of rhythmic events.
Previous research has shown there is overlap in the structures involved between visual and auditory rhythm perception, particularly within the premotor cortex, putamen, and cerebellum (Hove et al., 2013a;Araneda et al., 2017). While these areas appear to play a supramodal role in rhythm perception, putamen activation is stronger for auditory rhythms than for visual rhythms, suggesting the auditory system may be more tightly connected to timing networks (Hove et al., 2013a;Araneda et al., 2017). There is also evidence suggesting the visual system has its own in-house rhythm timing mechanisms with sources in the parietal lobes (Jäncke et al., 2000, Jantzen et al. 2005, in MT/V5 (Jantzen et al. 2005) and visual cortex (Zhou et al. 2014. Taken together, we interpret this literature as support for modality dependent rhythmic processing mechanisms, although to our knowledge this has not yet been clearly shown with a targeted EEG study. Beyond modality dependent rhythm processing, it has been suggested timing mechanisms in the brain are task specific (Wiener & Kanai, 2016;Comstock, Hove, & Balasubramaniam, 2018), and may be distinct for aspects of rhythm timing and duration perception (Ross et al., 2018;Grube, Lee et al., 2010;Grube, Cooper et al., 2010). Much of the evidence supporting predictive processing for rhythm comes through measures of neural oscillation within different frequency bands. This oscillatory modulation is believed to indicate communication between different regions of the brain, with lower frequency oscillations involved more in communication between regions that are farther away from each other, and higher frequencies involved more in localized communication (Sarnthein et al., 1998;Von Stein and Sarnthein, 2000). Further, Bastos et al., (2015) have shown in non-human primates that activity in the gamma and theta bands are involved in feedforward, or bottom-up visual processing while the beta band is involved in feedback, or top-down visual processing. Michalareas et al., (2016) have shown similar results in the human visual cortex with gamma involved in bottom up processing and alpha and beta involved in top-down processing. Interestingly, Michalareas et al., (2016) also found that alpha and beta top-down processing affects the ventral and dorsal visual stream areas differently, by shifting dorsal stream activity higher in the functional hierarchy of visual processing, while ventral stream downward. If frequency band activity relates to specific top-down or bottom-up processing networks, then by measuring frequency band activity during different rhythm timing tasks we can find markers of network type involved, supporting different networks for different tasks. Neural oscillation within different frequency bands are therefore a rich source of information for investigating timing networks, astiming information is communicated across brain networks.
Neural mechanisms of auditory rhythm perception rely on strong interactions between motor systems and auditory cortices (Janata et al., 2012;Repp and Su, 2013;Iversen and Balasubramaniam, 2016;Ross et al., 2016aRoss et al., , 2016b, possibly mediated through projections in parietal cortex (Patel and Iversen, 2014;Ross et al., 2018). Communication across these networks could be carried out through frequency band specific oscillatory activity. Activity in the beta band (14 -30 Hz) is of primary interest as it has been shown to play a role in prediction and timing for auditory rhythms (Snyder & Large, 2005;Fujioka et al., 2009Fujioka et al., , 2012Fujioka et al., , 2015, as well as being implicated in the onset of movements (Kilavik et al., 2013). Snyder & Large (2005) found differentiation between induced and evoked activity in EEG high beta and low gamma bands (20 -60 Hz), where induced activity is not phase locked to a stimulus onset and evoked activity is phase locked to the stimulus onset. By presenting subjects with a sequence of tones with occasional tones omitted, Snyder and Large found induced activity was similar in tone trials and omitted tone trials, indicating expectation for the tones in the sequence, while evoked activity was greatly reduced when there was no tone. Fujioka et al. (2009) used a similar omission paradigm with MEG and found induced beta from auditory cortices decreased after tone onset and increased in anticipation of the expected tone onset. A later MEG study showed the rate of beta increase in anticipation of tone onset is dependent on the tempo of the stimuli, while beta decrease following tone onset is consistent across multiple tempos (Fujioka et al., 2012). Fujioka et al. (2012) additionally found cortico-cortical coherence that followed the tempo of the rhythms between auditory cortices and sensorimotor cortex, supplementary motor area, inferior-frontal gyrus, and cerebellum.
The role of beta activity in visual rhythm perception is less studied, however, beta band amplitude modulation arising from the motor cortex has also been implicated in visually mediated temporal cues indicating expectation (Saleh et al., 2010). More recently, Varlet et al. (2020), showed cortico-muscular coupling of beta-band activity induced by audio-visual rhythms between EEG recorded over motor areas and EMG recorded from finger muscles pressing down on a force sensor. Significantly, the coupling appeared to be modulated by the tempo of the rhythm and peaked roughly 100 ms prior to each tone in the sequence. Interestingly, the study did not find significant cortico-muscluar coupling in response to separate auditory or separate visual rhythms. While Saleh et al. (2010) and Varlet et al. (2020) suggest involvement of beta band modulation in visual rhythm perception, the role of beta band activity in visual rhythm perception remains unclear.
In order to investigate predictive mechanisms of rhythm perception across modalities, we used EEG to record beta band modulation during auditory and visual rhythms. To separate out the stimulus response activity from activity related to temporal prediction of the stimulus we used an omission paradigm similar to that used by Snyder & Large (2005) and Fujioka et al. (2009). Given that previous studies have indicated involvement of sensorimotor beta in rhythm perception (Fujioka et al., 2012(Fujioka et al., , 2015Varlet et al., 2020) we investigate both sensory and motor related beta. Because EEG activity smears at the scalp it can be difficult to separate out concurrent sources of activity. We use Independent Components Analysis (ICA) as a blind source separation method in an attempt to distinguish sensory and motor related activity.
Based on the assumption that beta oscillations play a general role in top-down processing, we hypothesized that we would find induced beta power modulation for both auditory and visual modalities following the same pattern seen in Fujioka et al. (2009). Specifically, we hypothesized we would find an induced increase in beta in anticipation of the onset of each rhythmic stimulus event, and also prior to the expected onset of an omitted event (omission onset), followed by a sharp decrease in beta power after event onset, but not after omission onset.
Further, we expect that evoked beta power would increase only in response to stimulus onset and not in anticipation of omission onset. Because the motor system has been implicated in both auditory and visual rhythm perception, and evidence of motor related beta for rhythm perception has been seen for auditory rhythms (Fujioka et al., 2012(Fujioka et al., , 2015, and implicated in visual rhythms (Varlet et al., 2020), we expected to find motor related predictive beta activity for both auditory and visual modalities. We also expected to find distinct network activity in predictive beta for visual rhythm perception, specifically greater evidence for predictive beta in the parietal and visual cortices, given evidence of visual timing activity in these regions (Jäncke et al., 2000;Jantzen et al. 2005;Zhou et al. 2014;

Materials and methods
Participants 18 subjects participated in the experiment (11 female, average age of 23.6 (20 -34)) with one being rejected after data collection for poor signal to noise ratio. All participants were righthanded and had typical hearing and typical or corrected vision. The experimental protocol was carried out in accordance with the Declaration of Helsinki. This study was approved by the UC Merced Institutional Review Board for research ethics and human subjects, and all participants gave informed consent prior to testing.

Task
After subjects gave written consent, they were seated and fitted with a 32 electrode EEG cap. Subjects were then tasked with watching isochronous flashing visual rhythms or listening to isochronous auditory rhythms. Both kinds of rhythms had an interonset interval (IOI) of 600 ms, and both had occasional omissions of single tones or single flashes. The rhythms were broken into stimulus trains with each train consisting of 100 tones or flashes with 7 omitted tones or flashes placed randomly within the train. The location of the omitted tones or flashes in the stimulus trains were constrained such that there must be at least 8 tones or flashes between each omission. There were 20 stimulus trains per condition for a total of 140 omissions in each condition. Subjects completed all of the stimulus trains in one modality, followed by all of the stimulus trains in the other modality, in design counterbalanced across subjects. Before the omission conditions, subjects were presented with a condition with no omissions consisting of 140 tones or flashes. The non-omission stimulus trains were of the same modality as the omission stimulus trains that would follow. This design resulted in 140 trials for each of the four conditions (tone non-omission, tone omission, flash non-omission, flash omission).
To ensure that subjects were attending to the rhythms, after each train a shorter sequence of 5 tones or flashes was presented at a slightly slower or faster tempo than the experimental train, and subjects were asked to determine if the shorter rhythm was slower or faster than the preceding rhythm. The number of correct responses and response times were recorded and used to determine if subjects were adequately attending to the stimulus trains. The auditory metronome consisted of 1000 Hz tones lasting 50 ms with a 10 ms rise and 40 ms fall time, generated using Audacity digital audio software. The visual metronome consisted of light grey square flashes 3 cm x 3 cm lasting 50 ms each. In both cases there was a black screen with a dark grey fixation cross in the center of the screen where the lines were approximately 3 mm wide and 4 cm long. The visual flashes always appeared behind the fixation cross so that the cross never disappeared when the flash appeared behind it.
The stimuli were presented using Paradigm experimental stimulus presentation software (Perception Research Systems, 2007) on a 60 Hz monitor, which was approximately 65 cm from the subject's head. Subjects responded to any prompts using a keyboard placed on a desk in front of the chair they were seated in.
EEG data acquisition and processing EEG was continuously recorded using an ANT-Neuro 32 channel amplifier with the ANT-Neuro 32 electrode Waveguard cap. The electrodes were situated according to the 10-20 International system and EEG was recorded with a sampling rate of 1024 Hz. The data were then processed using the EEGLAB v14.1.1 toolbox (Delorme and Makeig, 2004) within Matlab 2019a. Channel locations were added using the standard location montage for the Wavegaurd cap. EEG data were first pruned by hand to remove sections between stimulus train blocks. This was done to remove any break periods between trains. Following pruning, the data were downsampled to 256 Hz and then a high pass filter with a 2 Hz passband edge and 6 dB cutoff at 1 Hz was applied. A lowpass filter with a 50 Hz passband edge and 6 dB cutoff at 56.25 Hz was applied to remove 60 Hz line noise. Bad channels were rejected that had activity with lower than 0.8 correlation with their surrounding channels, and the rejected channels were then interpolated using spherical interpolation. We then removed single channel artifacts using artifact source reconstruction (ASR) which has been shown to effectively remove large-amplitude or transient artifacts in the data (Mullen et al., 2015;Chang et al., 2018). ASR was performed using a conservative burst criterion parameter of 50 standard deviations. After ASR was run we then rereferenced the data to average. In order to separate out non-brain artifacts and for the source level analysis we ran Independent Components Analysis (ICA) using the AMICA ICA algorithm (Palmer et al., 2012). Dipole source localization was performed on the resulting components using the MNI head model, and 2 dipoles were fit where appropriate instead of 1 using the FitTwoDipoles plug in (Piazza et al., 2016). ICA components were checked to find eye blink and cardiac components, which were marked for later rejection. The remaining independent components were used for source analysis We then segmented the continuous data into 4 long epochs for the experimental conditions: Non-omission visual flashes, non-omission auditory tones, visual omissions, and auditory omissions. The non-omission conditions came from the non-omission stimulus train block that preceded the omission block. Each condition was epoched from -1.67 seconds prior to each tone/flash to 1.67 seconds following the tone/flash. Epoch length was determined by calculating the window size needed for the later time/frequency calculations so the resulting time/frequency data would span +/-1.5 seconds from the tone or flash onset of interest. The omission groups were epoched in the same way in relation to omission events. Following epoching, epochs were checked for blinks that occurred during either event onset (for the nonomission conditions) or expected onset (for the omission conditions) as defined as a 50 uV or larger spike in frontal electrodes within +/-100 ms of onset or expected onset. After epochs with eye blinks at event onset, or expected onset, were rejected, eye blink components determined by AMICA marked earlier were then rejected. Remaining epochs with amplitude spikes greater than +/-500 uV were then rejected. Finally, epochs that were deemed improbable were rejected by computing the probability distribution of values across the epochs for individual channels and across all channels. Any epoch that contains data values greater 6 standard deviations for the channel or 2 standard deviations for all electrodes was rejected. One subject was rejected due to having over 50 % of their total epochs being rejected. For the remaining 17 subjects there were EEG activity measured at the electrode level is smeared across the scalp making it difficult to separate out signals from different sources. Because we are interested in time sensitive neural activity from both sensory and motor areas that occur simultaneously, we focus our analysis on the source level components. To compare independent components across subjects, we performed a cluster analysis using k-means clustering based on the component dipole locations and scalp topographies. To ensure non-brain sources were excluded from clustering, only components with dipoles located within the head and with a residual variance of less than 15% were used resulting in a total of 289 total brain components across 17 subjects. To determine the appropriate number of clusters, we applied three measures for cluster number optimization (Calinski-Harabasz, Silhouette, and Davies-Bouldin) for between 5 and 30 clusters.
The Calinski-Harabasz and Silhouette methods indicated the optimal number of clusters was 9 while the Davies-Bouldin method indicated an optimum number of 13. We used 9 clusters to maximize the number of unique subjects per cluster, plus 1 outlier cluster with components with positions of more than 3 standard deviations from any of the cluster centers. In addition, the parent cluster consisted of all 289 components. The 9 clusters (figure 2, table 1) averaged 31.78 components per cluster with a standard deviation of 7.1, which were made up from 15.78 subjects on average, standard deviation 0.97. The outlier cluster consisted of three components from 2 subjects.     averaging the power at each step from between 14 and 30 Hz.

Attention task behavioral results
To assess if attention was maintained evenly between the two modalities, we analyzed the behavioral data from the attention task for the two omission conditions. Both auditory (94.72%) and visual (88.61%) conditions showed a correct response rate well above chance. To assess the differences between the auditory and visual conditions, the number of correct responses and response times were assessed using paired t-tests. There was a significant difference in number

Event Related Spectral Perturbations
To determine if ERSP power was being significantly modulated by the stimuli and Looking at the parent cluster containing all components, we find increased evoked power following both visual and auditory stimulus onset, but not in response to visual or auditory omission onsets (Figure 3a & 3b). Induced activity from the visual condition in the parent cluster increases significantly and peaks roughly at stimulus onset, but also increases at omission onset, particularly in the low beta range (Figure 3b). This pattern is also seen in the posterior clusters for visual activity (Figures 4 & 5). Auditory ERSP power modulation is less pronounced

Beta Band Slope Analysis
While significance testing in ERSP power can indicate significant power modulations in response to stimuli, we are interested in the dynamics of beta band activity following findings that indicate beta power rises to peak at the expected onset of an auditory tone, where the rate of the rise is dependent on the tempo of the stimuli (Fujioka et al., 2012). Since we hypothesized that rise in beta activity is related to the timing of the rhythmic stimuli, we would see beta power rise prior to the expected onset of the omitted stimuli. To test this hypothesis, 2 slopes were fitted in the averaged beta activity for each subject for each condition based on a least squares measure. The first slope started at -300 ms prior to stimulus or omission onset and ended at stimulus or omission onset (0 ms). Using -300 ms as the starting point was chosen as the halfway point between stimuli. Because there is considerable variation across subjects in slope activity, a second slope was fitted starting at the lowest measured activity between -300 and -100 ms and ending at stimulus or omission onset. To provide a third condition for comparison, we shuffled the ERSP data used to find slopes in the control condition for each subject at each channel, and for each component for each cluster, and then extracted beta band power and fitted slopes. ERSPs. Slopes were then fitted in the same way as with the non-shuffled data, except that instead of finding the minimum beta power between -300 and -100 ms for the shuffled condition, we used the same the same starting point used in the non-shuffled control condition for the corresponding subject or component.
Four sets of t-tests were used to determine if the fitted slope of beta activity prior to the onset of a tone or flash was equivalent to the fitted slope of beta activity prior to the expected but omitted onset of a tone or flash for both induced and evoked activity and for both the slopes the fitted from -300 ms to onset and for the slopes fitted to the trough between -300 and -100 ms and onset. FDR correction was used to correct for multiple comparisons for all t-tests using the method described in Benjamini and Hochberg (1995) with alpha set to 0.05. The first three analyses were performed using paired t-tests comparing: the slopes of the omission conditions to the slopes of the control conditions, the slopes of the control conditions to the slopes of the shuffled conditions, and the slopes of the omission conditions to the slopes of the shuffled conditions. If beta power is being modulated such that it shows anticipation of the stimulus rather than only reaction to the stimulus we would expect both the omission and control fitted slopes to be significantly different from the shuffled fitted slopes, and we would also expect the omission and non-omission fitted slopes to not be significantly different.
Showing that a fitted slope in the omission condition is not significantly different from the slope in the non-omission condition, yet significantly different from a flat slope is not sufficient to claim that the slopes in the omission and non-omission conditions are equivalent. This is because a comparison between significant results and nonsignificant results is not necessarily significant (Gelman & Stern, 2012). To assess the viability of the comparison between the two results, we applied a post-hoc comparison test as used in Abbott & Shahin (2018). The test calculated if the slope of the non-omission condition + the slope of the shuffled condition -2 x the slope of the omission condition was significantly different from zero using a t-test with the same FDR correction as used for the other t-tests at each channel and each cluster.
The results of these tests at the electrode level show that only channel P8 meets the criteria for the 4 tests: p > 0.05 for the omission to non-omission slopes comparison, p < 0.05 for the comparisons of the non-omission to shuffled and omission to shuffled slopes, and p < 0.05 for the post hoc comparison test as applied to the slopes fitted to the between the trough of beta power and onset for induced beta. Additional channels met the first 3 criteria, but did not reach significance in the post-hoc test for the induced trough fitted slope for both visual and auditory conditions ( Figure 8). No channels met these criteria for the slopes fitted at the fixed values between -300 ms and onset for the visual condition for induced or evoked beta. No auditory channels met the 4 criteria for any of the conditions.    Slopes fitted to evoked beta at the cluster level resulted in the parent cluster for both auditory and visual modalities, and cluster 2 (left sensorimotor) for the auditory modality meeting all 4 slope criteria for the trough fitted slopes (figure 6). Clusters 3 (midline central), 5 (right sensorimotor), and 8 (parietal) met the first 3 criteria for the trough fitted slope tests in both modalities. Clusters 6 (left temporal/parietal) and 1 (left frontal) in the auditory and visual modalities respectively met the first 3 slope criteria for the trough fitted slopes. No cluster met any of the necessary criteria in the slopes fitted between -300 and 0 ms to evoked beta activity.
All slope measures and tests for the visual and auditory slopes can be found in the supplemental tables 1 (visual) and 2 (auditory)  conditions appear to show stimulus modulated induced beta power that is not modulated by the expected stimulus onset. Evoked beta for both auditory and visual conditions increases after stimulus onset, and appears to also increase after the omission onset, but not significantly.

Evoked and Induced comparison
To further understand the different roles evoked and induced beta play in the temporal aspects of auditory and visual rhythm processing, we measured peak power and peak time in response to both present and omitted tones and flashes. To make the comparison ERSP power P was converted from dB to uV 2 and normalized using the formula:

Summary of Results
Using a cluster based approach to describe network-level beta band activity, we described predictive timing in a modality-specific way. Analyses on the slopes of beta activity from the parent clusters reveal evidence for both induced and evoked predictive timing in auditory and visual modalities at the global level. A look at the slopes of beta activity from individual clusters indicates evidence of induced predictive timing in the visual modality in posterior regions: left and right temporal/parietal clusters, and parietal cluster; the midline central cluster, and from the right frontal cluster. Slope based evidence for induced predictive timing in the auditory modality was found in the parietal cluster. Cluster specific evidence of evoked predictive timing in slope measures was seen only in the auditory modality, and only in the left sensorimotor cluster.
Based on previous results from Snyder & Large (2005) we expected evoked beta peak power to be significantly lower for omission events compared to tone or flash events, and we expected there to be no significant difference in induced beta peak power between omission events and tone or flash events. This pattern was seen much more prominently in the auditory modality, specifically in the parietal, left and right sensorimotor, left and right frontal, and left temporal/parietal clusters. A significant difference would additionally be expected between how much evoked beta peak power shifted between non-omission and omission conditions and how much induced beta power shifted between non-omissions and omissions. This significant difference was replicated in several clusters: the parietal cluster, left and right sensorimotor clusters, and the right frontal cluster, thus providing strong evidence for auditory induced beta playing a predictive role in networks of those regions. There were a few differences in the peak times in auditory beta across both induced and evoked activity and conditions. The significant shift in peak time from tone to omitted tone between induced and evoked beta for the right sensorimotor cluster follows the expected pattern of induced beta peaking later in response to an omitted tone than in response to a non-omitted tone. The evoked beta peaked earlier in response to an omitted tone than in response to a non-omitted tone. While not significant, we find it interesting that the opposite pattern with beta peak time appears in the left sensorimotor cluster: induced beta peaked slightly earlier in response to omitted tones than in response to tones, yet evoked beta peaked slightly later in response to the omitted tones than in response to the tones. This is in concordance with what would be expected if evoked beta was playing a predictive role, and when taken in conjunction with the slope evidence of predictive evoked activity in the left sensorimotor cluster suggests the existence of significant hemispheric differences in auditory rhythm processing mechanisms.
Differences in evoked and induced beta power in response to visual non-omissions and omissions did not provide clear evidence of predictive beta as seen in the auditory case, except for in the shift of peak power between evoked and induced activity from flash to flash omission in the parent, parietal, midline central, right frontal, and right temporal/parietal clusters.
Interestingly, a look at differences in peak times does provide stronger evidence suggesting separate roles for evoked and induced beta for the parietal, right and left temporal/parietal, and occipital clusters. In these clusters the evoked beta peak came earlier in response to omitted flashes than to non-omitted flashes, while induced beta peaked later in response to omitted flashes than to non-omitted flashes, which is what would be expected if induced beta activity was playing a predictive role, while evoked beta was only responsive to stimuli. Taken together with the slope results, we interpret these findings as evidence of induced beta playing a predictive role in visual rhythm perception similar to that reported in previous studies for auditory induced beta (Fujioka et al., 2009(Fujioka et al., , 2012(Fujioka et al., , 2015Snyder & Large 2005).

Predictive Beta band activity
Beta modulation has been shown to play a role in a wide range of activities including top down control on sensorimotor systems (Engel & Fries, 2010;Arnal et al., 2011;Picazio et al, 2014;Haegens & Golumbic, 2018), facilitating long-range communication between cortical regions (Kopell et al., 2000;Kilavik et al., 2013) such as between sensorimotor and peripheral areas (Fujioka et al., 2015), and is suggested to play a role in encoding temporal intervals (Wiener et al., 2016). Beta band activity also correlates with motor behavior, with power attenuation just before and during movements (See Kilavik et al., 2013 for review). Considering the suggested role the motor cortex has in timing and predictive processing (Schubotz et al., 2000;Patel & Iversen, 2014), the role of beta in imposing general top down control, and its role in facilitating communication with sensorimotor peripheral systems, it is not surprising that beta activity appears to play a role in rhythm perception and prediction.
Beyond the link to sensorimotor behavior, beta activity is known to play a role in auditory rhythm perception. Frontocentral induced beta and gamma modulation occurs with the onset of rhythmic events and can be seen at the expected onset of an omitted event (Snyder & Large, 2005). Fujioka et al. (2012) found that beta power arising from the auditory cortices increases before tone onset in an isochronous rhythm at a rate dependent on the tempo of the rhythm, and attenuates following the tone at a constant rate not dependant on the tempo of the rhythm. Beta activity has also been seen to play a role in maintaining beat and meter structure (Fujioka et al., 2015). Consistent with these findings, we find evidence of auditory induced beta power peaking in anticipation of both tones and omitted tones, with the strongest evidence coming from the parietal, left and right sensorimotor, and right frontal clusters. Because the source of neural activations are more difficult to localize using EEG than MEG, some caution is needed in interpreting the location of these sources. However, given other findings suggesting predictive induced beta arising from fronto-central regions using EEG (Snyder & Large 2005), and from the auditory cortices, sensorimotor cortices, and parietal cortices using MEG (Fujioka et al., 2012(Fujioka et al., , 2015, we believe the regions indicated by the cluster locations are reasonable interpretations of the source of the predictive beta we measured. It is of note that we did not find evidence of predictive beta that we could tie clearly to the auditory cortex. This may be a limitation of the cluster approach we used with the independent components, but it has also been put forth that signals arising from the auditory cortex are more suited to being measured by MEG than EEG (Destoky et al., 2019).
When looking at beta modulation in the visual domain, we see a beta power increase at the expected onset of an omitted flash in multiple clusters. Comparing beta modulation in anticipation of the visual onset between the omission and non-omission conditions shows induced beta power increasing prior to onset, followed by a sharp power drop-off, but only after flash onset, and not following omission onset. While we expected to find predictive beta activity in the visual domain, it was surprising to see evidence of predictive induced beta modulated more clearly and across more clusters in the visual domain than in the auditory domain because the timing aspects of rhythm perception in the auditory domain are thought to be more precise as evinced by less variability in auditory SMS compared to visual SMS (Repp 2005, Repp & Su 2013. We suggest this discrepancy between auditory and visual beta modulation is due a combination of factors. The most important factor being the size differential between the visual and auditory cortices; the visual cortex is much larger than the auditory cortex, and so processing of visual stimuli involves more cortical neurons resulting in more neural activity measured at the scalp than auditory cortex would produce. Compounding this is the suggestion previously mentioned that auditory signals are more suited to measurement from MEG than from EEG (Destoky et al., 2019), resulting in a comparatively reduced measurement of beta modulated by auditory rhythms.
The clusters that show evidence of predictive beta activity for the visual modality do not perfectly overlap with what is seen in the auditory modality. In the sensorimotor clusters, we only find evidence of auditory predictive beta in bilateral sensorimotor clusters, and not visual predictive beta. There is evidence of visual predictive beta in the midline cluster, which contains dipoles localized to the premotor regions. This may indicate motor system involvement and would be inline with research suggesting the medial premotor region plays a role in predictive timing in primates across sensory modalities (Merchant et al., 2013). However, this begs the question of why the same activity was not seen in the auditory modality if premotor timing activity is not modality specific. A possible explanation is given by work reporting that a greater number of cells in the primate SMA respond to visual timing cues than to auditory timing cues (Merchant et al., 2015), although it is not clear if this finding extends to humans or if it is specific to the primates involved in that study. It is also of interest that we find predictive visual induced beta activity from the slope analysis in left and right temporal/parietal junction and parietal clusters, but not in the occipital cluster. Given the difficulty in localizing sources with EEG, and the component distribution of the four posterior clusters, it is likely the left and right temporal/parietal and parietal clusters contain activity arising from cortical patches within the occipital cortex. Considering the distribution of components, and the faster rebound in induced beta power in the occipital cluster (figure 5b), we consider it likely that activity from early processing areas of the visual cortex (e.g. V1) are more strongly represented in the occipital cluster than the surrounding posterior clusters. This however cannot be confirmed with the spatial limitations of EEG, and will require a methodology with greater spatial precision to test.
While beta power modulation in response to visual rhythmic flashes has been seen before (Saleh et al., 2010, Meijer et al., 2016, to our knowledge this is the first time it has been shown predicting the onset of an omitted event. However, it has been questioned whether beta modulation is even related to temporal prediction at all (Meijer et al., 2016). Meijer et al., (2016) investigated beta activity with a rhythmic visual task and found beta power modulation in response to isochronous visual rhythms of different tempi (IOI's of 1050, 1350, 1650 ms), yet the rate of beta power modulation was the same regardless of the tempo used. This is different from what was found by Fujioka et al., (2012) in their study of auditory beta modulation, where the rate of beta power prior to tone onset was modulated by the tempo of the rhythm. Meijer et al., (2016) interpreted their result as evidence that beta activity is not playing an entraining role in the visual system, suggesting instead that the beta peaks seen may be caused by rebounding activity in response to the flash, peaking roughly 900 ms after event onsets. The current study provides the contrary evidence, and suggests that beta modulation may be playing a role in prediction of the onset of visual events, since the beta modulation during the omission could not be in response to any event, and instead must be responding to the timing of the expected onset of the flash. Induced beta peaks less than 50 ms after the omission onset, or 650 ms after the onset of the prior stimulus (figure 4), which is much earlier than would be expected for beta power rebound in response to the flash event, as described by Meijer et al. (2016). We suggest the reason for the discrepancy between Meijer et al.'s (2016) findings and those findings reported here may be due to their use of relatively slow tempi compared to the 600 ms IOI of this study. There is evidence that sub-second timing and supra-second timing use different networks (see Wiener et al, 2010 for a review). We therefore suggest beta synchronization may only be playing a predictive role in the sub second time scale., the task used in the Meijer et al. (2016) study was much more complicated than simply attending to the timing of the rhythms as in our task, and demanded more attention and possibly competing resources.

Contribution of the motor system
Previous studies have described induced beta modulation to auditory rhythms arising from sensorimotor cortices (Fujioka et al., 2012(Fujioka et al., , 2015. There is also evidence that auditory timing appears to rely on motor cortex (Janata et al., 2012;Repp and Su, 2013;Iversen and Balasubramaniam, 2016;Ross et al., 2016aRoss et al., , 2016b and motor networks with nodes in the parietal lobes, cerebellum, and basal ganglia (Repp & Su, 2013;Patel & Iversen, 2014;Levitin et al., 2018). This motor network activity could indicate that the motor system is playing an important role in predicting the timing of events in auditory rhythms, often discussed in the context of evolution of social activities such as dance and language. (Fitch, 2016;Iversen, 2016;Patel, 2006). The auditory beta modulation from the sensorimotor clusters we present here is consistent with the narratives of the previous literature on the involvement of the motor system for auditory timing. This can be contrasted with our findings from the visual system where there is no evidence of predictive beta timing in the bilateral sensorimotor clusters, and instead evidence in the mid-central cluster that may be related activity arising from the SMA.
In the auditory modality, we found evoked predictive beta timing activity in the left sensorimotor cluster ( figure 6a), yet we found evidence of induced predictive timing activity in the right sensorimotor cluster (figure 7a). The asymmetrical beta activity seen in the two sensorimotor clusters specific to the auditory conditions suggests hemispheric specialization specific to auditory processing. A recent meta-analysis on neural activation during music listening shows consistent MRI activation in the right but not left primary motor cortex during music listening tasks (Gordon et al., 2018). Interestingly, they found that studies that asked the subjects to move a body part while listening elicited stronger activity in the right primary motor cortex than studies using passive listening tasks. Others describe a left hemisphere role (Pollok, Rothkegel, Schnitzler, Paulus, & Lang, 2008) or non-motor-dominant hemisphere role (Kaulmann, Hermsdörfer, & Johannsen, 2017;Yadav & Sainburg, 2014) for motor timing.
Similarly, for language perception there appears to be hemispheric specialization in the auditory cortices, with the left hemisphere specialized in temporal changes and the right hemisphere in spectral changes (Zatorre et al., 1992;Zatorre & Belin, 2001). Specifically, it has been shown that activity in the left anterolateral superior temporal sulcus (STS) corresponds to processing of temporal aspects of speech perception, while perception of spectral features of speech are associated with the same structure in the right hemisphere (Obleser et al., 2008). Our results support bilateral motor contributions to auditory timing, although the mechanism that results in predictive evoked activity in the left hemisphere and predictive induced beta activity in the right hemisphere may be distinct.

Limitations & Future Directions
The current study reveals that timing and prediction for visual rhythm perception could employ non-motor networks. We cannot say what role, if any, the motor system plays in visual timing. A closer look at the connections between visual and motor systems is needed to elucidate the issue. Using flashing visual rhythms as opposed to moving visual rhythms may elicit a different picture of activation as the visual system is better tuned to discerning temporal information when movement is present (Hove et al, 2013b).
Another limitation of the current study is that we did not use multiple tempi. Having only one tempo makes it unclear how much the change in time course of neural activations is related to the tempo. Using multiple rhythms with different tempi would allow for a clearer differentiation between tempo dependent aspects of timing. If those tempi spanned both subsecond and supra-second timing it would also provide insight to the temporal limits to the mechanisms in visual rhythm perception.
Although we see frequency band specific oscillatory modulation during rhythm perception, caution should be used in assuming this is the brain's mechanism of timing. There is evidence for multiple mechanisms for timing (for review see: Wiener et al., 2010;Wiener & Kanai, 2016;Comstock, Hove, & Balasubramaniam, 2018), and here we describe one reflection of these processes. Oscillatory dynamics likely reflect more broadly the mechanism for spreading information between or across networks, and timing perception is only a subset of neural communication happening during these tasks.
Additional investigation is needed into the differences seen between left and right motor contributions to auditory timing. While the differences suggest possible functional lateralization in auditory rhythm perception, it is unclear if those differences are driven by handedness (Kaulmann, Hermsdörfer, & Johannsen, 2017;Yadav & Sainburg, 2014) or other factors (Pollok, Rothkegel, Schnitzler, Paulus, & Lang, 2008). Future studies are needed to look more closely at specific hemispheric contributions.
Finally, the inherent low spatial resolution of EEG limits how confidently we can draw conclusions about neural sources. We describe broad cortical source regions/networks in lieu of more focal sources with respect to this methodological limitation, but argue that the ICA-based cluster analysis leads to reasonable spatial and functional grouping of neural activity likely from common sources. That being said, we cannot speak with certainty about the exact cortical sources of the activity we describe. A method with better spatial resolution that retains fine temporal resolution, such as MEG or ECoG, would provide better source resolution for predictive rhythm perception networks.

Conclusion
We investigated the mechanisms of prediction for auditory and visual rhythms using an omission paradigm. Results show induced beta activity predicting the expected onset of visual rhythmic events bilaterally in temporal/parietal clusters, in a dorsal medial cluster, a parietal cluster, and a right hemisphere frontal cluster. We also show induced beta activity predicting the expected onset of rhythmic auditory events bilaterally in sensorimotor clusters, in a parietal cluster, and in a right hemisphere frontal cluster. We additionally present evidence for evoked auditory predictive timing in a left motor cluster. Our results support theories of predictive timing in both visual and auditory modalities, that can be observed in beta band oscillatory activity. Our results also support, using a cluster based approach, that visual and auditory prediction for rhythmic events may be subserved by modality-specific cortical networks, although they do not rule out the possibility that both auditory and visual networks are subserved by a common subcortical network. These findings also suggest that auditory timing may involve hemisphere specific activity, and reliance on motor networks.