GLMsingle: a toolbox for improving single-trial fMRI response estimates

Advances in modern artificial intelligence (AI) have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to tens of thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle, a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses (glmsingle.org). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experimental paradigm. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Furthermore, these improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. Specifically, we demonstrate that GLMsingle: (i) improves the decorrelation of response estimates between trials that are nearby in time; (ii) enhances representational similarity between subjects both within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets that sample brain activity across many experimental conditions.


INTRODUCTION
: Overview of GLMsingle GLMsingle takes as input a design matrix (where each column indicates the onset times for a given condition) and fMRI time-series in either volumetric or surface space, and returns as output an estimate of single-trial BOLD response amplitudes (beta weights). GLMsingle incorporates three techniques designed to optimize the quality of beta estimates: first, the use of a library of hemodynamic response functions (HRFs), where the best-fitting HRF from the library is chosen for each voxel; second, an adaptation of GLMdenoise (Kay et al., 2013) to the single-trial GLM framework, where data-derived nuisance regressors are identified and used to remove noise from beta estimates; and third, an efficient re-parameterization of ridge regression (Rokem and Kay, 2020) as a method for dampening the noise inflation caused by correlated single-trial GLM predictors.

94
To assess the impact of GLMsingle, we evaluate four different types of single-trial response estimates 95 (henceforth, beta versions). The first arises from a baseline procedure that reflects a typical GLM 96 approach for fMRI analysis (beta version b1), and each subsequent beta version (b2-b4) incorporates an  2. An optimal HRF is identified for each voxel (Allen et al., 2022) by iteratively fitting a set Figure 2: Impact of GLMsingle on voxel test-retest reliability To compute reliability for a given voxel, we measure the test-retest Pearson correlation of GLM beta profiles over repeated presentations of the same stimuli (see Methods). (A) Differences in reliability between b1 (derived from a baseline GLM) and b4 (the final output of GLMsingle) are plotted within a liberal mask of visual cortex (nsdgeneral ROI). Scatter plots show reliability values for individual voxels. (B) Relative differences in mean reliability within the nsdgeneral ROI. For each voxel, we computed the mean reliability value over all beta versions being considered (b1-b4), and then used this as the basis for thresholding voxels (from Pearson r = −0.2 to 0.6). At each threshold level, for each beta version, we compute the voxel-wise difference between the reliability of that specific beta version and the mean reliability value, and then average these difference values across voxels within the nsdgeneral ROI. The traces in the first column indicate the mean (+/-SEM) across subjects within each dataset. The bars in the second column indicate subject-averaged differences in reliability at threshold r = 0.2. The relative improvement in reliability due to GLMsingle (b1 vs. b4) tends to increase when examining voxels with higher reliability, and each optimization stage within GLMsingle (HRF fitting, GLMdenoise, ridge regression) confers added benefit to voxel reliability. estimation that can arise from having correlated single-trial predictors. To directly compare GLMsingle 139 to LSS, we computed auxiliary GLMsingle beta versions that do not incorporate GLMdenoise. This 140 allows us to isolate the effect of the GLM estimation procedure (i.e., LSS vs. fractional ridge regression).

141
For both the case of an assumed HRF and the case of voxel-wise tailored HRFs, we find that fractional 142 ridge regression yields more reliable signal estimates than LSS (Figure 3). These improvements 143 are most pronounced in the most reliable voxels (Figure 3c). LSS can be viewed as applying heavy 144 regularization uniformly across voxels, while our ridge regression approach is more flexible, tailoring 145 the degree of regularization to the SNR of each voxel. Heavy regularization may actually degrade the 146 quality of signal estimates in reliable voxels, and our approach avoids this possibility. 147 We then performed a complete assessment of all auxiliary beta versions and the primary versions 148 (b1-b4), in order to determine whether any other analysis strategy could achieve parity with b4 in the 149 quality of GLM outputs. Reassuringly, when summarizing the relative quality of all 8 beta versions 150 over a range of reliability thresholds, we observe superior performance from b4, the default output of 151 GLMsingle ( Figure 3a).

152
GLMsingle relies on an internal cross-validation procedure through which key hyperparameters (the 153 number of noise regressors and the voxel-wise levels of ridge regression regularization) are optimized to 154 maximize the consistency of responses across condition repetitions. This raises a possible concern that 155 our reliability estimates (e.g. Figure 2) are somewhat optimistic. As a strict assessment of reliability, 156 we repeated the reliability quantification for each of the 8 beta versions, this time computing test-retest  For each dataset, we compute the degree of temporal autocorrelation in each beta version by averaging session-wise representational similarity matrices over subjects. We plot results arising from analysis of voxels at two different reliability thresholds (r = 0 and r = 0.3) for NSD (A) and BOLD5000 (B). Assuming that ground-truth neural responses to consecutive trials should be uncorrelated on average, positive (or negative) Pearson r values off the diagonal imply sub-optimal estimation of BOLD responses. In the right-most column, we plot mean autocorrelation between all pairs of timepoints. Applying GLMsingle (b4) results in a substantial decrease in temporal autocorrelation compared to a baseline GLM approach (b1).
as well as in BOLD5000, where stimuli are spaced 10 s apart to alleviate issues relating to signal 173 overlap. To quantify these effects, we compute mean temporal autocorrelation as a function of time

183
When applying GLMsingle, these patterns of temporal autocorrelation change dramatically. In NSD 184 b4, autocorrelation drops to r = 0 much more rapidly than in b1, and in BOLD5000, beta maps from 185 successive trials in b4 are now nearly uncorrelated on average. This is an expected outcome, since RDMs of subject pairs between the two datasets. The latter result is especially striking given the many 210 methodological differences between NSD and BOLD5000: fMRI data were collected at different sites 211 on different scanners, at different field strengths (7T vs. 3T), with different behavioral tasks, and with 212 different inter-stimulus intervals (4 s vs. 10 s).

213
These results indicate that GLMsingle, through its multifaceted approach to mitigating the effects of 214 noise, helps reveal meaningful shared variance in neural responses across individuals who viewed the 215 same stimuli. The GLMsingle toolbox may therefore be a key resource for future fMRI studies seeking 216 to stitch together data across subjects from different sites or cohorts.  Figure 6: Impact of GLMsingle on MVPA decoding accuracy (A) Image-level linear SVM decoding accuracy by beta version. At each reliability threshold, we compute the mean decoding accuracy over subjects within each dataset, as well as the standard error of the mean. Classifiers are trained on n − 1 available image repetitions, and tested on the held-out repetition, with accuracy averaged over cross-validation folds. Applying GLMsingle (b4) yields dramatic increases in image decodability compared to a baseline GLM (b1). (B) The effect of GLMsingle on animacy representation is shown in an example NSD subject (subj01) using multi-dimensional scaling. GLMsingle clarifies the division in representational space between stimuli containing animate and inanimate objects. COCO stimuli containing identifiable human faces are masked with a rectangle for the sake of privacy.

234
As scientific datasets grow in scale and scope, new techniques for data processing will help to unlock 235 their potential. This is especially true in human neuroscience where data remain both expensive and 236 time-consuming to collect (Naselaris et al., 2021). This paper has introduced GLMsingle, a publicly 237 available toolbox for analyzing fMRI time-series data that leverages data-driven techniques to improve 238 the accuracy of single-trial fMRI response estimates. We have tested GLMsingle extensively using NSD 239 and BOLD5000, two of the largest fMRI datasets that densely sample responses within individuals.

305
Given the requirement of repeated discrete events, GLMsingle is not applicable to resting-state data, 306 since they contain no explicit task structure. Similarly, GLMsingle is not suitable for experiments that 307 involve continuous event structures -for example, movie watching, storytelling, dynamic exploration, 308 game-playing -unless certain events within the task are coded as discrete, repeated instances. For 309 example, the appearance on-screen of a particular character could be treated as a repeated "event" in 310 constructing a design matrix. Or, as another example, certain words or parts of speech could be treated 311 as "events" within a continuous auditory or linguistic experiment.

312
It is important to consider whether denoising comes at the potential cost of introducing bias (Kay,313 2022). Considering each component of GLMsingle, we believe that the risk of bias is minimal for most 314 use cases. First, considering the library-of-HRFs approach, we note that the conventional approach 315 of using a fixed canonical HRF actually incurs more risk of biasing response estimates than does an 316 approach that attempts to flexibly capture variations in HRFs. Nonetheless, we acknowledge that the 317 library may not necessarily capture all HRF shapes, and this represents one possible source of bias 318 (though likely minor). Second, considering the GLMdenoise procedure, we note that data-derived 319 nuisance regressors are not blindly removed from the time-series data prior to modeling, as this would especially for experiments where condition ordering is pseudorandom, it is unlikely that this form of 328 temporal regularization and its associated bias would lead to incorrect scientific inferences.

329
Online example scripts and tutorials 330 To enable easy adoption of GLMsingle, we provide extensive documentation and example scripts for 331 common neuroimaging use-cases (glmsingle.org). Publicly available online resources include code 332 implementation of GLMsingle in both MATLAB and Python, example scripts and notebooks, technical 333 documentation, and answers to frequently asked questions. The GLMsingle pipeline is designed to 334 be easy to implement in different neuroimaging pipelines. The example scripts we provide illustrate 335 typical GLMsingle usage for both event-related and block designs. These scripts guide the user through 336 basic calls to GLMsingle, using representative, small-scale example datasets. We hope these practical 337 resources facilitate the application of GLMsingle to existing and future neuroimaging datasets.

339
Our results suggest that GLMsingle represents a methodological advancement that will help improve 340 data quality across different fMRI designs. While improvements in MR hardware (e.g. magnetic field 341 strength, RF coil, pulse sequences) and experimental design (e.g. optimized study design and trial The fMRI data from NSD were pre-processed by performing one temporal resampling to correct 506 for slice time differences and one spatial resampling to correct for head motion within and across 507 scan sessions, EPI distortion, and gradient nonlinearities. This procedure yielded volumetric fMRI 508 time-series data in subject-native space for each NSD subject. In this paper, we analyze the standard-509 resolution pre-processed data from NSD which has 1.8-mm spatial resolution and 1.333-s temporal 510 resolution (the time-series data are upsampled during preprocessing).

512
For complete details of the BOLD5000 study and methodology, refer to the corresponding dataset paper  Functional data were collected at 3T, with 2-mm isotropic resolution, and with a TR of 2 s. Each trial 520 lasted 10 s, and consisted of the presentation of an image for 1 s, followed by a 9-s gap. A total of 521 either 9 or 10 runs were collected in one session, containing 37 stimulus trials each, for a total of either 522 333 or 370 trials per session.

523
The fMRI data from BOLD5000 were preprocessed using fMRIPrep (Esteban et al., 2019). Data 524 preprocessing included motion correction, distortion correction, and co-registration to anatomy (or 525 further details, please refer to the BOLD5000 dataset paper (Chang et al., 2019). This yielded volumetric 526 fMRI time-series data in subject-native space for each BOLD5000 subject.

527
Because GLMsingle requires condition repetitions in order to perform internal cross-validation proce-528 dures, and because BOLD5000 contains a limited number of within-session repetitions, we concatenated 529 data from groups of 5 sessions together before processing using GLMsingle. To account for differences 530 in BOLD signal intensity across different sessions, we performed a global rescaling operation to the 531 data within each session to roughly equate the time-series mean and variance across the 5 sessions 532 comprising one batch of data. Specifically, we first computed the global mean fMRI volume across all 533 5 sessions, and then, for each session, computed a linear fit between the mean volume from a single 534 session and the global mean volume. This yielded a multiplicative scaling factor applied to each session 535 in order to roughly equate signal intensities across sessions.

536
Applying GLMsingle to NSD and BOLD5000 537 We used GLMsingle to estimate single-trial BOLD responses in the NSD and BOLD5000 datasets.

538
For NSD, GLMsingle was applied independently to each scan session. For BOLD5000, groups of 539 5 sessions were processed together, following the rescaling procedure described above. The default 540 GLMsingle parameters were used for processing both NSD and BOLD5000, except that we evaluated 541 up to 12 nuisance regressors in GLMdenoise (default: 10).

542
Four different versions of single-trial GLM betas were computed and saved. The first beta version (b1, incorporating GLMdenoise, and the final beta version (b4, FitHRF + GLMdenoise + RR) arises from 550 Step 5, and reflects the additional use of ridge regression. For comparisons between GLMsingle and