Continuous Evaluation of Denoising Strategies in Resting-State fMRI Connectivity Using fMRIPrep and Nilearn

Reducing contributions from non-neuronal sources is a crucial step in functional magnetic resonance imaging (fMRI) connectivity analyses. Many viable strategies for denoising fMRI are used in the literature, and practitioners rely on denoising benchmarks for guidance in the selection of an appropriate choice for their study. However, fMRI denoising software is an ever-evolving field, and the benchmarks can quickly become obsolete as the techniques or implementations change. In this work, we present a denoising benchmark featuring a range of denoising strategies, datasets and evaluation metrics for connectivity analyses, based on the popular fMRIprep software. The benchmark is implemented in a fully reproducible framework, where the provided research objects enable readers to reproduce or modify core computations, as well as the figures of the article using the Jupyter Book project and the Neurolibre reproducible preprint server (https://neurolibre.org/). We demonstrate how such a reproducible benchmark can be used for continuous evaluation of research software, by comparing two versions of the fMRIprep software package. The majority of benchmark results were consistent with prior literature. Scrubbing, a technique which excludes time points with excessive motion, combined with global signal regression, is generally effective at noise removal. Scrubbing however disrupts the continuous sampling of brain images and is incompatible with some statistical analyses, e.g. auto-regressive modeling. In this case, a simple strategy using motion parameters, average activity in select brain compartments, and global signal regression should be preferred. Importantly, we found that certain denoising strategies behave inconsistently across datasets and/or versions of fMRIPrep, or had a different behavior than in previously published benchmarks. This work will hopefully provide useful guidelines for the fMRIprep users community, and highlight the importance of continuous evaluation of research methods. Our reproducible benchmark infrastructure will facilitate such continuous evaluation in the future, and may also be applied broadly to different tools or even research fields.


Supplementary Materials Annex A. Common families of confound regressors
Mitigating the variance introduced by confounding fluctuations is necessary to gain meaningful measures of brain connectivity (Power et al., 2015). The most common method to minimize the impact of confounds is the use of linear regression (Friston et al., 1994) to regress out nuisance signals. These nuisance signals fall into one of three general categories commonly acknowledged in the literature: head motion, physiological noise (cardiac and respiratory), and instrumental noises from the MRI. The most common confound regressors are extracted following basic processing steps (i.e., motion correction, field unwarping, normalization, bias field correction, and brain extraction): • Motion realignment measures capture head motion, a well-known source of disturbances in fMRI signals (Friston et al., 1996;Hajnal et al., 1994) which causes distance-dependent signal correlations and introduces systematic bias in group comparisons (Power et al., 2012;Satterthwaite et al., 2012;Van Dijk et al., 2012). Six rigid-body motion parameters (3 translations and 3 rotations) are typically estimated relative to a reference image and used as confound regressors (Friston et al., 1996). • Non-grey matter tissue signals (such as white matter and cerebrospinal fluid) are unlikely to reflect neuronal activity and can be dominated by a mixture of motion and physiological artifacts (Fox et al., 2005). This type of signal is captured by averaging signals within anatomically-derived masks. • The global signal is a confound regressor extracted from averaging signals within the full brain volume (Fox et al., 2005). The global signal clearly captures motion and physiological fluctuations, but is also sensitive to global neural activity, making it a controversial choice for regression (Power et al., 2017;Saad et al., 2012). • Scrubbing (Power et al., 2012) is a volume censoring approach to remove high motion segments in which the framewise displacement (see Materials and Methods section Participant exclusion based on motion) exceeds some threshold. The scrubbing approach is applied alongside head motion parameters and tissue signal regressors. • Temporal high-pass filtering accounts for low-frequency signal drifts introduced by physiological and scanner noise sources.
The family of motion, non-grey matter and global signal regressors can be further expanded using their first temporal derivatives and their quadratic (square) terms (Satterthwaite et al., 2013) to capture potential non-linear effects of these noise sources. Optimal denoising results often require full expansion of head motion parameters (both derivatives and squares).
Aside from regressors directly modeling noise derived from realignment measures or anatomical properties, other approaches capture the impact of motion and non-neuronal physiological activity through data-driven methods: . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 5, 2023. However, a denoising strategy can perform differently due to factors that strongly correlate with motion, such as psychiatric conditions, age, or the choice of subsequent analytical technique. For example, CompCor may only be maximally effective in low-motion data . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 5, 2023. ;https://doi.org/10.1101/2023.04.18.537240 doi: bioRxiv preprint (Behzadi et al., 2007. Volume-censoring-based strategies are unsuitable for time series analysis where a uniform sampling of signals is required, such as most time-frequency analysis implementations. Researchers thus need to evaluate the best denoising strategy based on the available benchmarks, the profile of their data and their choice of analytical techniques.
The existing benchmarks have a few pitfalls that limit the integration to each researcher's unique needs. The benchmark research and denoising methods development are conducted on in-house preprocessing solutions with different datasets. From the research standpoint, this is not necessarily a pitfall as it showcases that workflows built upon different software following the general shared principle for preprocessing have converging conclusions. However, this is a problem for user adoption of the recommended strategy to their choice of preprocessing workflow. To correctly implement the best approach for their study, researchers need to understand the extensive literature and then construct the workflow. Another pitfall is that results in benchmarks are subject to the scope of datasets evaluated. As researchers use in-house tools and analysis code, there is no direct way to apply the suggested strategy or generate the benchmark statistics for evaluation on a new dataset. The denoising benchmark literature provides a good overview of the progress of methods in the field of resting state functional connectivity, but falls short in providing a reproducible path for the general community to adapt the results to modern solutions of preprocessing, such as fMRIPrep.
. CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 5, 2023. ; https://doi.org/10. 1101/2023