Abstract
When fields lack consensus standards and ground truths for their analytic methods, reproducibility can be more of an ideal than a reality. Such has been the case for functional neuroimaging, where there exists a sprawling space of tools for constructing processing pipelines and drawing interpretations. We provide a critical evaluation of the impact of differences across five independently developed minimal preprocessing pipelines for functional MRI. We show that even when handling identical data, inter-pipeline agreement was only moderate. Critically, this highlights a dependence of downstream analyses on the chosen processing pipeline, and sheds light on a potentially driving factor in prior reports of limited reproducibility across studies. Using a densely sampled test-retest dataset, we show that limitations imposed by inter-pipeline agreement mainly become appreciable when the reliability of the underlying data is high, which is increasingly the case as the field progresses into an era of unprecedented data quality and abundance. We highlight the importance of comparing analytic configurations, as both widely discussed (e.g., global signal regression) and commonly overlooked (e.g., MNI template version) decisions were found to lead to marked variation. We provide recommendations for incorporating tool-based variability in functional neuroimaging analyses and a supporting infrastructure.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Main text updated; Supplemental files updated; Table 1 revised.