RT Journal Article SR Electronic T1 FAIRly big: A framework for computationally reproducible processing of large-scale data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.10.12.464122 DO 10.1101/2021.10.12.464122 A1 Adina S. Wagner A1 Laura K. Waite A1 Małgorzata Wierzba A1 Felix Hoffstaedter A1 Alexander Q. Waite A1 Benjamin Poldrack A1 Simon B. Eickhoff A1 Michael Hanke YR 2022 UL http://biorxiv.org/content/early/2022/02/01/2021.10.12.464122.abstract AB Large-scale datasets present unique opportunities to perform scientific investigations with un-precedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).Competing Interest StatementThe authors have declared no competing interest.