PT - JOURNAL ARTICLE AU - Johnson, Erik C. AU - Wilt, Miller AU - Rodriguez, Luis M. AU - Norman-Tenazas, Raphael AU - Rivera, Corban AU - Drenkow, Nathan AU - Kleissas, Dean AU - LaGrow, Theodore J. AU - Cowley, Hannah AU - Downs, Joseph AU - Matelsky, Jordan AU - Hughes, Marisa AU - Reilly, Elizabeth AU - Wester, Brock AU - Dyer, Eva AU - Kording, Konrad AU - Gray-Roncal, William TI - Toward A Reproducible, Scalable Framework for Processing Large Neuroimaging Datasets AID - 10.1101/615161 DP - 2019 Jan 01 TA - bioRxiv PG - 615161 4099 - http://biorxiv.org/content/early/2019/04/22/615161.short 4100 - http://biorxiv.org/content/early/2019/04/22/615161.full AB - Emerging neuroimaging datasets (collected through modalities such as Electron Microscopy, Calcium Imaging, or X-ray Microtomography) describe the location and properties of neurons and their connections at unprecedented scale, promising new ways of understanding the brain. These modern imaging techniques used to interrogate the brain can quickly accumulate gigabytes to petabytes of structural brain imaging data. Unfortunately, many neuroscience laboratories lack the computational expertise or resources to work with datasets of this size: computer vision tools are often not portable or scalable, and there is considerable difficulty in reproducing results or extending methods. We developed an ecosystem of neuroimaging data analysis pipelines that utilize open source algorithms to create standardized modules and end-to-end optimized approaches. As exemplars we apply our tools to estimate synapse-level connectomes from electron microscopy data and cell distributions from X-ray microtomography data. To facilitate scientific discovery, we propose a generalized processing framework, that connects and extends existing open-source projects to provide large-scale data storage, reproducible algorithms, and workflow execution engines. Our accessible methods and pipelines demonstrate that approaches across multiple neuroimaging experiments can be standardized and applied to diverse datasets. The techniques developed are demonstrated on neuroimaging datasets, but may be applied to similar problems in other domains.