TemplateFlow: FAIR-sharing of multi-scale, multi-species brain models

Reference anatomies of the brain (‘templates’) and corresponding atlases are the foundation for reporting standardized neuroimaging results. Currently, there is no registry of templates and atlases; therefore, the redistribution of these resources occurs either bundled within existing software or in ad hoc ways such as downloads from institutional sites and general-purpose data repositories. We introduce TemplateFlow as a publicly available framework for human and non-human brain models. The framework combines an open database with software for access, management, and vetting, allowing scientists to share their resources under FAIR—findable, accessible, interoperable, and reusable—principles. TemplateFlow enables multifaceted insights into brains across species, and supports multiverse analyses testing whether results generalize across standard references, scales, and in the long term, species.

Morphological variability manifests not only in differences between brains but also in the way that a brain changes across its lifespan, as it is remodelled by development, aging, and degenerative processes (Courchesne et al., 2000;Good et al., 2001;Sowell et al., 2003). These morphological differences often correspond with the effects of interest in neuroimaging studies and hinder direct spatial comparisons between brain maps (Brett et al., 2002). The substantial variability within and between individual brains necessitates a means of formalizing population-level knowledge about brain anatomy and function. Neuroscientists have answered this need by creating brain atlases as references for understanding and contextualizing morphological variability. Atlases map landmarks, features, and other knowledge about the brain as annotations that are consistent across individual brains.
The development of atlases in neuroscience has accelerated knowledge discovery and dissemination. Early endeavors, epitomized by the groundbreaking work of Brodmann (2006( , originally published in German in 1909 and complemented by Von Economo and Koskinas (2008( , originally published in German in 1925, leveraged careful scrutiny of microanatomy and cytoarchitectonic properties in small numbers of brains. Talairach's assiduous postmortem examination of a single brain (Talairach et al., 1957) remarkably incorporated stereotaxy by defining three spatial reference axes over the brain that allowed anchoring neural landmarks to coordinates.
This first stereotaxic atlas saw wide use with a later, improved version (Talairach and Tournoux, 1988). Stereotaxy also prompted the development of the earliest surgical neuronavigation systems. Schurr and Merrington (1978) developed a stereotaxic apparatus to surgically induce targeted brain lesions on cats. This work informed early sectional atlases of the rodent (Paxinos and Watson, 1997) and macaque (Martin and Bowden, 2000) brains.
On account of its implicit stereotaxy, its capacity to image the entire brain, and its non-invasive acquisition protocols, magnetic resonance imaging (MRI) has revolutionized neuroscience in general and the atlasing endeavor (Evans et al., 2012) in particular. In combination with software instruments' progress to map homologous features between subjects supported by regular grids (Avants et al., 2008) or reconstructed anatomical surfaces (Robinson et al., 2014), MRI has enabled researchers to create population-average maps of a particular image modality and/or particular sample with relative ease. These maps, called "templates", are typically created by averaging features across individuals that are representative of the population of interest to a study (Dickie et al., 2017). As a result, atlasing endeavours have been made contingent on templates, and have shifted away from the search for a single universal neuroanatomical pattern, instead making use of increasingly large samples with the aim of representing a population average of the distribution of morphological patterns (Evans et al., 1993).
Although neurotypical human adults have historically been the most comprehensively templated brains (Evans fsaverage and fsLR are surface templates; the remaining templates are volumetric. Each template is distributed with atlas labels, segmentations, and metadata files. The 25 templates displayed here are only a small fraction of those created as stereotaxic references for the neuroimaging community.  Advancing beyond the volumetric constraints of stereotaxy, researchers of the primate neocortex have also devised standard spaces based on geometric reconstructions of the cortical surface (Fischl et al., 1999). This surface-based approach has the advantage of respecting the intrinsic topology of cortical folds, a development that has led to further improvements in spatial localization (Van Essen et al., 2012;Coalson et al., 2018).
Such resources as atlases and templates, which provide standardized prior knowledge, have become an indispensable component of modern neuroimaging data workflows for two cardinal reasons. First, group inference in neuroimaging studies requires that individuals' features are aligned into a common spatial frame of reference where their location can be called standard (Brett et al., 2002). Second, templates engender a stereotaxic coordinate system in which atlases can be delineated or projected. Associating atlases with template coordinates also facilitates the mapping of prior population-level knowledge about the brain into images of individual subjects' brains (for instance, to sample and average the functional MRI signal indexed by the regions defined in an atlas; Yeo et al. (2011)). Because they are integral to analytic workflows, the most widely-used templates and atlases are typically distributed along with neuroimaging software libraries. Alternatively, researchers distribute their templates through, e.g., the NeuroImaging Tools and Resources Collaboratory [NITRC; RRID:SCR_003430], institutional websites, or data storage systems such as FigShare [RRID:SCR_004328] or Dryad [RRID:SCR_005910]). However, using the default templates of their analyses' software toolbox is the most common practice, as shown by Carp (2012b) in a review of 241 functional MRI studies.
In sum, a number of challenges have derived from management, stewardship, distribution, reuse and reporting of templates that merit attention. In an early perspective, Van Essen (2002) called for connecting templates in an aggregation of databases with "powerful and flexible options for searching, selecting, and visualizing data" and stressed the importance of resource accessibility. TemplateFlow provides a framework that satisfies all of the aforementioned desiderata while following the "Findability, Accessibility, Interoperability, and Reusability (FAIR) Guiding Principles" (Box 1; Wilkinson et al., 2016). Following the FAIR Principles, TemplateFlow effectively decouples standardized spatial data from software libraries while affording processing and analysis workflows (e.g. Esteban et al., 2017, 2019) with the necessary flexibility to select the most appropriate template available. TemplateFlow comprises a cloud-based repository of human and nonhuman imaging templates -the "TemplateFlow Archive", Figure 1-paired with a Python-based library -the "TemplateFlow Client"-for programmatically accessing template resources ( Figure 2). The resource is complemented with a "TemplateFlow Manager" tool to upload new or update existing resources. When adding a new template, the Manager initiates a peer-reviewed contribution pipeline where experts   The TemplateFlow Archive can be accessed at a "low" level with DataLad, or at a "high" level with a Python client. New resources can be added through the Manager Command Line Interface, which initiates a peer-review process before acceptance in the Archive. are invited to curate and vet new proposals. These software components, as well as all template resources, are version-controlled. Therefore, not only does TemplateFlow enable "off-the-shelf " access to templates by humans and machines, it also permits researchers to share their resources with the community. To implement several of the FAIR Principles, the TemplateFlow Archive features a tree-directory structure, metadata files, and data files following an organization inspired by the Brain Imaging Data Structure (BIDS; Gorgolewski et al., 2016). The online documentation hub and the resource browser located at TemplateFlow.org provide further details for users.

BIDS-inspired structure for template and atlas archival following FAIR Principles
BIDS prescribes a file naming scheme comprising a series of key-value pairs (called "entities"). Mirroring BIDS' patterns for participants, each template is associated with an identifier, which is an alphanumeric label. Template identifiers are unique across the Archive, and signified with the key tpl-(e.g., tpl-MNI152Lin ), which is analogous to subof BIDS. Hence, every template and all associated metadata, atlases, etc. are assigned a unique and persistent identifier (Box 1, F1). The adaptation of BIDS to the domain of templates and atlases affords the tool with a robust implementation of the principles I1-3 in Box 1.
For each template, the TemplateFlow database includes reference volumetric template images (e.g., one T 1weighted MRI), a set of atlas labels and voxelwise annotations defined with reference to the template image, and additional files containing the template and atlas metadata. Correspondingly, TemplateFlow allows surface-based resources such as average features, geometry files, annotations, or metadata. The only requirement for feature averages and atlases sharing a unique template identifier is that they must be spatially in register, regardless of the data sampling strategy (i.e., volume, surface, or mixed, and corresponding resolution or mesh density). Although the most widely used templates generally represent MRI-derived features, TemplateFlow is not limited to any specific set

Box 1. The FAIR Guiding Principles
To be Findable: F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata (defined by R1 below) F3. metadata clearly and explicitly include the identifier of the data it describes F4. (meta)data are registered or indexed in a searchable resource  TemplateFlow is designed to maximise the discoverability and accessibility of new templates, minimise redundancies in template creation, and promote standardisation of processing workflows. To enhance visibility of existing templates, TemplateFlow includes a web-based browser indexing all files in the TemplateFlow Archive (templateflow.org/browse/). This table has been automatically generated using the TemplateFlow Client tooling.

Template ID Description
Fischer344 of modalities. Table 1 has been programmatically generated by the accompanying code examples, and enumerates the templates currently distributed with the Archive indicating their corresponding unique identifiers and including a short description. Template resources are described with rich metadata (Box 1, F2 and R1), ensuring that the data usage license is clear and accessible (Box 1, R1.1), data and metadata are associated with detailed provenance (Box 1, R1.2), and data and metadata follow a domain-relevant structure transferred from the neuroimaging community standards of BIDS (R1.3). Figure 3 summarizes the data types and metadata that can be stored in the Archive. Figure 4 provides an overview of the Archive's metadata specification, showing that metadata clearly and explicitly include the identifier of the data they describe (Box 1, F3). Data and metadata are retrievable using several open, free, standard communications protocols without need for authentication (Box 1, A1) by using DataLad (Halchenko et al., 2021). Cloud storage for the Archive is supported by the Open Science Framework (osf.io) and Amazon's Simple Storage Service (S3). Version control, replication, and synchronisation of template resources across filesystems is managed with DataLad. Leveraging DataLad, metadata are stored on GitHub, ensuring accessibility to metadata even when corresponding data are no longer available (Box 1, A2). DataLad is based on Git and Git-Annex, which index all data and metadata (Box 1, F4).

TemplateFlow provides humans and machines with flexible and granular access to templates
TemplateFlow's Python client provides human users and software tools with reliable and programmatic access to the archive. The client can be integrated seamlessly into image processing workflows to handle requests for template resources on the fly. It features an intuitive application programming interface (API) that can query the TemplateFlow Archive for specific files ( Figure 5). The BIDS-inspired organization enables easy integration of tools and infrastructure designed for BIDS (e.g., the Python client uses PyBIDS; Yarkoni et al., 2019). To query TemplateFlow, a user can submit a list of arguments corresponding to the BIDS-like key-value pairs in each entity's file name (see Online Methods).
To integrate template resources into neuroimaging workflows, traditional approaches required deploying an oftentimes voluminous tree of prepackaged data to the filesystem. By contrast, the TemplateFlow client implements lazy loading, which permits the base installation to be extremely lightweight. Instead of distributing neuroimaging data with the installation, TemplateFlow allows the user to dynamically pull from the cloud-based storage only those resources they need, as they need them. After a resource has been requested once, it remains cached in the filesystem for future utilization.
We demonstrate benefits of centralizing templates in general, and the validity of the TemplateFlow framework in particular, via its integration into fMRIPrep (Esteban et al., 2019), a functional MRI preprocessing tool. This integration provides fMRIPrep users with flexibility to spatially normalize their data to any template available in the Archive (see Box 2). This integration has also enabled the development of fMRIPrep adaptations, for instance to pediatric populations or rodent imaging (MacNicol et al., 2021a), using suitable templates from the archive. The uniform interface provided by the BIDS-like directory organisation and metadata enables straightforward integration of new templates into workflows equipped to use TemplateFlow templates. Further examples of tools leveraging TemplateFlow include MRIQC (Esteban et al., 2017) for quality control of MRI; PyNets (Pisner and Hammonds, 2020), a package for ensemble learning of functional and structural connectomes; ASLPrep (Adebimpe et al., 2021), an ASL pre-processing pipeline that makes use of TemplateFlow through sMRIPrep -the spin-off structural pipeline from fMRIPrep;-and NetPlotBrain (Thompson and Fanton, 2021), which uses TemplateFlow to display spatially standardized brain network data.
TemplateFlow eases the dissemination of templates and opens their vetting and maintenance to the community Beyond redistributing the templates most commonly used in the literature, the resource has the vision of becoming a centralized and standard way for researchers to disseminate their new templates. Templateflow's pipeline for submission of new templates integrates peer-review with minimal technical overhead. This review process is proposed as a complement to the traditional assessment of template resources prior to publication, in which reviewers and editors focus on academic merit over accessibility and reusability potential. This submission pipeline is easily initiated with the Python-based TemplateFlow Manager, further described in the Online Methods.
TemplateFlow's management infrastructure also eases maintenance. The lack of infrastructure for templates and atlases has given rise to static development modes where templates are packaged once and rarely revised. As a result, templates and atlases have remained outside version control because it is considered too onerous and requires informatics expertise that may exceed the resources of research teams. Although uncommon, errors in template and atlas resources have nonetheless been reported (e.g., Rohlfing, 2013; Halchenko, 2013), and the common denominator across these instances is template reuse for the discovery of issues. In our experience with OpenfMRI and OpenNeuro (Markiewicz et al., 2021), reuse is a top cause for dataset revision after the first release of data. Because of the flexibility and exposure that TemplateFlow affords templates and atlases will substantially lower barriers to reuse and feedback reporting, the resource will stimulate the early identification of problems and promote the request of new features to improve available assets.
Inspired by the Conda-forge community repository and the Journal of Open Source Software, the GitHubbased "templateflow" organization is a site for dialogue between members of the neuroimaging community and TemplateFlow Archive curators. GitHub issues offer any community member the ability to share their needs with developers and Archive curators, for instance by identifying templates or workflow features for potential inclusion in the project. "Pull requests" provide a means for members of the community to directly contribute code or template resources to the TemplateFlow Archive.

Decoupling software and templates is indispensable for more reliable and more reproducible study designs
Although there is not yet any standard distance function that can objectively determine whether a template choice is phenotypically proximal to the study's sample, selecting an inappropriate template may introduce so-called "template effects" that bias morphometric analyses and produce incorrect results (Yoon et al., 2009). For example, since most generally used templates were created with a sample of adults of European ancestry, a study involving East Asian adults might require a non-default template. More evidently, because of the relative scarcity of nonhuman imaging resources, exposure to template effects is even more pressing in the nonhuman context: e.g., is it appropriate to use a mouse template for the spatial standardization of rat images? Not only are nonhuman templates and atlases fewer, accommodation of such resources in popular software tools is generally limited. For instance, AFNI (Cox and Hyde, 1997) includes a rat template that can be applied in some contexts, while SPM provides functionality only through third-party add-ons (e.g., Sawiak et al., 2009). Second, deviating from software defaults places a knowledge burden on the user. Once the researcher has selected a reference standard space that is suitable for their study population, if their choice is not included by default with the software they plan to use, they must then locate and download the reference template or atlas and integrate it within their analytic pipeline. To our knowledge, only AFNI have made efforts on this direction with their relatively recent @Install_<template_name> command, although the tool (i) is not designed to be compatible with other analysis alternatives, and (ii) only provides a minimal set of TemplateFlow's features (i.e., resource download).
On the other hand, for tools like TemplateFlow that open up a wide range of options for a given study, the consequently increased methodological flexibility may become a point of concern. Carp (2012a) empirically investigated the consequences of methodological flexibility in neuroimaging, demonstrating that decision points in workflows can lead to substantial variability in analysis outcomes. In a contemporaneous paper, Carp (2012b) contextualized these findings vis-à-vis the inflated risk of false positives, underscoring that analytical variability degrades the reproducibility of studies only in combination with (intended or unintended) selective reporting of methods and results. Selective reporting, in this particular application, would mean that a researcher explores the results with reference to several templates or atlases and reports only those that confirm the research's hypotheses. TemplateFlow's design accounts for this risk by equipping researchers with the necessary tooling and metadata (unambiguous resource identifiers, versioning and provenance information, querying utilities, etc.) to minimize their reporting burden and ensure completeness. More recently, Botvinik-Nezer et al. (2020) advocated for another approach to the problem of analytical variability: "multiverse" analyses, wherein many combinations of methodological choices are all thoroughly reported and cross-compared when presenting results. Applied to the particular choice of template and atlas combinations, it would thus be desirable to report neuroimaging results with reference to several standard spaces and determine whether the interpretations hold across those references and atlases. TemplateFlow's interoperability empowers users to incorporate this type of analysis into their research by easily making template or atlas substitutions for cross-comparison. For instance, Box 2 shows how TemplateFlow works with fMRIPrep to automate preprocessing of outputs in multiple standard spaces. This facilitates assessment of the robustness of a result with respect to the template or atlas of choice in accordance with the multiverse approach. Botvinik-Nezer et al. (2020) also promoted pre-registration of studies as a powerful means to fixate methodological choices before the authors have access to intermediate and final results. Here, TemplateFlow emerges as a powerful tool to unambiguously document template and atlas choices at pre-registration time.  (Esteban et al., 2019), we propose TemplateFlow as a timely resource to bridge gaps in the flexibility and reproducibility of neuroimaging studies. These gaps mostly stem from a software-bound distribution mode of templates and atlases that is the de facto standard practice. We argue this practice is oftentimes excessively limiting for researchers (i.e., studying the infant brain or nonhuman brain) if not problematic (i.e., template effects induced by the wrong choice of reference). More worryingly, many scientific communications assume the provenance of templates is completely specified by the choice of software library (e.g., Carp, 2012b). As additional, anecdotal evidence, prompted in part by Carp (2012b), in the Supplementary Materials, we show an exploration over 6,048 papers published in the specialized titles NeuroImage and NeuroImage: Clinical that contained the term "MNI" (for Montreal Neurological Institute; which has been historically established as the standardized space of reference). Our analysis indicates that MNI space can refer to any of a family of templates and is not a unique identifier. As a matter of fact, studies carried out with SPM96 (Friston et al., 2006) and earlier versions report their results in MNI space with reference to the single-subject Colin 27 average template (Holmes et al., 1998). However, beginning with SPM99, the software updated its definition of MNI space to refer to a new average of 152 subjects linearly aligned (Mazziotta et al., 1995). As of SPM12, different modules alternatively use the latter template or a new version created by the MNI in 2009 by means of nonlinear registration (Fonov et al., 2011). By contrast, the MNI template bundled with FSL was developed by Dr. A. Janke in collaboration with MNI researchers (Evans et al., 2012). Although it was generated under the guidance of and using the techniques of the 2006 release of nonlinear MNI templates, this instance is not in fact part of the official portfolio distributed by MNI. Further, FreeSurfer uses a modified version of the "MNI Average Brain (305 MRI) Stereotaxic Registration Model" (a linear MNI template based on 305 subjects; Evans et al., 1993) within some early steps of the recon-all pipeline. AFNI has opted for expanding its support to all instances of the MNI spaces providing some of the functionalities of TemplateFlow. However, utilization and configuration is reserved to expert users, involved tools are not designed for compatibility, and the maintenance burden remains on the shoulders of the package developers. Issues regarding access, unequivocal identification, integration within software workflows, reuse terms, and provenance tracking are only exacerbated when the necessary template or atlas is not a software default.

Discussion
We address such concerns by promoting a resource based on FAIR Principles. Indeed, it is habitual to find resources in institutional deposits such as the MNI website that lack an explicit usage license (an example at the time of writing 1 1 Author OE notified the authors and the issue is being resolved is Frey et al. (2009)). Distribution with general-purpose, manually-curated repositories such as NITRC may be more problematic. Beyond the absence of a shared data format because repositories do not mandate any standard, additional problems arise that hinder or block access and reuse. We have found templates at NITRC flagged as available for freeware noncommercial use as a general category, but the actual terms for reuse are not accessible. More worryingly, authors may post private resources with the appearance of being open. NITRC's interface allows resources to be nominally released under a permissive open license, but before the data can be accessed, resources' authors may require signing a Data Usage Agreement that forbids redistribution 2 .
2 Author OE reported this practice to the NITRC Further details of the advantages of TemplateFlow over NITRC for the particular case of template and atlas resource distribution are provided in the Table S3. Long term sustainability of the resource is ensured by using minimal cost services (all free at the time of writing) and open source tools.
Limitations TemplateFlow affords researchers substantial analytical flexibility in the choice of standard spaces of reference. Such flexibility helps researchers minimize "template effects" -by easily inserting the most adequate template-but also opens opportunities for incomplete reporting of experiments. Using DataLad or the Tem-plateFlow Client, researchers have at their disposal the necessary tooling for precise reporting: unique identifiers, provenance tracking, version tags, and comprehensive metadata. Therefore, the effectiveness of TemplateFlow to mitigate selective reporting is bounded by the user's discretion.
As a research resource, the scope of this manuscript is limited to describing the framework and infrastructure of TemplateFlow, highlighting how neuroscientists can leverage this new data archive and the tooling around it. Therefore, some fundamental issues related to this work must be left for future investigation: (i) the overarching problems of cross-template and cross-atlas consistency (Bohland et al., 2009); (ii) comparative evaluation of methodological alternatives for producing new templates, atlases and related data; (iii) providing neuroimagers with more objective means to determine the most appropriate template and atlas choices that apply to their research, as well as better understanding "template effects"; (iv) the adequacy of original (MRI, nuclear imaging, etc.) and derived (regularly gridded images, surfaces, etc.) modalities for a specific research application; or (v) the study of the validity and reliability of inter-template registration, as well as the evaluation of such a component of the TemplateFlow framework. However, the proposed framework serves as an ideal keystone to investigate some of the issues above. For instance, the Supplementary Materials describe one of the resource's tools to estimate nonlinear spatial mappings between templates that would be powerful in investigating the issue of consistency (i).

Conclusion
We introduce an open framework for the archiving, maintenance and sharing of neuroimaging templates and atlases called TemplateFlow that is implemented under FAIR data sharing principles. We describe the current need for this resource in the domain of neuroimaging, and further discuss the implications of the increased analytical flexibility this tool affords. These two facets of reproducibility -availability (under FAIR guiding principles) of prior knowledge required by the research workflow, and the analytical flexibility such availability affords-are ubiquitous concerns across disciplines. TemplateFlow's approach to addressing both establishes a pattern broadly transferable beyond neuroimaging. We envision TemplateFlow as a core research tool undergirding multiverse analyses -assessing whether neuroimaging results are robust across population-wide spatial references-as well as a stepping stone towards the quest of mapping anatomy and function across species.

Methods
TemplateFlow comprises four cardinal components: (i) a cloud-based archive, (ii) a Python client for programmatically querying the archive, (iii) automated systems for synchronizing and updating archive data, and (iv) inter-template registration workflows. Here, we discuss the details of each component's implementation in turn, as well as the manner in which they interact with one another to form a cohesive whole.

The TemplateFlow Archive.
The archive itself comprises directories of template data in cloud storage. For redundancy, the data are stored on both Google Cloud using the Open Science Framework (OSF) and on Amazon's Simple Storage Service (S3). Prior to storage, all template data must be named and organized in directories conforming to a data structure adapted from the Brain Imaging Data Structure (BIDS) standard (Gorgolewski et al., 2016). The precise implementation of this data structure is a living document and is detailed on the TemplateFlow homepage (http://www.templateflow.org). We detail several critical features here.
The archive is organized hierarchically, and descriptive metadata follow a principle of inheritance: any metadata that apply to a particular level of the archive also apply to all deeper levels (Figure 3). At the top level of the hierarchy are directories corresponding to each archived template. If applicable, within each template directory are directories corresponding to sub-cohort templates. Names of directories and resource files constitute a hierarchically ordered series of key-value pairs terminated by a suffix denoting the datatype. For instance, tpl-MNIPedi-atricAsym_cohort-3_res-high_T1w.nii.gz denotes a T 1 -weighted (T1w) template image file for resolution "high" of cohort "3" in the "MNIPediatricAsym" template (where the definitions of each resolution and cohort are specified in the template metadata file, TemplateFlow Archive). The most common TemplateFlow datatypes are indexed in Table S1; an exhaustive list is available in the most current version of the BIDS standard (https://bids.neuroimaging.io/).
Within each directory, template resources include image data, atlas and template metadata, transform files, licenses, and curation scripts. All image data are stored in gzipped NIfTI-1 format and are conformed to RAS+ orientation (i.e., left-to-right, posterior-to-anterior, inferior-to-superior, with the affine qform and sform matrices corresponding to a cardinal basis scaled to the resolution of the image). Template metadata are stored in a JavaScript Object Notation (JSON) file called template_description.json ; an overview of metadata specifications is provided in Figure 4. In brief, template metadata files contain general template metadata (e.g., authors and curators, references), cohort-specific metadata (e.g., ages of subjects included in each cohort), and resolution-specific metadata (e.g., dimensions of images associated with each resolution). Atlas metadata are often stored in TSV format and specify the region name corresponding to each atlas label. Transform files are stored in HDF5 format and are generated as a diffeomorphic composition of ITK-formatted transforms mapping between each pair of templates.
The archive has a number of client-facing access points to facilitate browsing of resources. Key among these is the archive browser on the TemplateFlow homepage, which indexes all archived resources and provides a means for researchers to take inventory of possible templates to use for their study.

The Python client.
TemplateFlow is distributed with a Python client that can submit queries to the archive and download any resources as they are requested by a user or program. Valid query options correspond approximately to BIDS key-value pairs and datatypes. A compendium of common query arguments is provided in Table S1, and comprehensive documentation is available on the TemplateFlow homepage.
When a query is submitted to the TemplateFlow client, the client begins by identifying any files in the archive that match the query. To do so, it uses PyBIDS (Yarkoni et al., 2019), which exploits the BIDS-like architecture of the TemplateFlow Archive to efficiently scan all directories and filter any matching files. Next, the client assesses whether queried files exist as data in local storage. When a user locally installs TemplateFlow, the local installation initially contains only lightweight pointers to files in OSF cloud storage. These pointers are implemented using DataLad (Halchenko et al., 2021), a data management tool that extends git and git-annex. TemplateFlow uses DataLad principally to synchronize datasets across machines and to perform version control by tracking updates made to a dataset.
If the queried files are not yet synchronized locally (i.e., they exist only as pointers to their counterparts in the cloud), the client instructs DataLad to retrieve them from cloud storage. In the event that DataLad fails or returns an error, the client falls back on redundancy in storage and downloads the file directly from Amazon's S3. When the client is next queried for the same file, it will detect that the file has already been cached in the local filesystem. The use of resource pointers with the client thus enables lazy loading of template resources. Finally, the client confirms that the file has been downloaded successfully. If the client detects a successful download, it returns the result of the query; in the event that it detects a synchronization failure, it displays a warning for each queried file that encountered a failure.
Continued functionality and operability of the client is ensured through an emphasis on maximizing code coverage with unit tests. Updating the client requires successful completion of all unit tests, which are automatically

Data Metadata Scripts
NIfTI images include population-average templates and tissue class segmentations.
Masks are binaryvalued NIfTI images indicating whether each voxel is in a region.
Atlases are NIfTI images that assign anatomical or functional labels to template voxels.
Transformations are HDF5 files containing maps between template coordinate spaces.
Cohort directories contain template resources specific to a sub-cohort of participants.
JSON metadata summarise information about templates, resolutions, and cohorts.

Tabular metadata
contain dictionaries that pair atlas regions with anatomical labels.
A changelog chronicles changes and updates made to template resources.
License files specify usage rights for template resources.
Python scripts are used to prepare template resources. Archive's browser, accessible at TemplateFlow.org, with a single template resource directory expanded. Template data are archived using a BIDS-like directory structure, with top-level directories for each template. Each directory contains image files, annotations, and metadata for that template. Following BIDS specifications, volumetric data are stored in NIfTI-1 format. Further surface-based data types are supported with GIFTI (surfaces) and CIFTI-1 (mixed volumetric-and-surface data).
{ "Authors": [ "Fonov V", "Evans AC", "Botteron K", "Almli CR", "McKinstry RC", "Collins DL" ], "Curators": [ "Esteban O" ], "Identifier": "MNIPediatricAsym", "License": "MIT-derived. See LICENSE file", "Name": "MNI's unbiased standard MRI template for pediatric data from the 4.5 to 18.5y age range", "RRID": "SCR_008796", Publications to reference when using the template, and salient links for template information. Cohort identifiers. Each has a subdirectory in the template data directory, and each has a metadata object nested in the cohort field. 2-tuple array indicating the lower and upper bounds for participant age in the cohort, if the cohorts are stratified by age. Full descriptive name of the cohort.. Units for cohort age bounds.

cohort
Object Top-level field containing all cohort metadata objects.

origin shape zooms
Object Array Array Array Resolution identifiers. Each has a metadata object nested in the res field. The metadata for each resolution apply to all images whose name includes res-<identifier>. The identifier itself does not necessarily correspond to the voxel size. (x, y, z) spatial location of the voxel origin relative to the physical origin in mm. (x, y, z) shape of the image in voxels. (x, y, z) size of each voxel in mm.

res
Object Top-level field containing all resolution metadata objects.

Cohort metadata
Resolution metadata Figure 4. Overview of the metadata specification of the TemplateFlow Archive. TemplateFlow's metadata are formatted as JavaScript Object Notation (JSON) files located within each template set. An example template_description.json metadata file is displayed at the left for MNIPediatricAsym . In addition to general template metadata, datasets can contain cohort-level and resolution-level metadata, which are nested within the main metadata dictionary and apply only to subsets of images in the dataset. After importing the API, the user submits a query for the T 1 -weighted FSL version of the MNI template at 1 mm resolution. The client first filters through the archive, identifies any files that match the query, and finds their counterparts in cloud storage. It then downloads the requested files and returns their paths in the local TemplateFlow installation directory. Future queries for the same resource can be completed without any re-downloading.

Ancillary and managerial systems.
TemplateFlow includes a number of additional systems and programs that serve to automate stages of the archive update process, for instance addition of a new template or revision of current template resources. To facilitate the update and extension process, TemplateFlow uses GitHub actions to automatically synchronize dataset information so that all references remain up-to-date with the current dataset. These actions are triggered whenever a pull request to TemplateFlow is accepted. For example, GitHub actions are used to update the browser of the TemplateFlow Archive so that it displays all template resources as they are uploaded to the archive. Whereas the TemplateFlow Client synchronizes data from cloud storage to the local filesystem, a complementary TemplateFlow Manager handles the automated synchronization of data from the local filesystem to cloud storage. The Python-based manager is also used for template intake, i.e., to propose the addition of new templates to the archive. To propose adding a new template, a user first runs the TemplateFlow Manager using the tfmgr add <template_id> --osf-project <project_id> command.
The manager begins by using the TemplateFlow client to query the archive and verify that the proposed template does not already exist. After verifying that the proposed template is new, the manager synchronizes all specified template resources to OSF cloud storage. It then creates a fork of the tpl-intake branch of the TemplateFlow GitHub repository and generates an intake file in Tom's Obvious Minimal Language (TOML) markup format; this intake file contains a reference to the OSF project where the manager has stored template resources. The TemplateFlow Manager commits the TOML intake file to the fork and pushes to the user's GitHub account. Finally, it retrieves template metadata from template_description.json and uses the metadata to compose a pull request on the tpl-intake branch. This pull request provides a venue for discussion and vetting of the proposed addition of a new template.

Box 2. Integration of TemplateFlow in processing workflows
TemplateFlow maximizes the accessibility and reuse potential of templates and atlases. For example, let's reuse the base configuration file for FSL FEAT we proposed in our paper (Esteban et al., 2019). The design file design.fsf specifies a simple preprocessing workflow with FSL tools. The simplified code listing below shows that, just to make non-default templates available to FSL using the graphical user interface (GUI), at least five steps are necessary: # 1. User determines two nondefault templates they want to spatially normalize into # 2. User manually downloads templates, extracts the required files from packages $ curl -sSL <url> | tar zxv --no-same-owner -C /data/templates/ # 3. User opens FSL's GUI, edits the target template box content pointing to the appropriate files # 4. User generates FSL configuration files to permit batch execution on the command line # 5. For the default and the two nondefault templates, execute FSL's feat: $ feat design_<template>.fsf The outputs of each feat design_<template>.fsf call will follow the pre-specified patterns of FSL, with whatever customization the user has introduced into the design file. The user, therefore, must then adapt the downstream analysis tools to correctly interpret the derived dataset, in each standard space, or reformat the output dataset according to the expectations of the analysis tools. The user is also responsible for all aspects of provenance tracking and adequately reporting them in their communications. Information such as version of the template (or download date), citations to relevant papers, and other metadata (e.g., RRIDs) must be accounted for manually throughout the research process.
In contrast, tools using TemplateFlow dramatically simplify the whole process (note that MNI152NLin2009cAsym and OASIS30Ants are the two templates not found within the FSL distribution, and MNI152NLin6Asym denotes FSL's MNI space (i.e., the default FSL template): $ fmriprep /data /derivatives participant --output-spaces MNI152NLin2009cAsym MNI152NLin6Asym OASIS30Ants fMRIPrep generates the results with BIDS-Derivatives organization for the three templates. The tool also leverages TemplateFlow to generate a boilerplate citation text that includes the full names, versions and references to credit the template's authors for each of the templates involved. fMRIPrep internally stages one spatial normalization workflow for each of the output spaces. Each of these normalization sub-workflows uses a simple line of Python code to retrieve the necessary resources from TemplateFlow using the TemplateFlow Client interface ( Figure 5): >>> from templateflow.api import get >>> tpl_ref_file = get ("MNI152NLin6Asym",desc=None,resolution=1,suffix="T1w",extension="nii.gz") One detail overseen in the FSL example is that, for a robust spatial normalization process, a precise binary mask of the brain is generally used. While FSL would require the user to manually set this mask up in the GUI, in the case of TemplateFlow, it requires a second minimal call: >>> msk_ref_file = get("MNI152NLin6Asym", desc="brain", resolution=1, suffix="mask", extension="nii. To contribute a new template to TemplateFlow, members of the community first organize template resources to conform to the BIDS-like TemplateFlow structure. Next, tfmgr (the TemplateFlow Manager, see Table S2) synchronizes the resources to OSF cloud storage and opens a new pull request proposing the addition of the new template. A subsequent peer-review process ensures that all data are conformant with the TemplateFlow standard. Finally, TemplateFlow curators merge the pull request, thereby adding the template into the archive.
Our text mining results are consistent with a previous investigation by Carp (2012b) in the domain of functional MRI, which similarly illustrated a strong coupling between software library and authors' choices of templates and atlases. In this prior work, Carp (2012b) analysed 241 functional MRI studies, 90.9% of which reported normalizing brain images to a common template. Of those, 79.0% indicated the target space used for spatial normalization. Few studies reported critical parameters such as image modality, and only 50 out of 241 specified the template image, out of which 26.0% used "the MNI152 template", and 26.0% the "SPM library's echo-planar imaging template". Unfortunately, template selection is seen as a default parameter of the software library, which leads to the assumption that the target normalization space is implicitly reported by identifying the software tools of choice. The risks and limitations of this reporting scheme are further underscored by a recent comparison of analytic outcomes across software libraries. In this study, Bowring et al. (2019) implemented analogous image processing pipelines using tools from each of three software suites (AFNI, FSL and SPM) in order to identify challenges to reproducing published studies with openly shared raw data. When discussing the differences among software pipelines, they noted that, "while all packages are purportedly using the same MNI atlas space, an appreciable amount of activation detected by AFNI and FSL fell outside of SPM's analysis mask. " Furthermore the coupling between templates and software likely limits the use of templates other than those defaulted by the software. Custom templates (i.e., those not included as a default option for the software tool) range from population-specific templates to ad hoc templates created by averaging images of the study at hand. In some settings, the use of default templates risks introducing "template effects" that confound the interpretation of results (such as those introduced when an adult template is ised in pediatric imaging studies, Yoon et al., 2009). As the target population moves away from the population used to create a default template -generally, a neurotypical adult population-"template effects" become more concerning and custom templates more necessary. The problem is exacerbated in the case of nonhuman imaging, as the scarcity (or absence) of specific templates available within software packages hinders already challenging translational endeavors. Further, the consistency across templates and atlases is reportedly low (Bohland et al., 2009), and although there has not been any programmatic comparison to understand the extent to which this inconsistency alters the spatial interpretation of results, it is reasonable that templates and atlases introduce a decision point and therefore are sources of some analytical variability. Figure S1. The FSL and SPM software tools associate with dominant topics of sentences including the term "MNI" across the literature. We performed topic modeling with latent Dirichlet allocation (LDA; Blei et al., 2003) on text sentences extracted from 6,048 articles that contained the word "MNI". For each topic identified, the 20 words with the highest loadings on that topic are displayed in a word cloud with larger font size indicating higher loading of the word on the corresponding topic. Word clouds are sorted by descending topic dominance. Ranking and relative dominance are shown above each topic's cloud.. Two top-dominant topics -#3 and #5-are associated with SPM and FSL respectively.

Data entity API query example Description
Template "MNI152Lin" The template dataset to which an image or other data file belongs.

Resolution resolution=1
The image resolution. Each resolution is assigned a key, which is defined in the res field of template_description.json .
Mask desc="brain", suffix="mask" Indicates that the image is a binary-valued annotation, where voxels labelled 1 are part of the mask.
Discrete segmentation desc="malf", suffix="dseg" Indicates that the image is an integer-valued annotation. Each segmentation image file ( .nii.gz format) is paired with a dictionary of segment names ( .tsv format).
Probabilistic segmentation label="CSF", suffix="probseg" Indicates that the image is a probabilistic annotation, wherein the value of each voxel indicates the probability of that voxel belonging to the specified label. Atlas atlas="Schaefer", desc="7Network" The atlas to which a segmentation file belongs.
Transformation from="MNI152Lin", suffix="xfm" File containing a mapping between 2 stereotaxic coordinate spaces. The source space is defined in the from field, while the target space is defined in the tpl field.

Environment variable Specifications
template_id Identifier of the template. This is the value of the tpl field in all file names.
--osf-project OSF_PROJECT The OSF project where the template data are to be stored. The project must be writable by the user account whose credentials are specified in the --osf-user and --osf-password arguments.
--osf-user OSF_USERNAME Account username or identifier for OSF cloud storage.
--osf-overwrite Flag that indicates that the OSF client should force the overwrite of any existing files in the OSF project that have names conflicting with those of new files.
--gh-user GITHUB_USER Account username for GitHub. The user account whose credentials are provided must have a fork of the TemplateFlow repo.
--path OSF_PROJECT Path to a local directory where template resources are located. The path must either be a directory whose name is tpl-<template_id> or contain such a directory.
--nprocs Maximum number of parallel processes to run when uploading to or fetching from OSF. Table S3. TemplateFlow and NITRC are complementary resources with very different scope, goals and implementations. While TemplateFlow is specifically designed for sharing template and atlas resources, the NeuroImaging Tools & Resources Collaboratory (NITRC) "is an award-winning free web-based resource that offers comprehensive information on an ever expanding scope of neuroinformatics software and data" (https://www.nitrc.org/include/about_us.php). Indeed, NITRC provides a fundamental service to the neuroimaging community under open-ended, generalistic purposes, and it has not found a comparable alternative. Indeed, TemplateFlow is a narrowly-scoped tool that resolves a very specific set of issues around template, atlases, their sharing and reuse, and their reporting. Reflecting such a hierarchy of resources and acknowledging the relevance of NITRC, TemplateFlow was registered therein (https://www.nitrc.org/projects/templateflow).