Magnetic resonance imaging datasets with anatomical fiducials for quality control and registration

Tools available for reproducible, quantitative assessment of brain correspondence have been limited. We previously validated the anatomical fiducial (AFID) placement protocol for point-based assessment of image registration with millimetric (mm) accuracy. In this data descriptor, we release curated AFID placements for some of the most commonly used structural magnetic resonance imaging templates and datasets. The release of our accurate placements allows for rapid quality control of image registration, teaching neuroanatomy, and clinical applications such as disease diagnosis and surgical targeting. We release placements on individual subjects from four datasets (n = 132 subjects for a total of 15,232 fiducials) and more than 10 brain templates (4,288 fiducials), compiling over 300 human rater hours of annotation. We also validate human rater accuracy of released placements to be within 1-2 mm (using a total of 50,336 Euclidean distances), consistent with prior studies. Our data is compliant with the Brain Imaging Data Structure (BIDS) allowing for facile incorporation into modern neuroimaging analysis pipelines. Data is accessible on GitHub (https://github.com/afids/afids-data).


32
Tools available for reproducible, quantitative assessment of brain correspondence 33 have been limited. We previously validated the anatomical fiducial (AFID) placement 34 protocol for point-based assessment of image registration with millimetric (mm) 35 accuracy. In this data descriptor, we release curated AFID placements for some of the 36 most commonly used structural magnetic resonance imaging templates and datasets. 37 The release of our accurate placements allows for rapid quality control of image 38 registration, teaching neuroanatomy, and clinical applications such as disease 39 diagnosis and surgical targeting. We release placements on individual subjects from 40 four datasets (n = 132 subjects for a total of 15,232 fiducials) and more than 10 brain 41 templates (4,288 fiducials), compiling over 300 human rater hours of annotation. We 42 also validate human rater accuracy of released placements to be within 1-2 mm (using 43 a total of 50,336 Euclidean distances), consistent with prior studies. Our data is 44 compliant with the Brain Imaging Data Structure (BIDS) allowing for facile incorporation 45 into modern neuroimaging analysis pipelines. Data is accessible on GitHub 46 (https://github.com/afids/afids-data). Open resources available for reproducible, quantitative assessment of brain 56 correspondence have been limited 1 . The most common metrics employed for the 57 purpose of examining the quality of image registration, including the Jaccard similarity 58 and Dice kappa coefficients, compute the voxel overlap between regions of interest 59 (ROIs), which have been shown to be insufficiently sensitive when used in isolation or 60 in combination for validating image registration strategies 1 . The ROIs used in voxel 61 overlap are often larger subcortical structures that are readily visible on MRI scans (i.e., 62 the thalamus, globus pallidus, and striatum), and thus lack the ability to detect subtle 63 misregistration between images which may be crucial to detecting erroneous significant 64 differences and variability [1][2][3][4][5] . 65 66 Inspired by classic stereotactic methods, our group created, curated, and validated a 67 protocol for the placement of anatomical fiducials (AFIDs) on T1 weighted (T1w) 68 structural magnetic resonance imaging (MRI) scans of the human brain 2 . The protocol 69 involves the placement of 32 AFIDs found to have salient features that allow for 70 accurate localization. The AFIDs are described using three-dimensional (x, y, and z) 71 Cartesian coordinates and thus correspondence between points can be computed 72 using Euclidean distances across a variety of applications. After a brief tutorial, AFIDs 73 have been shown to have high reproducibility even when performed by individuals with 74 no prior knowledge of medical images, neuroanatomy, or neuroimaging software. This 75 was shown in separate studies where placements were performed on publicly available 76 templates and datasets 2 and a clinical neuroimaging dataset 3 .

78
The AFIDs protocol provides a metric that is independent of the registration itself while 79 offering sensitivity to registration errors at the scale of millimeters (mm). This margin is 80 crucial in neuroimaging applications (including morphometric analysis and surgical 81 neuromodulation), where a few millimetres may represent the difference between 82 optimal and suboptimal therapy.

84
The aim of this data descriptor is to provide the community with curated AFID 85 placements and their associated MRI images. We release annotations on four datasets 86 (n = 132; 15,152 fiducials) including healthy subjects and patients with neurological 87 disorders, and more than 10 commonly used magnetic resonance imaging templates 88 (4,288 fiducials), compiling more than 300 human rater hours of manual annotation of 89 neuroanatomical structures. Descriptions of datasets and templates are provided in 90 subsequent sections (see Table 1). 91 92 Current Applications: 93 Registration Assessment: We share our curated AFIDs annotations for a wide variety 94 of datasets and templates of varying field strengths. This diversity of datasets will 95 facilitate the testing and validation of image registration algorithms that can be used in 96 many contexts. The user can select the datasets and templates that are in line with 97 their neuroimaging application, then use the curated annotations to assess image 98 registration quantitatively. For instance, AFIDs have been used to evaluate the process 99 of iterative deformable template creation 6,7 , showing that error metrics generated from 100 AFIDs converged differently as a function of template iterations and registration method 101 (i.e., linear vs non-linear). Sharing the AFID placements and their associated images 102 in the Brain Imaging Data Structure (BIDS) format aids in the convenience we strive to 103 provide for the end-user and neuroimaging application developer 2,3,6,7 . 104 105 Education: New raters can compare their AFID placements to the curated normative 106 distribution placements we release here. Our placements have been compiled over the 107 years and can help raters assess accuracy for specific fiducials and subject/template 108 data. To improve user accessibility and navigation of our released AFIDs annotations 109 and framework, we also release the AFIDs validator (https://validator.afids.io). This tool 110 provides: 1) detailed documentation of the AFIDs placement protocol, 2) an interactive 111 way for users to upload placements to a regulated database, and 3) interactive ways 112 to view uploaded placements relative to curated placements, which helps guide user to 113 improve neuroanatomical understanding and placement accuracy 2,3 . 114 115 Brain structure and volumetric analyses: The 32 AFIDs (and associated images) in our 116 pathologic dataset relative to the control can allow for insight on brain morphology and 117 putative biomarkers of neurodegenerative diseases 3 .

119
Prospective Applications: 120 Registration optimization and quality control: The released imaging and AFID 121 placement data may be useful in a few ways for improving neuroimaging pipelines: 1) 122 providing centralized and quality controlled neuroimaging data (from more than 5 123 international neuroimaging datasets) allowing for a more accurate and generalizable 124 head-to-head comparison amongst existing software for image registration, and 2) 125 establishing a new registration metric which can be incorporated into neuroimaging 126 software development workflows to optimize registration algorithm performance and 127 also for quality control.

129
Automatic and accurate landmark placement: Our curated AFIDs can be used as 130 ground truth placements when training machine-learning algorithms to automate brain 131 landmark localization. Among the 32 AFID placements we release are the anterior and 132 posterior commissures (AC and PC, respectively). Downstream applications of 133 automatic localization include automatically computing AC-PC transforms (a common 134 process in neuroimaging studies) and aspects of neurosurgical planning which involve 135 the placement of these anatomical landmarks. The diversity of the released data (both 136 hardware and disease status) will be crucial to the generalizability of such tools.

138
Surgical targeting: We release ultra-high field (7-Tesla; 7-T) MRI data where small 139 structures like the subthalamic nucleus (STN) 8 and zona incerta within the posterior 140 subthalamic area are clearly visible 7 . Ground-truth locations of surgical targets (x, y, 141 and z) can be related to the AFIDs placement locations via predictive models. This 142 approach mitigates the lack of access to best case neuroimaging in clinical settings 143 due to lack of high-field MRI or motion degradation. 144 145 Brain anatomy abstraction and anonymization: AFIDs and the distances between them 146 represent an abstraction of brain anatomy in an anonymized way while still allowing for 147 accurate pooling of data. Other major anatomical landmarks (representing lesions, 148 tumors, or other structures) can be described in reference to the AFIDs "coordinate 149 system" we establish using these curated placements. 150 151

153
Rationale for fiducial selection and placement assessments 154 The current version of the AFIDs protocol involves the placement of 32 anatomical 155 fiducials. These AFIDs were selected to be easily identified on T1w MRI scans across 156 varying field strengths (1.5-T, 3-T, 7-T) and were validated in previous studies 2,3 . During 157 the selection process, regions that were prone to geometric inhomogeneity and 158 distortion were avoided to enhance the accuracy of fiducial placement across 159 applications of the AFIDs protocol 2 . There are 10 fiducials that fall on the midline and 160 11 located laterally on both hemispheres. The AFIDs protocol includes fiducials 161 representing salient neuroanatomical features mostly located in the subcortex.

162
Additional proposed fiducials could be included in future versions of the AFIDs protocol, 163 but would require undergoing a similarly rigorous validation process 2,3 . 164 4 Fiducial localization error (FLE) is a term described by Fitzpatrick and colleagues 9 that 165 represents the distance between a fiducial position from its intended location. This term 166 is used when operating image-guidance systems during neurosurgical procedures. In 167 the context of the AFIDs protocol, and inspired by this extant terminology, we have 168 defined the term anatomical fiducial localization error (AFLE). This value, in millimetres, 169 can be thought of as the error arising from the placement (i.e., localization) of each of 170 the 32 fiducials. When used to communicate the accuracy of all 32 AFIDs together, we 171 term it global AFLE. There are three contexts for applying AFLEs: 1) Mean AFLE: rater 172 localization error relative to the intended location defined as the mean placement of all 173 raters for a specific fiducial (termed ground truth AFID in subsequent sections). 2) Inter-174 Rater AFLE: rater localization error calculated as the pairwise distances between 175 different rater placements. If a single rater applied the AFIDs protocol more than once, 176 then their mean placement coordinates were used for the pairwise distances 177 calculations. 3) Intra-Rater AFLE: rater localization error evaluating the precision of 178 multiple placements by a single rater computed as the average pairwise distance 179 between the same rater's placements.

181
We also adopt the term fiducial registration error (FRE) in the context of the AFIDs 182 protocol and term it the anatomical fiducial registration error (AFRE AFIDs protocol application 214 Before raters performed the AFIDs protocol, they attended a 3DSlicer workshop and 215 placed the AFIDs protocol on a publicly available template as a form of training. For 216 manual rater placements, the AFIDs protocol (https://afids.github.io/afids-protocol/) 217 generally began with the placement of the anterior commissure (AC) and posterior 218 commissure (PC) points (AFID01 and 02 respectively), which are defined to be at the 219 5 center of each commissure. This was then followed by the identification of one or two 220 more midline points (often the pontomesencephalic junction, AFID04, and the Genu of 221 Corpus Callosum, AFID19, are used). After that, an AC-PC transform is performed, and 222 the rest of the anatomical fiducials are placed. Rater placements deviating from a 223 ground truth fiducial by greater than 10 mm were removed and considered outliers, as 224 these errors are likely to be due to mislabelling and not reflective of true localization 225 accuracy. In addition to subsequent sections, Table 1 provides brief descriptions of the 226 released datasets and templates, information about raters, and AFIDs applications.

228
AFIDs-HCP30 dataset 229 Subject demographics and imaging protocol 230 This subset consists of 30 unrelated healthy subjects (age: 21-52; 15 female and 15 231 male) chosen from the Human Connectome Project dataset (HCP). All scans were T1-232 weighted MR volumes with 1 mm voxels acquired on a 3-T scanner 11 . 233 Rater demographics and AFID placements 234 A total of 5 expert raters applied the AFIDs protocol. All raters had applied the AFIDs 235 protocol before and have more than a year of neuroimaging, anatomy, and 3DSlicer 236 experience. Three raters were previously heavily involved in validation studies 2,3 and 237 were assigned 10 random scans such that a total of one application of the AFIDs AFIDs-OASIS30 dataset 243 Subject demographics and imaging protocol 244 This subset consists of 30 subjects (age: 58.0 ± 17.9 years; range: 25-91; 17 female 245 and 23 male) selected from the publicly available Open Access Series of Imaging 246 Studies (OASIS-1) database 12 and imaged at 3-T. The subjects were cognitively intact 247 (Mini-Mental State Examination = 30), and the MRI scans were specifically chosen to 248 be challenging (areas with more complex anatomy and asymmetries) by the senior 249 author. More details on the selected subjects can be found in a previous study 2 . It is 250 important to note that this subset of the OASIS-1 dataset is different from other currently 251 existing subsets (for instance, the one used in the Mindboggle project 13 ).

252
Rater demographics and AFID placements 253 Eight novice raters (11.5 ± 11.2 months imaging experience, 14.2 ± 17.0 months 254 neuroanatomy experience, and 7.0 ± 8.8 months of 3DSlicer experience) and 1 expert 255 rater (neurosurgical resident with 10 years experience in neuroanatomy) applied the 256 AFIDs protocol via 3DSlicer 4.8. Healthineers, Erlangen, Germany). An 8-channel parallel transmit/32-receive channel 285 coil was used. The ethics approval, detailed imaging protocol, and pre-processing steps 286 were documented in a previous study 7 . 287 Rater demographics and AFID placements 288 There were 3 expert and 6 novice raters recruited to apply the AFIDs protocol on the 289 SNSX-32 dataset using 3DSlicer 4.8.1. No rater demographic data was collected, 290 however, the 3 expert raters had more than 12 months of exposure to medical imaging, 291 neuroanatomy, and 3DSlicer and applied the AFIDs protocol in our previous validation 292 study 2 . The 6 novice raters had prior exposure to medical imaging, neuroanatomy, and 293 3DSlicer but have never applied the AFIDs protocol before training. The raters were 294 split into 3 equal groups with one expert rater placed in each. Each group was randomly 295 assigned a subset of the 32 subjects (two out of three rater groups had 11 subjects to 296 annotate). Each rater within the group placed the AFIDs protocol on all subjects 297 allocated to their group. Thus, the AFIDs protocol was performed a total of 3 times on 298 all 32 subject scans (3,072 fiducials), with each rater annotating either 10 or 11 different 299 subjects once depending on their group assignment. Dataset can be found on: The Agile12v2016 is an ultra-high field template created locally at our institution 309 (CFMM). It consists of 12 healthy control subjects (6 female; age: 27.6 ± 4.4 years). 310 Scans were on a 7-T scanner (Agilent, Santa Clara, California, USA/Siemens, 311 Erlangen, Germany) via a 24-channel transmit-receive head coil array 15,16 . 312 The Colin27 is a template created from one subject scanned 27 times on a Phillips 1.5-313 T MR unit 17 . 314 Rater demographics and AFID placements 315 The same 8 novice raters that annotated the AFIDs-OASIS30 subset also annotated 316 all of the templates mentioned above 4 times. Each rater performed the AFIDs protocol 317 a total of 12 times for a total of 96 protocols (1,024 fiducials). Since raters annotated 318 the same template more than once, there was an intra-rater metric calculated for these 319 three templates (contrary to the datasets). Annotations were performed via 3DSlicer 320 4.8.1. 323 Template details and imaging protocol 324

BigBrainSym & MNI2009Sym & PD-25 templates
BigBrain is an ultra-high resolution histological 3D model of the brain created using a 325 large-scale microtome to cut a complete paraffin-embedded brain (65-year-old male) 326 coronally at 20-mm thickness 18 . The BigBrainSym template refers to the BigBrain 327 registered to MNI2009bSym space, defined in previous studies 2,6 . The MNI2009Sym is 328 a symmetric version of the MNI2009Asym 14 . 329 7 The PD-25 template is a multi-contrast MNI template of a PD cohort with 3-T field 330 strength 19 . We used the PD25-T1MPRAGE for the AFIDs placements. 331 Rater demographics and AFID placements 332 A total of 2 expert raters (more than one year of experience in neuroimaging, anatomy, 333 and 3DSlicer and have been involved in prior validation studies 1,2 ) were involved in 334 placements. Each rater annotated both templates once (192 fiducials) via 3DSlicer 335 4.8.1.

337
TemplateFlow templates 338 Template details and imaging protocol 339 All adult human structural MRI templates that could be found on TemplateFlow at the 340 time of manuscript preparation were annotated (n=8) 20 . 341 Rater demographics and AFID placements 342 Three novice raters (no prior neuroimaging, anatomy, and 3DSlicer experience) and 1 343 expert rater (lead author; more than 10 years of experience in medical imaging, 344 anatomy, and 3DSlicer) annotated a total of 8 templates (see Table 1). Each rater 345 annotated the 8 templates once (1,024 fiducials) via 3DSlicer 4.10. 346 347

348
All placements for a given scan and fiducial were averaged to achieve the ground truth 349 fiducial placement per participant or template as shown in Figure 2a. For datasets, 350 ground truth fiducial placements were computed for each subject in a dataset as shown 351 in Figure 2b. 352 To compute the mean AFLE, Euclidean distances from the ground truth fiducial location 353 to each of the individual rater placements were averaged for each fiducial. The result 354 is termed the subject or template mean AFLE per fiducial. This process was 355 independently repeated for all subjects. All subject mean AFLEs were averaged to 356 achieve a dataset mean AFLE per fiducial as shown in Figure 3a. Finally, the dataset 357 mean AFLE per fiducial was averaged across all fiducials to produce the global dataset 358 mean AFLE. In a similar fashion, global inter-rater AFLE was computed for one subject 359 across fiducials and then averaged across all subjects to produce a global dataset inter-360 Rater AFLE shown in Figure 3b. 361 362

363
In total, we release the curated AFID placements and associated imaging of 4 datasets 364 and 14 openly available human brain templates (total of 19,520 manually placed 365 anatomical landmarks -more than 300 human rater annotation hours). When 366 available, individual rater placements were released, otherwise, the rater's mean 367 (ground-truth) placements were made available. Usage Notes 390 We recommend loading the shared AFIDs annotation files (*.fcsv) in 3DSlicer alongside 391 their associated images all of which are in BIDS format for ease of navigating. The local 392 neuroimaging datasets we release here (namely, the LHSCPD and SNSX) will be 393 quality controlled and expanded as more patients are recruited. Additionally, quality 394 and version control of the AFIDs framework will be introduced as more collaborations 395 and initiatives begin incorporating it into their workflows and releases. New templates 396 and brain images can be added to future versions of the data descriptor once they have 397 met standards for validation set by prior related studies 2,3 . 398 399  Competing interests 435 The authors report no potential competing interests with work published. 436 437  show the process of computing the intended AFID placement on a neuroimaging template or dataset respectively. It is the mean of the rater point cloud at each AFID, referred to as "ground-truth" in the text. . If calculating for a template, would = be 1.

Code Availability
. In (a) Euclidean distances (shown in pink) represent = ℎ / distance from rater placement to the ground-truth (red). The mean AFLE was calculated by dividing the sum all Euclidean distances across all subjects (shown by the sigma notation) with the total number of Euclidean distances in the dataset ( ) for each AFID. In (b) Euclidean distances (shown in pink) × represent the pairwise distances between all rater placements on a scan. Inter-Rater AFLE was calculated by dividing the sum of the pairwise distances (shown by the sigma notation) by the total number of rater pairwise distances across a dataset per AFID ( ). × ×