Adjusting for Variable Brain Coverage in Voxel-Based fMRI Meta-Analysis

Meta-analyses of fMRI studies are vital to establish consistent findings across the literature. However, fMRI data are susceptible to signal dropout (i.e. incomplete brain coverage), which varies across studies and brain regions. In other words, for some brain regions, only a variable subset of the studies included in an fMRI meta-analysis have data present. These missing data can mean activations in fMRI meta-analysis are underestimated (type II errors). Here we present SPM (MATLAB) code to run a novel method of adjusting random-effects models for meta-analytic averaging of a group of studies and mixed-effects models for comparison between two groups of studies. In two separate datasets, meta-analytic effect sizes and z-scores were larger in the adjusted, compared to the unadjusted analysis. Relevantly, these changes were in regions such as the ventromedial prefrontal cortex where coverage was lowest. Limitations of the method, including issues of how to threshold the adjusted maps are discussed. Code and demonstration data for the adjusted method are available at https://doi.org/10.25377/sussex.c.4223411.


Introduction
Echo-planar images for functional magnetic resonance imaging (fMRI) are susceptible to signal dropout (Ojemann et al., 1997) leaving gaps in activation maps. The level of coverage can vary widely between individuals, scanners, and scan protocols. This presents a problem of false negatives (type II errors -wrongly concluding no effect exists) for both individual studies and for map-based fMRI meta-analyses.
The problem of missing data from lack of coverage is limited to a subset of regions, including ventromedial prefrontal cortex (vmPFC) and temporal lobes, due to factors such as nearby air and bone. False negatives are therefore localised to these regions and not uniformly distributed throughout the brain. Research on topics and tasks which rely on these regions, for example value-based decision-making in vmPFC (Levy and Glimcher, 2012), will be disproportionately affected by issues of coverage. In addition, if meta-analyses test for convergence, incomplete brain coverages may lead to incorrect p-values, because the test assumes that "false foci" are uniformly distributed across the brain (Albajes-Eizagirre and . While techniques have been developed to maximise coverage (Weiskopf et al., 2007), they are not uniformly successful or universally applied. In this report, we discuss an approach to reducing these type II errors in map-based fMRI meta-analyses.
Using meta-analysis techniques on neuroimaging data is vital to establish consistent neural correlates across studies (Müller et al., 2018;Wager et al., 2007Wager et al., , 2009. Several tools are available. One technique for meta-analysis is Anisotropic Effect Size Signed Differential Mapping software (AES-SDM, Radua et al., 2014) which combines coordinate-based metaanalysis with unthresholded maps  to reduce assumptions of the spatial extent of activations. Dropout in where signal is present can mean activations in fMRI meta-analysis are missed or underestimated. This will be, at least in part, due to voxels where no effect size was measured, being attributed the same variance estimates as voxels where effect sizes were measured. Here we present code which runs a novel method of adjusting both random and mixed-effects models, for meta-analytic averaging across a single group or comparison between two groups of studies respectively. The code adjusts each type of variance (withinstudy & between-study) in the models used in AES-SDM, which are usually assigned to every voxel, so only voxels where data was recorded are included.

Data selection
This technique to account for variable coverage was developed as part of an fMRI meta-analysis on prosocial behaviour so a detailed description of selection methods is provided elsewhere (Cutler and Campbell-Meiklejohn, 2019). Briefly, meta-analyses were calculated for each of two groups of decision type, "altruistic" and "strategic", using randomeffects models and these groups were compared with a mixed-effects model. The altruistic group contained 18 maps and 3 coordinates sets (n = 21, 557 participants) while the strategic group had 10 maps and 5 coordinates sets (n = 15, 593 participants). Due to different control conditions across studies, the adjusted analysis was only run on studies which contrast altruistic (n = 12) or strategic (n = 12) with selfish decisions.
To establish the wider relevance of the method, we conducted a second meta-analysis using data which researchers have made available through NeuroVault (Gorgolewski et al., 2015). It is vital to stress that this is in no way a comprehensive meta-analysis of any tasks and it is unlikely that a genuine meta-analysis would group these maps together. These maps were simply used as their CC0 license enables sharing as a demonstration set with the code (available at https://doi.org/10.25377/sussex.c.4223411).
Searches on NeuroVault were conducted for "choice" and "deci*" (for decision, decide etc.). Maps were selected if they had data from any decision task in the scanner with a contrast to no decision or a decision which varied on a parameter, for example complexity. This crude selection technique resulted in 18 maps (see Table 1).
Coverage was investigated by binarising each map, after registration to a common template, based on whether there was signal in each voxel and summing these images to create coverage maps ( Figure 2). Both the dataset on prosocial decisions and the NeuroVault dataset on decisions showed decreased coverage around the periphery, particularly in vmPFC. Korn 2018 Heuristic and optimal policy computations in the human brain during sequential decisionmaking

Combined image and coordinate meta-analysis
Using maps in fMRI meta-analysis has a number of benefits including enhanced sensitivity and detection of consistent but subthreshold effects. When maps are unavailable, AES-SDM recreates estimated maps from coordinates and their effect sizes using an anisotropic kernel. If obtainable, peaks can be entered in both directions of the contrast.
Statistics other than T are transformed before all maps, including those recreated from coordinates, are aligned to a common template. The software then implements a permutation-based analysis. The recommendation is 50 permutations which creates 50 randomisations with the same number of foci as the map of interest. These preprocessing steps result in recreated NIfTI maps of effect sizes and within-study variance for each study. These maps are used in the both the original, unadjusted method by the software and the adjusted technique described here. Adjusted analysis uses custom scripts in SPM12 (Statistical Parametric Mapping, http://www.fil.ion. ucl.ac.uk/spm) which are available under an MIT license from https://doi.org/10.25377/sussex.c.4223411 and github.com/jocutler/adjusting-dropout-fMRI-meta.

Unadjusted model
One widespread use of meta-analysis is to calculate mean effect sizes across studies. A common method, and the method used in AES-SDM, is a random-effects model. In the model, AES-SDM weights each study by the inverse of the total (within-study and betweenstudy) variance. The between-study variance, τ 2 , is obtained by the DerSimonian-Laird estimator (Dersimonian and Laird, 1986) as: Where wi is a weighting calculated as the inverse of the ith study's within-study variance, k is the number of studies and Qw is calculated as: Where yi is the ith study effect size estimate, wi is a weighting calculated as the inverse of the ith study's within-study variance and ȳw is the weighted estimate of the overall effect size calculated as: In simplified terms, used in the code: With Q calculated as above, degrees of freedom (DoF) the number of studies -1 and: Where FE weightings are the inverse of the studies' within-study variance. These are referred to as fixed-effects (FE) weightings as they are the ones used for a fixed-effects model, which just takes into account within-study (not between-study) variance.
In practice, these equations demonstrate that the between-study variance (τ 2 ) depends on the sum of the weightings (wi) which are the inverse of the within-study variance.

Adjusting random-effects models
In fMRI meta-analysis, the within-study variance is a single number which is applied as the variance across every voxel in the brain within the mask of interest. The effect size for that voxel is the transformed effect size created during preprocessing. However, if signal dropout has occurred, the effect size is zero. This means that voxels with no recorded signal are attributed variance but no effect size. When the FE weightings and effect sizes are each summed during calculation of the between-study variance, these voxels are contributing to the total variance without contributing an effect size. This could underestimate meta-analytic effect size due to inflated variance.
To account for this, calculations for the meta-analysis can be adjusted so only studies where data was recorded contribute weightings to the calculation of τ 2 . This can be done either at the single-voxel level with a spreadsheet (Figure 1) or across the whole brain, voxelby-voxel, by masking variance maps with their coverage. The DoF value is also adjusted to be the number of studies with data -1.
It is important to note that maps recreated from coordinates should not be adjusted unless the coverage is known. If the coverage is unknown, effect sizes with values of zero do not necessarily imply lack of signal and could meaningfully demonstrate the voxel is too far from any peaks to be attributed effect size. Of course, the maps which these coordinates were generated from could also suffer from signal dropout but this cannot be confirmed. This is another reason to obtain maps wherever possible.
If the coverage is known, for example if the paper states the cerebellum was not analysed, a coverage map reflecting this could be created and entered into the analysis as the mask for that study. This was not done for any of the studies in the current analyses.
Once τ 2 has been calculated as a single number for the between-study variance, it is added to the within-study variance for each study to provide total variance. The inverse of this total variance provides the random-effects (RE) weightings for each study, by which the effect-size estimates are multiplied.
The issue of variable coverage affects results again at this stage as the overall metaanalytic effect size (Hedges' g) is calculated by the sum of the weighted effect sizes divided by the RE weightings summed: The meta-analytic variance map, used to calculate standard error and z-scores, is calculated as the inverse of the RE weightings summed: Again, voxels with no effect-size estimate due to missing data will contribute zero to the effect sizes but increase the sum of the RE weightings in both of these calculations. A greater value of summed weightings leads to underestimation of g and overestimation of variance.
The same principle can be applied here as with the FE weightings, where studies are only included in the weightings sum if they have an effect size present.

Figure 1.
Demonstration of an adjusted random-effects model on a single voxel for five studies. Shaded rows (studies 1, 3 & 4) are those which are still taken into account in the adjusted analysis. "All" refers to the unadjusted analysis with all studies. Study 1 is from coordinates so this is included despite having an effect-size estimate of zero as this could be meaningful (see section 2.3.2 for details and exception to this rule). Studies 3 and 4 are maps with non-zero effect-size estimates in the voxel of interest, so are included. Studies 2 and 5 are maps with zero effect-size estimates, suggesting missing data due to signal dropout, so these are not included in the adjusted analysis. The fixed-effects section is purely used to calculate the between-study variance (τ 2 : tau-sq) with the calculations for this shown in the box "calculating betweenstudy variance". Although τ 2 increases from 0.20 to 0.29 in the adjusted analysis, likely linked to having less studies, the meta-analytic effect size (g) increases substantially from 0.36 to 0.61 as the sum of the RE weightings (which the sum of the weighted effect sizes is divided by) decreases from 19.20 to 8.59. An interactive spreadsheet in this format can be downloaded from https://doi.org/10.25377/sussex.c.4223411 to run this analysis on any voxel for any set of studies.

Unadjusted model
In addition to calculating the mean effect size for a group of studies, meta-analysis can calculate the difference between two groups using a mixed-effects model. The calculations and method of adjusting are similar to random-effects models, except the DoF equals the total number of studies across groups -2, the calculation of Q is: And the calculation of C is: Where Q and C for each group separately are calculated as above. Groups are referred to as 0 and 1 to match dummy coding in AES-SDM.
Calculating τ 2 follows the same process as above to produce a single number for the between-study variance across all the studies in both groups. This is added to the within-study variance and the inverse of this total variance provides the study's mixed-effects (ME) weighting.
The meta-analytic effect size (Hedges' g) is then calculated with the formula shown above for each group of studies separately -the sum of ME weighted effect sizes divided by the sum of ME weightings for that group. The meta-analytic variance is also calculated as above for each group of studies separately: the inverse of the ME weightings sum for that group.
The Hedges' g effect-size map for the difference between groups is the calculated by subtracting the two separate effect-size maps: The meta-analytic variance for the difference between groups is calculated by summing the two separate variance estimates:

Adjusted model
As with random-effects models, voxels with zero effect sizes due to dropout are attributed within-study variance and so increase the sums of FE and ME weightings. This is likely to underestimate average effect sizes. The same adjustments can avoid this problem in a mixed-effects model by excluding weightings of voxels where no effect size is present, unless the map was recreated from peaks with unknown coverage. This adjustment can again be done for a single voxel or across the whole brain. The DoF becomes the total number of studies, across both groups, which contribute weightings -2.

Z-maps and thresholding
Once the effect-size and variance maps have been adjusted, maps of standard error and z-scores can be produced. As the input came from permutations in AES-SDM, z-scores are "SDM-Z" because they do not follow a normal distribution.
Standard error (SE) is the square root of the meta-analytic variance, either for a single group or the difference between groups (shown): SDM-Z is the effect size (Hedges' g) divided by the standard error, either for a single group or the difference between groups (shown): Thresholding in AES-SDM uses a voxel-level threshold of p<0.005 which approximates p<0.05 corrected and balances specificity and sensitivity . However, in SDM-Z maps from the adjusted method, voxels have differing DoF meaning thresholding is not straightforward.
In the prosocial decisions meta-analysis (Cutler and Campbell-Meiklejohn, 2019), maps were thresholded with SDM-Z > 2.3. This was chosen as a common value for thresholding, close to the average of the critical SDM-Z values generated in the original, unadjusted analyses (with all studies) and AES-SDM analyses run with the 50% of maps with the best coverage. Here, we apply the same threshold to the NeuroVault data. We recognise this is not a perfect solution but it provides continuity and this analysis is not meaningful, regardless of thresholding method, other than for demonstrating the impact of the adjustment.

Results
In both random and mixed-effects analyses, in both datasets, the adjusted method increased effect sizes across the lower vmPFC where coverage was worst (Figure 2). Activations based on SDM-Z > 2.3 uncorrected were larger in the adjusted than the unadjusted analysis.

Prosocial decisions
In the prosocial decisions data, effect sizes and the size of SDM-Z > 3 activations increased in the adjusted analysis for altruistic vs. selfish (Figure 2b iii) and altruistic vs. strategic (Figure 2b iv). For altruistic > strategic in posterior vmPFC and strategic > altruistic in anterior vmPFC, some activations were shown only in the adjusted analysis. This dissociation fits with findings of a posterior to anterior vmPFC axis differentiating altruistic from strategic decisions (Cutler and Campbell-Meiklejohn, 2019).

Decision-making (NeuroVault data)
Results from the second dataset from NeuroVault on decision-making further support the use of the adjusted method to account for coverage. Again, it is vital to stress that this analysis purely provides a second application of the adjustments for signal dropout and results are not a comprehensive meta-analysis or a meaningful group.  in the adjusted analysis accounting for vmPFC signal dropout, compared to the unadjusted analysis with all studies. Decisions (data from NeuroVault; n = 18) (i) average (random-effects) and (ii) comparison (mixed-effects). Prosocial decisions (iii) altruistic average (n = 12; randomeffects) (iv) comparison between altruistic and strategic decisions (n = 24; mixed-effects).

Discussion
This paper presents a novel method of adjusting voxel-based fMRI meta-analysis technique AES-SDM (Radua et al., 2014) to account for variable brain coverage between studies. Increased effect sizes and larger regions with substantial z-scores were found in the adjusted, compared to the unadjusted analysis, using two datasets.
Our results suggest that regions that suffer from signal dropout, such as vmPFC, can show false negatives if coverage is not accounted for. The role of these regions may be underestimated or overlooked, preventing a complete understanding of their function. In meta-analyses, lack of coverage in some studies may obscure an activation despite high consistency in the studies with data in that region.
The uneven spatial distribution of signal dropout may also increase false positive results in the rest of the brain. Specifically, the test for convergence used in current coordinate-based meta-analyses assumes a uniform distribution of false positive study peaks (Albajes-Eizagirre and , but this assumptions is incorrect if some brain regions have no data in some studies. This additional problem should not happen in the upcoming version of SDM, which no longer conducts tests for convergence. To overcome these issues, our method adjusts random and mixed-effects metaanalysis models to only include variance, at each calculation stage, from voxels with studylevel effect-size data. This means the number of contributing studies ranges between one and the total number of studies. The voxel degrees of freedom will be the number of contributing studies -1 (random-effects model) or -2 (mixed-effects model). For some voxels with only a few studies contributing, single studies could dominate and prevent the benefits of metaanalysis from being realised. In comparisons between two groups, differences between the sizes of each group could be driven by coverage.
Perhaps the biggest challenge from varying degrees of freedom across voxels is thresholding the adjusted meta-analytic map. In the prosocial decisions meta-analysis, SDM-Z > 2.3 was applied to the adjusted and unadjusted analysis as a comparison (Cutler and Campbell-Meiklejohn, 2019). Although this is liberal for z-scores which follow a normal distribution, SDM-Z scores do not follow a normal distribution  2.3 was close to the threshold SDM-Z score generated in the unadjusted analysis with all studies and the half with the best coverage. Here, we also threshold the second meta-analysis with SDM-Z > 2.3 for continuity. This is not a perfect solution and this adjusted method is perhaps best used alongside unadjusted analyses thresholded in a more appropriate way. The usefulness of a simple SDM-Z threshold may be limited to visual comparisons between unadjusted and adjusted maps. Comparing adjusted and unadjusted effect-size maps (Hedges' g) can also reveal the impact of coverage deficits.
Evidence for the importance of adjusting calculations is presented using two datasets. One is a comprehensive meta-analysis of prosocial decisions. The second group is studies with data on NeuroVault for decision tasks. Decision-making was chosen as it activates vmPFC (Levy and Glimcher, 2012), a region prone to reduced coverage. This data was purely used as the availability enables sharing to demonstrate the code and results are not be meaningful in any other way. That the NeuroVault data did not show large vmPFC activations is likely due to variety in tasks, which would not be grouped together in a genuine meta-analysis.

Conclusion
Using meta-analysis techniques on fMRI data establishes consistent findings across samples, scanning sites and tasks, providing many benefits and overcoming issues with single studies. Including statistical maps enhances some of these benefits and methods like AES-SDM, which combine maps with coordinates, increase the chances of including a study. However, fMRI meta-analyses are prone to false negatives if lacking coverage leaves missing data in study-level statistical maps. Several key regions, including vmPFC and temporal areas suffer from signal dropout, meaning false negatives are unevenly distributed throughout the brain.
To account for and overcome these issues, we present a novel method of adjusting calculations for random and mixed-effects models for meta-analytic group averages and comparisons respectively. By adjusting models to only include variance for voxels with data present in the study's effect-size map, we demonstrate increased meta-analytic effect sizes in regions with the worst coverage. This highlights that failing to account for coverage underestimates effect sizes or may miss activations altogether.