Elsevier

NeuroImage

Volume 19, Issue 3, July 2003, Pages 1240-1249
NeuroImage

Technical note
Posterior probability maps and SPMs

https://doi.org/10.1016/S1053-8119(03)00144-7Get rights and content

Abstract

This technical note describes the construction of posterior probability maps that enable conditional or Bayesian inferences about regionally specific effects in neuroimaging. Posterior probability maps are images of the probability or confidence that an activation exceeds some specified threshold, given the data. Posterior probability maps (PPMs) represent a complementary alternative to statistical parametric maps (SPMs) that are used to make classical inferences. However, a key problem in Bayesian inference is the specification of appropriate priors. This problem can be finessed using empirical Bayes in which prior variances are estimated from the data, under some simple assumptions about their form. Empirical Bayes requires a hierarchical observation model, in which higher levels can be regarded as providing prior constraints on lower levels. In neuroimaging, observations of the same effect over voxels provide a natural, two-level hierarchy that enables an empirical Bayesian approach. In this note we present a brief motivation and the operational details of a simple empirical Bayesian method for computing posterior probability maps. We then compare Bayesian and classical inference through the equivalent PPMs and SPMs testing for the same effect in the same data.

Introduction

To date, inference in neuroimaging has been restricted largely to classical inferences based on statistical parametric maps (SPMs). The statistics that comprise these SPMs are essentially functions of the data (Friston et al., 1995). The probability distribution of the chosen statistic, under the null hypothesis (i.e., the null distribution), is used to compute a P value. This P value is the probability of obtaining the statistic, or the data, given that the null hypothesis is true. If sufficiently small, the null hypothesis can be rejected and an inference is made. The alternative approach is to use Bayesian or conditional inference based upon the posterior distribution of the activation given the data (Holmes and Ford 1993). This necessitates the specification of priors (i.e., the probability distribution of the activation). Bayesian inference requires the posterior distribution and therefore rests on a posterior density analysis. A useful way to summarize this posterior density is to compute the probability that the activation exceeds some threshold. This computation represents a Bayesian inference about the effect, in relation to the specified threshold. In this technical note we describe an approach to computing posterior probability maps for activation effects, or more generally treatment effects, in imaging data sequences. A more thorough account of this approach can be found in Friston et al 2002a, Friston et al 2002b. We focus here on a specific procedure that has been incorporated into the SPM software. This approach represents, probably, the most simple and computationally expedient way of constructing posterior probability maps (PPMs).

The motivation for using conditional or Bayesian inference is that it has high face validity. This is because the inference is about an effect, or activation, being greater than some specified size that has some meaning in relation to underlying neurophysiology. This contrasts with classical inference, in which the inference is about the effect being significantly different from zero. The problem for classical inference is that trivial departures from the null hypothesis can be declared significant, with sufficient data or sensitivity. From the point of view of neuroimaging, posterior inference is especially useful because it eschews the multiple-comparison problem. In classical inference one tries to ensure that the probability of rejecting the null hypothesis incorrectly is maintained at a small rate, despite making inferences over large volumes of the brain. This induces a multiple-comparison problem that, for continuous spatially extended data, requires an adjustment or correction to the P values using Gaussian random field theory. This Gaussian field correction means that classical inference becomes less sensitive or powerful with large search volumes. In contradistinction, posterior inference does not have to contend with the multiple-comparison problem because there are no false positives. The probability that an activation has occurred, given the data, at any particular voxel is the same, irrespective of whether one has analyzed that voxel or the entire brain. For this reason, posterior inference using PPMs may represent a relatively more powerful approach than classical inference in neuroimaging. The reason that there is no need to adjust the P values is that we assume independent prior distributions for the activations over voxels. In this simple Bayesian model the Bayesian perspective is similar to that of the frequentist who makes inferences on a per-comparison basis (see Berry and Hochberg, 1999, for a detailed discussion).

PPMs require the posterior distribution or conditional distribution of the activation (a contrast of conditional parameter estimates) given the data. This posterior density can be computed, under Gaussian assumptions, using Bayes rule. Bayes rule requires the specification of a likelihood function and the prior density of the model’s parameters. The models used to form PPMs, and the likelihood functions, are exactly the same as in classical SPM analyses. The only extra bit of information that is required is the prior probability distribution of the parameters of the general linear model employed. Although it would be possible to specify these in terms of their means and variances using independent data, or some plausible physiological constraints, there is an alternative to this fully Bayesian approach. The alternative is empirical Bayes in which the variances of the prior distributions are estimated directly from the data. Empirical Bayes requires a hierarchical observation model where the parameters and hyperparameters at any particular level can be treated as priors on the level below. There are numerous examples of hierarchical observations models. For example, the distinction between fixed- and mixed-effects analyses of multisubject studies relies upon a two-level hierarchical model. However, in neuroimaging there is a natural hierarchical observation model that is common to all brain mapping experiments. This is the hierarchy induced by looking for the same effects at every voxel within the brain (or gray matter). The first level of the hierarchy corresponds to the experimental effects at any particular voxel and the second level of the hierarchy comprises the effects over voxels. Put simply, the variation in a particular contrast, over voxels, can be used as the prior variance of that contrast at any particular voxel.

This technical note describes the computation of PPMs that is implemented in our software (SPM2, http://www.fil.ion.ucl.ac.uk/spm). The theoretical background, on which this approach is based, was presented in Friston et al 2002a, Friston et al 2002b and the reader is referred to these articles for a full description. The model used here is a special case of the spatiotemporal models described in Section 3 of Friston et al. (2002a). This special case is one in which the spatial relationship among voxels is discounted. The advantage of treating an image like a “gas” of unconnected voxels is that the estimation of between-voxel variance in activation can be finessed to a considerable degree (see Eq. A.7 in Friston et al., 2002b, and following discussion). This renders the estimation of posterior densities tractable because the between-voxel variance can then be used as a prior variance at each voxel. We therefore focus on this simple and special case and on the “pooling” of voxels to give precise [restricted maximum likelihood] ([ReML]) estimates of the variance components required for Bayesian inference. The main advance described in this article is the pooling procedure that affords a computational saving necessary to produce PPMs of the whole brain. In what follows we describe how this approach is implemented and provide some examples of its application.

Section snippets

Conditional estimators and the posterior density

In this section we describe how the posterior distribution of the parameters of any general linear model can be estimated at each voxel from imaging data sequences. Under Gaussian assumptions about the errors ε ∼ N {0,Cε} of a general linear model with design matrix X the responses are modeled as y=Xθ+ε.

The conditional or posterior covariances and mean of the parameters θ are given by (see Friston et al., 2002b). Cθ|y=(XTCε−1X+Cθ−1)−1 ηθ|y=Cθ|yXTCε−1y, where Cθ is the prior covariance and

Applications

In this section we compare and contrast Bayesian and classical inference using PPMs and SPMs based on real data. The first data are the PET verbal fluency data that have been used to illustrate methodological advances in SPM over the years. In brief, these data were required from five subjects each scanned 12 times during the performance of one of two word-generation tasks. The subjects were asked either to repeat a heard letter or to respond with a word that began with the heard letter. These

Conclusion

In this note we have presented a simple way to construct posterior probability maps using empirical Bayes. Empirical Bayes can be used because of the natural hierarchy in neuroimaging engendered by looking for the same thing over multiple voxels. The approach provides simple shrinkage priors based on between-voxel variation in parameters controlling effects of interest. A computationally expedient way of computing these priors using ReML has been presented that pools over voxels. This pooling

Acknowledgements

This work was funded by the Wellcome Trust. We thank Marcia Bennett for preparing the manuscript.

References (9)

There are more references available in the full text version of this article.

Cited by (0)

View full text