Abstract
Functional magnetic resonance spectroscopy (fMRS) can be used to investigate neurometabolic responses to external stimuli in-vivo, but findings are inconsistent. We performed a systematic review and meta-analysis on fMRS studies of the primary neurotransmitters Glutamate (Glu), Glx (Glutamate + Glutamine), and GABA. Data were extracted, grouped by metabolite, stimulus domain, and brain region, and analysed by determining standardized effect sizes. The quality of individual studies was rated. When results were analysed by metabolite type small to moderate effect sizes of 0.29-0.47 (p < 0.05) were observed for changes in Glu and Glx regardless of stimulus domain and brain region, but no significant effects were observed for GABA. Further analysis suggests that Glu, Glx and GABA responses differ by stimulus domain or task and vary depending on the time course of stimulation and data acquisition. Here, we establish effect sizes and directionality of GABA, Glu and Glx response in fMRS. This work highlights the importance of standardised reporting and minimal best practice for fMRS research.
Introduction
1. Background
γ-aminobutyric acid (GABA) and glutamate (Glu), the main inhibitory and excitatory neurotransmitters in the brain, respectively, are critical for normal neurological function. GABA and Glu play an important role in perception (Edden et al., 2009; Puts et al., 2011), learning (Floyer-Lea et al., 2006), memory (Jo et al., 2014), and other behavioural functions (Paredes and Agmo, 1992; Donahue et al., 2010). GABA and Glu are known to interact, due to the fact that GABA is synthesized by using glutamic acid decarboxylase (GAD) by removing an α-carboxyl group from Glu (Cai et al., 2012). Several lines of evidence suggest that an imbalance in GABAergic and glutamatergic function is associated with neurological, neurodevelopmental, and neuropsychiatric disorders (Li et al., 2019; Tang et al., 2021; Nakahara et al., 2022). The interplay of GABA and Glu is of strong interest due to their role in excitatory and inhibitory (E/I balance) which was theorised to play important part in healthy brain function and that the disruption of E/I balance is shared by several psychiatric disorders (Yizhar et al., 2011; Ferguson and Gao, 2018).
In humans, Magnetic Resonance Spectroscopy (MRS) is the only technique that allows for the non-invasive in vivo measurement of wide range of neurometabolites including GABA and Glu (Mullins et al., 2014; Schür et al., 2016; Harris et al., 2017). MRS allows for the quantification of endogenous brain metabolites based on their chemical structure. 1H-proton containing metabolites each have their own distinct chemical environment and thus appear differently along a “chemical shift” axis, although with substantial overlap. Recent developments in MRS instrumental and acquisition technique have broadened our knowledge of brain neurochemistry in both clinal and research domains, and this has been extensively reviewed (Duarte et al., 2012; Faghihi et al., 2017).
While baseline GABA and Glu levels have been associated with typical and atypical brain function and behaviour (Coghlan et al., 2012; Horder et al., 2018), metabolite levels assessed at rest limit interpretation; as they cannot provide information on the temporal dynamics of GABA and Glu, which may provide insight into typical or atypical function, the relationship between GABA and Glu, task-related changes, and responses to pharmacological intervention. This has led to an increased interest in functional MRS studies, which have the potential to measure a dynamic neurochemical system.
1.1 Functional MRS
Functional MRS (fMRS) refers to the use of MRS to estimate metabolite changes in response to external stimulation by acquiring data at different time point associated with changes in stimulus presentation. Typically, MRS spectra result from an averaged signal from repeated measurements (transients) to improve signal to noise ratio (SNR) as metabolites have an inherently low SNR due to their low concentration. A single transient refers to the data collected in each repeat (repetition time, TR) during the MRS acquisition. The often-used term ‘averages’ in MRS stems from the averaging of these transients for a single ‘average’ spectrum. Functional MRS uses the same approach but tend to measure the signal in shorter durations, or average a smaller set of transients, than in static MRS. In this study, we refer to the number of repeated acquisitions per time point as transients to avoid confusion with the act of spectral averaging. It should be noted that different acquisition sequences exist for MRS, with the most popular single-voxel MRS sequences being spin-echo point-resolved spectroscopy (PRESS) (Bottomley, 1987), stimulated echo acquisition mode (STEAM) (Frahm et al., 1989), semi localization by adiabatic selective refocusing (sLASER)(Öz and Tkáč, 2011), and spin-echo full intensity acquired localised (SPECIAL) (Kuwabara et al., 1995). Details on these approaches are beyond the scope of this work but details can be found in recent consensus work (Peek et al., 2020; Lin et al., 2021).
fMRS has been used to study wide range of brain chemistry, includes high-concentration metabolites, such as N-acetyl aspartate, creatine, and choline, to low-concentration metabolites such as lactate (see (Prichard, 1992; Chen et al., 1993; Henning, 2018; Wilson et al., 2019; Peek et al., 2020)). While the fMRS of Glu, and particularly that of GABA is of immense interest due to their critical role in brain function, fMRS of these other metabolites is not yet well-established due to technical considerations (e.g., an absence of lactate at baseline) and perhaps more difficult interpretation of its outcomes).
Glu and GABA overlap considerably with signals from glutamine (Gln) and glutathione (GSH), particularly at clinical field strength (3 T) (see 1.4). Still, despite these challenges, fMRS of GABA and Glu has been used to study neurochemical changes associated with various type of exogenous change, including pain (Gutzeit et al., 2013; Cleve et al., 2015), visual stimulation (Mangia et al., 2007; Apšvalka et al., 2015; Bednařík et al., 2015), working memory (Woodcock et al., 2018), learning and memory (Stanley et al., 2017), and motor tasks (Schaller et al., 2014; Kolasinski et al., 2019). However, substantial inconsistencies between studies exist in terms of acquisition, analysis, findings, and interpretation. To date, the body of fMRS literature on Glu and GABA has not been systematically evaluated and analysed. From hereon we refer to fMRS studies of GABA and Glu as ‘fMRS’.
1.2 Limitations in estimating GABA and Glu
The measurement of GABA and Glu is challenging and contributes to variability across studies. GABA has a low concentration within the brain (1 - 2 mM), and its signal overlaps with high-concentration metabolites like NAA and creatine, as well as very similar chemical shift between Glu, Gln, and GSH. Spectral-editing techniques such as MEscher-Garwood Point-REsolved SpectroScopy (MEGA-PRESS) are often used to improve GABA resolution (Mescher et al., 1998; Edden and Barker, 2007; Near et al., 2011). These approaches rely on J-difference editing of the GABA signal, removing unwanted signal from the spectrum. For a technical review, see (Puts and Edden, 2012; Mullins et al., 2014; Wilson et al., 2019; Deelchand et al., 2021). Spectral-editing MRS techniques typically requires more transients (in the order of 8 minutes; 240+ transients for voxel sizes of 27 ml based on consensus for adequate data acquisition at 3T) compared to non-edited sequences for Glu (64 transients for voxel sizes of 8 ml at 3T) (Peek et al., 2020; Lin et al., 2021). Differences in MRS sequences, especially editing sequences, may affect the ability to interpret and reproduce studies (Terpstra et al., 2016; Baeshen et al., 2020). Whether linear-combination modelling approaches can successfully and reliably separate Glu from Gln, GABA and GSH remains inconclusive (Sanaei Nezhad et al., 2018; Zöllner et al., 2021) and thus, the composite measure Glx (= Glu + Gln) is commonly reported.
1.3 Heterogeneity in fMRS approaches
There is little homogeneity regarding fMRS experimental design, stimulus type, brain region and the quality of MRS acquisition and analysis methods — all of which often depends on the research question. fMRS can typically be performed using two types of experimental paradigms, block-designs or event-related designs (Mullins, 2018). Block designs contrast metabolite measurements between acquisition blocks that are often long in duration and contain numerous stimuli and transients. Event-related approaches rely on time-locking stimulus onset with the MRS acquisition and allow for the investigation of transient metabolite levels changes immediately after stimulus onset (stimulus-locked). Block approaches typically have more SNR as more transients are averaged across per spectrum and from the summation of responses presented in close succession but have limited interpretability of stimulus-locked neurochemical responses. Effect sizes are heterogeneous, with reported observed effect sizes (if at all reported) range from 2% to 18% change from baseline for visual stimulation, and up to 18% change from baseline for painful stimulation (Gussew et al., 2010; Mullins, 2018; Stanley and Raz, 2018). Event-related designs are more tightly associated with stimulus timings, but often suffer from low SNR due to a limited number of transients being averaged across. Both approaches are limited by multiple unknowns such as: the response function describing the delay between stimulus and neurotransmitter change, optimal acquisition duration and timing, and optimal data analysis techniques.
1.4. Our approach
One prior meta-analysis of fMRS studies focused exclusively on Glu (Mullins, 2018), however, no meta-analysis it yet to investigate the fMRS of GABA. With increasing interest in GABA and the popular concept of excitation-inhibition balance (E/I), a comprehensive meta-analysis of both GABA and Glu is of strong interest. We then further investigate potential factors that could affect outcomes in fMRS studies including fMRS design, fMRS parameters, quality of MRS studies and other source of bias.
2. Materials and Methods
2.1 Search strategy and Inclusion criteria
A systematic search of databases (Pubmed, Ovid Medline, and Google Scholar) was performed using a search Boolean generated from litserchr package in R (Grames et al., 2019) combined with additional search terms based on discussion with co-author NAP (For search terms, see Supplementary Table 1). After the initial search on 21st May 2021, the abstract of each article was screened to identify relevant studies using the metagear package in R (Lajeunesse, 2016). The studies that met the following criteria were included: 1) use of in-vivo fMRS to measure neurometabolites in the brain; 2) the study investigated changes in GABA or Glutamate (both Glu and Glx) in response to non-invasive stimuli or tasks; 3) the study participants were healthy adult humans or the study contained a healthy human control group (no psychiatric or neurological condition); 4) the study had a baseline or control condition; 5) the study was published in a peer-reviewed journal, and was written in English or translated to English via Google Translate. Relevant articles from the reference sections of included studies were identified and manually added to the analysis after being discussed with a senior author (NAP).
2.2 Study selection and data extraction
Following PRISMA and PROSPERO guidelines for systematic evidence synthesis, we pre-registered this meta-analysis on Prospero (CRD42021257339) and identified relevant literature (Tricco et al., 2018). A two-stage method was used for study selection (Furlan et al., 2009). In the first stage, potentially relevant titles and abstracts were independently assessed by two investigators (DP and NAP). If the abstract was inconclusive, the full text was retrieved and assessed for eligibility. In the second stage, the investigators independently assessed the full text of potential studies selected in the first stage for their eligibility. A third investigator (JH) was consulted if disagreements persisted in both stages. Reasons for exclusion were documented.
Two investigators (DP and NAP) independently extracted the data using an identical extraction sheet. Data were extracted into four main topics of interest: 1) neurometabolite levels during fMRS; 2) study characteristics (i.e., sample size, age, gender); 3) reported MRS acquisition parameters according to the MRS-Q (e.g., MRS sequence, fMRS paradigm and timing, voxel size, TE, TR, pre- and post-processing); and 4) bibliometric data (e.g., authors, year of publication, and type of publication).
Concentrations of GABA, Glu and Glx were taken as reported by the study, mean and standard deviation (SD; meanmetab) or as percentage change from baseline (%changemetab). While it is possible to perform a meta-analysis on all data calculated as %changemetab, the SD of these two types of data are on different scales and therefore should not be combined together (Higgins, 2011). Our approach allows data points to be combined while avoiding secondary calculation of data. The data, whether time point or time-course data, were considered as separate datapoints and compared to ‘rest’ or ‘baseline’, as long as the actual data are reported separately. Dependence of time-course data is discussed below in section 2.5. If numerical data were not explicitly reported, imputation methods recommended by the Cochrane handbook were used (Higgins et al., 2011). Data not reported in-text but in figures were extracted using WebPlotDigitizer (Rohatgi, 2021). The time from the start of the MRS acquisition to the time of metabolite measurement was also extracted. Differences between brain regions (voxels) were considered independent and therefore data from multiple brain regions acquired in a single study were extracted as independent datapoints (Peek et al., 2020). If limited studies of specific voxels were available, we grouped them based on a broader brain region (e.g., ‘frontal’, or ‘parietal’).
2.3 Quality assessment
The Risk of Bias Assessment tool for Non-randomized Studies (RoBANS) (Kim et al., 2013a) was used to determine the quality of the methodological design and reporting. MRS-Q (Peek et al., 2020) and https://osf.io/8s7j9/, is a quality appraisal tool specifically designed for the systematic review of MRS studies. The MRS-Q was used to assess whether the reported acquisition methods satisfy the minimal best practice in MRS. The MRS-Q allows for assessing both the acquisition approach and whether reporting was adequate (Peek et al., 2020), and is in line with the recently published MRSinMRS (Lin et al., 2021). As the MRS-Q was designed for static MRS, its application for functional MRS experiments is discussed further in the Discussion). Studies were categorised into “low-quality” and “high-quality” based on the adequacy of reported MRS parameters. Studies that reported sufficient spectroscopy parameters and satisfy the consensus for adequate data acquisition were classified as ‘high quality’, studies that reported insufficient spectroscopy parameters or did not satisfy the consensus for adequate data acquisition were classified as ‘low quality’, and studies with not enough information to classify were considered ‘unsure’. While we used these terms (as per these guidelines) these do not always reflect that the study itself of low quality but perhaps did not report sufficient information per recommendation. We should also consider these in the context of history. As detailed below, we analyse data with and without inclusion of “low-quality” papers, but also perform a more dimensional approach, testing the association between effect size and acquisition parameters. Two investigators (DP and NAP) independently assessed the quality of each study using both tools. Disagreements were discussed and resolved by consensus with a third investigator (JH).
2.4 Publication bias
Data were assessed for publication bias separately for each metabolite (GABA and Glu/Glx). The effect sizes were then aggregated for each metabolite within each study to avoid non-independence effects using Egger’s regression and trim-and-fill test (Duval and Tweedie, 2000; Bowden et al., 2015; Nakagawa et al., 2021). For the trim-and-fill test, a random-effects model was used on aggregated data, thus not accounting for non-independent effect sizes. Then, the Knapp and Hartung method (IntHout et al., 2014) was used to test for publication bias instead of the Wald test (Z-tests) as it has been suggested to have better performance on trim-and-fill approaches (Nakagawa et al., 2021). Aggregate effect sizes for each study were calculated by the ‘aggregate’ function from the metafor package in R (Viechtbauer, 2010). Compound symmetric structure (CS) and a conservative rho value of 0.7 were applied as per Rosenthal (1986). Data are visualized using funnel plots (Begg and Mazumdar, 1994; Sterne and Egger, 2001) with standard error (SE) as a measure of uncertainty.
2.5 Data analysis
The meta-analysis was performed on the extracted data to estimate effect sizes in each study using the Meta-Essentials tool in R (Suurmond et al., 2017). Standardized mean differences and 95% confidence intervals (Hedge’s G) were calculated from the mean metabolite concentration change from baseline and/or the percentage change from baseline (% change), as well as through their standard deviation, allowing us to compare data reported in different units. If not specified, the first rest period was selected as baseline condition to calculate the mean difference for all fMRS designs (block, event-related and time course data).
Since data extracted from time courses are considered dependent, their effect sizes should be considered dependent as well. Therefore, time course data were first analysed separately and then sub-grouped within-study with a random variance component (Tau) weighting separately for each sub-group (Hak et al., 2016; Suurmond et al., 2017). Studies that did not allow for effect size calculation due to missing information (e.g., concentration or %change) were included in the systematic review but not in the meta-analysis. Heterogeneity of data was evaluated using I2 (Higgins et al., 2003). The I2 statistic is an estimate of proportion of variance in effect size that reflects real heterogeneity. I2 is a relative measure with a range from 0 to 100. Low I2 suggests no heterogeneity in data and no effect of moderator or potential clustering within the data. A high I2 suggests there are external factors and biases driving the dispersions of effect sizes, which should result in further sub-group analysis (Hak et al., 2016; Borenstein et al., 2021).
Most of the effect size estimates extracted in this current study consisted of time series data, or several datapoints came from a single study (i.e., multiple outcomes from the same participants, for example, rest versus stimulation conditions). This led to statistical dependency between measures, which can lead to errors in variance estimation of the combined effect size (Borenstein et al., 2021). To take the relationships among outcomes into account, robust variance estimation (RVE) was used. RVE has the advantage of approximating the dependence structure rather than requiring exact dependence values between effect sizes, as these are unknown for most of the studies included (Pustejovsky and Tipton, 2021). We used a conservative correlation coefficient of 0.7 for all observations (i.e. pre- and post-observations; time course data) in accordance to Rosenthal (1986)’s recommendations.
Our main aim was to identify general patterns in the fMRS responses of GABA, Glu and Glx. We then sub-grouped the data and analysed it based on type of stimulation, type of paradigm (i.e., block or event-related), and acquisition and analysis parameters (i.e., time, number of transients per time point). Beyond stimulation type we also analysed the data by brain area (region of interest). Because of variation in voxel location and limited available data for specific voxels we opted to analyse these data by region to ensure collation of data. We grouped the ROIs to optimize the number of studies yet retain a semblance of functional relevance. For example, motor cortex and medial prefrontal cortex were categorized as ‘frontal region’. We were also interested to establish whether there was an association between effect size and quality of acquisition (based on the MRS-Q). We first performed subgroup analysis on high-quality versus low-quality studies. We then estimated the correlation coefficient between effect size and number of transients and voxel size using Spearman’s rho. Finally, we explored effect size as function of time using LOESS (locally weighted least squares regression) fitting to investigate the non-linear trend of metabolite changes over the course of an acquisition, as an exploratory step to inform on potential temporal dynamics of the metabolite response (Ruppert and Wand, 1994). We do not expect this to be linear, nor do we have any a priori expectations regarding the non-linear trajectory. Only metabolite levels during stimulation periods were taken into account for this analysis; metabolite levels during breaks or periods of rest in between stimulation periods were excluded. The start of MRS acquisition was considered as t = 0 s.
3. Results
3.1 Study selection
The initial search returned 3,385 studies. After automatic removal of duplicates, 3,383 studies were eligible for abstract screening. 3,329 studies were excluded in the abstract screening stage for the following reasons: additional duplicate studies (n = 538); irrelevant topic (n = 2,778); and animal studies (n = 13). This resulted in 54 studies eligible for full-text screening, resulting in an additional four studies excluded due to insufficient detail, and one study excluded due to it being a meta-analysis. Finally, a total of 49 studies were included in this study. A PRISMA flow diagram can be found in Figure 1.
3.2 Study characteristics
3.2.1 Spectroscopy
Thirty-one of the fMRS studies were performed on 3 T MR-systems, 15 at 7 T, two studies were performed at 4 T, and one study at 1.5 T. The most commonly (18 studies) used non-spectral-editing sequence was PRESS (Bottomley, 1984; Klose, 2008) six studies used STEAM, and five studies used sLASER. For spectral-editing sequences, 10 studies used MEGA-PRESS, six used SPECIAL, two studies used MEGA-sLASER, and one study used each of BASING or STRESS. Two studies reported the use of more than one editing sequence (Table 1). To measure fMRS GABA, 10 studies used MEGA-PRESS, three studies used SPECIAL, two studies used MEGA-sLASER, two studies used MEGA-sLASER, two studies used sLASER and one study used STEAM. To measured Glu and Glx, 18 studies used PRESS, 10 studies used MEGA-PRESS, six studies used STEAM, six studies used SPECIAL, five studies used sLASER, one study used each of BASING or STRESS.
3.2.2 Neurometabolites
Fifteen studies investigated only Glu levels, seven studies investigated only Glx, nine studies reported both Glu and Glx levels. Seven studies investigated both Glu and GABA, while ten studies investigated both Glx and GABA, and one study reported only GABA. See Table 1 for details.
3.2.3 Stimulus domains and brain regions
We grouped studies into 8 stimulus domain categories. These domains were visual (n = 20), pain (n = 8), learning (n = 7), cognition (n = 5), motor (n = 4), stress (n = 2), tDCs (n = 1), and exercise (n = 3). Studies were considered to fall into the visual domain if they contained visual stimulation (i.e., flashing checker board, rotating checker board, visual attention tracking, and video clips) the pain domain if they contained stimulus that elicit pain (i.e., heat pain, dental pain and electric shock) learning domain if they contained learning paradigm (i.e., object recognition, reinforcement learning, n-back task (for short-term memory/implicit learning and working memory), cognition if they contained cognitive task (i.e., Stroop task, imaginary swimming and categorization of either object or abstract stimuli), motor if they contained motor response (i.e., hand clenching and finger tapping), stress if they contained psychological stress, and pharmacological stress and exercise if they contained measurement of evaluation of heart rate to exercise.
The studies were grouped in six different brain regions of interest (ROI). The most studied ROI was the occipital ROI for Glu/Glx and GABA. Additional details of MR-parameters and fMRS experiment designs are presented in Table 1. Figures 2A and 2B summarise studies by brain ROIs investigated for Glx/Glu and GABA, respectively, and additionally reports on stimulus domain.
3.3 Quality assessment
3.3.1 MRS-Q
Most studies (n = 31/49, 63.3%) satisfied the MRS-Q criteria of standardized reporting and best practice and were assessed to be of high quality (Figure 3A). Eighteen studies (36.7%) were assessed as low quality due to inadequate MRS parameters according to MRS-Q, mostly due to an insufficient number of transients or small voxel sizes (see Discussion for further consideration of using baseline MRS quality assurance approaches for fMRS). Among these low-quality studies, nine used spectral-edited fMRS(Maddock et al., 2011; Schaller et al., 2013; Cleve et al., 2015, 2017; Kühn et al., 2016; Coxon et al., 2018; Bezalel et al., 2019; Volovyk and Tal, 2020; Frank et al., 2021), while eight were non-edited (Gussew et al., 2010; Taylor et al., 2015a; Betina Ip et al., 2017; Stanley et al., 2017; Lynn et al., 2018a; Woodcock et al., 2018, 2019; Jelen et al., 2019). For high quality studies, nine studies used spectral-edited fMRS (Hasler et al., 2010; Michels et al., 2012; Schaller et al., 2014; Dennis et al., 2015, 2015; Chen et al., 2017; Mekle et al., 2017; Kurcyus et al., 2018; Boillat et al., 2020; Dwyer et al., 2021) and 24 were non-edited (Mullins et al., 2005; Mangia et al., 2007; Gutzeit et al., 2011, 2013; Lin et al., 2012; Siniatchkin et al., 2012; Kim et al., 2013b, 2014; Lally et al., 2014; Apšvalka et al., 2015; Bednařík et al., 2015; Huang et al., 2015; Taylor et al., 2015b, 2015a; Jahng et al., 2016; Betina Ip et al., 2017; Chiappelli et al., 2018; Kolasinski et al., 2019; Martínez-Maestro et al., 2019; Archibald et al., 2020; Fernandes et al., 2020; Vijayakumari et al., 2020; Frank et al., 2021; Koush et al., 2021b). Two edited-fMRS studies (Floyer-Lea et al., 2006; Stagg et al., 2009) reported insufficient information regarding the MRS parameters and were identified as ‘unsure’.
3.3.2 RoBANS
The risks of biases assessed using the RoBANS are summarized in Figure 3B. According to the RoBANS assessment, all but one study was considered to have a high risk of bias due to non-blinding of outcome, primarily due to participants or experimenters being aware of receiving/delivering a functional paradigm. Only one study explicitly reported blinding of outcome. Given the nature of fMRS experiment as a pre-post intervention study, some fMRS experiment designs might be impossible to blind. While there may be potential bias due to the fMRS examiner or participant being aware of the stimulus being given, the order of stimuli is often unknown to participant and therefore ‘blind’ to the stimulus paradigm. Yet, this bias needs to be considered as it may impact the results (e.g., participant may behave differently when the purpose is known, experiments may bias their analysis based on the paradigm). Blinding criteria are likely more relevant for pharmacological studies than for typical fMRS experiments.
fMRS studies are often required to exclude data with unsatisfactory spectral quality. While this is common in MRS, based on the RoBANS criteria, studies with incomplete outcome data would be identified as high risk. Given above criteria, 55.1% of studies were considered high-risk. Twenty-two studies (44.9%) stated that all data were included. Two studies (4.1%) were of high risk of bias for selective outcome reporting as they did not fully report all available outcomes. Bias of inadequate measurement was also identified via the MRS-Q by assessing whether studies reported adequate MRS parameters; 70% of all studies included were assessed to be at low risk of bias in this domain. No study reported potential bias in selection of participants.
3.4 Publication bias
The summary for the Egger’s and Trim-and-fill test for publication bias are showed in Table 2.
3.4.1 Egger’s regression test and funnel plot
Egger’s regression test (Egger et al., 1997) is a quantitative asymmetry test based on a simple regression model. The funnel plot illustrates the effect size of each study on the x-axis and standard error on the y-axis, without the publication bias the studies should roughly followed the funnel shape with symmetric distribution of datapoints (Lin and Chu, 2018). No asymmetry in small and large effect sizes was found for GABA (both %changeGABA and meanGABA). However, Egger’s test suggested significant asymmetry (p < 0.05) for Glu/Glx, as well as for Glu and Glx when analyzed separately, except for %changeGlx. Supplementary Table 2 shows the results from the Egger’s regression test including the estimated effect sizes adjusted for publication bias. Supplementary Figure 1 shows the funnel plot using SE as a measure of uncertainty, color coded by stimulus domain. These data suggest that studies of Glu/Glx were asymmetrical due to an absence of small effect size positive direction studies.
3.4.2 Trim-and-fill
The trim-and-fill method is a non-parametric test that was used to visualize and correct data asymmetry due to publication bias (Duval and Tweedie, 2000). The principle of the method is to ‘trim’ the studies with publication bias causing plot asymmetry, and to use the trimmed funnel plot to estimate the estimated the true centre of the funnel plot, then ‘filling’ or added the trimmed studies and their missing counterpart studies (not reported due to publication bias). Based on the method, no study was added via the trim-and-fill test; therefore, the estimated effect sizes remained the same. All data demonstrated moderate to high heterogeneity with I2 values of 60% - 90% (Supplementary Table 3). This means that the variability and inconsistency across study are from the true heterogeneity in the data and not by chance (Higgins et al., 2003). Trim-and-fill analysis suggested there were no potential missing studies due to bias (Supplementary Figure 2). Due to presence of between-study heterogeneity in this current study, the interpretation of these results needs to be treated with care (Terrin et al., 2003; Ioannidis and Trikalinos, 2007; Shi and Lin, 2019).
3.5 Meta-analysis
3.5.1 Effect of fMRS-design
Neurometabolite levels across all studies
When we considered change in metabolite levels across studies regardless of stimulus domain, brain ROI, or other factors (e.g., voxel size, number of transients), meanGlu and meanGlx increased significantly compared to the respective baseline condition (Hedge’s GGlu_mean = 0.37, 95% CI: 0.09 – 0.645, I2 = 86.83 and GGlx_mean = 0.29, 95% CI: 0.035 – 0.555, I2 = 87.71 respectively). The percentage change between baseline and active conditions in Glu was positive on average (Hedge’s GGlu_pct = 0.47, 95% CI: 0.158 – 0.789, I2 = 82.81). No significant change was observed for GABA studies for either mean or percentage change when compared to baseline (Figure 4A).
Neurometabolite levels by type of paradigm
When effect sizes were computed by type of paradigm regardless of brain ROI and stimulus domain, block designs showed lower confidence intervals in effect size relative to event-related designs and a significant overall positive change in Glu/Glx for both mean and %change (Hedge’s GGlu/Glx-mean = 0.27, 95% CI: 0.064 – 0.475, I2 = 86.28; Hedge’s GGlu/Glx-%change = 0.36, 95% CI: 0.124 – 0.605, I2 = 86.28) (Figure 4B). A significant reduction in mean GABA was observed for event-related designs (Hedge’s GGABA = −0.76, 95% CI: −1.285 – −0.227, I2 = 0.11), but no significant change was observed for block paradigms. It must be noted that the significant effect observed here is of one study only, thus the interpretation of the result must be treated with care.
3.5.2 Neurometabolite levels by stimulus domains
All stimulus domains that demonstrated a significant change from baseline contained only one individual study with 3 to 9 within-study outcomes (i.e., were driven by single studies that had multiple results at different timepoint, metabolite changes as a function of time or different types of stimuli within a single study). The percentage in GABA level increased positively during exercise (Hedge’s GGABA-mean = 0.46, 95% CI: 0.023 – 0.906, I2 = 0.7). On the other hand, the %changeGlu/Glx was positive during learning (Hedge’s GGlu/Glx-%change = 0.29, 95% CI: 0.106 – 0.469, I2 =0.23), mean GABA showed negative change from baseline (Hedge’s GGABA-mean = −0.76, 95% CI: −1.285 – −0.227, I2 =0.11) during learning. Mean GABA and %change in GABA showed significant change in the opposite direction in the motor domain (Hedge’s GGABA-mean = −0.76, 95% CI: −1.485 – −0.044, I2 = 0.6; Hedge’s GGABA-%change = 0.32, 95% CI: 0.184 – 0.459, I2 = 0). Stress stimulation was associated with a significant negative change for GABA (Hedge’s GGABA-mean = −0.87, 95% CI: −1.609 – −0.129, I2 = 0.69). During transcranial direct current stimulation, GABA showed a negative %change (Hedge’s GGABA-%change = −0.12, 95% CI: −0.238 – −0.006, I2 = 0). There were no significant changes related to visual stimulation for any measure of Glu/Glx and GABA (Figure 5). Again, it must be highlighted that only 1-2 studies were included in these results with statistical significance, thus these findings need to be interpreted cautiously.
3.5.3 Neurometabolite levels by ROI studied
When we investigated the neurometabolites by ROI, only a few studies were included for each metabolite. Across neurometabolites, regardless of stimulus domain, every ROI except for the limbic ROI showed a significant difference in neurometabolite levels compared to the baseline condition. The occipital ROI comprised most of the studies included (n = 22 across metabolites). Pooled effect sizes from six studies in occipital ROIs observed an overall increase by %change of Glu/Glx (Hedge’s GGlu/Glx-%change = 0.84, 95% CI: 0.089 - 1.588, I2 = 88.73). This was surprising since stimulation in the visual domain themselves showed no significant effect. This may be because the effect of visual stimulation was not only tested in visual cortex but across different ROIs (see Figure 2 and Table 1). Significant increases compared to the baseline condition were also observed for frontal %changeGABA (Hedge’s GGABA-%change = 0.35, 95% CI: 0.046 – 0.649, I2 =80.86) and insular meanGlu/Glx level (Hedge’s GGlu/Glx-mean = 0.52, 95% CI: 0.094 – 0.95, I2 = 75.97). Due to limited available data, temporal and parietal ROI only had one study included for each analysis, except for percentage change in parietal GABA. While significant differences were observed, these data show very low heterogeneity (I2 = 0 – 0.23). This might suggest a potential bias in over- or under-estimating the observed effects since these results are from within-study outcomes. Data by ROIs are shown in Figure 6.
3.5.4 Effect sizes in relation to time
Several studies had time-course data available, and we were therefore able to explore effect sizes based on ‘time-in-acquisition’ (see Figure 7). The results show different temporal fluctuation for GABA/Glu/Glx in different stimulus domains. The fitted line (LOESS) suggests potential metabolic response patterns; GABA tends to start high but then decreases with increasing time-in-acquisition in learning paradigms. For Glu/Glxmean, three studies were included for exercise stimulus and one study was included for each of visual, learning and stress. For meanGABA, one study was included for visual stimulus. For %changeGlu/Glx, four studies were included for visual stimulus, two studies for learning, and one study each for motor and cognitive. For %changeGABA, one study was included for motor stimulus. The %changeGlu/Glx tends to increase with increasing stimulation for visual paradigms only. There were no clear patterns for meanGABA and meanGlu/Glx. It should be noted that while this is interesting, the amount of available data included is too small to make a firm conclusion.
3.6 Effect of fMRS-parameters
3.6.1 Effect of quality based on the MRS-Q
Supplementary Figure 3 illustrates data when only ‘high quality’ studies were included. Generally, Glu/Glx show a positive trend while GABA shows a small negative trend for meanGABA compared to baseline. These findings are in agreement with section 3.5.1 where we did not consider study quality. Unlike in section 3.5.1, however, the change in meanGlu/Glx was not significant from baseline, while %changeGlu/Glx was significant, with higher effect size from baseline compared to 3.5.1 (Hedge’s GGlu/Glx-mean = 0.24, 95% CI: −0.066 – 0.553, I2 = 85.04, p = 0.045). GABA data show an overall lower effect for both mean and %change and did not reach statical significance, consistent with section 3.5.1.
Figure 8 shows data for Glu/Glx and GABA by stimulus domains across high-quality studies only. Several domains contained only a single high-quality study, therefore, results in domains such as stress (Glx and GABA) and motor (GABA) remained relatively the same. MeanGlu shows a difference for the motor domain when only high-quality studies were included, indicating an increase of Glu-mean compared to the baseline condition (Hedge’s GGlu-mean = 0.37, 95% CI: 0.004 – 0.743, I2 = 0). Exercise, learning, pain, and visual domain remained non-statistically significant for all metabolite types.
3.6.2 Effect of number of transients and voxel size
First, we assessed whether effect size was correlated with the number of transients and voxel size. The number of transients mentioned here is the number of transients that was averaged across for metabolite quantification (e.g., per acquisition block or per one window width for sliding window analysis). There was statistically significant relationship between effect size and the number of transients for meanGlu (ρ = −0.3, p = 0.0062). All other metabolites showed no significant relationship with number of transients (meanGABA: ρ = 0.021, p = 0.9, meanGlx: ρ = −0.27, p = 0.084). Percentage change in GABA (%changeGABA: ρ = −0.21, p = 0.079), Glu (%changeGlu: ρ = −0.15, p = 0.11), and Glx (%changeGlx:ρ = −0.26, p = 0.2) showed no significant correlations between number of transients and effect size (Figure 9).
To analyse the association between effect size and number of transients, we binned studies based on the number of transients. Most of the studies used a number of transients in the range of 65-128 and 129-256 for metabolite quantification (n = 13 for each bin). A small increase in percentage GABA (Hedge’s GGABA-%change = 0.23 – 0.32) and a small decrease in meanGABA (Hedge’s GGABA-mean = −0.76) were observed for studies with a limited number of transients (1-32 and 33-64). However, these significant results included data from only one study (Figure 10). The results were inconclusive when analysing these data by stimulus type, as only 1-4 studies were included for each stimulus type (Supplementary Figure 4).
The relationship between voxel size and effect size was different based on type of data (mean or %change) (Figure 11). For fMRS studies reported in mean metabolite levels, the effect sizes showed a negative relationship with voxel size (meanGABA: ρ = 0.42, p = 0.012; meanGlu: ρ = 0.066, p = 0.55; meanGlx: ρ = −0.42, p = 0.0059). Conversely, in studies reporting %change from the baseline condition, we observed a positive relationship between effect size and voxel size for all type of metabolite (%changeGABA: ρ = 0.12, p = 0.3; (%changeGlu: ρ = 0.19, p = 0.043; (%changeGlx: ρ = 0.41, p = 0.039). Only meanGlu and %changeGABA did not demonstrate a significant relationship with voxel size (Figure 11).
Discussion
1. Summary of the findings
We systematically evaluated and synthesized the fMRS literature on GABA and Glu/Glx to date (mid 2021). Overall, results show a wide variability in effect sizes and directionality for both Glu/Glx and GABA when generalized across design and stimulus domain. Most of the Glu/Glx studies showed positive trends (increases) during stimulation compared to baseline (at rest), while GABA studies generally showed negative trends (decreases) compared to baseline. The increase in Glu/Glx levels is in agreement with several animal studies showing an association between neuronal activation and Glu/Glx in response to task or stimuli (Just and Faber, 2019; Takado et al., 2021), which also correlates with BOLD signal activation (Just et al., 2013; Baslow et al., 2016; Just and Sonnay, 2017). Significant changes in Glu and Glx from baseline only had a small to average effect size (Hedge’s GGlu and Glx= 0.29 - 0.47). Although changes in GABA compared to baseline were not statistically significant across studies, the general directionality of decreased GABA levels is consistent with a previous narrative review by Duncan et al (2014) suggesting that GABA tends to be negatively correlated with task-evoked neuronal responses, as well as with studies showing that inhibition tends to decrease during repeated stimulation or learning (Stagg et al., 2011; Heba et al., 2016; Kolasinski et al., 2019). Ultimately, this meta-analysis shows that current fMRS works show large variety within domain and stimulus type, small effect sizes, and susceptibility to factors beyond experimenter control. While standardised reporting is becoming more widespread in MRS field, fMRS does not always adhere to the same principles and additional reporting standards need to be developed. This includes thorough reporting stimulus details and analysis methods, including open access to analysis code and stimulation paradigms, as these are likely driving the heterogeneity as well. This review revealed several important factors that need to be considered when performing and interpreting fMRS studies, which are detailed in the following sections.
2. Effect of fMRS design
2.1 Effect of fMRS paradigm: block paradigm or event-related
In the current meta-analysis, the magnitude of effect sizes was observed to be smaller for block designs than event-related designs. This is in agreement with a previous meta-analysis of fMRS of Glu (Mullins, 2018). However, block designs provided more consistent results for Glu/Glx from tighter 95%CI of the averaged effect sizes compared to event-related designs, suggesting that block paradigms may be better at capturing Glu/Glx changes. On the other hand, event-related paradigms showed a wider range of confidence intervals compared to block design (event-related: 95% CI of −0.23 to 5.59, Block: 95% CI of −0.406 – 0.605). Although speculative, perhaps the most relevant difference between these two paradigms is that they are likely probing different brain processes, i.e., fast-acting neurochemical response through event-related designs and slower homeostatic processing or plasticity in block paradigms.
Block designs have the potential advantage of robust metabolite quantification as signal averaging is performed during a sustained stimulus. Habituation and adaptation to repeated stimulation with a potential summative effect likely plays a key role in block designs (Michels et al., 2012; Betina Ip et al., 2017; Ligneul et al., 2021). Signal averaging over a longer time course has been shown to smooth out any task-based dynamics of neural activity (Mangia et al., 2007; Mullins, 2018) and brain homeostasis during long stimulation blocks might lead to dismissal of, or minimal, metabolic changes (Mangia et al., 2012; Apšvalka et al., 2015).
These limitations can be overcome by time-locking fMRS to stimulus onset and assessing metabolic changes with higher temporal resolution. The temporal resolution of the event-related approach can be brought to under 30 seconds or less, allowing for measurement of a relatively fast response at the cost of increased measurement uncertainty of the individual time point due to decreased SNR. Several approaches have been implemented to successfully improve temporal resolution without sacrificing SNR, including sliding window, and/or averaging over participants, which will be discussed further in Section 3.1.
It is likely that the optimal choice of paradigm depends on the targeted stimulus domain. Any study with “long term” change (i.e., learning, memory, or even pharmacological approaches) may consider using block paradigms as these hold an advantage of higher SNR (Jahng et al., 2016; Bezalel et al., 2019; Vijayakumari et al., 2020). As previously discussed, block design often involves repeat stimulation with the theorised summation brain response, while event-related designs with fewer transients are likely to elicit a smaller response, which, even when averaged together, is not driven by repeated summation of stimuli. While this is not the right approach to assess transient responses, when someone is interested in more long-term changes, both our data and prior work suggests block designs may be more robust (Jahng et al., 2016; Bezalel et al., 2019; Vijayakumari et al., 2020). While this is speculative, our meta-analysis based on available data showed that block designs tend to have higher effect sizes than event-related designs. Nevertheless, careful fMRS paradigm design might allow for investigation of both block and event-related analysis within the same acquisition (Apšvalka et al., 2015; Stanley et al., 2017; Woodcock et al., 2018) through careful study design, but this is not widely used.
2.2 Effect of stimulus domain
The directions and magnitudes of metabolic changes are influenced by stimulus domain. A significant increase compared to baseline was observed for Glu/Glx in five domains (exercise, learning, motor, stress, tDCs). Increased Glu/Glx during stimulation is in line with studies showing that neuronal responses require increased energy metabolism and/or excitatory neurotransmission. Although effect sizes were small, GABA concentrations tended to decrease in response to stimulation, except for %change in the motor domain. This is in agreement with previous studies demonstrating a negative relationship between regional neural activation and GABA, and a deactivation of GABAergic mechanisms when excitation is required (Duncan et al., 2014; Kiemes et al., 2021). It has been suggested that task-related GABA changes are more robustly observed in stimulus paradigms with a change in behavioural performance (Ip and Bridge, 2021), such as learning (Frangou et al., 2018, 2019), motor or sensory performance (Stagg et al., 2011; Heba et al., 2016; Kolasinski et al., 2019), and stress (Houtepen et al., 2017; Lynn et al., 2018a); this is reflected in our meta-analysis results, and GABA changes do not appear particularly robust. fMRS studies in pain appeared to be most consistent, but most domains show huge variation in their responses. GABA changes tend to be moderate at best and appear very domain- and approach, specific.
The high I2 across stimulus domains observed in this meta-analysis reflects the high degree of heterogeneity in results for different paradigms and stimuli even within stimulus domain. While we expected some variation as stimulus parameters and stimulation approach will differ between studies, we were surprised by this large heterogeneity. It should be noted that classification of stimulation domains may vary depending on individual opinion and judgement. For example, we grouped all visual stimulation fMRS studies into one category, despite differences in experimental design, stimulus intensity, and stimulus duration, which likely influenced the observed results (Mullins, 2018; Stanley and Raz, 2018; Ip and Bridge, 2021). Especially in the visual domain, we found a lot of heterogeneity, likely due to the variety in visual tasks including flashing checker boards with different flickering frequency, movie or clip-videos as visual stimulus, rotating checkerboard, and visual stimulations with variations in contrast level (Mangia et al., 2012; Kim et al., 2013b, 2014; Betina Ip et al., 2017; Mekle et al., 2017; Bednařík et al., 2018; Martínez-Maestro et al., 2019). Previous studies demonstrated regional cerebral blood flow change in linear function with stimulus repetition rates that peaked at approximately 8 Hz then decline above this frequency (Fox and Raichle, 1984; Bejm et al., 2019). Previous fMRI studies also reported BOLD response to be depends on stimulus patterns (Krüger et al., 1998; Hoge et al., 1999). Similarly, perhaps approaches with higher SNR (such as 7T) are more sensitive to changes (Mangia et al., 2012).
Combining visual stimulus studies was necessary, however, as separating them out further would lead to single study analysis, which is not particularly useful for meta-analytical purposes. However, we do know stimulus parameters can have different effects. Previous studies have demonstrated a lack of both Glu and BOLD signal changes at low visual contrast level, whereas only high stimulus intensity elicited a measurable and significant Glu response (Ip et al., 2019). This suggests that stimuli used for fMRS are preferably ones with high-intensity to evoke a sufficiently salient response (e.g., in a considerable number of neurons) to cause neurometabolite production or spillover (Yashiro et al., 2005; Gonçalves-Ribeiro et al., 2019), which leads to a measurable transient change that can be measured with MRS. Additionally, MRS-derived neurometabolite signals are non-specific and reflect all cellular component (e.g., cytosol, extracellular space, vesicle, synaptic cleft, etc.). It is possible that a smaller brain response with less SNR (e.g., one induced by repetitive stimulation) could be masked by other metabolic responses with higher SNR (e.g., energy usage, steady state).
2.3 Effect of ROI
Although we intended to study the effect of ROI on effect size, there was insufficient data to draw firm conclusions. Despite the occipital ROI being the most studied ROI in fMRS (and MRS in general, (Puts and Edden, 2012)), and with the benefit of high-quality spectra due to its homogenous field relative to other ROIs (Juchem and de Graaf, 2017), only %changeGlu/Glx was significantly increased compared to baseline. A significant increase of GABA was demonstrated for frontal and parietal ROIs which included fMRS studies of visual, exercise, motor, stress and learning stimulus; all these involved some kind of repeated stimulation and likely to reflect plasticity. This is consistent with the notion that both frontal and parietal regions play important roles in regulating inhibitory control of behaviour (Aron et al., 2004; Narayanan and Laubach, 2017; Hermans et al., 2018). An increase in Glu/Glx was demonstrated for insular cortex and other temporal lobe regions. While we can only speculate why this appears to be more robust t, it might be that there is less variation in the approach used for insular regions compared to other regions, for example, visual studies. It is possible that paradigms targeting insular/parietal regions elicit stronger responses in these regions than visual stimuli do in visual regions, but it might also be the case that voxels have less heterogeneity (as heterogeneity even within occipital lobe is large, and different occipital regions have very different roles).
The differences in both of direction and magnitude due to anatomical differences and functional differences of ROIs are not surprising (Gordon et al., 2017; Zhang et al., 2020). Different brain regions typically contain different tissue compositions (i.e., white and grey matter) (Pouwels and Frahm, 1998; Amaral et al., 2013). Differences in tissue composition also leads to variation in metabolism with grey matter having higher energy consumption compared to white matter (Amaral et al., 2013; Ford and Crewther, 2016), which in turn, affects GABA, Glu and Glx levels (Rae et al., 2009; Rae, 2014) and see also next section. We were not able to determine the role of tissue composition and subsequent partial volume correction, which accounts for much variation in the estimation of GABA and Glx/Glu, due to limited available and reported data. Another possible explanation for differences in effect sizes between ROIs could arises from increase SNR in certain regions (e.g., occipital lobe) with close proximity to the receiver coil as well (Di Costanzo et al., 2007; Minati et al., 2010). Nevertheless, further primary studies are required to further elucidate the relationship between effect sizes and brain region.
2.4 Possible mechanisms underly metabolite changes
While directional changes in the neurometabolite responses were observed in this meta-analysis, the mechanisms underlying these changes remain unclear. Metabolite concentrations obtained in fMRS studies originate from all cell compartments (i.e., cell body, cytosol, synaptic cleft, etc.) (Puts and Edden, 2012). The brain’s response to external stimuli consists of a complex interplay between neuronal mechanisms. This includes changes in blood flow, changes in neurotransmitter transport, production and breakdown, and brain oxidative metabolism (Fox and Raichle, 2007; Mangia et al., 2009; Takado et al., 2021). Besides neuronal synaptic activity, metabolic processes also contribute to the neurometabolite levels measured in MRS (e.g., the TCA cycle)(Dienel, 2012; Magistretti and Allaman, 2015).
Our finding of increased Glu/Glx during stimulation/tasks is in agreement with several studies that link Glu and brain responses to stimulus such as perception, visual activation, motor activation, learning, and memory (Gao et al., 2013; Magalhães et al., 2019; Ligneul et al., 2021). Glu plays a major role during activity-dependent energy demands as the most abundant amino-acid and the main excitatory neurotransmitter in the brain (Ligneul et al., 2021). Increasing evidence demonstrates the close regulation between glucose consumption and glutamate-glutamine cycling (Sibson et al., 1998; Rothman et al., 2003), which was theorised to lead to increasing Glu levels during the BOLD-activation period (Betina Ip et al., 2017; Vijayakumari et al., 2018; Martínez-Maestro et al., 2019). Additionally, Glu is also a major determinant for neuronal plasticity during periods of high neural activity as Glu influence the production of of brain-derived neurotrophic factor (BDNF) which regulates survival, differentiation and synaptogenesis in the CNS to change patterns of neuronal connectivity (Gonçalves-Ribeiro et al., 2019; Valtcheva and Venance, 2019). Indeed, Glu release by neurons and its uptake to astrocytes for recycling via glutamine is thought to represent 70-80% of total brain glucose consumption (Hertz and Rothman, 2016). That said, it is not possible to differentiate metabolic Glu from vesicular or synaptic Glu, and caution in the interpretation of Glu/Glx changes is important; one cannot simply extrapolate these changes to changes in neurotransmission.
Previous studies have demonstrated the relationship between GABA as measured with MRS and the gene encoding for glutamic acid decarboxylase (GAD) 67. GAD 67 is responsible for converting Glu into GABA under baseline conditions and the majority of GABA production, and is present in both cell bodies (Marenco et al., 2010) and synapses. Therefore, MRS quantified GABA is often said to reflect ‘inhibitory tone’ (Rae, 2014; Peek et al., 2020). The relationship between GABA and neuronal activation (or deactivation) is less consistent, and often dependent on the task used. Previous work has shown that increased GABA levels are associated with increased BOLD signal in response to an interference task (Kühn et al., 2016) and in response to pharmacological manipulation in rat brain (Chen et al., 2005). However, other studies have shown that higher baseline GABA was associated with lower BOLD response amplitude (Muthukumaraswamy et al., 2012; Rae, 2014; Stanley and Raz, 2018). It has been suggested that Glu/Glx and GABA changes in response to stimulation comprise of both energy usage and neural process facilitating a shift into new metabolic steady-state by shifting the excitation/inhibition equilibrium, linking these two processes more directly (Just et al., 2013; Lynn et al., 2018b). A recent fMRS study in animals models showed that increases in GABA after repeated tactile stimulation were consistent with two-photon microscopy measures of increased inhibitory activity, and increases in Glu with increased excitatory activity, suggesting that functional changes in GABA and Glu measured through MRS are indeed reflective of increased inhibitory neurotransmission (Takado et al., 2021).
3. Effect of fMRS parameters
Beyond assessing fMRS through differences and changes in the ‘bulk’ metabolite response to stimulation, it is also important to investigate differences at the level of acquisition and analysis. In this meta-analysis, we demonstrated the effect of MRS parameters such as number of transients, voxel size, timing, and MRS quality limitations on reported metabolite concentrations.
3.1 Number of transients
The results reported in our meta-analysis illustrate the variability of methods used in fMRS studies. For both Glu/Glx and GABA, effect sizes seem to be higher for a lower number of transients, and effect size decreases as the number of transients increase. There are several possible explanations for this; One is that low transient sizes lead to lower SNR and unreliable spectral quantification (Mikkelsen et al., 2018), which potentially lead to biased metabolite concentration changes. Another possible explanation is that rapid changes in the first few minutes due to neurotransmitter release might influence the effect sizes observed with a small number of transients due to their higher temporal resolution (Mullins, 2018; Ligneul et al., 2021). On the other hand, a larger number of transients might lead to lower effect size observed due the effect being averaged out over a longer period of time (Ip and Bridge, 2021), thus diluting any rapid changes. Ultimately, conclusions are difficult to draw without a measurable ground truth, since the spectral fitting process itself may introduce quantitative bias depending on SNR (and therefore the width of the averaging window). Synthetic simulated data can be useful to elucidate the accuracy, precision, and biases of spectral fitting when attempting to resolve small temporal changes.
Given the approximate 104 times lower metabolite concentrations relative to water, and thus low SNR, spectra are often collected with long acquisition times. These acquisition times are often longer than the assumed temporal dynamics with fast metabolite changes in less than 1s (Apšvalka et al., 2015; Bednařík et al., 2015; Mullins, 2018; Ligneul et al., 2021) which likely reflect changes in visibility in existing metabolite pools. Several spectral averaging methods have been applied to overcome this trade-off between temporal resolution and SNR (Kanowski et al., 2004; Mikkelsen et al., 2018). One of these averaging methods included averages fMRS data across short sequential acquisition blocks (Kolasinski et al., 2019). Others used time-locking to stimulus onset followed by averaged transients acquired during stimulus presentation or baseline, comparing the two, as event-related averaging (Lally et al., 2014 p.201; Apšvalka et al., 2015; Stanley et al., 2017). Some studies have averaged across a small number of transients but across participants to obtain group-level spectra with higher temporal resolution (Apšvalka et al., 2015; Bednařík et al., 2015; Fernandes et al., 2020). Others have applied a ‘sliding window’ or ‘moving averages’ approach (i.e., average transients in blocks then shifting the averaging over time by a certain transient window width) to detect a dynamic trace of metabolite changes (Mangia et al., 2007; Schaller et al., 2013; Fernandes et al., 2020; Rideaux, 2020).
Our results are in agreement with studies suggesting averaging across a small number of transients has an advantage of higher temporal resolution for detecting rapid modulation of metabolite levels (Lally et al., 2014; Betina Ip et al., 2017; Ligneul et al., 2021). A longer averaging window might be better associated with moving towards a new steady metabolism as described above (Betina Ip et al., 2017; Lynn et al., 2018a). Furthermore, the brain likely responds differently to different types of stimuli, and in a region-specific manner, once again emphasising that task design needs to be tailored towards the question of interest. Surprisingly, we know very little about the actual temporal dynamics of these metabolites thus makes it difficult to a priori choose the best acquisition strategy. Only a few studies were included to allow for the consideration of the impact of transient width, which supports the urgent needs in of more primary studies of fMRS with varying time windows.
3.2 fMRS timing
Our analysis allowed us to explore whether effect sizes change with time of acquisition. While exploratory, these time-resolved fluctuation patterns suggest different response functions for different brain regions or stimulus domains, and between GABA and Glu/Glx. Some studies observed a fast Glu response early in a working memory task, but not later in the task (Woodcock et al., 2018), while others observed Glu reaching a new steady state 1 to 2 minutes after stimulus onset (Mangia et al., 2007; Schaller et al., 2013). Previous studies of GABA and Glx in response to visual stimulation demonstrated concentration drifts over time in opposite directions while participants were at ‘rest’ before stabilising (in steady state) after around 500 seconds (Rideaux, 2020). As discussed in previous sections, the time courses of neurometabolites in response to stimulus domain are a topic of great interest and require further elucidation. This perhaps can be achieved by varying the time of fMRS acquisition and stimulus onset in high-field MR at > 3 T, while aiming for the best temporal resolution possible.
3.3 Others MR-instrument-related limitations
fMRS is also sensitive to other instrument and acquisition-related limitations. MRS offers low spatial specificity as large voxels (often >15 ml) are required for sufficient SNR. Reducing voxel size requires increasing acquisition time to maintain SNR, which is not only impractical, but also increases the risk of scanner drift and participant motion, especially in clinically sensitive motion-prone groups such as prenatal and people with neurodevelopmental conditions (Mikkelsen et al., 2018; Hui et al., 2021; Ip and Bridge, 2021).
4. Quality assurance of MRS
Differences in fMRS parameters go hand-in-hand with quality assurance. There is no consensus on minimally best practice for fMRS to date. Currently available quality assurance and reporting metrics (MRS-Q and MRSinMRS) were designed for static MRS (Peek et al., 2020; Lin et al., 2021), and do not take into account functional approaches where the averaged number of transients is often lower to achieve better temporal resolution. Notably, many studies reported here used smaller voxel sizes compared to consensus recommendation (~27 ml for edited MRS for GABA, 3 T, and ~3.4 ml for unedited at 128 transients, 3T) (Lin et al., 2021). Smaller voxel size inconsistent with consensus standards was often observed in particular for spectral editing of GABA, and findings may be less reliable due to insufficient SNR. Here, we used standard language for quality assessment (such as high or low quality) but should of course note that this language often refers to studies not reporting sufficient information. It is our hope that with the increasing consensus in reporting, this will become less of a concern. We should also note that some studies used “low quality” approaches compared to the consensus now but need to see these in a historical perspective. Despite several studies reporting inadequate fMRS parameters, our sensitivity analysis based on study quality shows no extreme changes from analyses including all studies. While there is room for improvement for reporting of fMRS, most of the studies used adequate fMRS scan parameters. It is possible that the number of transients is less important when modelling time-course data and using within-participant designs. Establishing minimum reporting standard in this early stage would greatly increase reproducibility in a field that offers an almost unlimited number of data analysis strategies.
5. Sources of bias
As discussed in the previous sections, there are various sources of bias in fMRS study design, acquisition, and analysis parameters (i.e., brain area, voxel size, number of transients, and metabolite unit, e.g., percentage change or mean concentration). Study quality assessments further suggest that fMRS studies lack randomisation and blinding of participants. Additional risk of bias could arise from selection of participants, for example, studies often using colleagues as participants for the study. fMRS studies such as stress or pharmacological designs often use a pre-post within-participant design, this introduces bias into the analysis (Ma et al., 2020) and potentially leads to reporting of positive results (publication bias) (Rosenthal, 1979; Murphy and Aguinis, 2019). Additional sources of bias were beyond the scope of this meta-analysis. These include the general experiment design, such as population sampling and type of baseline condition such as difference type of visual baseline condition of eyes close or a fixation cross (Ip et al., 2019; Ip and Bridge, 2021), the choice of analysis approach including differences between spectral modelling algorithms (Zöllner et al., 2021, 2022; Craven et al., 2022; Marjańska et al., 2022), quantification and referencing (metabolite in institutional units [IU], absolute concentration or ratio to creatine [Cr], etc.)(Porges et al., 2017), and how results are reported (e.g., reported only in percentage change but not in concentration). In particular, the choice of quantification reference compound might have a strong impact, although it is assumed typical reference compounds such as Cr or NAA are unlikely to change with stimulation (Wilson et al., 2019). One important parameter that needs further investigation and consensus treatment is linewidth adjustment based on the BOLD signal. Since haemoglobin is paramagnetic when deoxygenated, but diamagnetic when oxygenated, local magnetic susceptibility depends on the blood-oxygen level. BOLD activation causes narrowed MRS lines and increased signal magnitude that can lead to overestimation of metabolite levels if uncorrected (Zhu and Chen, 2001; Betina Ip et al., 2017). In the present study, we did not have sufficient information to perform an analysis on these topics.
As shown in the results, the type of data reported (mean or %change) influences the effect sizes observed. Most fMRS studies reported results as %change, followed by ratio to reference molecule (e.g., tCr, NAA). Our meta-analysis avoids the secondary calculation of data by analysing data as presented. For the sake of transparency and understanding these impacts, we suggest reporting data in both comparative result (e.g., %change, change from baseline) and in mean metabolite concentration (e.g., ratio to reference metabolite, mmol/kg, institutional units) in the future. These results often support and strengthen each other and increase comparability between studies. Future study could investigate the Glu and GABA ratio as a theoretical index of E/I balanced, although the exact relationship between MRS-derived Glu/GABA concentration and E/I balanced is still under active debate (Steel et al., 2020; Rideaux, 2021).
6. Limitation of the current meta-analysis
Several limitations to this meta-analysis study need to be acknowledged when interpreting this current work. While the data were considered based on data type (mean and %change), we had to assume that the units included were on the same scale, and that reference metabolites concentration (e.g., creatine or NAA) remained relatively unchanged (Steen et al., 2005; Rae, 2014). Another potential limitation is that we included all studies regardless of study quality as assessed by both ROBANS and MRS-Q in our main meta-analysis. The sensitivity analysis of high-quality studies according to MRS-Q suggested that the studies included showed interchangeable results regardless of quality, and we recognize that the MRS-Q, while useful, does not fully apply to functional MRS. While we aim to comprehensively include all fMRS studies to date, unfortunately our search strategy may have missed out on more recent work, such as Ip et al. (2019); we did not identify this paper through other means. Lastly, there was a lack of statistical power for some stimulus domains and fMRS paradigm due to the small numbers of studies included. Taken together, fMRS is a field with enormous possibility, but with several sources of bias and variability that need to be addressed.
In this current study, we employed the RVE method to synthesize effect sizes for each stimulus domain from the multiple outcomes available (i.e., multiple within-study outcomes). While some significant changes from baseline were noted in some stimulus domains, often they led to single-study meta-analyses with several datasets from various timepoints included (varying from 3-9) in each single-study per each domain. These results therefore need to be interpreted with care as there is study-bias.
7. Conclusion
We established effect sizes and directionality of the GABA, Glx and Glu response in all currently available fMRS studies. Our results demonstrated relatively small effect sizes and large heterogeneity, limiting the current state of fMRS as a technique in investigating neurodynamic responses in the healthy brain. However, we attempt to address these limitations and hope that advances in these approaches have promise for application in atypical brain function. fMRS of clinical conditions is surprisingly under-studied, but holds promise for understanding a dynamic system, with potential implications for drug response and diagnosis. As such, fMRS holds great potential to be used alongside other techniques to perturb GABA and Glutamate mechanisms, including TMS and pharmacological challenges and assess the impact on the system in both typical and atypical brain. Furthermore, combining fMRS with other imaging techniques, such as EEG or fMRI, allows for associating (f)MRS with distinct neural mechanisms associated with E/I balance.
This meta-analysis highlights the urgent need for consensus for standardised reporting and minimal best practices to improve the reproducibility of fMRS. Additionally, there remains a lack of fundamental knowledge of fMRS, for example, with respect to metabolic time courses. Establishing fMRS paradigms and parameters that evoke metabolic responses with high reliability and reproducibility would be of great interest in this early state of the field as it would allow for measuring atypical responses more readily, and ultimately lead to elucidation of underlying mechanisms of brain function in both health and disease.
Acknowledgments
DP is funded by a Chang Phueak Scholarship, Chiang Mai University, ChiangMai, Thailand. GO receives salary support from R00 AG062230 and R21 EB033516.
Footnotes
Conflict of interest statement: The authors declare no competing financial interests.
Responded to reviewer comments, toned down the discussion and modified introduction to fMRS mostly.