Linewidth-related bias in modelled concentration estimates from GABA-edited 1H-MRS

J-difference-edited MRS is widely used to study GABA in the human brain. Editing for low-concentration target molecules (such as GABA) typically exhibits lower signal-to-noise ratio (SNR) than conventional non-edited MRS, varying with acquisition region, volume and duration. Moreover, spectral lineshape may be influenced by age-, pathology-, or brain-region-specific effects of metabolite T2, or by task-related blood-oxygen level dependent (BOLD) changes in functional MRS contexts. Differences in both SNR and lineshape may have systematic effects on concentration estimates derived from spectral modelling. The present study characterises the impact of lineshape and SNR on GABA+ estimates from different modelling algorithms: FSL-MRS, Gannet, LCModel, Osprey, spant and Tarquin. Publicly available multi-site GABA-edited data (222 healthy subjects from 20 sites; conventional MEGA-PRESS editing; TE = 68 ms) were pre-processed with a standardised pipeline, then filtered to apply controlled levels of Lorentzian and Gaussian linebroadening and SNR reduction. Increased Lorentzian linewidth was associated with a 2–5% decrease in GABA+ estimates per Hz, observed consistently (albeit to varying degrees) across datasets and most algorithms. Weaker, often opposing effects were observed for Gaussian linebroadening. Variations are likely caused by differing baseline parametrization and lineshape constraints between models. Effects of linewidth on other metabolites (e.g., Glx and tCr) varied, suggesting that a linewidth confound may persist after scaling to an internal reference. These findings indicate a potentially significant confound for studies where linewidth may differ systematically between groups or experimental conditions, e.g. due to T2 differences between brain regions, age, or pathology, or varying T2* due to BOLD-related changes. We conclude that linewidth effects need to be rigorously considered during experimental design and data processing, for example by incorporating linewidth into statistical analysis of modelling outcomes or development of appropriate lineshape matching algorithms.


D.2
Outcomes from fits to synthetic data ...

A MRSinMRS checklist
Site (name or number) [GPS](1-10) ) were used to transition first from the signal to the smooth spline fit in the last 0.05 ppm, and then from the smooth spline fit to zero in the subsequent 0.10 ppm beyond the fit range.

D.2 Outcomes from fits to synthetic data
Supplementary Figure 8 Synthetic spectra and fits by various algorithms FIT

E.1 Background and Methods
The default fitting algorithm in spant is ABfit.A key component of this algorithm is the adaptive baseline model, in which the degree of baseline flexibility is automatically adjusted to match the complexity of the data 1 .As with other algorithms, this is inevitably a balancing act between a baseline that is too smooth, and one that is too flexible -with either scenario potentially leading to bias or increased variance in the fit outcome.Most preexisting literature and documentation relating to spant at the time of this analysis focussed on non-edited spectra, such as may be obtained by short TE PRESS sequence.Early piloting suggested that default parameters suitable for such sequences may have allowed for too much baseline flexibility for GABA-edited datasets -as such, several different configurations of spant were applied to the in-vivo datasets to assess their relative suitability for GABAedited spectra.
There are a number of options available for controlling ABfit's baseline modelling; two of these are aic_smoothing_factor and bl_ed_ppm: • aic_smoothing_factor affects the criterion used for automatic estimation, with smaller values encouraging more flexible baselines, while remaining adaptive to any broad spectral features.Five different options for aic_smoothing_factor were assessed: 5 (default), 10, 20, 40 and 80, specifying increasingly firm baselines; these are subsequently denoted A05, A10, A20, A40 and A80 respectively.
• bl_ed_pppm specifies the baseline flexibility, in units of ED per ppm where ED is the "effective dimension" -analogous to the number of spline functions required per ppm; this is automatically determined except when auto_bl_flex (automatic baseline flexibility) is explicitly disabled.Three fixed values for bl_ed_pppm were tested (with auto_bl_flex disabled): 1.00, 1.67 and 2.00 ED/ppm, denoted F10, F17 and F20.
1 These parameters are detailed at https://martin3141.github.io/spant/articles/abfit-baseline-opts.html This analysis was performed using all in-vivo datasets from the main study, with processing and quality control as described in the main manuscript -including spectra with Lorentzian linebroadening, but not including those of experimentally degraded SNR.
To assess the relative suitability of each configuration, the determined baseline flexibility (bl_ed_pppm) and resultant signal-to-residual ratio (SRR) were inspected.
Furthermore, GABA+ estimates were assessed for accuracy and consistency with other algorithms methods 2 .Exploiting the different expected GABA concentrations between grey and white matter [3][4][5] , the degree of correlation between GABA estimates and voxel tissue fraction was used as an index of the accuracy of GABA estimation 6 .Robust (skipped) 7,8 Spearman correlation coefficients between voxel grey matter fraction and water-scaled GABA estimate without adjustment for tissue content (GABA+/H2OnoTC) were reported.
Finally, intra-class correlation coefficients (ICCs) were calculated between each spant variant and the median outcome from all other algorithms, using a two-way mixed-effects model for single-rater consistency (ICC(3,1), implemented in pingouin 9 ).

E.2 Results and Discussion
Outcomes for each of the tested configurations are presented in Supplementary Table 7.Using automatic baseline flexibility (Ann), the determined bl_ed_pppm factor was seen to clip at either edge of the allowable range (.502, 7.00) in a substantial proportion of cases; with A05 case (aic_smoothing_factor=5), the majority of estimates clipped to the predefined upper limit; in the A80 case, the majority clipped to the lower limit.The reported GABA.sd metric is suspiciously low in several cases (particularly with lower bl_ed_pppm values); as such, this metric is not considered further herein, but may warrant further investigation elsewhere.Signal-to-residual ratio (SRR) was generally higher in the cases with greater baseline flexibility, in particular A05.This is to be expected: more flexible baselines are more able to absorb signal fluctuations which would otherwise remain in the residuals.
The intraclass correlation coefficient (ICC), indicating agreement with other algorithms, was highest in the fixed bl_ed_pppm cases (Fnn), particularly F17 (bl_ed_pppm=1.67); this configuration is roughly equivalent to the 0.6 ppm baseline knot spacing adopted in both LCModel and Osprey, hence it is unsurprising that greatest agreement is seen in this case, where the baseline parameterisation is most similar.Using correlation between GABA+/H2OnoTC and tissue grey matter (GMcorr) as an indication of measurement accuracy, although subtle variation was seen, none of the configurations was found to be significantly more accurate than any other.The impact of line broadening, as assessed with the LLWF (see main manuscript section 2.5), was substantially reduced in magnitude as baseline flexibility was reduced.Supplementary Table 7 Selected fitting performance metrics for spant, using a variety of baseline configurations.Values are reported as median with [10,90]-percentiles or {95% confidence interval}.

Supplementary Figure 10 Relative difference in GABA+ concentration reported by spant in different configurations, in relation to applied line broadening
Although certain configurations displayed better performance for individual metrics, or for individual sites' datasets, no single configuration was found to clearly outperform the others across a range of quality metrics in the general case.For settings where linewidth differences are not expected to be an issue, the default configuration (A05) remains a reasonable choice; in cases where linewidth changes may be of concern, a more firm baseline model should be adopted.
In the present study, configuration A20 was adopted (aic_smoothing_factor=20) for the main analysis: this configuration is seen to allow the Abfit algorithm to operate over its configured range (rather than clipping significantly at either limit), has a 'typical' baseline resolution similar to that of other methods, and gives moderately good outcomes across a range of quality metrics without optimising for any particular factor.

F.1 Background and Methods
FSL-MRS exhibited two noteworthy characteristics in the main analysis: • For the filtered in-vivo datasets, fit outcomes showed notably lower linewidth dependency (for Lorentzian linebroadening) than for any of the other configurations assessed.
• Somewhat greater variance was seen for datasets of extremely low linewidth (particularly apparent in the simulated datasets) To investigate the first and mitigate the second, several alternative configurations of FSL-MRS were assessed: • Polynomial baseline order (--baseline_order) of -1 (no baseline), 1 (linear) and 2-7 (nth order polynomials) were assessed • Different optimization algorithms (--algo) were evaluated: Truncated Newton ("Newton") and full Metropolis Hastings ("MH").In typical usage, the latter is initialised internally by the substantially faster Newton algorithm.
• Modelling without and with a separate metabolite group for NAA and NAAG (--metab_groups NAA+NAAG), to allow shift and broadening of those metabolites to be optimised separately from all other components (hence allowing a slight frequency shift to mitigate the observed instability at very low linewidths) • Comparing the --lorentzian lineshape model against the default Voigt lineshape model: the former may be a closer match to the line broadening applied in the present analysis, whereas the latter is expected to more fully reflect the different broadening factors present in in-vivo data.
As with the spant baseline exploration, the same in-vivo datasets, processing and quality control as described in the main manuscript were used -including spectra with Lorentzian linebroadening spectra, but not including those of experimentally degraded SNR.

F.2 Results and Discussion
Lower-order baseline models (below 5 th order) exhibit the lowest linewidth dependence for moderate line-broadening factors, up to about 5 Hz; slightly increased linewidth dependence was seen with higher-order baseline models.With higher order baselines ascribing progressively more signal to baseline, the default 2 nd order baseline remains a reasonable choice for scenarios where moderate linewidth differences may be encountered.Adopting a pure Lorentzian lineshape model resulted in only marginal differences, with a tendency towards fractionally higher estimates and perhaps a slightly reduced linewidth dependence for moderate linewidths, in cases where a baseline was modelled (not statistically significant).

Supplementary Figure 11 Relative difference in GABA+ concentration reported by FSL-MRS with differing baseline configurations and default Lorentzian + Gaussian linewidth model, in relation to applied line broadening
G Exploratory Analysis #3: LCModel baseline knot spacing

G.1 Background and Methods
To assess the impact of baseline flexibility on linewidth dependence in LCModel, spectra were fit using a range of different baseline configurations: no baseline (NOBASE=T), and spline knot spacings from 0.1 to 4.0 ppm (in steps of 0.1 ppm between 0.1 to 1.2 ppm, steps of 0.2 ppm between 1.2 and 2.0 ppm, and in steps of 0.5 ppm from 2.0 up to 4.0 ppm).
As previously, in-vivo data sets were fit after processing as described in the main manuscript, including spectra with Lorentzian linebroadening but not including those of experimentally degraded SNR.

G.2 Results and Discussion
GABA+ estimates obtained when no baseline was modelled were significantly elevated relative to those obtained when also using a baseline model.Unsurprisingly, excessively flexible baseline models (< 0.4 ppm knot spacing) resulted in significantly stronger linewidth-dependent biases.Firmer baseline models (>0.4 ppm) led to lower linewidth-dependent bias, particularly in the 0.5 -1.0 ppm knot spacing range; this is compatible with previous recommendations for knot spacing The default Gannet processing pipeline applies zero-filling to 0.062 Hz spectral resolution, and line-broadening of 3 Hz.Both these steps were omitted in the present study.

H.1.3 LCModel
LCModel 13 was invoked with the following parameters for modelling GABA-edited difference spectra, to achieve a relatively stiff but non-zero cubic spline baseline model: • SPTYPE='mega-press-3' sets the "special type" appropriately for MEGA-PRESS.This implies a flat baseline (NOBASE=T), the use of NAA @ 2.01 ppm as a reference for basis set scaling (wsmet='NAA', wsppm=2.01),and default correction factor for attenuation of NMR-visible water, ATTH2O = 0.43 (suitable for TE = 68 ms).
• NOBASE=F to re-enable baseline modelling (which is disabled by default for MEGA-PRESS when using SPTYPE=mega-press-3), • DKNTMN=0.6 to set the baseline knot spacing to 0.6 ppm (the default value is 0.15).This value was varied in the exploratory analysis, Supplementary Section G. NAA was used as a reference for basis set scaling.Appropriate TE, sweep width and centre frequency (echo, fs, ft) were specified on the command line.Automatic phasing and referencing (auto_phase and auto_ref) were disabled.The start point for analysis of the difference spectra (start_pnt) was set at 5 ms (calculated as 0.005 times the sample frequency).Eddy-current correction was disabled (--water_eddy false).HSVD was performed (per defaults) to remove residual water.
Typical parameters for Tarquin, modelling GABA-edited difference spectra:

H.2.2 Gannet
In addition to the GABA+Glx model, Gannet independently fits a Lorentzian model to NAA around 2.01 ± 0.04 ppm, and a dual-Lorentzian model for Choline and Creatine, the latter centred at 3.02 ppm with a fixed 0.18 ppm separation, all from the edit-OFF subspectrum.A minor local modification was made to additionally yield water-referenced estimates from the existing Cr and NAA fits (usually only reported for those metabolites contained in the difference spectrum).

H.2.3 LCModel
Basic configuration was similar to that used for the difference spectra, with two key exceptions.PPMGAP was not defined -which implies fitting over the full defined fit range (0.2 -4.2 ppm), the default mode of operation.SPTYPE and DKNTMN were also not defined, again implying a default fit: a regular spline baseline was modelled (with default knot spacing 0.15 ppm), Creatine was used for internal basis set scaling, and LCModel's default set of soft constraints were adopted.

H.2.4 Osprey
Edit-OFF sub-spectra were modelled with default parameters, in the same invocation of OspreyFit as for the difference spectra.

H.2.6 Tarquin
Edit-OFF sub-spectra were quantified with HSVD water removal disabled (--water_width 0) but all other parameters equivalent to those used for the difference spectra.

Table 1 :
1RSinMRS checklist1summarising key details of the MRS acquisition See sup.table 2 for spectral width and number of points.15 ms editing pulses at 1.9 ppm (ON) and 7.46 ppm (OFF) g.Water suppression method See sub.table 2 h.Shimming method, reference peak, and thresholds for "acceptance of shim" chosen See sup.table 2; acceptance criteria varied across sites i. Triggering or motion correction method (respiratory, peripheral, cardiac triggering, incl.device
C Supplementary Details: In-vivo dataSupplementary Figure1Group-average fit by each algorithm, to original (unfiltered) data, low SNR and high linewidth
Supplementary Table5Groupwise statistics for GABA+/H2O LLWF (see section3.1.2)Furtherto details in the main manuscript, section 2.3: Background signal derived by this approach is only meaningful on the original fit range (0.2-4.2 ppm); this has implications for the measurement of SNR, which often uses a region well beyond the normal fit range to assess the noise factor.Moreover, there is a risk of discontinuities (hence potential numeric instability) if the background function is not close to zero at the edge of the fit range.To avoid such discontinuities, a smooth spline was fit to the last 0.05 ppm at either end of the fit range, and extrapolated 0.10 ppm beyond the fit range.A pair of Hanning functions Supplementary Table4Variance Partition Coefficients (VPCs) for GABA+ estimated from filtered spectra at specified SNR/Line Broadening levels; all VPCs quoted as percentages.
Gannet 12 models data by fitting peaks in the frequency domain.Version 3.1 was used, with the 'GABA+Glx' model applied to the difference spectrum between 2.79 and 4.1 ppm.This model fits the GABA+ peak with a single Gaussian peak around 3.02 ± 0.05 ppm and Glx as a pair of Gaussian peaks at 3.71 ± 0.02 ppm and 3.79 ± 0.02 ppm, and includes terms to characterize the baseline.The Gannet baseline is modelled as a linear slope with sinusoid and cosine terms, periodic at 2.62 ppm (i.e., A * (f -f0) + B * sin(  * f / 1.31 / 4 ) + C * cos(  * f / 1.31 / 4 ) where f is the frequency (in ppm) and f0 the offset to the first modelled Glx peak (±3.71 ppm).
A local build of Tarquin 15,16 v4.3.11 was used; processed data and per-vendor basis sets were supplied to Tarquin in the corresponding LCModel file formats.
141.4 OspreyOsprey14implements a frequency domain basis set fit; version 1.0.1.1 was used, with default fit and reconstruction settings: separate fit on the range 0.2-4.2ppm, incorporating macromolecule and lipid components in the basis set but without adding a co-edited MM3 peak. Knt spacing for the spline baseline was set at 0.6 ppm.Although Osprey implements a number of soft constraint models, in this instance the MM3co basis set component was incorporated without additional constraints on amplitude.