Magnetic Resonance Spectroscopy Spectral Registration Using Deep Learning

Deep learning‐based methods have been successfully applied to MRI image registration. However, there is a lack of deep learning‐based registration methods for magnetic resonance spectroscopy (MRS) spectral registration (SR).

Gamma-aminobutyric acid (GABA) is the primary inhibitory neurotransmitter in the human brain, but its concentration is very difficult to quantify due to the overlapping metabolite Creatine (Cr) being present in much greater concentrations. 2,67][8][9] MEGA-PRESS is a J-difference editing (JDE) pulse sequence that separates overlapping metabolites from each other.However, a major limitation in JDE pulse sequences is that they rely heavily on the subtraction of spectral edited "On Spectra" and non-edited "Off Spectra" to reveal the edited resonance in the "Diff Spectra." 2 As a result of the overlapping resonances being an order of magnitude larger in intensity than the GABA resonance, small changes in scanner frequency and spectral phase will lead to incomplete subtraction in the edited spectrum. 10Small changes in scanner frequency can possibly arise from gradient-induced heating of passive shim elements and long time-constant eddy currents while small changes in spectral phase shift can possibly arise from respiratory-induced magnetic field drifts. 2,7,11The standard approach in GABA editing is to apply frequency and phase drift correction of individual frequency domain transients by fitting the Cr signal at 3 ppm. 6,9The major limitation of the Cr fitting-based correction method, however, is that it relies strongly on sufficient signalto-noise ratio (SNR) of the Cr signal in the spectrum.4][5] However, the correction accuracy largely depends on the overall spectral SNR where low SNR (i.e., 2.5) will deteriorate the performance as the signal will be dominated by noise. 7Furthermore, due to increasingly demanding medical needs, it is crucial to develop a more robust, fast, and highly accurate registration technique.
To address incomplete spectra subtraction and low registration efficiency, deep learning, a popular technique used to address complex computational challenges, has been effective to facilitate image processing in medical image registration. 12,13A deep learning-based registration method uses a template image during training and optimizes a loss function to register each image.This approach limits time-consumption and computationally expensive per-image optimization to achieve a trained deep learning image registration model.A multilayer perceptron (MLP) model and a convolutional neural network (CNN) model have been recently applied to single-transient sequential FPC for edited MRS. 10,14Both of these models (MLP-FPC and CNN-FPC) demonstrated high potential of applying deep learning in MRS data preprocessing by pretraining models with simulated datasets with wide ranges of frequency and phase offsets. 10,14Although both models yielded low frequency and phase offset prediction errors, the utility in SR was limited due to the models' need to be separately trained for frequency and phase offset prediction, and separately used to perform FPC. 10,14A limitation in this training is that the subtraction errors caused by phase and frequency errors appear similar but require different corrections.If the error is misdiagnosed, an improper correction will be applied, which may degrade the quality of spectral subtraction.Therefore, a more efficient network that can mimic the nature of performing simultaneous FPC similarly to the spectral editing techniques could be considered to more accurately perform FPC of the given data.The CNN-SR model is considered to correct frequency and phase offset at the same time while comprising the CNN properties of exploiting spatial and temporal invariance in recognition of features such as the overall shape of the signal and its peaks.
Thus, in this study, investigation of the feasibility and utility of CNNs for SR of single-voxel MEGA-PRESS MRS data.Given an SR algorithm and CNN properties, CNN-SR can output a higher accuracy in FPC than other approaches.

Materials and Methods
This study was approved by the local Institutional Review Board and the requirement for written informed consent was waived due to the retrospective study design.

Dataset
SIMULATED DATASET.The main challenge for deep learning is to determine the inputs and ground truth for networking training in order to achieve a specific performance goal.Since there is no ground truth of frequency and phase offsets for the in vivo dataset, in this work, the MEGA-PRESS training, validation, and test transients were simulated using the FID Appliance (FID-A) toolbox (version 1.2) in Matlab R2021a, (The Mathworks, Natick, Massachusetts, USA) with the same parameters (15-msec sinc-Gaussian editing pulses with FWHM = 88.5 Hz, 2048 data points sampled at 2 kHz spectral width) as described in the previous work. 10,14he training set was allocated 32,000 OFF + ON spectra (2048 points each), and 4000 for both validation set and test set.The models were also tested using datasets with added random Gaussian noise at SNR 20 and further challenged the model with lower SNR 2.5 and line broadening (0-20 msec).The SNR values were computed by the ratio of the Cr peak signal relative to the noise standard deviation.

IN VIVO DATASETS.
In vivo data was retrieved from the publicly available Big GABA repository. 15All 101 medial parietal lobe MEGA-edited datasets from nine sites with (Philips Healthcare, Best, The Netherlands) scanners (3 T field strength, TE = 68 msec, 2048 data points sampled at 2 kHz spectral width, 320 transients) were collected in total, where each dataset contained 320 transients OFF + ON.The in vivo cohort data had the following guidelines: 18-35 years old; approximately 50:50 female/male split.The models were also evaluated on this in vivo dataset with additional offsets (small, medium, large) introduced.

Network Architecture
Both supervised and unsupervised learning loss were incorporated in the proposed CNN-SR model and trained on the simulation dataset to optimize the network parameters (Fig. 1a).Given the nature of the CNN-SR model, further fine-tuning of the network parameters using unsupervised learning after initial training can also be done.Further fine-tuning on the simulation dataset at a more extreme condition (SNR 2.5 and line broadening) and on the in vivo MEGAedited dataset was performed to demonstrate further utility of the model's SR framework.
The structure of the network (Fig. 1b) was a sequential network, which took moving spectra and template spectra as inputs and predicted frequency and phase offsets at the same time.Both moving spectra and template spectra were processed to have a length of 1024 and were concatenated to form a single 2048 input array.The network started with four successive blocks, each consisted of a onedimensional convolutional layer followed by a batch-normalization layer and a one-dimensional max-pooling layer.The convolutional layer consisted of (By order: 2, 4, 8, 16) kernels with a size of 128, and the max-pooling layer had a pool size of 2 with a stride of 2. Furthermore, three fully connected layers (FC) with 1024, 512, and 256 nodes were used and a final fully connected linear output layer of 2 nodes was designed.Each hidden layer was followed by a rectified linear unit (ReLU) activation function to introduce non-linearity.An Adam optimizer was used to train the neural network with a 0.0001 learning rate. 16The output from the network was the predicted offset of frequency and phase.Each model was trained for 1000 epochs with a batch size of 320, and the mean absolute error (MAE) was used as the loss function.The model was designed to stop training if the least validation loss did not change for 50 consecutive epochs.

Network Training
MAE loss was used to compute the differences between the predicted and true offsets and spectra.At the training stage, the model's loss function consists of two parts, supervised loss, and unsupervised loss as in Fig. 1a.The supervised loss computes the difference between the predicted and true frequency and phase offsets.The unsupervised loss computes the difference between the registered real and imaginary spectra from the The network architecture of the CNN-SR model.Both the frequency and phase offsets were predicted with the proposed model where the input is the concatenation of the moving spectra and template spectra.The network architecture was composed of four hidden 1D convolutional layers, four batch-normalization layers, four 1D max-pooling layers, and four fully connected layers.The convolutional layer consisted of kernels with a size of 3, and the max-pooling layer had a pool size of 2 with a stride of 2. Furthermore, three fully connected layers (FC) with 1024, 512, and 256 nodes respectively followed by a final fully connected linear output layer of two nodes were implemented.All hidden layers were each followed by a rectified linear unit (ReLU) activation function and the output fully connected layer by a linear activation function that generated the predicted offset.Simulated spectra manipulated from FID-A with artificially generated frequency or phase offsets were used as training data for the network.To compare different models, each network was trained through 1000 epochs with early stopping implemented when 50 consecutive epochs did not improve the lowest validation loss.
predicted offsets with the template spectra.The loss functions were defined together to form a semi-supervised loss function shown as the following where weights and normalization factors were implemented to optimize the training.Given more extreme or in vivo datasets where ground truths are difficult to obtain, the model can cope with this challenge by fine-tuning pre-trained model parameters using purely the unsupervised component of the loss function.This will allow the model to adapt to the specific dataset and improve performance to achieve optimal SR practice.

Network Testing
On the scale of À20 to 20 Hz and À90 to 90 , uniformly distributed artificial offsets were first applied to the FIDs to derive a frequency drift and a phase drift in the simulated data.Line broadening was then added to the distorted FIDs and a Fast Fourier Transform was then applied. 17The peripheral 1024 samples of the spectra were then cropped off, and the central 1024 samples were selected and normalized to the maximum signal in the spectrum.Different levels of Gaussian distributed noise were added to this moving spectra prior to inputting into the network.Next, the same normalization and cropping processes were applied to the template spectra (spectra where no artificial offsets and no line broadening was introduced) and were concatenated to the moving spectra to form an array of length 2048.The network predicted the frequency offset (Δf) and the phase offset (Δϕ), which were used to perform FPC, and then applied to the moving spectra to generate the registered spectra.This research was conducted with an Intel (R) Xeon (R) CPU E5-2650 v4 @ 2.20 GHz processor and an NVIDIA GeForce RTX 2080 Ti GPU (NVIDIA, Sunnyvale, California, USA) with a memory of 11 GB.Python v.3.9 and the deep learning Python library Pytorch v.2.0 was used for this study.

EVALUATION AND COMPARISON USING THE IN VIVO
DATASET.The MEGA-edited datasets were used as the test set of the CNN-SR network.For a first comparison to the performance of the CNN model, a published model-based SR (mSR), a non-deep learning approach, was used to perform FPC in the time domain. 3Specifically, mSR uses a noise-free model as the template instead of the median transient of the dataset.Noise-free ON and OFF FID models were created in Osprey (version 1.0.0), an open-source MatLab toolbox (The Mathworks, Natick, Massachusetts, USA), following previous preprocessing recommendations. 15The CNN-SR model was also compared to a benchmark neural network MLP-FPC, using MLP containing 3 FC layers (1024, 512, 1 node(s)) and CNN-FPC, a CNN containing two convolutional blocks (Convolutional layer with 4 kernels of size 3 and Max pooling layer with down-sampling size 2 and stride 2) and 3 FC layers (1024, 512, 1 node(s)). 10,14In both of these networks, each hidden FC layer was followed by a ReLU activation function, and a linear activation function followed the output layer.
To examine the network in a more extreme environment, additional series of artificial offsets were added to the in vivo data.

Performance Measurement
In the simulated dataset, the artificial offsets were set as the ground truth, and the MAE between the ground truth and predicted value was used as the criteria to measure the network's performance.Moreover, calculation and plotting of the difference value between the true spectra and the corrected spectra using mSR, MLP-FPC, CNN-FPC, and CNN-SR was performed.A Q score was used to determine the performance strengths of each methods, and it was defined as , where σ 2 is the variance of the choline subtracted artifact in the average difference spectrum. 14If the Q score was greater than 0.5, it indicated that the first method performed better than the second method and vice versa.The computation time of CNN-SR and mSR per transient was also measured.

Statistical Analysis
A two-tailed paired t-test was used to generate the P-value comparing CNN-SR's MAE to the other approaches' (MLP-FPC and CNN-FPC) MAE when testing on the simulated test set (SNR 20 and SNR 2.5 with line broadening).For each modality comparison (CNN-SR vs. MLP-FPC, and CNN-SR vs. CNN-FPC for SNR 20; CNN-SR vs. MLP-FPC, and CNN-SR vs. CNN-FPC for SNR 2.5 with line broadening), the statistical significance was determined.Moreover, a two-tailed paired t-test was used and the P-value of the variance of the choline interval was computed to determine the statistical significance of CNN-SR compared to the other approaches (MLP-FPC, CNN-FPC, and mSR) when using the in vivo dataset.A P-value <0.05 was considered statistically significant in both analyses.

Model Performance Evaluation and Spectra Analysis for the Simulated Datasets
The results of the MLP-based approach and CNN-based approaches on the simulated test dataset with SNR of 20 and SNR of 2.5 with line broadening are illustrated in Figs. 2 and 3.The comparison of the errors for FPC of the MLP-based approach and the CNN-based approaches of the On spectra, Off spectra, and On/Off mismatch of the simulated test set for varying SNRs is illustrated in Fig. 4.
For the test set with SNR 20, the CNN-based approaches showed significantly lower frequency estimation errors than the MLP-based approach, and the CNN-SR model showed the lowest phase estimation errors for the On spectra, Off spectra and On/Off mismatch (Fig. 4a).Taking the Off spectra as an example, the mean frequency offset errors were 0.043 AE 0.039 Hz for the MLP-FPC model, 0.014 AE 0.012 Hz for the CNN-FPC model, and 0.014 AE 0.010 Hz for the CNN-SR model.The mean phase offset errors were 0.132 AE 0.116 for the MLP-FPC model, 0.141 AE 0.106 for the CNN-FPC model, and 0.104 AE 0.076 for the CNN-SR model.
With a lower SNR of 2.5 with random 0-20 msec line broadening introduced (Fig. 4b), the CNN-SR model demonstrated significantly lower frequency and phase estimation errors than the other models for the On spectra, Off spectra, and On/Off Mismatch.For example, the mean frequency and phase offset estimation errors for the Off spectra The results in Figs. 2 and 3 show that compared to the MLP-based approach, the CNN-based approaches had smaller errors within the frequency and phase ranges tested.At SNR of 20, the CNN-SR model performed better than the MLP-FPC model and the CNN-FPC model.When the SNR decreased to 2.5 and line broadening was applied, the CNN-SR model performed better than the MLP-FPC and CNN-FPC models, which had less stable predictions and larger errors.Additionally, by extracting the spectra interval corresponding to GABA (i.e., 2.8-3.2ppm) and Glx (i.e., 3.55-3.95ppm) from the derived mean difference spectra (Fig. 4), the residual spectra errors were found to be lower with the CNN-SR model (at SNR of 20 and SNR of 2.5 with line broadening) than with the MLP-FPC and CNN-FPC models.Consequently, the residual spectra errors using CNN-based models for the full spectra were significantly lower than those of the MLP-based model for the On spectra, Off spectra, and On/Off mismatch at a lower SNR, indicating CNN-based models' higher performance and robustness in the presence of noise with respect to the MLP-FPC model.Among CNN-based models, the CNN-SR model performed best in terms of frequency and phase estimation errors and noise tolerance (Table 1).All results were statistically significant.

Model Performance Evaluation and Spectra
Analysis for the In Vivo Big GABA Datasets Figure 5a,b illustrates the Off and Diff spectra resulting from the 101 in vivo Big GABA datasets without (column 1) or with (columns 2-4) additional artificial offsets for no correction,   As for medium offsets, the performance of the MLP-FPC model and the CNN-FPC model was comparable, but the CNN-SR model still performed better.The mean performance score of the CNN-FPC model against the MLP-FPC model was 0.47 AE 0.17 (Fig. 5c, column 3), while it was 0.60 AE 0.19 for the CNN-SR model against the MLP-FPC model (Fig. 5d, column 3), and 0.62 AE 0.18 for the CNN-SR model against the CNN-FPC model (Fig. 5e, column 3).
When large offsets were added, the performance of the CNN-FPC model was slightly better than the MLP-FPC model.The CNN-SR model still significantly outperformed the MLP-FPC and CNN-FPC models.The mean performance score of the CNN-FPC model against the MLP-FPC model was 0.53 AE 0.17 (Fig. 5c, column 4), while it was 0.68 AE 0.15 for the CNN-SR model against the MLP-FPC model (Fig. 5d, column 4), and 0.66 AE 0.15 for the CNN-SR model against the CNN-FPC model (Fig. 5e, column 4).
For small and medium offsets, the CNN-FPC corrected spectra and MLP-FPC corrected spectra (Fig. 5b, columns 2-3) were similar to the original spectra (Fig. 5b, column 1).However, for large offsets, the MLP-FPC corrected spectra (Fig. 5b, column 4) slightly diverged from the original spectra, while the CNN-FPC corrected spectra still had many consistencies in shape and size from the original spectra.Comparably, the CNN-SR corrected spectra was always consistent with the original spectra, regardless of the scale of offsets added.The superior performance of the CNN-SR model was also indicated by the variances of choline intervals for the 101 in vivo datasets (Fig. 6).With no offset or large offset, the CNN-FPC model had lower choline interval variance than the MLP-FPC model; but with small or medium offsets, the MLP-FPC model had lower choline interval variance than the CNN-FPC model.The CNN-SR model, in contrast, had relatively stable performance and its generated variance of the choline interval was significantly lower than both the MLP-FPC model and the CNN-FPC model at all offset levels.Also, the larger the offset, CNN-SR demonstrated superior performance compared to the MLP-FPC and the CNN-FPC.All the results were statistically significant (Tables 2 and 3).By comparing the best-performed CNN-SR model to the published non-deep learning approach, mSR exhibited the same performance pattern as the CNN-SR model, with a similar mean performance score of 0.49 AE 0.08 for no additional offsets (Fig. 7).The same conclusion was drawn with small and medium additional offsets with a similar mean performance score of 0.48 AE 0.09 and 0.49 AE 0.07, respectively.They all had a similar level of variance of choline interval at around 0.6 Â 10 À4 , with no significant difference.The P-values for no added offsets, small offsets, and medium offsets were 0.63, 0.41, and 0.20 respectively.For an input with large added offsets, the CNN-SR demonstrated significant improvement compared with mSR (0.57 AE 0.17, P < 0.05) which indicated the robustness of the CNN-SR to various input artifacts.The SR computation time of each transient given the in vivo dataset was also analyzed where mSR had a processing time of 0.1475 s/transient while the CNN-SR model had a processing time of 0.0415 s/transient.

Discussion
The metabolic profile of both human and animal brains may be non-invasively and quantitatively measured using MRS.It is beneficial for research and clinical applications since it provides essential information on the metabolic state of the brain.However, the collected data could be affected, since MRS is prone to scanner instability introduced by factors like frequency drift and subject motion.In order to accurately represent and measure metabolites, FPC through SR is a crucial preprocessing step that avoids unwanted spectral distortions that may bias the metabolite quantification.
From the results, the CNN-SR model was more robust and had superior performance when compared to other sequential FPC deep learning methods (CNN-FPC and MLP-FPC) for all testing conditions in the simulated data.At SNR 20, the MLP-FPC model exhibited larger correction errors for frequency and phase offset, and On/Off mismatch followed by the CNN-FPC model, with both being outperformed by the CNN-SR model.Likewise, the CNN-SR model surpassed both sequential FPC deep learning methods when faced with more distorted data (SNR 2.5 with line broadening 0-20 msec).The results may show that CNN-SR had smaller MAEs for both frequency and phase offset predictions and Diff spectra derivation, thus this approach could be more robust to noise and provide more accurate predictions.
Moreover, due to the CNN-SR model's unsupervised learning component, further fine-tuning the model to specific data is possible.Given the nature of in vivo data where ground truths are not present and the data is disturbed by multiple variables (i.e., noise, subject motion), an unsupervised learning SR approach can be used to further refine the model hyperparameters.By using CNN-SR's unsupervised learning framework, the pre-trained model was further trained with more distorted data (at SNR 2.5 with line broadening).It is clear how the CNN-SR model can outperform the other FPC models, as seen in the smaller correction errors for both phase and frequency, and the residual spectra being smaller.
When testing on in vivo data for different phase and frequency offsets, the CNN-SR model demonstrated once again to have superior performance.The Off and Diff spectra were clearer for this model across all testing conditions, with shapes and peaks better preserved.Furthermore, Q scores were consistently higher using the CNN-SR model in comparison to all the other FPC deep learning models.Nevertheless, these results remain equivalent to mSR, the state-of-the-art non-deep learning numerical correction method, with no, small and medium additional offsets but were found to perform better when larger magnitudes of offsets were introduced. 3hese findings may illustrate the value of deep learning for SR and evince the utility and strengths of simultaneous FPC with a SR model framework.The CNN-SR model can perform simultaneous frequency and phase correction and compared to the CNN-FPC and MLP-FPC models, it may produce more reliable, robust and accurate results in a shorter processing time and with higher computational efficiency.Additionally, the framework has the capability of adopting an unsupervised learning approach.Contrary to other FPC models that would need a ground truth, this approach can take advantage of using spectra loss to learn in an unsupervised manner.This can be an advantage, widely applicable to training and testing on in vivo data.Additionally, the performance of CNN-SR was comparable to mSR when smaller/medium magnitudes of offsets exist in this dataset but given advantages as stated previously (shorter processing time and higher computational efficiency), CNN-SR surpasses the utility of mSR especially when larger offsets are introduced in the dataset.Results in this study revealed that by employing unsupervised learning, fine-tuning the model to state-of-theart performance for any given dataset can be performed.

Limitations
This study was solely conducted on spectra obtained from humans, but MRS is a widely available approach for animals as well, playing a noteworthy role in pre-clinical studies. 6Further exploration on animal data could be carried out in the future to validate the generalizability of the CNN-SR framework.Additionally, testing was conducted on in vivo data, unfolding the possibility to also consider other living conditions like in situ, ex vivo, and in vitro.Regarding the JDE sequences, the model was tested on MEGA-PRESS, but other sequences such as PRESS, sLASER, or MEGA-sLASER could be considered in the future for training and evaluation.Furthermore, data from vendors like General Electric and Siemens are publicly available, so it would be valuable to evaluate the model performance on other vendor datasets.
Similarly, different magnetic field strengths other than 3T (eg, 7T, 9.4T, 11.7T) are important variables to take into account in the future.In this study, only frequency and zeroorder phases were considered.However, inclusion of other parameters like first-order phases, amplitude, and bandwidth variance in different transients could be examined in upcoming studies.Although CNN-SR outperformed the other deep learning approaches, it is still comparable to the state-of-theart model (mSR) when datasets are not affected by large deformation parameters.However, given the advantages the deep learning approach provides when conducting SR, such as its high computational efficiency, quick processing time, and being able to generalize well on datasets from different modalities, this approach may have more utility for the users.
The model can also adapt to very small datasets and can be applied to the same dataset numerous times, demonstrating its ability to overcome potential issues such as lack of resources.For future work, creating a model that triumphs mSR in all cases should be explored.A way to approach this could include contemplating other backbone frameworks, such as transformers.Finally, it would be of great interest to demonstrate the model's clinical utility by processing MRS spectra from patients with neuropsychiatric disorders.Neuropsychiatric disorders have symptoms that tend to impact brain function, emotion, and mood.These problems can affect concentration and can lead to mood and memory problems.Many of these disorders are difficult to diagnose and treat.It is of high importance to advance diagnosis and treatment in this area.

Conclusion
This study investigated a novel CNN framework for MRS spectral registration with both supervised and unsupervised learning.The proposed CNN approach for spectral registration shows better performance and could deliver results that are more robust to noise as compared to the state-of-the-art spectral registration algorithms in both simulations and in vivo datasets.

FIGURE 1 :
FIGURE 1: The pipeline for assessment, sample output, and network structure of the model.(a) Flow chart of computation to determine the registered spectra with details of the input and output from the network architecture.(b)The network architecture of the CNN-SR model.Both the frequency and phase offsets were predicted with the proposed model where the input is the concatenation of the moving spectra and template spectra.The network architecture was composed of four hidden 1D convolutional layers, four batch-normalization layers, four 1D max-pooling layers, and four fully connected layers.The convolutional layer consisted of kernels with a size of 3, and the max-pooling layer had a pool size of 2 with a stride of 2. Furthermore, three fully connected layers (FC) with 1024, 512, and 256 nodes respectively followed by a final fully connected linear output layer of two nodes were implemented.All hidden layers were each followed by a rectified linear unit (ReLU) activation function and the output fully connected layer by a linear activation function that generated the predicted offset.Simulated spectra manipulated from FID-A with artificially generated frequency or phase offsets were used as training data for the network.To compare different models, each network was trained through 1000 epochs with early stopping implemented when 50 consecutive epochs did not improve the lowest validation loss.

were 4 .
715 AE 3.221 Hz and 22.063 AE 20.122 for the MLP-FPC model, 3.465 AE 3.126 Hz and 10.468 AE 8.931 for the CNN-FPC model, and 0.058 AE 0.050 and 0.416 AE 0.317 for the CNN-SR model.

FIGURE 2 :
FIGURE 2: Visualization of the performance of the deep learning models (MLP-FPC, CNN-FPC, CNN-SR) for frequency and phase correction using the published simulated dataset with added noise at SNR of 20.For the MLP-FPC, the CNN-FPC and the CNN-SR model, the scatter plots on the left show the correction errors between the ground truths and model predictions at different frequency and phase offsets.The spectra on the right demonstrate the spectra predicted by each deep learning model, the true MEGA-PRESS difference spectra, and the subtraction between them.Among all three models, the MLP-FPC exhibits larger correction errors for frequency and phase offset followed by the CNN-FPC, with both being outperformed by the CNN-SR.(a) Output of the MLP-FPC model on the simulated dataset; (b) Output of the CNN-FPC model on the simulated dataset; (c) Output of the CNN-SR model on the simulated dataset.

FIGURE 3 :
FIGURE 3: Visualization of the performance of the deep learning models (MLP-FPC, CNN-FPC, CNN-SR) for frequency and phase correction using the published simulated dataset with line broadening and added noise at SNR of 2.5.For the MLP-FPC, the CNN-FPC, and the CNN-SR model, the scatter plots on the left show the correction errors between the ground truths and model predictions at different frequency and phase offsets.The spectra on the right demonstrate the spectra predicted by each deep learning model, the true MEGA-PRESS difference spectra, and the subtraction between them.Among all three models, MLP-FPC exhibits larger correction errors for frequency and phase offset followed by CNN-FPC, with all being outperformed by CNN-SR.(a) Output of the MLP-FPC model on the simulated dataset; (b) Output of the CNN-FPC model on the simulated dataset; (c) Output of the CNN-SR model on the simulated dataset.

FIGURE 4 :
FIGURE 4: Comparison between the MLP-FPC model, the CNN-FPC model and the CNN-SR model for frequency-and-phase correction of the On spectra, Off spectra and On/Off mismatch at SNR of 20 and at SNR of 2.5 with line broadening.From left to right: the frequency estimation error of the On spectra, the frequency estimation error of the Off spectra, the frequency On/Off mismatch error, the phase estimation error of the On spectra, the phase estimation error of the Off spectra, the phase On/Off mismatch error, the GABA residual spectra mean absolute error and the Glx residual spectra mean absolute error.(a) Box plots showing the frequency estimation errors (in Hz), the phase estimation errors (in degrees) and the GABA and the Glx residual spectra mean absolute error of the MLP-FPC model, the CNN-FPC model and the CNN-SR model at SNR of 20; (b) Box plots showing the frequency estimation errors (in Hz), the phase estimation errors (in degrees) and the GABA and the Glx residual spectra mean absolute error of the MLP-FPC model, the CNN-FPC model and the CNN-SR model at SNR 2.5 with line broadening.****: The twotailed P-value is less than 0.0001.

FIGURE 5 :
FIGURE 5: The in vivo Off and Diff spectra results of models with different level of added offsets and performance scores comparing the CNN-FPC model to the MLP-FPC model, the CNN-SR model to the MLP-FPC model and the CNN-SR model to the CNN-FPC model for the 101 in vivo datasets.(a) The original Off spectra and the results of three models after applying corrections to the in vivo data without further manipulation and with additional frequency and phase offsets applied to the same 101 datasets: small offsets (0-5 Hz; 0-20 ), medium offsets (5-10 Hz; 20-45 ), and large offsets (10-20 Hz; 45-90 ); (b) The original Diff spectra and the results of three models after applying corrections to the in vivo data without further manipulation and with additional frequency and phase offsets applied to the same 101 datasets: small offsets (0-5 Hz; 0-20 ), medium offsets (5-10 Hz; 20-45 ), and large offsets (10-20 Hz; 45-90 ); (c) Comparative performance Q scores for the CNN-FPC model and the MLP-FPC model for each dataset.A score above 0.5 indicated that the CNN-FPC model performed better than the MLP-FPC model in terms of alignment, whereas a score below 0.5 indicated the opposite.(d) Comparative performance Q scores for the CNN-SR model and the MLP-FPC model for each dataset.(e) Comparative performance Q scores for the CNN-SR model and the CNN-FPC model for each dataset.

FIGURE 6 :
FIGURE 6: Comparison of the variance of the choline interval in the edited in vivo Diff spectra among the MLP-FPC, CNN-FPC, and CNN-SR models with different levels of added offsets.From left to right: box plots of choline interval variances with no offset, small offsets, medium offsets and large offsets.The CNN-SR model has relatively stable performance and its generated variance of the choline interval is significantly lower than both the MLP-FPC model and the CNN-FPC model at all offset levels.With no offset or large offset, the CNN-FPC model has lower choline interval variance than the MLP-FPC model; but with small or medium offsets, the MLP-FPC model has lower choline interval variance than the CNN-FPC model.****: The two-tailed P-value is less than 0.0001; **: The two-tailed P-value is between 0.001 and 0.01; *: The two-tailed P-value is between 0.01 and 0.05.

FIGURE 7 :
FIGURE 7: Model performance comparison between the CNN-SR model and mSR model for the in vivo datasets, in terms of the Off spectra, Diff spectra, performance scores and the variance of choline interval for without and with additional offsets.(a) The Diff spectra results of the CNN-SR model and the mSR model.(b) Comparative performance scores Q for the CNN-SR model and the mSR model for each dataset.A score above 0.5 indicated that the CNN-SR model performed better than the mSR model in terms of alignment, whereas a score below 0.5 indicated the opposite.(c) Box plots of the variance of the choline interval of the CNN-SR model and the mSR model.No significant difference is observed in the no, small and medium additional offset cases but a significant difference is observed in the large additional offset case.

TABLE 1 .
Table of Models' Performance for the Simulated Datasets Table of mean absolute errors of the MLP-FPC, CNN-FPC, and CNN-SR model for frequency correction, phase correction, GABA residual, and Glx residual on the simulation dataset with different levels of noise.MLP =

TABLE 2 .
Table of Q Scores for the In Vivo Datasets Table of performance scores Q calculated between MLP-FPC, CNN-FPC, and CNN-SR model under four conditions: no added offsets, small offsets, medium offsets, and large offsets.MLP = multilayer perceptron; CNN = convolutional neural network; FPC = frequency-and-phase correction; SR = spectral registration.

TABLE 3 .
Table of the Variance of Choline Residuals for the In Vivo DatasetsTable of choline variances calculated on the MLP-FPC, CNN-FPC, and CNN-SR model under four conditions: no added offsets, small offsets, medium offsets, and large offsets.MLP = multilayer perceptron; CNN = convolutional neural network; FPC = frequency-andphase correction; SR = spectral registration.