## ABSTRACT

**Purpose** To develop a SNR enhancement method for chemical exchange saturation transfer (CEST) imaging using a denoising convolutional autoencoder (DCAE), and compare its performance with state-of-the-art denoising methods.

**Method** The DCAE-CEST model encompasses an encoder and a decoder network. The encoder learns features from the input CEST Z-spectrum via a series of 1D convolutions, nonlinearity applications and pooling. Subsequently, the decoder reconstructs an output denoised Z-spectrum using a series of up-sampling and convolution layers. The DCAE-CEST model underwent multistage training in an environment constrained by Kullback–Leibler divergence, while ensuring data adaptability through context learning using Principal Component Analysis processed Z-spectrum as a reference. The model was trained using simulated Z-spectra, and its performance was evaluated using both simulated data and in-vivo data from an animal tumor model. Maps of amide proton transfer (APT) and nuclear Overhauser enhancement (NOE) effects were quantified using the multiple-pool Lorentzian fit, along with an apparent exchange-dependent relaxation metric.

**Results** In digital phantom experiments, the DCAE-CEST method exhibited superior performance, surpassing existing denoising techniques, as indicated by the peak SNR and Structural Similarity Index. Additionally, in vivo data further confirms the effectiveness of the DCAE-CEST in denoising the APT and NOE maps when compared to other methods. While no significant difference was observed in APT between tumors and normal tissues, there was a significant difference in NOE, consistent with previous findings.

**Conclusion** The DCAE-CEST can learn the most important features of the CEST Z-spectrum and provide the most effective denoising solution compared to other methods.

## 1. INTRODUCTION

Chemical Exchange Saturation Transfer (CEST) is an emerging MRI mechanism that exploits the exchange of protons between water and certain solute molecules to produce contrast. Over recent years, CEST has gained significant attention for its capability to probe molecular and physiological properties in biological tissues with enhanced detection sensitivity (1–5). In CEST imaging, a Z-spectrum, a plot of the water signal as a function of the frequency offset (*Δω*) of the saturation pulses, is typically acquired so that molecules with distinct resonance frequency offsets can be identified. In brain tissues, there are multiple pools including amide proton transfer (APT) at around3.5ppm (6,7), amine CEST effect close to 3ppm (8,9) guanidinium CEST at around 2ppm (10–12), and nuclear overhauser enhancement (NOE) effects at around −1.6ppm (12–15) and −3.5ppm (16–19), termed NOE(−1.6) and NOE(−3.5) effects. Among these effects, the APT and NOE(−3.5) are two major effects that have been widely studied. They have demonstrated potentials in various applications, including tumor detection (20–23), ischemic stroke identification (24–26), and the diagnosis of multiple neurological disorders (27–35).

However, despite its enhanced sensitivity, CEST’s practical applications still faces challenges related to a low signal-to-noise ratio (SNR). This limitation arises from the typically low concentration of solute molecules as well as the scaled-down effect from the direct water saturation (DS) and magnetization transfer (MT) effects. The noise can significantly compromise the quality of CEST images, obscuring fine details, reducing contrast, and complicating subsequent image analysis, quantification, and interpretation. Overcoming this challenge necessitates advanced denoising techniques capable of preserving the essential molecular information while effectively suppressing noise.

To reduce the image noise, two strategies are generally used: increasing the number of signal averages/acquisitions (NSA) or slice thickness and applying post-processing methods (32). However, the former can lead to longer scanning times or loss of image details. Post-processing methods, which don’t require extra data collection, have gained attention. These techniques include the Principal Component Analysis (PCA) approach (36), the Multilinear Singular Value Decomposition (MLSVD) method (37), a hybrid approach combining non-local mean and coherence-enhanced diffusion (NLmCED) (38), and a method combining SVD and NLM, referred to as suBspace denoising with nOnlocal lOw-rank constraint and Spectral local-smooThness regularization (BOOST) (39). While generally effective, these methods have limitations, including dependency on regularization parameters, time-consuming iterative denoising with large or noisy data, and less effective performance with low SNR data.

Recently, Convolutional Neural Network (CNN)-based denoising methods have demonstrated superior noise elimination performance compared to traditional methods (40–42). Chen et al. (43) introduced a spatiotemporal correlation-based denoising network for CEST image denoising, known as the denoising CEST network (DECENT). This method combines 3D anisotropic filtering with spectral filtering through deep learning, harnessing global and spectral features essential for CEST denoising. However, these deep learning-based methods are fundamentally confined to traditional denoising approaches as the model’s weights are adjusted to optimize the maximum likelihood estimation (MLE), which assumes that the noise is purely random and does not consider any prior knowledge about the signal.

Simultaneously, the denoising convolutional autoencoder (DCAE) technique has been introduced and successfully implemented to improve the image SNR in various MRI fields (44,45). The DCAE is a deep learning model that can operate on the principle of maximum a posteriori (MAP) estimation. MAP estimation integrates prior information regarding the signal into the estimation procedure, aiming to determine parameters that optimize the likelihood of the signal considering both the observed noisy data and the prior knowledge (46). This prior knowledge can be very useful in denoising, as it can help to make more accurate estimations in some cases. Further details about the MLE and MAP are shown in Supporting Information Method S1. In this paper, we apply the DCAE, with necessary modifications, to reduce noise in the CEST Z-spectrum, referred to as DCAE-CEST, and compare it with state-of-the-art denoising methods based on MLE to demonstrate its advantages.

## 2. METHODS

### 2.1 The architecture of DCAE-CEST

Fig. 1A depicts the architecture of the DCAE-CEST network, which primarily comprises a four-layer encoder and a four-layer decoder. The input of the DCAE-CEST network is the noisy Z-spectrum, while the output is the reconstructed denoised Z-spectrum. The encoder network includes a 1D convolution followed by an exponential linear unit (ELU) activation and max-pooling-based down-sampling in each layer. Conversely, the decoder network comprises up-sampling using a transposed convolutional layer, followed by a 1D convolution and linear activation, in each layer. The transposed convolutional layer provides the up-sampled output apart from convolution operation with the help of zero padding. The encoder network contains 32, 64, 128, and 256 convolution filters, while the decoder network employs 256, 128, 64, and 32 convolution filters. To maintain alignment between the encoder and decoder dimensions, each intermediate feature undergoes zero padding and cropping. The architecture can use different convolution kernel sizes in temporal pathways to extract the latent salient features from the input data. A latent representation size of 32 is employed to preserve the most significant features of the input. In this study, the convolution kernel size is chosen to be 3.

### 2.2 Training and testing data from simulations

Numerical simulations of multiple-pool model Bloch-McConnell equations (47) were conducted to generate training and testing data. Continuous wave (CW)-CEST Z-spectra were simulated with *ω*_{1}of 0.5µT and 1µT, along with *Δω* ranging from −2000 Hz to −1250 Hz with a step size of 250 Hz (−10 ppm to −6.25 ppm with a step size of 1.25 ppm at 4.7 T), −1000 Hz to 1000 Hz with a step size of 25 Hz (−5 ppm to 5 ppm with a step size of 0.125 ppm at 4.7 T), and 1250 Hz to 2000 Hz with a step size of 250 Hz (6.25 ppm to 10 ppm with a step size of 1.25 ppm at 4.7 T). A total of 185,472 clean Z-spectra for the two saturation powers were created by varying the sample parameters, listed in Supporting information Table S1. This total included 46,656 clean Z-spectra for each saturation power used for training, and 46,080 clean Z-spectra for each saturation power used for testing. The test spectra, simulated with a combination of sample parameters separate from those used in training data generation, were used in the creation of digital phantoms. These phantoms were employed to validate the proposed method through comparison with other state-of-the-art methods. Each digital phantom simulation was carried out with a constant amide pool concentration, while other sample parameters were altered. Eight digital phantoms were simulated, each with a varying level of amide pool concentration ranging between 0.04% and 0.18%, to emulate various signal levels.

For each clean Z-spectrum used for the training, five noisy Z-spectra were generated by adding Gaussian noise to the real and imaginary components of the CEST signals at various levels, ranging from 1% to 5%, to emulate Rician noise of MRI images. Here, the real components (x) were obtained from the simulations while the imaginary components (y) were set to 0. The final CEST signals, with Rician noise, were obtained using the formula . Furthermore, another 2.5% additive white Gaussian noise (AWGN) was added to the 233280 **(=**5×46,656) Z-spectra (with added Rician noise at each level) to simulate the signal fluctuation caused by the instability of the MRI system. For the testing data, the clean Z-spectra were used as references, and the Z-spectra with 1% Rician noise and 2.5% AWGN noise were used as input, except in instances specifically noted (i.e., Fig. 5A, 5B). The simulations of Bloch equations were conducted using the ordinary differential equation solver (ODE45) in MATLAB (Math works, Natick, MA, USA).

### 2.3. Neural network training and prediction

Fig. 1B and 1C outlines the flowchart for the training and prediction procedure of DCAE-CEST. The training phase contains two steps and leverages the concept of curriculum learning which gradually increases the complexity of the task (48,49). In the first step training (i.e., pre-training phase), the DCAE-CEST model was initially pretrained using the 2×46,656 clean Z-spectra simulated with the two saturation powers as references, and their noisy counterparts with added Rician noise at various levels as input. The learning rate was set at 1−10^{-4} and 200 epochs. Subsequently, the training complexity was increased by incorporating the same 2×46,656 clean Z-spectra simulated with the two saturation powers as references, but their noisy counterparts with both added Rician noise at various levels and 2.5% AWGN noise as input. Adam optimizer was utilized to minimize the mean square error (MSE) loss between the DCAE-CEST output and the reference. Supporting information Fig. S1 illustrates the MSE loss of the DCAE-CEST model training in relation to the number of iterations. In the second step (i.e., fine-tuning phase), the models were fine-tuned separately for each saturation power by repeating the first step, but only using training data simulated with the corresponding single saturation power. In the prediction phase, a context learning was used to reduce potential bias in the DL models (45). Specifically, the PCA-based denoised Z-spectra, along with an averaging approach using Z-spectra from a limited number of nearby voxels, were used as references for comparison with the prediction. If the loss between the reference and the prediction is larger than a certain threshold, the model would be further fine-tuned. This procedure was performed for 0-3 iterations or until the loss was less than the threshold. The fine-tuning and context learning are executed in an environment constrained by Kullback–Leibler (K-L) divergence, which serves as a measure of the difference between two distributions, steers optimization, and guides model adaptions (50). A comprehensive description, along with the pseudocode for these two-step training and the prediction, is provided in Supporting Information Algorithms S1-S3. The DACE-CEST network was implemented in MATLAB (R2022b), and the training process took approximately 32 hours. Further details about the curriculum learning, pre-training and fine-tuning, context learning, K-L divergence, and ELU activation, which are used in the DCAE-CEST model, are provided in Supporting Information Method S2.

### 2.4. Ablation study

An ablation study was conducted to examine nigh distinct combinations of deep learning hyper-parameters. These combinations involved the use of DCAE with Rectified Linear Unit (ReLU) activation (#1), DCAE with ELU activation (#2, termed the conventional DCAE in this paper), DCAE with ReLU activation followed by context learning (#3), DCAE with ELU activation followed by context learning (#4), curriculum learning of DCAE with ReLU activation (#5), curriculum learning of DCAE with ELU activation (#6), curriculum learning of DCAE with ReLU activation followed by context learning (#7), curriculum learning of DCAE with ELU activation followed by context learning but without pre-training (#8), and curriculum learning of DCAE with ELU activation followed by context learning (#9, i.e., DCAE-CEST). For models from #1-#4 and #8, a one-step training was conducted on CEST data obtained with one single saturation power (i.e., 1µT). For models from #5-#7 and #9, a two-step training was first conducted on CEST data obtained with two saturation powers (i.e., 0.5µT and 1µT) for pre-training and then on CEST data obtained with one single saturation power (i.e., 1µT) for fine-training. The Z-spectrum samples were randomized, followed by 80-20% split for training and testing purposes respectively. Of the training samples, 70% were used for actual training, while the remaining 30% were set aside for validation. The performance of these different combinations was then evaluated by comparing their mean MSE values between the predicted and the reference Z-spectra using 1000 Z-spectrum samples randomly selected from testing data and with 10 times repeated experiments.

### 2.5 Multiple pool Lorentzian fit and Quantification metrics

The APT and NOE(−3.5) effects were quantified using the multi-pool model Lorentzian fit of the CEST Z-spectra. The mathematical framework for the multi-pool model Lorentzian fit is shown in Eq. (1):
Here, *L*_{i}(Δ*ω*) = *A*_{i}/(1 + (Δ*ω* − Δ*ω*_{c})^{2}/(0.5*W*_{i})^{2}), represents the Lorentzian line with central frequency ( Δ*ω*_{c}), full width half maximum (*W*_{i}), and peak amplitude (*A*_{i}). *N* is the number of fitted pools. *S*(Δ*ω*) represents the CEST signals as a function of *Δω*. *S*_{0} is the control signal without RF saturation. A six-pool (amide at 3.5 ppm, guanidine at 2 ppm, water, NOE(−1.6), NOE(−3.5), and semisolid MT) model Lorentzian fit was first performed to process the Z-spectra. The number of fitted pools was estimated by observing exchange/coupling effects on the Z-spectrum. Supporting Information Table S2 lists the starting points and boundaries of the fit. Then, the reference signals (S_{ref}) for quantifying APT and NOE(−3.5) were obtained by summing all Lorentzians except for the corresponding pool (51). The label signals (S_{lab}) were obtained from the fitted CEST signals. An apparent exchange-dependent relaxation (AREX) method (26), which inversely subtracts *S*_{lab}from *S*_{ref}with T_{1obs} (=1/R_{1obs}) normalization, was used to quantify the APT and NOE(−3.5) effects, termed AREX_{mfit}.
in which, R_{1obs} is the observed water longitudinal relaxation rate. The ATP and NOE(−3.5) maps were obtained by choosing the maximum value between 3.25ppm and 3.75ppm and between - 3ppm and −4ppm, respectively, on the AREX_{mfit} spectrum for each voxel. To further minimize potential biases in the DCAE-CEST, the value ranges of these ATP and NOE(−3.5) maps, fitted from the DCAE-CEST denoised data, were normalized using the respective fitted APT and NOE(−3.5) maps from the PCA-denoised CEST data.

### 2.6 Evaluation metrics

MSE, mean absolute error (MAE), peak SNR (PSNR) (43), structural similarity index (SSIM) (43), defined in Eq. (3–6) respectively, were used to evaluate the denoising performance.
where *S* and *Ŝ* denote the signals and the estimated signals.
where *X* represents a given sample data, and *X*_{max} is the maximum value in this given sample data, which is 1 for a Z-spectrum.
where *μ* refers the mean and *σ* stands for the standard deviation of X with the subscript “denoise” indicating the denoised data, while “ref” indicating the reference data; *σ*_{XdenoiSeXref} is the covariance of *X*_{denoiSe}and *X*_{ref}. The constants *c*_{1} = 1 × 10^{−4} and *c*_{2} = 9*c*_{1} are taken to avoid the division by zero (52). A lower value of MSE or MAE, or a higher value of PSNR and SSIM, indicates superior denoising performance.

### 2.7 Animal Preparation

Six rats bearing 9L tumors were prepared by injecting 1 × 10^{5} 9L glioblastoma cells in the right brain hemisphere. MRI imaging was conducted after 2 to 3 weeks. All rats were immobilized and anesthetized with 2-3% isoflurane and 97-98% oxygen during the experiments. Respiration rate was monitored to be in a range from 40 to 70 breaths per minute. Rectal temperature was maintained at 37°C using a warm-air feedback system (SA Instruments, Stony Brook, NY). All animal procedures were approved by the Animal Care and Usage Committee of Vanderbilt University Medical Center.

### 2.8 MRI

CEST Z-spectra were acquired with the same ω_{1} and *Δω* as those in generating the training data through simulations. Control images were acquired with the frequency offset at 100,000 Hz (500ppm at 4.7T). R_{1obs} was obtained using an inversion recovery method (53). All measurements were performed on a Varian 4.7-T magnet with a 38-mm receive coil. All images have a matrix size of 64 × 64, field of view of 30 × 30mm^{2}, and slice thickness of 2mm.

### 2.9 State-of-the-art methods

To evaluate the advantages of our proposed DCAE-CEST method, we compared it with various state-of-the-art denoising methods, including PCA, MLSVD, NLmCED, DECENT, and the conventional DCAE. Default implementation parameters were employed unless otherwise stated. For MLSVD, we adopted the sequentially truncation approach (54). The number of iterations for NLmCED denoising was set to 6 (38). For the DECENT method, default parameter settings were used.

### 2.10 Data analysis and statistics

ROIs of tumors and contralateral normal tissues were delineated from R_{1obs} maps. The ROIs of contralateral normal tissues were chosen to mirror the tumor ROIs. The student’s t-test was employed to compare the ROI-averaged signals. It was considered to be statistically significant if *P* < 0.05. All the data processing was carried out in the MATLAB (R2022b) or python environment, running on a machine with processor Intel(R) Core (TM) i9-10900X CPU @3.70GHz× 20 equipped with 64GB RAM and NVIDIA RTX A4000 GPU with 26.5GB. The MATLAB implementation cade is available at https://www.mathworks.com/matlabcentral/fileexchange/167446-dcae-cest.

## 3. RESULTS

### 3.1 Ablation study

Fig. 2 shows the result from the ablation study examining nine different combinations of deep learning hyper-parameters. First, when comparing all DCAE models that use ReLU activation (#1, #3, #5, #7) with those that use ELU activation (#2, #4, #6, #8, #9), it is found that ELU activation significantly reduces the MSE values. This could be due to ReLU activation only considering positive values as informative, while transforming negative values into zeros. In contrast, the ELU activation function treats negative values as informative, rendering the DCAE model with ELU a superior solution for our data. Second, when comparing all DCAE models that employ curriculum learning (#5, #7 or #6, #8, #9), with those that do not (#1, #3, or #2, #4), for those with either ReLU or ELU, it is found that curriculum learning dramatically reduces the MSE values. This may be because, after a certain period of curriculum training, the process of gradient descent (used to update the model’s weights) becomes more efficient as the model’s predictions gradually align with the actual values. In other words, the curriculum can influence the direction of the model’s training weight variation and prevent arbitrary weight updates. Third, when comparing all DCAE models that use context learning (#7 or #9) with those that do not (#5 or #6), for those with either ReLU or ELU, it is found that context learning can further reduce the MSE values. Lastly, when comparing the DCAE model that uses ELU activation, curriculum learning, context learning, and the two-step training (#9) with the model that also uses ELU activation, curriculum learning, context learning but only employs a one-step training (#8), it is found that this two-step training is critical for reducing the MSE values. Based on this ablation study, we choose to use the DCAE model modified by including the ELU activation, curriculum learning, and context learning with the two-step training.

### 3.2 Comparison with state-of-the-art methods via simulations

Fig. 3A-3F display a representative clean simulated Z-spectrum (reference) from a digital phantom with amide concentration of 0.1%, its noisy counterpart with 1% Rician noise and 2.5% AWGN noise (noisy), and the denoised Z-spectra using various methods, with ω_{1} of 1µT. The residual spectra, which represent the differences between the references and the denoised Z-spectra, were also plotted to compare the performance of various denoising methods. The mean MSE and MAE values of these residual spectra between 10ppm and −10ppm were calculated for all voxels in a digital phantom. The results show that the mean MSE values from this phantom for PCA, MLSVD, NLmCED, DECENT, DCAE, and DCAE-CEST were 0.0003601, 0.000056, 0.0004099, 0.0002562, 0.0001116, and 0.0000516, respectively, and the mean MAE values from this phantom for these denoising techniques were 0.01347, 0.00708, 0.01497, 0.00782, 0.00890, and 0.00639, respectively. It can be observed that the mean MSE and MAE residuals for the DCAE-CEST denoising are the lowest, suggesting that the DCAE-CEST method can effectively restore a given Z-spectrum. Fig. 3G-3L depict the AREX_{mfit} quantified APT and NOE(−3.5) spectra from the corresponding denoised Z-spectra shown in Fig. 3A-3F, as well as the ground truth (GT) APT and NOE(−3.5) spectra fitted from the reference Z-spectra. It can be observed that the AREX_{mfit} quantified APT and NOE(−3.5) spectra from the DCAE-CEST denoised Z-spectrum closely resemble the GT, while other methods show more deviations.

Fig. 4 presents the AREX_{mfit} quantified APT maps on digital phantoms generated with 1% Rician noise and 2.5% AWGN noise. These maps were derived from references, their noisy counterparts, and the denoised Z-spectra using various methods, with ω_{1} of 1µT. It was observed that the quality of the noisy APT maps had significantly deteriorated, in comparison to the reference APT map. However, the APT maps generated using all denoised methods displayed a substantial improvement compared to the noisy APT maps. Among the denoised methods, PCA, MLSVD, and NLmCED showed distinct differences from DECENT and DCAE-CEST. NLmCED demonstrated a minor patch effect, while DECENT performed well. Upon visual comparison, DCAE-CEST surpassed all other state-of-the-art methods when compared with the reference APT maps. Supporting information Fig. S2 shows the residual maps of the eight digital phantoms depicted in Fig. 4. These maps represent the differences between the reference APT maps and the denoised APT maps. Supporting information Fig. S3- and Table S3 and S4 illustrate the median and interquartile range of the residual values, the mean MSE values, and the mean MAE values, respectively, for each residual map of digital phantom seen in Supporting Information Fig. S2. It’s noteworthy that the DACE-CEST has the smallest interquartile range and lowest MSE value among all denoising methods. In addition, the DCAE-CEST has the lowest mean MAE value, except for the lowest amide concentration. Supporting Information Table S5 shows the time taken for the prediction of a digital phantom for all these denoising methods. The DCAE takes approximately 6.7s, which is comparable to the 5.1s taken by the DECEST, but much less than the 14.3s taken by the NLmCED.

Fig. 5A and 5B compare the average PSNR of the simulated Z-spectrum (noisy) and their denoised counterparts using various denoising methods from all digital phantoms with various noise levels (1-5% Rician noise and 2.5% AWGN noise). Fig. 5C and 5D compare the average PSNR values of the APT map and NOE(−3.5) map, respectively, fitted from the simulated Z-spectrum (noisy) and their denoised counterparts using various denoising methods from all digital phantoms with 1% Rician noise and 2.5% AWGN noise. Fig. 5E and 5F compare the average SSIM values of the APT map and NOE(−3.5) map, respectively, from these digital phantoms. It was found that the DCAE-CEST method provides the highest PSNR and SSIM values. These results collectively demonstrate the effectiveness of our proposed DCAE-CEST method for improving the CEST signal denoising as well as enhancing the APT and NOE(−3.5) quantification.

### 3.3 Comparison with state-of-the-art methods via animal experiments

Fig. 6A-6G displays two representative Z-spectra (noisy) as well as the corresponding denoised Z-spectra using various methods, from the tumors and the contralateral normal tissues, respectively, in a rat brain measured with ω_{1} of 1µT. Supporting information Fig. S4 depicts the AREX_{mfit} quantified APT and NOE(−3.5) spectra from the corresponding Z-spectra shown in Fig. 6. Notably, the denoised Z-spectrum using the NLmCED and the conventional DCAE method shows significant deviation from the noisy counterpart, particularly in the NOE range.

Fig. 7 and Fig. 8 show the APT and NOE(−3.5) maps quantified by AREX_{mfit} with ω_{1} of 1µT, as well as the R_{1obs} maps on a representative rat brain. These CEST maps were derived from the measured noisy Z-spectra, and the denoised Z-spectra using various methods. It was observed that the DCAE-CEST method effectively minimizes the noise effect while preserving the internal texture, demonstrating superior performance compared to other denoising methods. Supporting information Fig. S5 to S26 show the corresponding Fig. 7 and Fig. 8, but from other rat brains and with all ω_{1} values. All images demonstrate improved denoising performance when the DCAE-CEST method is used as compared to other state of the art methods.

Fig. 9 illustrates the statistical differences between tumors and the contralateral normal tissues for the AREX_{mfit} quantified APT and NOE(−3.5) values derived from the noisy data as well as the denoised data using the PCA, MLSVD, NLmCED, DECENT, DCAE, and DCAE-CEST methods. More detailed statistical data can be found in the supporting information Tables S6 and S7. It was observed that while no significant difference was observed in APT between tumors and normal tissues for all denoising methods except the DCAE in APT imaging, there was a significant difference in NOE(−3.5), consistent with previous findings (15,55–57).

## 4. DISCUSSION

In this paper, we developed a DCAE-CEST model and applied it to denoise CEST Z-spectrum. We found that this proposed model is capable of restoring the original, uncorrupted CEST signals from noisy inputs by addressing inherent challenges in reconstructing CEST data, while minimizing noise interference. Experiments on digital phantoms and animals demonstrated that it is the most effective method for CEST signal denoising by comparing it with other methods.

The DCAE-CEST model, through an encoding phase, transforms input data into a hidden representation, then reconstructs the output to match the original input in a decoding phase. It includes convolution layers that enhance its ability to utilize Z-spectral characteristics effectively (37). To optimize the convolutional layer, we expanded the DCAE-CEST model’s depth and width, creating a broader network without overfitting. Using the ELU, we improved regularization and learning methods, enabling better noise filtering during iterative training. In the first training phase, the model is pre-trained on CEST data obtained with two saturation powers, enhancing its adaptability to various data distributions and complexities. In the second training phase, the model is fine-tuned to optimize its performance on CEST data obtained with specific saturation power and adapt it to particular characteristics. During this training, we also employed curriculum learning, starting with simpler tasks and gradually progressing to more complex ones, thereby improving convergence, generalization, and training stability. While MAP estimation-based models have advantages over MLE, they can introduce bias in the denoised output when dealing with data that is different from the training set. To mitigate this, we used the Z-spectrum from PCA-processed data as a reference for reducing the bias using the context learning during the prediction phase. It is important to note that PCA denoising only eliminates certain high frequency components, making it challenging to achieve completely noise-free results without losing information. However, these results still provide an approximate representation of the original data and introduce little or no bias. The context learning can effectively reduce the bias effect in DCAE-CEST and achieve superior denoising performance compared to PCA. Hence, it outperforms the performance of either technique when used individually in terms of denoising without changing internal patterns. Furthermore, it’s important to note that a reasonable number of iterations in the context learning process is crucial to maintain a balance between the denoising performance and bias. This is illustrated in the Supporting Information Fig. S27, which emphasizes the significance of context learning and the choice of the iteration number for this process. Alongside context learning, we also normalized the value ranges of the fitted APT or NOE(−3.5) maps from DCAE-CEST denoised data using these value ranges from the PCA denoised data to further mitigate this bias.

The DCAE-CEST model was trained using simulated CEST signals, the parameters for which were derived from literature reviews (6,56,58–65) or based on our prior knowledge and experiences. Nonetheless, accurately quantifying the underlying CEST parameters remains a challenging task, primarily due to the difficulty to accurately isolate each CEST pool. As a result, a wide range of CEST parameters have been reported, using different quantification methods or fitting models. For instance, the amide-water exchange rate, as reported in previous studies (6,56,60,62,63), varies from a few dozen to several hundred s^{-1}. Consequently, the parameters we used for our simulation might not accurately reflect real tissue characteristics. This could potentially lead to prediction bias when training on these data. This bias was addressed by employing the context learning during the prediction phase.

The simulated CEST signals incorporated both Rician noise, resulting in a non-zero mean in the Z-spectra, and Gaussian noise, leading to a zero mean in the Z-spectra. This was based on our analysis of the noise characteristics present in the in vivo CEST Z-spectra. In Supporting information Fig. S28, we provided the probability density of the difference between the noisy Z-spectra and the corresponding Z-spectra processed through Gaussian smoothing, using all voxels in a representative rat brain. It’s important to note that Gaussian smoothing effectively acts as a low-pass filter, reducing both Gaussian and Rician noise components. Therefore, this subtraction can highlight the characteristics of these two types of noises. Our findings revealed that the probability density of this subtraction comprises two components: one with a zero mean and another with a non-zero mean, indicating the presence of both Gaussian and Rician noise.

Compared with other CEST denoising methods, our DCAE-CEST model has a few key advantages: 1) By using MAP estimation, our model can effectively reconstruct input data while preserving valuable data properties, making it more suitable for denoising CEST signals. In contrast, other methods use the MLE, which may not be precise to remove noise while restore the clean CEST signals; 2) The training of the DCAE-CEST model relies solely on the Z-spectrum, which can be easily generated from simulations. In contrast, many other methods rely on measured structural images of human subjects that are more challenging to obtain, especially when the training data encompasses a range of pathologies and diseases. In addition, the measured structural images contain more information than the Z-spectrum, meaning that their use requires a significantly larger sample size for training. Moreover, the training using both the structural images and the spectral features, such as the DECENT, may be limited by the inherent spatial and temporal correlation noise. 3) DCAE-CEST has the ability to model complex non-linear functions, whereas PCA functions on the principle of linear correlation. Therefore, DCAE-CEST is likely to be more efficient than PCA when dealing with data or noise that exhibits non-linear characteristics. The NLmCED provides some patches (see Supporting Information Fig. S2) because it is based on the coherency of nearby data samples in an image. However, the DCAE-CEST can avoid this problem using limited number of nearby data samples during context learning.

In another recent study (66), a ResUNet was developed for CEST denoising by modifying an existing UNet model. Despite both ResUNet and DCAE-CEST being designed as autoencoders, they exhibit differences. The training of ResUNet depends on CEST image data, similar to DECENT, rather than solely Z-spectral data, necessitating a significant quantity of training data. Because of this, the use of ResUNet is time-intensive and requires more memory due to a large number of parameters, compared to DCAE-CEST. Another work (67) also utilized a UNet for denoising MRS data. This method adheres to MAP and thus also exhibits prediction bias. However, unlike our approach, no bias reduction strategy was implemented in this method.

Due to various overlapping components in CEST imaging, quantification of APT effect is always challenging. Conventionally, an asymmetric analysis of magnetization transfer ratio, termed MTR_{asym}, which subtracts a label CEST signal acquired at +3.5ppm and a reference CEST signal acquired at −3.5ppm, has been used to provide APT-weighted imaging by reducing contaminations from DS and MT effects (68,69). This method shows significantly enhanced denoising performance in APT and NOE(−3.5) maps. However, the contribution from the asymmetric MT effect (70), NOE effect (14,19,57,71,72), and nearby amine CEST effect (73,74) to the APT-weighted contrast between tumors and contralateral normal tissues have not been thoroughly evaluated. To address this issue, multiple-pool model Lorentzian fit and various other methods have been proposed. These methods also provide dramatically enhanced APT signals in tumors. However, direct subtraction of the label signal and reference signal was typically used in this method. This metric cannot fully remove the DS and MT due to their ‘shine through’ effect, as well as T_{1obs} (75–78). In our previous reports, we used the multiple-pool Lorentzian fit, along with the AREX metric, to remove all contaminations, and found no significant difference between tumors and contralateral normal tissues in the APT imaging (15,55–57). The results found in this paper are consistent with our earlier findings.

## 5. CONCLUSION

The proposed DCAE-CEST network can learn the most important features of the CEST Z-spectrum and provide the most effective denoising solution by comparing it with other methods.

## Method, algorithm, table, and figure captions for Supporting Information

### Methods

**Supporting information Method S1:** Maximum Likelihood Estimation (MLE) and Maximum a posteriori probability (MAP)

**Supporting information Method S2:** Definitions of additional terms used in the DCAE-CEST model

### Algorithms

**Supporting information Algorithm S1:** DCAE-CEST algorithm for pre-training (the first step training).

**Supporting information Algorithm S2**: DCAE-CEST algorithm for fine tuning (the second step training).

**Supporting information Algorithm S3:** DCAE-CEST algorithm for context learning during the prediction

### Tables

**Supporting information Table S1.** Sample parameters used in generating the training data and digital phantom data.

**Supporting information Table S2.** Starting points and boundaries of the amplitude, width, and offset of the exchange/coupling pools in the Lorentzian fit. The unit of peak width and offset is ppm.

**Supporting Information Table S3**. The mean MSE of the APT values in each digital phantoms simulated with a variety of amide pool concentration (f_{s_amide}) without/with various denoising techniques at 1*μT*.

**Supporting Information Table S4:** The mean MAE of the APT values in each digital phantoms simulated with a variety of amide proton concentration (f_{s_amide}) without/with different denoising techniques at 1*μT*.

**Supporting Information Table S5**. Time consumption (in seconds) of different methods on the digital phantoms. The experiments were performed on a personal computer equipped with processor Intel(R) Core (TM) i9-10900X CPU @3.70GHz× 20 and NVIDIA RTX A4000 GPU with 26.5GB.

**Supporting information Table S6:** The AREX_{mfit} quantified APT and NOE(−3.5) values from tumor and contralateral normal tissues of each rat, derived from the noisy Z-spectra acquired with 0.5µT along with PCA, MLSVD, NLmCED, DECENT, DCAE, and DCAE-CEST.

**Supporting information Table S7:** The AREX_{mfit} quantified APT and NOE(−3.5) values from tumor and contralateral normal tissues of each rat, derived from the noisy Z-spectra acquired with 1µT along with PCA, MLSVD, NLmCED, DECENT, DCAE, and DCAE-CEST.

### Figures

**Supporting information Fig. S1.** Plot of the MSE loss of the DCAE-CEST model training in relation to the number of iterations.

**Supporting information Fig. S2**: Residual maps between the reference APT maps and the denoised APT maps of the eight digital phantoms. The numbers at the left side indicate the amide concentration for the corresponding row, while the text on the top identifies the methodology used for each column.

**Supporting information Fig. S3**: The median and interquartile range of the residuals of the eight digital phantoms with the amide concentration of 0.04% (A), 0.06% (B), 0.08% (C), 0.1% (D), 0.12% (E), 0.14% (F), 0.16% (G), and 0.18% (H), respectively.

**Supporting information Fig. S4**: The AREX_{mfit} quantified APT and NOE(−3.5) spectra from the corresponding Z-spectra in Fig. 6.

**Supporting information Fig. S5.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #1 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S6.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #2 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S7.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #2 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S8.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #3 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S9.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #3 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S10.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #4 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S11.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #4 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S12.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #5 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S13.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #5 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S14.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #6 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S15.** Maps of the AREX_{mfit} quantified APT effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #6 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S16.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #1 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S17.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #2 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S18.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #2 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S19.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #3 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S20.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #3 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S21.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #4 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S22.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #4 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S23.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #5 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S24.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #5 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S25.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #6 measured with ω_{1} of 0.5µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Fig. S26.** Maps of the AREX_{mfit} quantified NOE(−3.5) effect from the noisy Z-spectra (A), as well as the denoised Z-spectra using PCA (B), MLSVD (C), NLmCED (D), PCA (12) (E), DECENT (F), DCAE (G), and DCAE-CEST (H), from rat #6 measured with ω_{1} of 1µT. The map of R_{1obs} (I) is also shown for comparison.

**Supporting information Figure S27.** A comparison of the AREX_{mfit} quantified APT effect fitted from the noisy Z-spectra (A), PCA-denoised Z-spectra (B), DCAE-CEST denoised Z-spectra with 5 iterations for the context learning (C) DCAE-CEST denoised Z-spectra without the context learning (D), and DCAE-CEST denoised Z-spectra (E) from rat #1. Notably, in the DCAE-CEST in (E), there are 3 iterations for the context learning. By comparing (C) with (B), a similar image structure was observed. This is because more iterations in the context learning allow the DCAE-CEST to closely mirror its reference, thereby reducing its denoising performance. Additionally, by comparing (D) with (B), a higher SNR but biased range of signal values were obtained in the DCAE-CEST denoised map without context learning. Conversely, when comparing (E) with (B), a higher SNR and similar range of signal values can be observed. This underscores the importance of selecting a reasonable number of iterations to balance the denoising performance and bias.

**Supporting information Figure S28.** The probability density of the subtraction of the noisy Z-spectra and the Gaussian smoothing processed Z-spectra from all voxels in a representing rat brain. The red curve represents the fitted Gaussian line shape. Two components could be observed: one with a zero mean and the other with a non-zero mean (zoomed image on the top right).