## Abstract

Functional near-infrared spectroscopy (fNIRS) measures the hemoglobin concentration changes associated with neuronal activity. Diffuse optical tomography (DOT) consists in reconstructing the optical density changes measured from scalp channels to the near-infrared light attenuation changes within the cortical regions. In the present study, we adapted a nonlinear source localization method developed and validated in the context of Electro- and Magneto-Encephalography (EEG/MEG): the Maximum Entropy on the Mean (MEM), to solve the inverse problem of DOT reconstruction. We first introduced depth weighting strategy within the MEM framework for DOT reconstruction to avoid biasing the reconstruction results of DOT towards superficial regions. We also proposed a new initialization of the MEM model improving the temporal accuracy of the original MEM framework. To evaluate MEM performance and compare with widely used depth weighted Minimum Norm Estimate (MNE) inverse solution, we applied a realistic simulation scheme which contained 4000 simulations generated by 250 different seeds at different locations and 4 spatial extents ranging from 3 to 40*cm*^{2} along the cortical surface. Our results showed that overall MEM provided more accurate DOT reconstructions than MNE. Moreover, we found that MEM was remained particularly robust in low signal-to-noise ratio (SNR) conditions. The proposed method was further illustrated, by comparing to functional Magnetic Resonance Imaging (fMRI) activation maps, on real data involving finger tapping tasks with two different montages. The results showed that MEM provided more accurate HbO and HbR reconstructions in spatial agreement with the fMRI main cluster, when compared to MNE.

**Highlights**

We introduced a new NIRS reconstruction method – Maximum Entropy on the Mean.

We implemented depth weighting strategy within the MEM framework.

We improved the temporal accuracy of the original MEM reconstruction.

Performances of MEM and MNE were evaluated with realistic simulations and real data.

MEM provided more accurate and robust reconstructions than MNE.

## 1. Introduction

Near-infrared spectroscopy (NIRS) is an non-invasive functional neuroimaging modality. It detects changes in oxy- and deoxy-hemoglobin (HbO/HbR) concentration within head tissues through the measurement of near-infrared light absorption using sources and detectors placed on the surface of the head (Scholkmann et al., 2014). In continuous wave NIRS, the conventional way to transform variations in optical density to HbO/HbR concentration changes at the level of each source-detector channel, is to apply the modified Beer Lambert Law (mBLL) (Delpy et al., 1988). This model assumes homogeneous concentration changes within the detecting region, i.e. ignoring the partial volume effects which indicates the absorption of light within the illuminated regions varies locally. This assumption introduces serious and systematic errors when dealing with focal hemodynamic changes (Boas et al., 2001; Strangman et al., 2003).

In order to handle these important quantification biases associated with sensor level based analysis, diffuse optical tomography (DOT) has been proposed to reconstruct the fluctuations of HbO/HbR concentrations within the brain (Arridge, 1999). This technique not only provides better spatial localization and resolution of the underlying hemodynamic responses (Boas et al., 2004a; Joseph et al., 2006), but also avoids partial volume effect in classical mBLL, thus, achieves better quantitative estimation of HbO/HbR concentration changes (Boas et al., 2001; Strangman et al., 2003). Some applications of DOT to reconstruct brain hemodynamic responses on real NIRS data have been applied to reconstruct hemodynamic responses on motor cortex during median-nerve stimulation (Dehghani et al., 2009; Hughes et al., 2004), finger tapping (Boas et al., 2004a; Yamashita et al., 2016), visual cortex retinotopic mapping (Zeff et al., 2007; White and Culver, 2010; Eggebrecht et al., 2012) and simultaneous imaging over the motor and visual cortex (White et al., 2009).

In order to formalize DOT reconstruction, one needs to solve two main problems. The first one is the so-called forward problem which generates a forward model or sensitivity matrix that maps local absorption changes within the brain to variations of optical density changes measured by each channel (Boas et al., 2002). The second problem is the so-called inverse problem which aims at reconstructing the fluctuations of hemodynamic activity within the brain from scalp measurements (Arridge, 2011). The forward problem can be solved by generating a subject specific anatomical model, describing accurately propagation of light within the head. Such anatomical model is obtained by segmenting anatomical Magnetic Resonance Imaging (MRI) data, typically into five tissues (i.e. scalp, skull, cerebro-spinal fluid (CSF), white matter and gray matter), before initializing absorption and scattering coefficients values for each tissue type and for each wavelength (Fang, 2010; Machado et al., 2018). Solving the inverse problem is more difficult since it relies on solving an ill-posed problem which does not provide a unique solution, unless specific additional constraints are added. The most widely used inverse method in DOT is a linear approach based on Minimum Norm Estimate (MNE) originally proposed for solving the inverse problem of MagnetoencephaloGraphy(MEG) and Electroencephalography (EEG) source localization (Hämäläinen and Ilmoniemi, 1994). It minimizes the *L*_{2} norm of the reconstruction error along with Tikhonov regularization (Boas et al., 2004b; Zeff et al., 2007; Dehghani et al., 2009; Eggebrecht et al., 2012, 2014; Tremblay et al., 2018). Other strategies to solve DOT inverse problem have been also considered, such as sparse regularization using the *L*_{1} norm (Süzen et al., 2010; Okawa et al., 2011; Kavuri et al., 2012; Prakash et al., 2014; Tremblay et al., 2018) and Expectation Maximization (EM) algorithm (Cao et al., 2007). A non-linear method based on hierarchical Bayesian model for which inference is obtained through an iterative process has been proposed by (Shimokawa et al., 2012, 2013) and applied on finger tapping experiments in (Yamashita et al., 2016).

Maximum Entropy on the Mean (MEM) framework was first proposed by (Amblard et al., 2004) and then applied and carefully evaluated by our group in the context of EEG/MEG source imaging (Grova et al., 2006; Chowdhury et al., 2013). The MEM framework was specifically designed and evaluated for its ability to recover spatially extended generators (Heers et al., 2016; Pellegrino et al., 2016; Chowdhury et al., 2016; Grova et al., 2016), whereas we recently demonstrated excellent performances when dealing with focal sources as well (Hedrich et al., 2017) and when applied on clinical epilepsy data (Chowdhury et al., 2018; Pellegrino et al., 2020).

Inspired by these studies, our main objective was to adapt the MEM framework for DOT and carefully evaluate its performance. NIRS reconstruction tends to bias DOT reconstructions towards more superficial regions, especially because light sensitivity profile decreases exponentially with the depth of the generators (Strangman et al., 2013). To reduce this bias, we implemented and evaluated a depth weighted variant of the MEM framework. NIRS DOT using MEM was carefully evaluated using realistic simulations of NIRS data.

The article is organized as follows. The methodology of depth weighted MNE and depth weighted MEM for DOT is first presented. Then, we described our validation framework using realistic simulations and associated validation metrics. Finally, illustrations of the methods on finger tapping NIRS data set acquired with two different montages from 6 healthy subjects are provided and compared with functional Magnetic Resonance Imaging (fMRI) results.

## 2. Material and Methods

### 2.1. NIRS reconstruction

To perform DOT reconstructions, the relationship between measured optical density changes on the scalp and wavelength specific absorption changes within head tissue is usually expressed using the following linear model (Arridge, 1999):
where *Y* is a matrix (*p × t*) which represents the wavelength specific measurement of optical density changes in *p* channels at *t* time samples. *X* (*q × t*) represents the unknown wavelength specific absorption changes in *q* locations along the cortex at time *t*. *A* (*p × q*) is called the sensitivity matrix which is actually the forward problem relating absorption changes in the head to optical density changes measured by each channel. Finally, *e* (*p × t*) models the additive measurement noise. Solving the NIRS tomographic reconstruction problem consists in solving an inverse problem which can be seen as the estimation of matrix *X* (i.e. the amplitude for each location *q* at time *t*). However, this problem is ill-posed and admits an infinite number of possible solutions. Therefore, solving the DOT inverse problem requires adding additional prior information or regularization constraints to identify a unique solution.

Anatomical constraints can first be considered by defining the reconstruction solution space (i.e. where *q* is located) within the gray matter volume (Boas and Dale, 2005). In EEG and MEG source localization studies (Dale and Sereno, 1993; Grova et al., 2006; Chowdhury et al., 2013), it is common to constrain the reconstruction along the cortical surface. In this study, the reconstruction space was considered as the mid surface defined as the middle layer between gray matter/pial and gray/white matter interfaces (Fischl et al., 2002).

### 2.2. Minimum Norm Estimation (MNE)

Minimum norm estimation is one of the most widely used reconstruction methods in DOT (Zeff et al., 2007; Dehghani et al., 2009; White et al., 2009; White and Culver, 2010; Eggebrecht et al., 2012, 2014; Yamashita et al., 2016). Such estimation can be expressed using a Bayesian formulation which solves the inverse problem by estimating the posterior distribution (i.e. the probability distribution of parameter *X* conditioned on data *Y*). A solution can be estimated by imposing Gaussian distribution priors on the generators and the noise . ∑_{d} is the inverse of the noise covariance which could be estimated from baseline recordings. ∑_{s} is the inverse of the source covariance which is assumed to be an identity matrix in conventional MNE.

The Maximum A Posteriori (MAP) estimator of the posterior distribution P(X|Y) can be obtained using maximum likelihood estimation: where is the reconstructed absorption changes along the cortical surface (i.e. mid surface). λ is a hyperparameter to regularize the inversion using the priori minimum norm constraint . In this study, we applied the standard L-Curve method to estimate this λ as suggested in (Hansen, 2000).

### 2.3. Depth weighted MNE

Standard MNE solutions assumes Σ_{s} = *I* which tends to bias the results towards most superficial regions. When compared to EEG-MEG source localization, such bias is even more pronounced in NIRS since within the forward model light sensitivity values decrease exponentially with the depth (Strangman et al., 2013). Depth-weighted MNE has been first proposed as an approach to compensate this effect in DOT (Culver et al., 2003) and applied in (Zeff et al., 2007; Dehghani et al., 2009; White et al., 2009; Eggebrecht et al., 2012, 2014). Here we consider a more generalized expression as proposed in (Lin et al., 2006). It consists in initializing the source covariance matrix as , resulting in a so called depth weighted MNE solution, described as follows:

Depth weighted MNE solution therefore penalizes most superficial regions, by enhancing the contribution to deeper regions. *ω* is a weighting parameter tuning the amount of depth compensation to be applied. The larger is *ω*, the more depth compensation is considered. *ω* = 0 would therefore refer to no depth compensation and an identity source covariance model. *ω* = 0.5 refers to standard depth weighting approach mentioned above. In the present study, we carefully evaluated the impact of this parameter on DOT accuracy with a set of *ω* values (i.e. *ω* = 0, 0.1,0.3, 0.5, 0.7 *and* 0.9).

### 2.4. Maximum Entropy on the Mean (MEM) for NIRS 3D reconstruction

#### 2.4.1. MEM framework

The main contribution of this study is the first adaptation and evaluation of MEM method (Amblard et al., 2004; Grova et al., 2006; Chowdhury et al., 2013) to perform DOT reconstructions in NIRS. Within the MEM framework, the intensity of *x*, i.e. amplitude of *X* at each location *q* in Eq.1, is considered as a random variable, described by the following probability distribution *dp*(*x*) = *p*(*x*)*dx*. The Kullback-Leibler divergence or *v*-entropy of *dp*(*x*) relative to a prior distribution *dv*(*x*) is defined as,
where *f*(*x*) is the *v*-density of *dp*(*x*) defined as *dp*(*x*) = *f*(*x*)*dv*(*x*). Following the Bayesian approach, to introduce the data fit, we denote *C _{m}* as the set of probability distributions on

*x*that explains the data on average: where

*Y*represents the measured optical density changes,

*E*[

_{dp}*x*] = ∫

*xdp*(

*x*) represents the statical expectation of

*x*under the probability distribution

*dp*, and

*I*is an identity matrix with dimension of (

_{q}*q × q*). Therefore, within the MEM framework, a unique solution of d

*p*(

*x*) could be obtained,

#### 2.4.2. Construction of the prior distribution

To define the prior distribution *dv*(*x*) mentioned above, we assumed that brain activity can be depicted by a set of K non-overlapping and independent cortical parcels. Then the reference distribution *dv*(*x*) is modeled as,

Each cortical parcel k is characterized by an activation state, defined by the hidden variable *S _{k}*, describing if the parcel is active or not. Therefore we denote

*α*as the probability of

_{k}*k*parcel to be active, i.e.

^{th}*Prob*(

*S*= 1). is a Dirac function that allows to “switch off” the parcel when considered as inactive (i.e.

_{k}*S*= 0).

_{k}*N*(

*μ*) is a Gaussian distribution, describing the distribution of absorptions changes within the

_{k}, Σ_{k}*k*parcel, when the parcel is considered as active (

^{th}*S*= 1).

_{k}This type of spatial clustering of the cortical surface into *K* non-overlapping parcel was obtained using a data driven parcellization (DDP) technique (Lapalme et al., 2006). DDP consisted in first applying a projection method, the multivariate source prelocalization (MSP) technique (Mattout et al., 2005), estimating a probability like coefficient (MSP score) between 0 and 1 for each dipolar source on the cortical mesh, characterizing the contribution of each source to the data, followed by region growing around local MSP maxima. Once the parcellization is done, the prior distrubution *dv*(*x*) is then a joint distribution expressed as the multiplication of individual distribution of each parcel in Eq.7 assuming statistical independence between parcels.
where *dv*(*x*) is the joint probability distribution of the prior, *dv _{k}*(

*q*) is the individual distribution of the parcel

_{k}*k*described as Eq.7.

To initialize the prior in Eq.7, *μ _{k}* which is the mean of the Gaussian distribution,

*N*(

*μ*), was set to zero.

_{k}, Σ_{k}*Σ*at each time point

_{k}*t*, i.e.

*Σ*(

_{k}*t*), was defined by Eq.9 according to (Chowdhury et al., 2013), where

*W*(

_{k}*σ*) is a spatial smoothness matrix, defined by (Friston et al., 2008), which controls the local spatial smoothness within the parcel according to the geodesic surface neighborhood order. Same value of

*σ*= 0.6 was used as in (Chowdhury et al., 2013).

*η*(

*t*) was defined as 5% of the averaged energy of MNE solution within each parcel at time

*t*.

We can substitute this initialization into Eq.7 to construct the prior distribution *dv*(*x*). It can be proved that the *v*-entropy in Eq.4 is strictly a concave function that needs to be maximized under constraints in Eq.5. Finally, solving the optimization described in Eq.6 is equivalent to maximizing an unconstrained strictly concave Lagrangian function. Please refer to (Amblard et al., 2004; Chowdhury et al., 2013, 2016) for further details.

#### 2.4.3. Depth weighted MEM

In addition to adapting MEM for NIRS reconstruction, in this study, we also implemented for the first time, depth weighting within the MEM framework. Two depth weighting parameters, *ω*_{1} and *ω*_{2}, were involved in this process. *ω*_{1} was used to depth weight the source covariance matrix of each parcel *k* in Eq.9. *ω*_{2} was applied to solve the depth weighted MNE, as described in Eq.3, before using those prior to initialize the source covariance model within each parcel of the MEM model. Therefore, the standard MNE solution in Eq.9 was replaced by the depth weighted version of MNE solution described by Eq.3. Consequently, the depth weighted version of Σ_{k}(*t*) in Eq.9 for prior initialization is defined as,
where is the depth weighting matrix for each pacel *k*, in which *ω*_{1} was involved to construct this scaling matrix as described in Eq.3. This initialization followed the logic that depth weighting is in fact achieved by scaling the source covariance matrix. The other depth weighting parameter, *ω*_{2}, was considered when solving , therefore avoiding biasing the initialization of the source covariance with a standard MNE solution.

To comprehensively compare MEM and MNE and also to investigate the behavior of depth weighting, we first evaluated the reconstruction performance of MNE with different *ω* (i.e. step of 0.1 from 0 to 0.9). Then two of these values (i.e. *ω* = 0.3 and 0.5) were selected for the comparison with MEM since they performed better than the others. Note that the following expressions of depth weighted MEM will be denoted as MEM(*ω*_{1}, *ω*_{2}) to represent the depth weighting strategies.

#### 2.4.4. Accuracy of temporal dynamics

The last contribution of this study was to improve the temporal accuracy of MEM solutions. In classical MEM (Chowdhury et al., 2013) approach, in Eq.9 was globally normalized by dividing by max where Ω represents all the possible locations along the cortical surface and *T* is the whole time segment. Therefore, the constructed prior along the time actually contained the temporal scaled dynamics from MNE solution. To remove this effect, we performed local normalization for at each time instance *t*, i.e. by dividing by . This new feature would preserve the spatial information provided by prior distribution, while allowing MEM to estimate the temporal dynamics only from the data.

### 2.5. Validation of the proposed DOT methods

We evaluated the performance of the two DOT methods proposed (MEM and MNE), first within a fully controlled environment involving the use of realistic simulations of NIRS data, followed by evaluations on real data acquired with a well controlled finger tapping paradigm. For realistic simulations, theoretical task-induced HbO/HbR concentration changes were simulated within cortical surface regions with a variety of locations, areas and depths. Corresponding optical density changes in the channel space were then computed by applying a dedicated NIRS forward model, before adding real resting state NIRS baseline signal as realistic physiological noise at different signal to noise ratio (SNR) levels. Realistic baseline data were obtained from the actual NIRS acquisition of a single subject during resting state. In a second phase, we also evaluated the reconstruction performance of the proposed methods on real data set, that are, NIRS data acquired on healthy participants during block designed finger tapping tasks, with two different NIRS sensors montages (i.e. the full double density montage and the personalized optimal montage).

#### 2.5.1. MRI and fMRI Data acquisitions

Anatomical MRI data were acquired on 6 healthy subjects (25 ± 6 years old, right-handed male) and considered to generate realistic anatomical head models. The subjects have signed written informed consent forms for this study which was approved by the Central Committee of Research Ethics of the Minister of Health and Social Services Research Ethics Board, Qubec, Canada. MRI data were acquired in a GE 3T scanner at the PERFORM Center of Concordia University, Montral, Canada. T1-weighted anatomical images were acquired with the 3D BRAVO sequence (1 × 1 × 1 mm^{3}, 192 axial slices, 256 × 256 matrix), whereas T2-weighted anatomical images were acquired using the 3D Cube T2 sequence (1 × 1 × 1 mm^{3} voxels, 168 sagittal slices, 256 × 256 matrix). Functional MRI data was acquired in a GE 3T scanner using the gradient echo EPI sequence (3.7 × 3.7 × 3.7 mm^{3} voxels, 32 axial slices, TE = 25 ms, TR = 1, 900 ms).

Besides, participants also underwent functional MRI acquisition during finger opposition tapping tasks. The subject was asked to sequentially tape the left thumb against the other digits at 2Hz. For NIRS acquisition using the double density montage (1 participant), the finger tapping paradigm consisted in 10 blocks of 30s tapping task and each of them was followed by a 30 to 35s resting period. For NIRS acquisition involving NIRS optimal montage (5 participants), 20 blocks were acquired with the task period of 10s and the resting period ranging from 30s to 60s. fMRI Z-maps were generated by standard first-level fMRI analysis using FEAT from FSL software (Smith et al., 2004; Jenkinson et al., 2012).

#### 2.5.2. NIRS Data acquisition

NIRS acquisitions were done at the PERFORM Center of Concordia University using a Brainsight NIRS device (Rogue-Research Inc, Montreal, Canada), equipped with 16 dual wavelength sources (685*nm* and 830*nm*), 32 detectors and 16 proximity detectors.

We considered a first acquisition involving a full Double Density (DD) montage which is a widely used montage in NIRS acquisitions providing sufficiently dense coverage for local DOT (Kawaguchi et al., 2007). A 10 minutes resting state session was acquired to add realistic physiology noise used in the realistic simulations. The subject was seating on a comfortable armchair and instructed to keep the eyes open and to remain awake. The optodes of the full DD montage (i.e. 8 sources and 10 detectors resulting in 50 NIRS channels) are showed in Fig.1e. It is composed of 6 second-order distance channels(1.5*cm*), 24 third-order channels(3*cm*) and 12 fourthorder channels with 3.35*cm* distance. Channels with 4.5*cm* distances were excluded since they were associated with too low SNR when checking the raw finger tapping data. In addition, we also added one proximity detector paired for each source to construct the close distance channel (0.7*cm*) in order to measure superficial signals within extra-cerebral tissues. To place the montage with respect to the region of interest, the center of the montage was aligned with the center of the right “hand knob” area projected on the scalp surface and then each optodes were projected on the scalp surface (see Fig.1d).

For personalized optimal montage cases, we followed the methodology we previously reported in (Machado et al., 2018). First, the hand knob within right primary motor cortex was drawn manually along the cortical surface and defined as a target region of interest (ROI) using the Brainstorm software (Tadel et al., 2011)(Available at http://neuroimage.usc.edu/brainstorm). Then we applied our optimal montage algorithm (Machado et al., 2014, 2018) in order to estimate personalized montages, Fig.7a, built to maximize a priori NIRS sensitivity and spatial overlap between channels with respect to the target ROI. To ensure good spatial overlap between channels for 3D reconstruction, we constructed personalized optimal montages composed of 3 sources and 15 detectors (see Fig.7b). The source-detector distance was set to vary from 2*cm* to 4.5*cm* and each source was constrained such that it has to construct channels with at least 13 detectors. Finally, we also manually added 1 proximity channel, located at the center of the 3 sources.

All montages (Double Density and optimal montages) were built to cover the right motor cortex. Knowing NIRS channels positions estimated on the MRI of each participant, we used a 3D neuronavigation system (Brainsight TMS navigation system, Rogue-Research Inc, Montreal) to guide the installation of the sensors on the scalp. Finally every sensor was glued on the scalp using a clinical adhesive, collodion, to prevent motion (Yücel et al., 2014; Machado et al., 2018). The same finger tapping task paradigm described above was considered for DD and optimal montage NIRS acquisitions.

#### 2.5.3. NIRS forward model

T1 and T2 weighted images were processed using FreeSurfer (Fischl et al., 2002) and Brain Extraction Tool2 (BET2) (Smith et al., 2004) in FMRIB Software Library (FSL) to segment the head into 5 tissues (i.e. scalp, skull, Cerebrospinal fluid (CSF), gray matter and white matter see Fig.1a).

Optical coefficients of the two wavelengths considered during our NIRS acquisition, 685*nm* and 830nm, were assigned to each tissue type mentioned above. Fluences of light for each optode (see Fig.1b) was estimated by Monte Carlo simulations with 10^{8} photons using MCXLAB developed by (Fang and Boas, 2009; Yu et al., 2018) (http://mcx.sourceforge.net/cgi-bin/index.cgi). Sensitivity values were computed using the adjoint formulation and were normalized by the Rytov approximation (Arridge, 1999).

For each source-detector pair of our montages, corresponding light sensitivity map was first estimated in a volume space, and then further constrained to the 3D mask of gray matter tissue (see Fig.1c), as suggested in (Boas and Dale, 2005). Then, these sensitivity values within the gray matter volume were projected along the cortical surface (see Fig.1d and Fig.7c) using the Voronoi based method proposed by (Grova et al., 2006). This volume to surface interpolation method has the ability to preserve sulco-gyral morphology (Grova et al., 2006). After the interpolation, the sensitivity value of each vertex of the surface mesh represents the mean sensitivity of the corresponding volumetric Voronoi cell (i.e. a set of voxels that have closest distances to a certain vertex than to all other vertices). We considered the mid-surface from FreeSurfer as the cortical surface. This surface was then further downsampled to 25, 000 vertices.

#### 2.5.4. NIRS data preprocessing

Using the coefficient of variation (Schmitz et al., 2005; Schneider et al., 2011; Eggebrecht et al., 2012; Piper et al., 2014), channels exhibiting a standard deviation larger than 8% of the signal mean were rejected. Superficial physiological fluctuations were regressed out at each channel using the average of all proximity channels’ (0.7*cm*) signals (Zeff et al., 2007). All channels were then band-pass filtered between 0.01Hz and 0.1Hz using a *3rd* order Butterworth filter. Changes in optical density (i.e.Δ*OD*) were calculated using the conversion to log-ratio. Finally, Δ*OD* of finger tapping data were block averaged within the period of −10s to 60s around the task onsets. Note that since sensors were glued with collodion we observed very minimal motion during the acquisitions.

#### 2.5.5. Realistic Simulation of NIRS Data

To carefully evaluate depth weighted MNE and MEM methods for DOT, we simulated a variety of realistic NIRS data in the channel space, generated by cortical generators with different locations, areas and depths. As presented in Fig.2a, we defined three sets of evenly distributed seeds within the field of view of DOT reconstruction. The locations were selected with respect to the depth relative to the skull, namely we simulated 100 “super-ficial seeds”, 100 “middle seeds” and 50 “deep seeds”. The cortical regions in which we simulated an hemodynamic response were generated by region growing around seeds, along the cortical surface. To simulate generators with different spatial extents, we considered four levels of neighborhood orders, growing geodesically along the cortical surface, resulting in spatial extents ranging from *Se* = 3, 5, 7, 9 (corresponding areas of 3 to 40cm^{2}). For simplification, these cortical regions within which an hemodynamic response was simulated will be denoted as ‘generator’ in this paper. For each vertex within a ‘generator’, a canonical Hemodynamic Response Function (HRF) was convoluted with the experiment paradigm which consisted in one block of 20s task surrounded by 60s pre-/post-baseline period (Fig.2b). Simulated HbO/HbR fluctuations within the theoretical generator (Fig.2c) were then converted to the corresponding absorption changes of two wavelengths (i.e. 685*nm* and 830*nm*). After applying the forward model matrix A in Eq.1, we estimated the simulated, noise free, task induced Δ*OD* in all channels.

Δ*OD* of real resting state data were then used to add realistic fluctuations (noise) to these simulated signals. Over the 10min of recording, we randomly selected 10 baseline epochs of 120s each, free from any motion artifact. Realistic simulations were obtained by adding the average of these 10 real baseline epochs to the theoretical noise-free simulated Δ*OD*, at five SNR levels (i.e. SNR = 5, 3, 2,1). SNR was calculated through the following equation,
where Δ*OD _{λ}*[0,

*t*

_{1}] is the optical density changes of a certain wavelength λ in all channels during the period from 0s to

*t*

_{1}= 60s.

*std*(Δ

*OD*

_{λ}[-

*t*

_{0},0]) is the standard deviation of Δ

*OD*

_{λ}during baseline period along all channels. Simulated trials for each of four different SNR levels are illustrated in Fig.2.d. A total number of 4000 realistic simulations were considered for this evaluation study, i.e., 250 (seeds) × 4 (spatial extents) × 4 (SNR levels).

#### 2.5.6. Validation metric

Following the validation metrics described in (Grova et al., 2006; Chowdhury et al., 2013, 2016; Hedrich et al., 2017), we applied 4 quantitative metrics to access the spatial and temporal accuracy of NIRS 3D reconstructions. **Area Under the Receiver Operating Characteristic (ROC) curve (AUC)** was used to assess general reconstruction accuracy considering both sensitivity and specificity. **Minimum geodesic distance (Dmin)** measuring the geodesic distance, following the circumvolutions of the cortical surface, from the vertex that exhibited maximum of reconstructed activity to the border of the ground truth. **Spatial Dispersion (SD)** assessed the spatial spread of the estimated ‘generator’ distribution and the localization error. **Shape error(SE)** evaluated the temporal accuracy of the reconstruction. Further details on the computation of those four validation metrics are reported in Supplementary material S1.

## 3. Results

We first investigated the effects of depth weighting factor *ω* selection for depth weighted MNE. To do so, we evaluated spatial and temporal performances of DOT reconstruction and decided to apply the best *ω* for depth weighted MNE, *ω* = 0.3 and 0.5, in subsequent conducted analyses. Please refer to the Supplementary material S2 and Fig.S1 for the details. Throughout all of the quantitative evaluations among different methods involving different depth weighting factors *ω* in the results section, Wilcoxon signed rank test was applied to test the significance of the paired differences between each comparison. For each statistical test, we reported the median value of paired differences, together with its p-value (Bonferroni corrected). We are showing only results at 830*nm* for simulations, whereas we found similar trends for 685*nm* (results not shown).

### 3.1. Evaluation of MEM v.s. MNE using realistic simulations

Comparison of the performance of MEM and MNE on superficial realistic simulations are presented in Table. 1 and Fig.3, for 4 levels of spatial extent (*Se* = 3, 5, 7,9), using boxplot distribution of the 4 validation metrics. We evaluated three depth weighted implementations of MEM, MEM(*ω*_{1} = 0.3, *ω*_{2} = 0.3), MEM(0.3, 0.5) and MEM(0.5, 0.5), as well as two depth weighted implementations of MNE, MNE(0.3) and MNE(0.5).

For spatial accuracy, results evaluated using Dmin, we obtained median Dmin values of 0mm for all methods, indicating the peak of the reconstructed map, was indeed accurately localized inside the simulated generator.

When considering the spatial extent of the generators using AUC, for focal generators such as *Se* = 3 and 5, we found significant larger AUC (see Table. 1) for MEM(0.3,0.3) and MEM(0.3, 0.5) when compare to the most accurate version of MNE, i.e. MNE(0.3). When considering more extended generators, i.e. *Se* = 7 and 9, MEM(0.3, 0.5) and MEM(0.5, 0.5) achieved significantly larger AUC than MNE(0.3). However, the AUC of MNE(0.5) was significantly larger than MEM(0.3, 0.3) when *Se* = 7 as well as significantly larger than MEM(0.3, 0.5) and MEM(0.5, 0.5) when *Se* = 9.

In terms of spatial extent of the estimated generator distribution and the localization error, MEM provided significantly smaller SD among all the comparisons. Finally, for temporal accuracy of the reconstruction represented by SE, MNE provided significantly lower values, but with a small difference (e.g. 0.01 or 0.02), than MEM among all comparisons when *Se* = 3, 5.

Similar comparison between MEM and MNE were conducted respectively for middle seed simulated generators and deep seed simulated generators. Results are reported in supplementary material (Fig.S2 and Table. S1 for middle seeds, Fig.S3 and Table. S2 for deep seeds).

To further illustrate the performance of MEM and MNE as a function of the depth of the generator, we are presenting some reconstruction results in Fig.4. Three generators with a spatial extent of *Se* = 5, were selected for this illustration. They were all located around the right “hand knob” area, and were generated from a superficial, middle and deep seed respectively. The first column in Fig.4 shows the location and the size of the simulated generator, considered as our ground truth. The generator constructed from the superficial seed only covered the corresponding gyrus, whereas the generators constructed from the middle seed, included parts of the sulcus and the gyrus. Finally, when considering the deep seed, the simulated generated covered both walls of the sulcus, extended just a little on both gyri. For superficial case, MEM(0.3,0.3) and MEM(0.3,0.5) provided similar performances in term of visualization of the results and quantitative evaluation (*AUC* = 0.96, *Dmin* = 0*mm, SD* = 1.94*mm*, 2.15*mm*, *Se* = 0.03). When compared to MNE(0.3) and MNE(0.5), they clearly provided less accurate reconstructions, spreading too much around the true generator, as confirmed by validation metric, exhibiting notably quite large SD values (*AUC* = 0.86,0.89, *Dmin* = 0*mm, SD* = 9.84*mm*, 14.63*mm*, *Se* = 0.02). When considering the simulation obtained with the middle seed, MEM(0.3, 0.5) retrieved accurately the gyrus part of the generator but missed the sulcus component, since less depth compensation was considered. When increasing depth sensitivity, MEM(0.5, 0.5) clearly outperformed all other methods, by retrieving both the gyrus and sulcus aspects of the generators, resulting in the largest *AUC* = 0.98 and the lowest *SD* = 2.93*mm*. MNE(0.3) was not able to recover the deepest aspects of the generator as well, but also exhibited a large spread outside the ground truth area, with severe false positive, as suggested by a large *SD* = 9.69*mm*. MNE(0.5) was able to find the main cluster well, but it exhibited the largest spread, *SD* = 10.16*mm*. When considering the generators obtained from the deep seed, MNE(0.3) only reconstructed part of gyrus, missing completely the main sulcus aspect of the generator, resulting in low AUC of 0.57 and large SD of 10.34*mm*. MEM(0.3, 0.5) was not able to recover the deepest aspects of the sulcus, but reconstructed accurately the sulci walls, resulting in an AUC of 0.89 and a SD of 2.71*mm*. MEM(0.5, 0.5) recovered the deep simulated generator very accurately, as demonstrated by the excellent scores (*AUC* = 0.97, *SD* = 2.11*mm*) when compared to MNE(0.5). For those three simulations, all methods recovered the underlying time course of the activity with similar accuracy (i.e. similar SE values). In supplementary material, we added Video.1, illustrating the behavior of all the simulations and all methods, following the same layout provided in Fig.4.

### 3.2. Effects of depth weighting on the reconstructed generator as a function of the depth and size of the simulated generators

To summarize the effects of depth weighting in 3D NIRS reconstructions, we further investigated the validation metrics, AUC, SD and SE, as a function of depth and size of the simulated generators. Dmin was not included due to the fact that we did not find clear differences among methods throughout all simulation parameters from the previous results. In the top row of Fig.5, 250 generators created from all 250 seeds with a spatial extent of *Se* = 5 were selected to demonstrate the performance of different versions of depth weighting as a function of the average depth of the generator. Whereas in the bottom row of Fig.5, we involved 400 generators constructed from all 100 superficial seeds with 4 different spatial extent of *Se* = 3, 5, 7, 9, to illustrate the performance of different versions of depth weighting as a function of the size of the generator. According to AUC, depth weighting was indeed necessary for all methods when the generator moved to deeper regions (e.g.> 2*cm*) as well as when the size was larger than 20*cm*^{2}. Moreover, any version of MEM always exhibited clearly less false positives, as indicated by SD values, than all of MNE versions, whatever was the depth or the size of the underlying generator. We found no clear trend and difference of temporal accuracy (i.e. SE) among methods when reconstructing generators of different depth and size generators.

### 3.3. Robustness of 3D reconstructions to the noise level

Whereas all previous investigation were obtained from simulations obtained with a SNR of 5, in this section we compared the effect of the SNR level in Fig.6, on depth weighted versions of MNE and MEM, for superficial seeds only and generators of spatial extent *Se* = 5. We only compared MEM(0.3, 0.5) and MNE(0.5) considering the observation from previous results that these two methods were overall exhibiting best performances in this condition. Regarding Dmin, paired differences were not significant but MNE exhibited more Dmin values above 0mm than MEM at all SNR levels, suggesting that MNE regularly missed the source, whereas MEM always located the maximum of activity within the simulated generator. Regarding AUC, MEM(0.3, 0.5) managed to provide values higher than 0.8 at all SNR levels, whereas MNE(0.5) failed to recover accurately the generator for *SNR* = 1. Besides, in Table.2, we found that difference of AUC between MEM and MNE increased when SNR level decreased, suggesting more robustness of MEM when decreasing the SNR level. The difference of SD also increased when SNR levels decreased. Moreover, MEM exhibited stable SD values, except at *SNR* = 1. Finally, for both methods, decreasing SNR levels resulted in less accurate time course estimation (SE increased), slightly more for MEM when compared to MNE.

### 3.4. Evaluation of MEM and MNE on real NIRS data

For all finger tapping NIRS data considered for our evaluation (1 subjects with the double density montage, 5 subjects with optimal montage), two wavelength (i.e. 685*nm* and 830*nm*) were reconstructed first and then converted to HbO/HbR concentration changes along cortical surface using specific absorption coefficients. All the processes from NIRS preprocessing to 3D reconstruction were completed in Brainstorm (Tadel et al., 2011) using the NIRSTORM plugin developed by our team (https://github.com/Nirstorm). For full double density montage, reconstructed HbR amplitudes were reversed to positive phase and normalized to their own global maximum, to facilitate comparisons. In Fig.7.a, we showed the reconstructed HbR maps at the peak of the time course (i.e. 31s) of MEM and MNE by considering the 4 depth weighted versions, previously evaluated, i.e. MEM(0.3, 0.3), MEM(0.3, 0.5), MNE(0.3) and MNE(0.5). The two depth weighted version of MEM clearly localized well the “hand knob” region, while exhibiting very little false positives in its surrounding. On the other hand, both depth weighted version of MNE clearly overestimated the size of the hand knob region and were also showing some distant possibly spurious activity. The Z-map obtained during the corresponding fMRI task is presented on Fig.7.b, after projection of the volume Z-map on the cortical surface. Fig.7.c showed the time courses within the black patch which represented the “hand knob”. Each line represents the reconstructed time course at one vertex of the hand knob region and the amplitude were normalized by the peak value within the whole region.

Results obtained on 5 subjects for acquisition involving personalized optimal NIRS montage and corresponding DOT reconstructions are presented in Fig.8. For every subject, fMRI Z-maps are presented along the left hemisphere only and thresholded at *Z* > 3.1 (*p* < 0.001, Bonferroni corrected), The most significant fMRI cluster along M1 and S1 was delineated using a black profile. Reconstruction maps at the corresponding HbO/HbR peak timings are showed in the middle of each subject panel. MEM provided accurate HbO and HbR reconstructions in spatial agreement with the main fMRI cluster, whereas there were almost no overlapping between MNE reconstruction and fMRI main cluster for subject 1, 2 and 3. For the other two subjects, both methods provided good level of spatial overlap with fMRI main cluster, MEM was providing more focalized results, and MNE reported more false positives outside the presumed activation region (fMRI cluster). In the end, averaged reconstructed time courses within the fMRI main cluster region are showed with standard deviation as error bar. Similarly to simulations results, MEM exhibited overall very similar time course estimations than MNE in all cases. Considering the task duration was 10s, the reconstructed peak timing of HbO/HbR appeared accurately within the range of 10s to 20s.

## 4. Discussion

### 4.1. Spatial accuracy of 3D NIRS reconstruction using MEM

In the present study, we first adapted MEM framework into the context of 3D NIRS reconstruction and extensively validated its performance. The spatial performance of reconstructions can be considered in two aspects, 1) correctly localizing the main cluster of the reconstructed map close enough to the ground truth area, 2) accurately recovering the spatial extent of the generator. According to our comprehensive evaluations of the proposed depth-weighted implementations of MEM and MNE methods, accurate localization was overall not difficult to achieve for our proposed realistic NIRS simulations and SNR levels, as demonstrated using Dmin measuring the geodesic distance proved by our results. Almost all methods provided median value of Dmin to be 0*mm* in all simulation conditions except at lowest *SNR* = 1 values. On the other hand, we believe that recovering the actual spatial extent of the underlying generator is actually the most challenging task in NIRS reconstruction, especially since we know that from the fMRI literature the expected hemodynamic response elicited by a task is usually relatively extended. Looking for the results of MNE on both realistic simulations and real finger tapping tasks, either from visual inspection (e.g. Fig.4, Fig.7 and Fig.8) or quantitative evaluation by SD (e.g. Fig.3, Table.1 and supplementary section S2), we found that MNE maps usually reconstructed activities largely overestimating the size of the underlying generator. MEM was specifically developed, in the context of EEG/MEG source imaging, as a method able to recover the spatial extent of the underlying generators, which has been proved not to be the case for MNE-based approaches (Chowdhury et al., 2013, 2016; Grova et al., 2016; Hedrich et al., 2017; Pellegrino et al., 2020). This important properties was again successfully demonstrated in our results on NIRS reconstructions. MEM provided accurate spatial extent estimations when evaluating visually on the reconstructed maps or considering AUC and SD metrics, among different size and depth of the simulated generators and for real data during finger tapping tasks.

### 4.2. Importance of depth weighting in 3D NIRS reconstruction

Biophysics models of light diffusion in living tissue are clearly showing that, at all source-detector separations, light sensitivity decreases exponentially with depth (Strangman et al., 2013), at all source-detector separations. The general solution to grant the ability of sensitivity compensation in DOT reconstruction is to introduce depth weighting during the reconstruction. In this study, we carefully investigated the impact of depth weighting effects on DOT reconstruction, as a function of the location and the spatial extent of the underlying generators. As our understanding, the weighting parameter *ω* played a role in tuning the ‘‘effective field of view (FOV)” for reconstructions, which means that for smaller *ω* values, MNE will be “blind” to the deep regions even if these regions are inside or close to the true generator. Indeed, our results are showing that depth weighted values like *ω* = 0.0 and 0.1 were so small that they squeezed the “effective FOV” only to superficial regions such as the gyral crown (as suggested by low AUC values). On the other hand, higher *ω* values like 0. 7 and 0. 9 would bias too much the importance of deep generators and consequently, the most superficial aspects of the underlying generators were not recovered. According to our detailed evaluation on MNE Fig.S1, *ω* = 0.3 and 0.5 seemed to be good candidates offering the best trade off. However, MNE(0.5) reported higher spatial dispersion than MNE(0.3). Depth weighing was also important when recovering more extended generators (> 20*cm*^{2}, Fig.5), for both MNE and MEM, since those extended generators were covering both superficial and deep regions.

### 4.3. Implementation of depth weighting within the MEM framework

In this study, we are proposing for the first time a depth weighting strategy within the MEM framework, by introducing two parameters: *ω*_{1} acting on scaling the source covariance matrix, and *ω*_{2} tuning the initialization of the reference for MEM. When compared to depth weighted MNE, the MEM framework proposed here for NIRS reconstruction demonstrated its ability to reconstruct, different depth of focal generators as well as larger size generators, with better accuracy and less false positives (as illustrated in Fig.5). When considering deeper focal generators (*depth* > 2*cm*), MEM(0.5,0.5) clearly outperformed all other methods (see AUC and SD values in Fig 5). In terms of temporal accuracy, MNE and MEM provided overall similar level of accuracy, which is an important result given the fact the MEM is a non linear operator of the time courses. In summary, for a large range of depth and spatial extent of the underlying generators, MEM methods always exhibited accurate results (large AUC values) and less false positives (lower SD values) when compared to MNE methods.

In practice, we would suggest to consider either *ω*_{2} = 0.3 or 0.5 for the initialization of MEM in all cases and only tune *ω _{x}* according to the expected amount of compensations. This is due to the fact that MNE(0.3 or 0.5) provided a generally good reconstruction with larger true positive rate in most scenarios, therefore providing MEM an accurate reference model (

*dv*(

*x*)) to start with. Even when considering the most focal simulated generators (

*Se*= 3) case (see Fig.3, Table.1 and Fig.5), MEM(0.3,0.3) and MEM(0.3, 0.5) were actually exhibiting very similar performances. Our proposed suggestion to tune

*ω*

_{1}and

*ω*

_{2}parameters was actually further confirmed when considered results obtained from real data. For both montages, MEM(0.3, 0.3) results in excellent spatial agreement with fMRI Z-maps.

### 4.4. Implementation of depth weighting comparing to other similar approaches

Note that depth weighted strategy was originally introduced in DOT by (Culver et al., 2003) and had also been considered in other DOT studies either using MNE (Zeff et al., 2007; Dehghani et al., 2009; White et al., 2009; Eggebrecht et al., 2012, 2014) or a hierarchical Bayesian DOT algorithm (Yamashita et al., 2016). A spatially-variant regularization parameter *β* was applied to the regularization matrix. Different values of *β* were reported in these studies. For instance, *β* = 0.1 was used in (Zeff et al., 2007; Dehghani et al., 2009; Eggebrecht et al., 2014) whereas (Eggebrecht et al., 2012) used *β* = 0.01. Finally, 2.3 × 10^{4} and 2.5 × 10^{4} were considered for motor task and resting-state in (Yamashita et al., 2016) to control the minimum sensitivity for depth compensation to be the average sensitivity of around 2*cm* depth from the scalp. We introduced the depth weighting parameter *ω* which mapped the amount of compensation from 0 to 1 (as described in Eq.3). This is also a standard procedure introduced in EEG/MEG source localization studies (Fuchs et al., 1999; Lin et al., 2006).

### 4.5. Temporal accuracy of 3D NIRS reconstruction using MEM

Another important contribution of this study was that we improved the temporal accuracy time courses estimated within the MEM framework, resulting in similar temporal accuracy than MNE. For instance, the largest significant SE difference between MEM and MNE was only 0.02 for *Se* = 3 and 0.01 for *Se* = 5. Corresponding time course estimations are also reported for MEM and MNE in real data (Fig.7 and Fig.8), suggesting again very similar performances. For instance SE between MEM and MNE HbO time course was estimated as 0.02 for *Sub*05 in Fig.8, suggesting only small difference during the undershoot of the response (22s to 30s). Moreover, we found no significant SE differences between MEM and MNE for more extended generators (*Se* = 7,9). These findings are important considering that MNE is just a linear projection therefore the shape of the reconstruction will directly depend on the averaged signal at the channel level. On the other hand, MEM is a nonlinear technique, applied at every time sample, and not optimized for the estimation of resulting time courses.

### 4.6. SNR robustness of the reconstruction performance using MEM

To further investigate the effects of SNR on both reconstruction methods, we performed the comparisons along 4 different SNR levels, i.e. *SNR* = 1, 2, 3, 5. As shown in Fig.6 and Table.2, we found that MEM was more robust than MNE when dealing with simulated signals at lower SNR values. This is actually a very important result since when reconstructing HbO/HbR responses, one has to consider at least two ΔOD of two different wavelengths (e.g. 685*nm* and 830*nm*). For the simulation results, we reported reconstruction results obtained from 830*nm* data, whereas when considering real data (Fig.7 and Fig.8), we had to convert the reconstruction absorption changes at 685*nm* and 830*nm* into HbO/HbR concentration changes. Therefore, our final results were influenced by the SNR of all involved wavelengths.

There are also SNR variability between subjects as shown in Fig.8. With a good SNR level in Sub05, both MEM and MNE could reconstruct the main cluster of the activation, but MNE provided much more false positive activation outside the ROI. When considering relatively lower SNR cases, e.g. *Sub*02 and *Sub*03, MEM recovered the activation similar to fMRI map. In those cases, MNE not only reported suspicious activation pattern but also suffered to even correctly reconstruct the peak amplitude inside the presumed ROI. Our results suggesting MEM robustness in low SNR conditions for DOT are actually aligned with similar findings suggested for EEG/MEG source imaging, when considering source localization of single trial data (Chowdhury et al., 2018).

### 4.7. Comprehensive evaluation and comparison of the reconstruction performance using MEM and MNE

To perform a detailed evaluation of our proposed NIRS reconstructions methods, we developed a fully controlled simulation environment, similar to the one proposed by our team to validate EEG/MEG source localization methods (Chowdhury et al., 2013, 2016; Hedrich et al., 2017). Indeed such environment provided us access to a ground truth, which is not possible when considering real NIRS data set. Previous studies validated tomography results (Eggebrecht et al., 2014; Yamashita et al., 2016) by comparing with fMRI activation map which can indeed be considered as a ground truth, but only for well controlled and reliable paradigms. Since fMRI also measures a signal of hemodynamic origin, it is reasonable to check the concordance between fMRI results and DOT reconstructions. Therefore, as preliminary illustrations, we also compared our MEM and MNE results to fMRI Z-maps obtained during finger tapping tasks on 6 healthy participants, suggesting qualitatively excellent performances of MEM when compared to MNE. Further quantitative comparison between fMRI and NIRS 3D reconstruction, was out of the scope of this paper and will be considered in future studies.

### 4.8 Sampling size of NIRS reconstructions

As opposed to several other NIRS tomography studies that reconstruct NIRS responses within a 3D volume space, here we proposed to use the mid-cortical surface as anatomical constraint to guide DOT reconstruction. However, the maximum spatial resolution of our surface based reconstruction was similar to the volume based one. Indeed, DOT reconstruction within a volume space usually down-sampled light sensitivity maps to either 2 × 2 × 2 mm^{3} (Eggebrecht et al., 2014), 3 × 3 × 3 mm^{3} (Eggebrecht et al., 2012) or 4 × 4 × 4 mm^{3} (Yamashita et al., 2016) matrices, resulting the down-sampled voxel volume ranging from 8mm^{3} to 64mm^{3}. In our case, when projecting from volume space into cortical surface space, a unique set of voxels were assigned to each vertex along the cortical surface according to the Voronoi based projection method (Grova et al., 2006). Considering the mid-surface resolution (i.e. 25,000 vertices) used in this study, the average volume of a Voronoi cell was 25mm^{3}, which is falling within the same volume range. Therefore we can conclude that both volume-based and surface-based NIRS reconstructions as implemented here would result in similar sampling of the reconstruction space.

### 4.9 NIRS montage for 3D reconstructions

In studies such as (Zeff et al., 2007; White and Culver, 2010; Zhan et al., 2012; Eggebrecht et al., 2012, 2014), a high density montage was considered which was proved to be able to provide higher spatial resolution and robustness to low SNR conditions (White and Culver, 2010). In the present study, we first considered a full double density montage, as proposed in (Kawaguchi et al., 2007), to generate realistic simulations first and then to analyze finger tapping results on real data acquired on one subject. DD montage has been involved in several inverse modelling studies such as (Kawaguchi et al., 2004; Sakakibara et al., 2016; Machado et al., 2018). We therefore considered that using the full double density montage in this study in terms of validating and comparing methods was important. Besides, we also illustrated, in 5 other subjects, MEM performance when considering real data set acquired by optimal montages, exhibiting a large amount of local spatial overlap between channels. In this case, probe design was optimized to maximize the sensitivity to the hand knob ROI, while also ensuring sufficient spatial overlap between sensors (e.g. at least 13 detectors had to construct channels with each of the three sources, and the channel distance was ranging from 2*cm* to 4.5cm, see Fig.8a). We have previously demonstrated in (Machado et al., 2018) that even if high density montages can be considered as a gold standard for DOT reconstruction, personalized optimal montage (Machado et al., 2014, 2018) have ability to deliver accurate reconstructions along the cortical surface. Finally, evaluating the performance of MEM when considering high density NIRS montage would be of great interest but was out of the scope of this present study.

### 4.10 Availability of the proposed MEM framework

Several software have been proposed to provide NIRS reconstruction pipelines, for instance the NeuroDOT (Eggebrecht et al., 2014, 2019), At-lasViewer(Aasted et al., 2015) and NIRS-SPM(Ye et al., 2009). To ensure an easy access of our MEM methodology to our community, we developed and released a NIRS processing toolbox – NIRSTORM (https://github.com/Nirstorm), as a plugin of Brainstorm software (Tadel et al., 2011), which is a renown software package dedicated for EEG/MEG analysis and source imaging. Our package NIRSTORM offering standard preprocessing, analysis and visualization as well as more advanced features such as optimal montage design, access to forward model estimation using MCXlab(Fang and Boas, 2009; Yu et al., 2018) and the MNE and MEM implementations considered in this study.

### 4.11 Limitations and Perspectives

(Tremblay et al., 2018) had comprehensively compared a variety of NIRS reconstruction methods using large number of realistic simulations. Since introducing MEM was our main goal of this study, we did not consider such wide range of methodological comparisons. We decided to carefully compare MEM with MNE since MNE remains the main method considered for DOT, and is available in several software packages. As suggested in (Tremblay et al., 2018), DOT reconstruction methods based on Tikhonov regularization, such as least square regularization in MNE, usually allow great sensitivity, but performed poorly in term of spatial extent, usually largely overestimating the size of the underlying generator. On the other hand, L1-based regularization (Süzen et al., 2010; Okawa et al., 2011; Kavuri et al., 2012; Prakash et al., 2014) could achieve more focal solutions with high specificity but much lower sensitivity. As showed in our results, the proposed MEM framework allows reaching good sensitivity and accurate reconstruction of the spatial extent of the underlying generator. Bayesian model averaging (BMA) originally proposed for EEG source imaging by (Trujillo-Barreto et al., 2004), allows accurate DOT reconstructions with less false positives when compared to MNE. Similarly, we carefully compared MEM to Bayesian multiple priors approaches in (Chowdhury et al., 2013) in the context of MEG source imaging. Such comparison of more advanced DOT reconstruction methods, including also the one proposed by (Yamashita et al., 2016), would be of great interest but was out of the scope of this study.

Moreover, some studies (Eggebrecht et al., 2012; Yamashita et al., 2016) proposed to constrain the reconstruction space to both scalp and cortex when trying to disentangle brain NIRS response from superficial layers signals (skin and muscle layers). These methods extended the linear model into the linear combination of cortex sources and scalp sources with their corresponding forward models. Since this is the first time we implement MEM framework into NIRS, we did not consider this procedure in order to focus our implementation on standard cortical reconstructions. However, we indeed applied short distance channel regression at the sensor level in this study to remove the physiological noise measured from the scalp (Zeff et al., 2007). In the future, we will investigate the performance of MEM when reconstructing both reconstructing NIRS signal of both cortical and superficial layers origins.

Overall one advantage of the MEM framework is its flexibility. Since the core structure of the MEM framework is to provide a unique reconstruction map by maximizing the entropy relative to a reference source distribution, one could implement its own reference for specific usage. For instance, as we did in this paper, the reference distribution considered the depth weighting MNE solution and spatial smoothing. Note that in this study we applied MEM independently for the two wavelengths and then calculated HbO/HbR concentration changes after reconstruction, whereas one could directly solve HbO/HbR concentration changes along with reconstructions. Such procedure has been suggested by (Li et al., 2004), by incorporating signals from the two wavelength within the same DOT reconstruction model. In the future, the MEM framework would allow to easily implement such a fusion model, as suggested (Chowdhury et al., 2015) in the context of MEG/EEG fusion algorithms. Whereas our MEM-based EEG/MEG fusion allows to reach more reliability in the source imaging results (Chowdhury et al., 2018), we will consider such an approach to estimate directly HbO/HbR fluctuations from the two wavelengths signals.

Additionally, there are other extensions of MEM framework such as wavelet based MEM (i.e.wMEM) which could reconstruct specific frequency bands (Lina et al., 2014). Implementing wMEM localization for NIRS, would allow us to target specifically the DOT reconstruction of oscillatory/rhythmic NIRS activities in different frequency bands.

In the end, considering the main contribution of this study was to introduce the MEM framework for 3D NIRS reconstruction, we decided to first carefully evaluate the performance of MEM, using well controlled realistic simulations. Therefore, we also included few real data set reconstructions to illustrate the performance of the MEM reconstruction, whereas quantitative evaluation of MEM reconstructions on larger database will be considered in our future investigations. Indeed it would be interesting to involve more subjects as well as quantitative evaluations in the future.

## 5. Conclusion

In the present paper, we proposed a new NIRS reconstruction method – Maximum Entropy on the Mean (MEM). We first implemented depth weighting into MEM framework and improved its temporal accuracy. To carefully validate the method, we applied a large number (*n* = 4000) of realistic simulations with various spatial extents and depths. We also evaluated the robustness of the method when dealing with low SNR signals. The comparison of the proposed method with the widely used depth weighted MNE was performed by applying four different quantification validation metrics.

We showed that MEM framework could provide more accurate and robust reconstruction results, relatively stable for a large range of spatial extents, depths and SNRs of the underlying generator. Moreover, we implemented the proposed method into a new NIRS processing plugin – NIRSTORM in Brainstorm software to provide the access of the method to users for applications, validations and comparisons.

## Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant Program (CG and JML) and an operating grant from the Canadian Institutes for Health Research (CIHR MOP 133619 (CG)). NIRS equipment was acquired using grants from NSERC Research Tools and Instrumentation Program and the Canadian Foundation for Innovation (CG). ZC is funded by the Fonds de recherche du Qubec Sante (FRQS) Doctoral Training Scholarship and the PERFORM Graduate Scholarship in Preventive Health Research. GP is funded by Strauss Canada Foundation.

## Appendix A. Supplementary material

Supplementary material associated with this article can be found at the end of this manuscript.

## Supplementary material

### S1. Validation metrics

Here is a detailed description of the four validation metrics considered in our evaluation. Except the shape error (SE), other metrics were all calculated at the time instant *τ* when the simulated Δ*OD* time course reached its peak value (e.g. 12.2s after onset).

**Area Under the Receiver Operating Characteristic (ROC) curve (AUC)** was used to assess general detection accuracy of the reconstruction methods. We used a specific version of AUC that has been proposed in (Grova et al., 2006) in order not to bias results towards false positives.

**Minimum geodesic distance (Dmin)** was represented by the geodesic distance, following the circumvolutions of the cortical surface, of the vertex that exhibited maximum of reconstructed activity to the border of the ‘generator’. It should be 0 when the peak of the reconstruction map was located inside the simulated cortical region.

**Spatial Dispersion (SD)** assessed the spatial spread of the estimated ‘generator’ distribution and the localization error using Eq.12. The ideal value (i.e. SD = 0), was achieved when no activation was reconstructed outside the theoretical ‘generator’. The larger the SD was, the more spatially spreading were the reconstructed maps.
where *min*_{j∈Θ}(*D*^{2}(*i,j*)) is the minimum Euclidean distance between the vertex *i* to the vertex *j* which is located inside the simulated ‘generator’ (Θ). is the power of the amplitude of reconstructed time course on vertex i at time *τ*. *K* is the total number of vertices within the reconstruction field of view.

**Shape error(SE)** evaluated the temporal accuracy of the reconstruction. Reconstructed time courses within the simulation ‘generator’ were averaged and normalized. The root mean square of the difference between this time course and the normalized theoretical time course was estimated and denoted as SE in Eq.13 as introduced in (Chowdhury et al., 2013)
where *T* is length of the time course. *X _{th}*(

*t*) is the theoretical time course of the simulation. is the averaged mean of the reconstructed time courses within the ‘generator’.

### S2. Effects of depth weighting on MNE

We first investigated the effects of depth weighting factor *ω* selection for depth weighted MNE. To do so, we evaluated spatial and temporal performances of DOT reconstruction. As presented in Fig.S1, we compared depth weighted MNE using depth weighting factors *ω* = 0, 0.1, 0.3, 0.5,0.7, 0.9 in superficial seeds case. In general, *ω* = 0.3 and 0.5 provided overall the most accurate results (i.e. median AUC > 0.8 and *Dmin* = 0*mm*). For focal generators(i.e. *Se* = 3, 5), *ω* = 0.3 performed better than *ω* = 0.5 considering it was providing significantly lower SD. However, in extended generators (i.e. *Se* = 7, 9), reconstructions with *ω* = 0.5 were exhibiting more accurate results, consisting in significantly positive AUC difference (0.05 and 0.08, p < 0.001) and significantly positive SD difference (2.24 and 2.06, *p* < 0.001). *ω* = 0 and 0.1 only provided AUC higher than 0.8 in the case of *Se* = 3, whereas *ω* = 0.7 and 0.9 failed in all cases and even the median values of Dmin were significantly larger (median values around 2-3 cm). From these results, we decided to consider only the depth weighting values *ω* = 0.3 and 0.5 for depth weighting MNE in the comparisons with the MEM.

### S3. MEM v.s. MNE with realistic simulations involving middle and deep seeds

In Fig.S2 and Table.S1, we showed the comparison of MEM and MNE in middle seeds case. First of all, we found that more depth compensation was required to provide good reconstructions in all scenarios. Thus, MEM(0.5, 0.5) was compared to the best of MNE – MNE(0.5). Non-significant AUC and Dmin differences were found between them. However, MEM(0.5, 0.5) provided significant lower SD than MNE(0.5), median value of difference of *SD* = −5.33, −4.80, −5.00, −4.95, *P* < 0.001 for *Se* = 3,5,7,9 respectively. Fig.S3 and Table.S2 presented the comparison of them in deep seeds case. Similarly, no significant AUC and Dmin differences were found. MEM(0.5, 0.5) provided significant lower SD than MNE(0.5), median value of difference of SD = −6.39, −6.33, −6.97, −5.52, *P* < 0.001 for *Se* = 3, 5, 7,9 respectively. For temporal performance in these two cases, similar to Fig.3, MNE(0.5) gave significant lower SE (−0.01 or −0.02, *p* < 0.001) than MEM when *Se* = 3, 5 (small difference). No significant different SE was found in *Se* = 7, 9.