A Framework to Incorporate D-trace Loss into Compositional Data Analysis

The development of high-throughput sequencing technologies for 16S rRNA gene profiling provides higher quality compositional data for microbe communities. Inferring the direct interaction network under a specific condition and understanding how the network structure changes between two different environmental or genetic conditions are two important topics in biological studies. However, the compositional nature and high dimensionality of the data are challenging in the context of network and differential network recovery. To address this problem in the present paper, we proposed a framework to incorporate the data transformations developed for compositional data analysis into D-trace loss for network and differential network estimation, respectively. The sparse matrix estimators are defined as the minimizer of the corresponding lasso penalized loss. This framework is characterized by its straightforward application based on the ADMM algorithm for numerical solution. Simulations show that the proposed method outperforms other state-of-the-art methods in network and differential network inference under different scenarios. Finally, as an illustration, our method is applied to a mouse skin microbiome data. Author summary Inferring the direct interactions among microbes and how these interactions change under different conditions are important to understand community-wide dynamics. The compositional nature and high dimensionality are two distinctive features of microbial data, which invalidate traditional correlation analysis and challenge interaction network estimation. In this study, we set up a framework that combines data transformation with D-trace loss to infer the direct interaction network and differential network from compositional data. Simulations and real data analysis show that our proposed methods lead to results with higher accuracy and stability.

hierarchical model called MInt. Kurtzet al. [18] proposed a method called SPIEC-EASI, 34 which combines centered log-ratio (clr) transformation [1] for compositional data with 35 the neighborhood selection approach [20] or graphical lasso [13] to estimate the 36 precision matrix. Similar to the idea of Yuan and Lin [29], Fang et al. [10] first derived 37 likelihood with compositional data for Gaussian graphical models and then estimated 38 the precision matrix with a lasso penalized maximum likelihood method called gCoda. 39 In this paper, we proposed a framework to incorporate clr transformation [1] into 40 D-trace loss [30] to estimate the precision matrix from compositional data. 41 Biological networks often vary according to different environmental or genetic 42 conditions [3]. Understanding how networks change and estimating differential networks 43 are important tasks in biological studies. In recent years, researchers have actively 44 sought methods of estimating differential networks for absolute abundance data. 45 Chiquet et al. [6], Guo et al. [16] and Danaher et al. [7] estimated the precision matrices 46 and their differences jointly by penalizing the joint log-likelihood with different 47 penalties. Zhao et al. [31] developed a 1 -minimization method for direct estimation of 48 differential networks, which does not require sparsity of precision matrices or their 49 separate estimation. Yuan et al. [28] proposed a new loss function called DTL based on 50 D-trace loss [30] to estimate the precision matrix difference directly. In this paper, we 51 also extended our framework to incorporate clr transformation [1] into DTL [28] to 52 estimate the differential network from compositional data. 53 The remainder of the paper is organized as follows. In Section 2, we introduce our 54 framework to incorporate clr transformations for compositional data analysis into 55 D-trace loss, thereby enabling us to estimate both direct interaction network and 56 differential direct interaction networks from compositional data, respectively. In 57 Section 3, the performance of our method was evaluated and compared with other 58 state-of-the-art methods under various simulation scenarios. In Section 4, the proposed 59 November 1, 2018 2/14 methods are illustrated with an application to a mouse skin microbiome data. 60 2 Materials and methods 61

62
We begin with some notations and definitions for convenience. For a p × p matrix 63 X = (X ij ) ∈ R p×p , its transposition, trace and determinant are denoted as X T , tr(X) 64 and det X, respectively. Let X F = ( i,j X 2 ij ) 1/2 , X ∞ = max i j |X ij |, 65 |X| 1 = max j i |X ij |, X 1 = i,j |X ij |, and X 1,off = i =j |X ij | be the Frobenius 66 norm, ∞-norm, 1-norm, 1 -norm and off-diagonal 1 -norm of X. Denote by vec(X) the 67 p 2 -vector from stacking the columns of X, and X 0 means that X is positive definite. 68 For two matrices X, Y ∈ R p×p , let X ⊗ Y be the Kronecker product of X and Y . We 69 use X, Y to denote tr(XY T ) throughout this paper.

70
Suppose that there are p microbe species and that their absolute abundances are 71 z = (z 1 , z 2 , . . . , z p ) respectively. However, instead of absolute abundances, it is often the 72 case that only the relative abundances (or compositions)  Log-ratios ln xi ln xj [1] are commonly used in compositional data analysis, since the 81 simple equality ln xi ln xj = ln zi ln zj serves as the bridge between the compositions and the 82 unobserved absolute abundances [18]. Aitchison [1] also proposed a statistically 83 equivalent centered log-ratio (clr) transformation. The clr transformation matrix is 84 G = I − 1 p 1 p 1 T p , where 1 p is a p-dimensional all-ones vector and I is identity matrix.

85
Applying the clr transformation and using ln x = ln z − 1 p ln s and G1 p = 0 p , it follows 86 that Denoted by Σ ln x the covariance matrix of the log-transformed relative abundances, we 88 have Similarly, equation (2) and (3) establish a bridge between the observed relative 90 abundances and the unobserved absolute abundances. SPIEC-EASI [18] assumes that

91
GΣ ln x G serves as a good approximation of Σ since G − I ≈ 0 when p 0, and apply 92 the neighborhood selection approach [20] or graphical lasso [13] to the clr-transformed 93 relative abundances for precision matrix estimation. From the empirical loss minimization perspective, SPIEC-EASI is not the most natural 96 and concise because of the approximation and the log-determinant term in graphical 97 lasso [13]. In this section, we introduce an innovative framework to estimate the direct 98 interaction network from compositional data with D-trace loss. The new D-trace loss for 99 compositional data (CDTr loss) is proposed as We can view the CDTr loss as an analogue of the D-trace [30]   It is easy to check that CDTr loss can be written as To ensure that Σ −1 minimizes L CD , namely Σ 1/2 GΘ − Σ −1/2 G = 0 when Θ = Σ −1 , we 112 need the following exchangeable condition: Denote by σ ij and ρ ij the covariance and correlation between ln z i and ln z j , 114 respectively. Then, the exchangeable condition is equivalent to l σ il = l σ jl for all 115 i, j = 1, 2, . . . , p, which is similar to the assumption l =i σ il = 0, i = 1, 2, . . . , p in 116 SparCC [14]. If the variances σ ii , i = 1, 2, . . . , p are all the same, then the exchangeable 117 condition simplifies to l =i ρ il , i = 1, 2, . . . , p are all the same, which implies that the 118 average correlation with other species is nearly the same for each specie. Analogously, 119 the assumption in SparCC simplifies to l =i ρ il = 0, i = 1, 2, . . . , p, which implies that 120 the average correlations are very small. In the numerical experiments of section 3, we 121 show that CDTr still performs well, even when the exchangeable condition does not 122 hold.

123
In practical applications, we use the empirical version of CDTr loss as Since most species do not interact directly when the number of species p is large, we 125 further assume that the direct interaction network, or Θ, is sparse, which also helps to 126 solve the under-determined problem caused by compositionality and 127 dimensionality [9,10,29]. We employ the commonly used 1 penalty [25,29,30] to handle 128 the sparse assumption, and our sparse estimator of the precision matrix Θ is proposed as 129 where λ ≥ 0 is the tuning parameter for the tradeoff between the model fitting and the 130 sparsity ofΘ D . Following the idea of Zhao et al. [31], the tuning parameter is selected 131 by minimizing the Bayesian Information Criterion (BIC) [22] as where |Θ| 0 is the number of non-zero elements in the upper-triangle of Θ, and n is the 133 sample size. 134 Zhang and Zou [30] developed an efficient algorithm based on alternating direction 135 methods [21] for the solution of penalized D-trace loss estimator. We can simply replace 136 Σ and I in D-trace loss with GΣ ln x G and G in our CDTr loss and use the algorithm of 137 Zhang and Zhou [30] for the numerical solution of (8). Following the idea of Zhang and 138 Zhou [30] and Scheinberg et al. [21], we introduce two new matrices, Θ 0 and Θ 1 . The 139 augmented Lagrangian function of (8) are considered, and Λ 0 , Λ 1 , ρ are Lagrangian

150
A straightforward approach to estimate ∆ is to estimate Σ −1 and Σ * −1 separately 151 and then subtract the estimates under the key assumption that both precision matrices 152 are sparse. However, a more reasonable assumption is that the difference between the 153 precision matrices are sparse, not that both matrices are sparse, since direct interactions 154 may not be sparse while the changes under different conditions are often sparse [28].

155
Therefore, we proposed a new loss function for differential network estimation with 156 compositional data (DCDTr loss) to estimate ∆ directly, under the assumption that the 157 differential network ∆ is sparse. The DCDTr loss is proposed as another perspective, we can arrive at our DCDTr loss (10) by substituting the 164 approximation Σ ≈ GΣ ln x G, Σ * ≈ GΣ ln x * G into DTL loss. In the numerical 165 experiments of section 3, we also investigated the performance of procedures which 166 combine the approximation Σ ≈ GΣ ln x G, Σ * ≈ GΣ ln x * G with other methods for 167 differential network estimation, including the 1 -minimization method [31] for direct 168 estimation of differential networks and joint graphical lasso (FGL, GGL) [7] for joint 169 estimation of precision matrices. The detailed formulas are left in ??.

170
Under the exchangeable condition GΣ = ΣG and GΣ * = Σ * G, it is easy to check Obviously, ∆ = Σ * −1 − Σ −1 is a minimizer of our DCDTr loss L DCDT r . In practical 173 applications, we incorporate the finite sample estimators of Σ, Σ * and 1 penalty into 174 DCDTr loss, and our sparse estimator for the differential network ∆ is proposed as The tuning parameter λ is selected by minimizing the Bayesian Information Criterion 176 (BIC) [22,28,31] as where |∆| 0 is the number of non-zero elements in the upper-triangle of ∆, and n and n * 178 are the sample size.

179
Taking advantage of the algorithm developed by Yuan et al. [28] for the numerical 180 solution of lasso penalized DTL loss estimator, the algorithm for the numerical solution 181 of (12) is straightforward, essentially because we can simply replaceΣ andΣ * in DTL 182 loss with GΣ ln x G and GΣ ln x * G in our DCDTr loss. Following the idea of Yuan et 183 al. [28], we introduce three new matrices ∆ 1,2,3 and Lagrangian multipliers Λ 1,2,3 , ρ for 184 the solution of (12). The steps of the ADMM algorithm for the lasso penalized DCDTr 185 loss estimator are presented in Algorithm 2. The definitions of matrix operators K(X) 186 and S(X) are listed in S1 Appendix. (c) ∆ k+1 3 ) and Λ k+1

197
To investigate the performance of CDTr loss and the influence of the exchangeable 198 condition, we considered the following network structures for Θ. 199 1. Band graph: otherwise.  SPIEC(GL) [18]. We further consider an approximation method called aCDTr, which 217 approximates Σ with GΣ ln x G [18] and employs D-trace loss to estimate Θ = Σ −1 .

218
Specifically, the estimator of aCDTr is The true positive rate and true negative rate are evaluated at different tuning 220 parameters and used to generate the receiver operating characteristic (ROC) curve. We 221 use the area under the curve (AUC) to quantify the ability to recover the true 222 underlying network.

223
In Table 2, we present the mean AUC scores of the above-mentioned methods under 224 different settings. The mean AUC scores of CDTr and aCDTr are superior to the other 225 three methods in all cases, even when the exchangeable condition does not hold exactly, 226 which implies that CDTr and aCDTr outperform other methods in direct interaction 227 network recovery. Moreover, the mean AUC of CDTr is slightly higher than that of 228 aCDTr, except for the cluster graph and sample size n = 50. With increasing deviation, 229 the performance of CDTr and aCDTr decreases, which is reasonable if the exchangeable 230 condition does not exactly hold. Interestingly, the performance for the other three 231 methods also decreases with increasing deviation. For all network structures and 232 methods, the mean AUC scores increase as the sample size increases.   4. Scale-free graph: A scale-free graph is produced, following the B-A algorithm [4].

243
The initial graph has two connected nodes, and each new node is connected to only 244 one node in the existing graph with the probability proportional to the degree of the 245 each node in the existing graph. This results in p edges in the graph, and the  Table 3.

265
We investigate the performance of DCDTr loss with some experiments in this section.

266
The first precision matrix Θ is generated as follows:   Table 5. Therefor, 289 the differential matrix ∆ is Θ * − Θ. The two precision matrices Θ and Θ * are used to 290 generate data separately.   Table 6 presents the mean AUC scores of different methods under different settings. 296 We see that no method is generally better than the others in all cases. DCDTr performs 297 better than other methods in random graph, neighbor graph and block graph, while 298 GGL achieves higher AUC in scale-free and hub graph. With the increase of sample size, 299 the advantage of DCDTr becomes increasingly significant. Generally speaking, our  In this section, we illustrate our proposed method with an application to mouse skin 303 microbiome data [23]. A total of 261 mice were divided into 3 groups: 78  OTU counts by 0.5 to avoid zero counts and normalized the data to compositional data. 311 Since the the underlying true direct interaction networks were not available and the 312 accuracy of estimated networks was unobtainable, we evaluated the performance of the 313 proposed methods with reproducibility as suggested by Fang et al. [10] and Kurtz et 314 al. [18] suggusted. More specifically, we first constructed a reference network est 1 315 (precision matrix or differential matrix) with all data for each group and method. We 316 then selected half of the samples randomly to estimate the precision matrix or 317 differential matrix (denoted by est 2 ) again. The reproducibility was measured by the 318 fraction of overlapping edges shared by est 1 and est 2 in the reference network est 1 .

319
For each group and each method of precision matrix estimation, the procedure 320 stated above was repeated 20 times. The mean reproducibility is summarized in Table 7. 321 CDTr and aCDTr outperformed the other three methods in terms of reproducibility in 322 all three groups, implying that CDTr and aCDTr are more stable and accurate in direct 323 interaction estimation. We also estimated the differential network for the 324 Control-Healthy and Control-EBA groups, and the evaluation procedure was also 325 repeated 20 times. The mean reproducibility is listed in   Finally, we employed all methods to build a candidate microbiome association 329 network from the unified dataset for each group and group pairs. In Fig ??, we present 330 the number of shared edges for direct interaction networks recovered from various 331 methods via Venn diagrams. We can see that the direct interaction network from CDTr 332 is close to that of aCDTr, while the network from SPIEC(GL) and SPIEC(MB) are 333 more similar. A total of 28, 46 and 24 edges are shared by all candidate networks for 334 control, healthy and EBA groups, respectively, comprising the core interaction network 335 among OTUs. Moreover, almost all direct interactions discovered by CDTr and aCDTr 336 are in this core interaction network, while SPIEC(GL), SPIEC(MB) and gCoda discover 337 some eccentric interactions. The number of shared edges for differential networks are 338 shown in Fig ??. The situation for differential networks is much more complicated. 339 1 -M discovered many eccentric differential edges in both groups, but these were not 340 confirmed by other methods. The differential edges from GGL and FGL are almost the 341 same for both groups, and are more than the edges from DCDTr. Most differential 342 edges from DCDTr were verified by both GGL and FGL for both groups, implying that 343 DCDTr is good at inferring the crucial differential edges without mixing nonessential 344 edges. 345

346
Inferring the direct interactions among microbial species and understanding how the 347 network structure changes are important in the study of ecology and medicine. In this 348 paper, we propose a framework to estimate the direct interaction network and