Optimizing network propagation for multi-omics data integration

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a ‘topology bias’ caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.


INTRODUCTION
formal reasons, the optimal choice of methods and parameters is often dependent on the research question and the available data. Despite the recent popularity of network propagation and many published examples, there is incomplete understanding about how the choice of the network normalization method, the network propagation algorithm and the tuning parameters influence the result, and there is a need for guidance on how to optimize the analysis scheme.
In order to address these questions, we focus on two of the most popular network propagation algorithms: RWR and HD. For the algorithm description, we assume that the vector F0 contains the initial node scores that are to be propagated, such as fold changes of transcripts or phenotypes measured per gene. These node scores are 'mapped' onto a pre-defined network, which may or may not be weighted. This network is represented by a (commonly normalized) matrix W.
Random Walk on (unweighted) graphs is an iterative walker that starts from some node and at each time step transitions randomly to one of the neighboring nodes of the current node on the graph. In a weighted network, the transition to adjacent nodes is proportional to the edge weights. The 'time' that the walker spends on each node will depend on the 'sharing' of information with that node. A problem of Random Walk is that it does not converge to a solution that reflects the input data (i.e. the vector F0). Instead, the final solution will be solely dependent on the network topology. Random Walk with Restart is different in this respect: it is a variant of Random Walk with an option to transition back to the starting node at every iteration with the restart probability (1 − α) (0≤α ≤1). In the network propagation framework the restart probability determines the magnitude of signal that has to be retained at the query nodes, i.e. it can be interpreted as a dampening factor on long walks. Thus, a small α will keep the node scores close to the initial scores, whereas a large α will average scores of connected nodes in the network more strongly. RWR is an iterative procedure, the node scores are propagated on the network until they converge (Vanunu et al. 2010) -assuming that the convergence conditions are satisfied. Node scores at the i th iteration are updated according to: where α is the spreading coefficient representing the fraction of spread, W is a matrix representing the normalized network, F0 is a vector containing the initial node scores and Fi is the vector representing the propagated scores. The convergence is estimated through calculating the norm of Fi -Fi-1 and the iteration is stopped when the value falls below a certain threshold (e.g. below 10 −6 ) (Hofree et al. 2013). If denotes the limit of the sequence { } then where I is the identity matrix.
Heat Diffusion (HD; also called diffusion kernel) is a continuous-time analogue of RWR and models a fluid flow over a fixed number of time steps. These are defined by a time parameter t, which controls the spreading of signal (Cowen et al. 2017). The amount of 'fluid' that ends up at all network nodes can be computed according to where t in [0, ∞) is the time diffusion and is a tunable parameter, and Ft is the vector representing the propagated node scores. When t is set to zero, the node scores are not propagated to adjacent nodes (i.e. Ft = F0) and when t approaches infinity the solution will exclusively depend on W, i.e. the network topology, and information on the initial node scores is lost (Hancock, Wilson, and Bai 2005).
In this study we have systematically analyzed the performance of these two algorithms using Rattus norvegicus ageing mRNA and protein abundance data from two different metabolically active tissues: brain and liver at two age groups (Ori et al. 2015) as well as mRNA and protein data from prostate cancer patients of different grade groups (Charmpi et al. 2020). We focused on two questions: (1) how does the graph normalization (i.e. the computation of W) determine the influence of network topology on the final result? (2) Which strategies can be used for finding optimal choices for the smoothing parameters α and t?
Firstly, we are demonstrating that an inappropriately chosen graph normalization method can lead to a topology bias, i.e. a biased increase or reduction of node scores that is exclusively due to the network structure and independent of the input vector F0. Secondly, we used the mean squared error and its decomposition into bias and variance as a first criterion to select optimal parameters. Thirdly, transcriptome replicates were utilized to illustrate the effect of changing the spreading parameters in order to increase the consistency between replicate measurements. Fourthly, we employ RWR and HD to increase the consistency between transcriptomics and proteomics data. These analyses underlined again the differences and similarities between the methods and emphasized the importance of choosing optimal spreading parameters. Finally, we used network propagation to impute missing values and uncover ageing-associated genes as well as to identify sub-networks distinguishing prostate tumors of different grades.
network is unweighted). In case of undirected networks A is symmetric. Network propagation algorithms do not usually operate directly on A (i.e. setting W = -A), because this would strongly bias the results in favor of nodes with many neighbors and because convergence cannot be guaranteed for all values of the smoothing parameter. Different graph normalization methods have been proposed, here we focus on three popular forms: the Laplacian transformation, the normalized Laplacian, and the degree normalized adjacency matrix (Table 1). The Laplacian is defined by where D is a diagonal matrix containing the node degrees such that the entry dii is the sum of the i-th row of A. The eigenvalues of the matrix WL are not upper bound by 1, and therefore RWR is not guaranteed to converge for all 0 ≤α ≤1 (Chung 1996;Ma, Chen, and Sun 2014). Thus, this matrix is only used for HD and not further considered for RWR.
The Laplacian WL accounts for node degrees only in its diagonal, whereas the actual edge weights (offdiagonal elements) remain unchanged. The normalized Laplacian W � also normalizes off-diagonal elements using the degrees of the interacting nodes i and j: For RWR, we additionally set the diagonal of W L to zero (see Material and Methods): where I is the identity matrix.
Whereas the Laplacian is based on subtracting the degree matrix D, the degree row-normalized adjacency matrix is obtained by multiplying the inverse of D with A. We normalize the matrix row-wise (Yoon, Jin, and Kang 2018;Lovász 1993;Sonnenberg 2009) and define the degree row-normalized adjacency matrix as The use of the degree row-normalized adjacency matrix with HD leads to an exponential increase of the propagated node scores by the factor of e t . To avoid that for very large t the algorithm may run into numerical instabilities, the identity matrix I was added to the degree row-normalized adjacency matrix to ensure that the row sums are equal to zero: The adjacency matrix can also be normalized column-wise (Cowen et al. 2017).
To illustrate the effect of different graph normalization methods, we used an undirected network for the rat Rattus norvegicus, because the same network was subsequently used to investigate the effect of the smoothing parameters. We retrieved the network from the protein interaction database STRING version 10.5 (Szklarczyk et al. 2015) and filtered for high confidence edges (probability of an interaction ≥ 0.9) and the largest connected network component (9,388 nodes). However, the conclusions in this section regarding the choice of W are independent of any particular network.
The core idea of network propagation is that regions in a network enriched with high-scoring nodes get 'amplified', whereas regions with many scores close to or varying around zero get 'dampened'.
Thus, the result should depend on both, the structure of the network (its topology) and the initial nonrandom distribution of node scores. The network topology alone, i.e. independent of the node scores, should not lead to increased node scores after network propagation. However, inappropriately chosen graph normalization methods can lead to a topology bias. To demonstrate this effect, we computed for an undirected network the propagated results for an input vector of ones, i.e. setting all entries in F0 to 1. Without a topology bias all nodes should have identical scores after network propagation, because there was no initial 'clustering' of high-scoring nodes in any sub-network. Note, that this is equivalent to the expected outcome for an infinite number of random input vectors with mean 1.
The normalized Laplacian leads to a topology bias for both RWR and HD, i.e. hub nodes get higher scores than non-hub nodes (Fig. 1). However, using either the Laplacian (HD only) or the rownormalized adjacency matrix (HD & RWR) prevents a topology bias, i.e. after propagating the unit vector all nodes have identical scores (see Material and Methods; Table 1). Since the columnnormalized adjacency matrix (Cowen et al. 2017) also leads to topology bias (the limiting vector being proportional to the vector of node degrees for instance for RWR and α=1), we decided to use the rownormalized adjacency matrix in this analysis. All subsequent analyses employed the degree rownormalized adjacency matrix in order to avoid any topology bias. With this choice of graph normalization, each input vector will converge to a vector with the same value for all elements as α approaches one and t goes to infinity -in other words there will be no variability between the nodes for α=1 and t large enough. More details regarding the presence or absence of a topology bias using different graph normalization and propagation approaches are described in the section Material and Methods.  Using network propagation to maximize within-dataset consistency Spreading coefficients are tunable parameters of HD and RWR, and they control the amount and distance of signal spread on the interaction network. For RWR, the parameter α defines the fraction of node scores that has to be propagated to adjacent nodes at each iteration and it ranges between 0 and 1. Likewise in HD, the parameter t corresponds to the fraction of diffusion and it ranges from 0 to infinity. Whereas previous studies have often used arbitrarily chosen spreading parameters we wanted to know to what extent final results depend on the chosen spreading parameter and we wanted to test optimization criteria for selecting them. If the network topology is informative of genegene relationships, i.e. if it relates to the covariance structure in the data, one would expect that network propagation would drive noisy measurements closer to their (true) population means.
Here, we demonstrate that this notion can be used to find optimal choices for the spreading parameters. If network propagation enhances true biological signal in the data it should improve the consistency between replicate measurements. However, too excessive 'smoothing' might yield node scores that become similar between replicates, but which are independent of the input data and solely depend on the network structure. We demonstrate the utility of this idea by comparing individual replicate measurements to the mean of all replicates (before propagation), thus assuming that the average over all replicates is closer to the true population mean.
The spreading parameters determine how close the final solution will be to the initial vector F0 versus all converging to a common value. Thus, it has been suggested that network propagation can be understood as a regularization approach (Zhou et al. 2003). Here we show that network propagation yields a bias-variance tradeoff, using the following intuition: smoothing/'averaging' the data tends to make it less noisy, but this information sharing also creates biases (on the mean values). In other words, the propagated scores/values (e.g. log2 fold changes) of the genes will vary less across the replicates/samples, but will be distributed around a mean (propagated mean) which is different from the true mean. More rigorously, let 0 = � 01 , 02 , … , 0 � be the random vector of p scores (e.g.
fold changes) with population mean vector . Using the propagated vector 0 as estimate for (where = This formula corresponds to the classical bias-variance decomposition of MSE-where the first term corresponds to the variance, the second to the bias 2 and the third (error term) is equal to zero since Computing an average MSE over the genes we get where is the number of nodes (i.e. genes or proteins), the first term corresponds to an average variance and the second to an average bias 2 . When = = 0, the matrix is the identity matrix so the average bias 2 is equal to zero, but is expected to increase with increasing spreading parameters since the propagation (with the row-normalized adjacency matrix) eventually 'forces' all genes to a common value. On the other hand, the average variance between replicates is expected to decrease with more aggressive smoothing since smoothed node scores will assume a weighted average ( 0 ) using information from more and more genes (i.e. the sample size is increased and the noise terms tend to cancel out each other). Here, we propose to choose α or t such that they minimize the average MSE, i.e. minimizing the sum of bias 2 and variance.
Likewise, we computed a fold change for each young sample versus the average of all old samples, creating another set of three fold change estimates per gene ( ���� versus young1, ���� versus young2, ���� versus young3). Thus, this scheme results in exactly one fold-change estimate per sample. We refrained from directly comparing expression levels (i.e. not computing fold changes) for two reasons: first of all, comparing absolute expression levels would be dominated by highly expressed genes.
Second, estimating fold changes is more relevant for the majority of research questions.
First, we confirmed empirically that increasingly aggressive network propagation reduces the withinreplicate variance (node scores become more similar) and increases the between-gene correlation (see Supplementary Fig. 3).
Next, we asked if network propagation could help to denoise the data. Since we do not know the ground truth fold changes, we used the following notion: we assumed that the fold changes of the averaged replicate measurements are closer to the true fold changes than fold changes resulting from the individual replicates. Hence, we figured if network propagation performed on individual replicates 'moves' fold changes closer to the across-replicate average (before network propagation), this would reflect a noise reduction. In order to test this notion, we performed network propagation on the log2 fold changes of each replicate/tumor sample as well as the average log2 fold changes with varying spreading parameters α and t.
Here we estimated the expectation 0 is the vector of fold changes of replicate/sample and the mean vector by the initial average log2 fold changes. Using this we computed the average MSE for each sample in each dataset with varying spreading parameters for HD and RWR. Likewise, we estimated the variance from Equation (8) per sample, approximating the expectation [(( 0 ) − ( ) ) 2 ] with the corresponding replicate value (( 0 ) − ( ) ) 2 , and the average bias 2 by computing the squared difference between the propagated average log2 fold changes and the initial average log2 fold changes.
Network propagation could reduce the MSE for the rat brain data ( Fig. 2A-B), whereas it did not reduce the MSE of the liver data ( Fig. 2C-D). We noted that the initial absolute fold changes in the liver were on average much larger than in the brain ( Supplementary Fig. 4), indicating that upfront there was a stronger signal in the liver. Hence, it is possible that the relatively strong signal in the liver data could not be further improved through network propagation (see Discussion for more details).
In order to further test the idea that minimizing the MSE could be used as an optimization criterion for the spreading parameters, we also tested a dataset of 66 prostate cancer (PCa) samples belonging to four different grade groups (G1, G2, G3, G4/5) of increasing aggressiveness and 39 benign samples from 39 patients (Charmpi et al. 2020). We quantified mRNA and protein log2 fold changes for the tumor samples (versus benign) as described in (Charmpi et al. 2020) as well as average (mRNA and protein) log2 fold changes across the tumor samples. In the PCa cohort, moderate smoothing minimized the MSE on the mRNA layer, while more aggressive smoothing was required for protein data ( Fig. 2E-H). Next, we performed the same analysis individually for each tumor grade group, as opposed to combining all tumors in one analysis. The amount of network smoothing required to minimize the MSEs in this case was almost the same as for the global analysis ( Supplementary Fig. 5).
Subsequently, we tested an alternative way to quantify the inter-replicate consistency by using the correlation between replicate fold change estimates and average fold changes. The average interreplicate fold change correlations of the rat transcriptomes before smoothing were 0.693 and 0.877 for brain and liver, respectively. In the case of the brain samples, network propagation could improve the inter-replicate consistency at least when using HD (Fig. 3A-B). However, for the liver the already high consistency could only marginally be increased with network propagation (Fig. 3C-D) -an observation that was consistent with the MSE-based analysis. Importantly, more aggressive smoothing with α and t greater than the optimum first reduced the inter-replicate consistency, suggesting that biological signal was diminished. Even further smoothing then lead to again larger correlation scores. However, this 'over smoothing' yields node scores that will eventually converge against a common value (see also Material and Methods). An important feature of the analysis scheme presented here is that it provides guidance on how to prevent 'over smoothing', i.e. spreading parameters should be selected either at the minimum of the bias-variance tradeoff (i.e. minimizing the MSE) or well before the first dip of the inter-replicate correlation. Importantly, the optimal smoothing parameters resulting from the two criteria were different. Thus, the choice of the optimization criterion will influence the outcome.
Some of the patients in the PCa cohort (27 out of 39) had two tumor regions analyzed (TA1 and TA2).
Although prostate cancer tumors can be spatially heterogeneous, we had previously shown that the intra-patient variation is on average smaller than inter-patient variation (Charmpi et al. 2020). Hence, within limits, the two samples from the same tumor can be treated like replicate measurements. Thus, we tested if network propagation would further increase the intra-patient similarity. For that, we computed the correlation between the propagated mRNA log2 fold changes of TA1 and corresponding propagated mRNA log2 fold changes of paired TA2 for each such patient with the two propagation algorithms for varying spreading parameters. For most patients, the intra-patient similarity increased with network smoothing (Fig. 3E,F). In the case of four patients it decreased with increasing network propagation. However, those cases corresponded mostly to tumors with already low initial intrapatient similarity (Fig. 3E,F). Hence, these patients might correspond to cases with two independent tumors (clones) (Charmpi et al. 2020). Although these observations were supported by both propagation algorithms, a local maximum for the average within-patient similarity was attained only with HD-suggesting that HD might be more appropriate for this dataset.

Network propagation improves between-dataset consistency
As yet another optimization criterion, we used the correlation between mRNA and matching proteomics data. Although protein levels are determined by factors that are independent of their coding mRNA levels (such as variable translation rates or protein turnover), one typically observes a significant correlation between steady-state mRNA fold changes and the respective protein fold changes (Liu, Beyer, and Aebersold 2016). To test the utility of this criterion, we additionally used protein data from the Ori et al. study. In this case, we used average log2 fold changes between old and young rats in the two layers. Protein and mRNA levels were available for 1,772 common genes in brain and 1,670 in liver that were used for the subsequent analyses. Without network propagation we observed small but significant correlations between mRNA and protein log2 fold changes in both, brain and liver (brain: corr = 0.236, p < 2.2•10 -16 ; liver: corr = 0.308, p < 2.2•10 -16 ).
Network propagation slightly improved the protein-mRNA correlations in the brain (HD_Cor = 0.244 at t = 0.7, RWR_Cor = 0.245 at α = 0.5; Fig. 4A,B). Here, network propagation was conducted independently on the mRNA and protein data. Thus, it would be possible to use different spreading parameters for the two types of data. However, using different spreading parameters for the two data types did not further increase the protein-mRNA correlation ( Supplementary Fig. 2). In the case of the liver data network propagation did not further increase the protein-mRNA correlation (Fig. 4C,D). This lack of improvement is consistent with the inter-replicate analysis above, which showed that the signal-to-noise ratio was much larger in the liver compared to the brain samples ( Supplementary Fig.   4). Also the initial protein-mRNA correlation was higher in the liver compared to the brain, suggesting that network propagation had a greater potential of removing noise from the brain data compared to the liver data.
We also computed the correlation between the propagated mRNA and protein log2 fold changes of each PCa tumor sample with the two propagation algorithms for varying spreading parameters. For most of the samples, the between-layer similarity increased with smoothing (Fig. 4E,F). The few cases where the network propagation did not improve the correlation corresponded mostly to tumor samples with low initial correlations (Fig. 4E,F). Despite the differences in the obtained curves, these observations were supported by both propagation algorithms. However, with HD a local maximum for the average between-layer similarity was attained suggesting that HD might be more informative (as before for the within-patient similarity) for this dataset.
Importantly, this analysis again exposed the risk of 'over smoothing', i.e. too aggressive network propagation lead to equal scores between the nodes both at the mRNA and protein layer (Fig. 4,

Functional interpretation of network propagation results
One possible application of network propagation is the imputation of missing values (van Dijk et al. 2017). For example, shotgun proteomics measurements often do not quantify all proteins in a sample.
One might use network propagation to impute those missing values by utilizing measured protein levels of neighboring proteins in the network. In order to test this idea and in order to further validate the plausibility of network propagation results, we set initial protein log2 fold changes of 7,149 missing proteins for brain and 7,318 for liver to 0, while using observed log2 fold changes (old versus young) for all other 2,239 proteins for brain and 2,070 for liver. Then, we performed network propagation on the log2 fold changes, assuming that unobserved proteins in network regions with large log2 fold changes will likely also be strongly affected by ageing. Likewise, we performed the same procedure on mRNA log2 fold changes (2,835 genes with missing transcript levels for brain and 3,222 for liver, 6,553 genes for brain and 6,166 for liver with measured transcript levels). To validate the imputed log2 fold changes, we compared log2 fold changes of known ageing-associated genes/proteins (taken from JenAge; http://agefactdb.jenage.de/, on December 4, 2017) to log2 fold changes of random genes/proteins. We expected that ageing-associated genes/proteins should on average have greater absolute log2 fold changes than random genes/proteins. This was indeed the case (Fig. 5A-D), suggesting that network propagation could be used to impute missing values and to identify new candidates of ageing-associated genes and proteins.
To further test the utility of network propagation for revealing molecular mechanisms, we performed differential expression (DE) analysis on the mRNA layer contrasting highly aggressive versus less aggressive PCa tumors for varying spreading parameters. Network propagation was performed for each sample independently. Thus, if the network propagation would just 'spread noise' within the samples, we would not expect increasing numbers of DE genes, which requires consistent differences between multiple samples. However, the number of DE genes increased with increasing network smoothing (Fig. 5E,F) and a maximum number of DE genes was obtained prior to complete smoothing (i.e. before 'over smoothing'), especially when correcting for multiple testing. It is noteworthy that the majority of DE genes identified through this procedure are known cancer genes. Network propagation identified four additional DE genes (RHOC, LZTS1, ARHGDIG, PDIA2) after multiple testing correction by both algorithms (HD and RWR) at their optimum. Out of those four genes, three (RHOC, LZTS1, ARHGDIG) are known cancer genes (Lang et al. 2017;Cabeza-Arvelaiz et al. 2001;Meng et al. 2020;de León-Bautista et al. 2016). This further supports the notion that network propagation enhances the signal of genes on relevant network regions while reducing the noise at the same time. Interestingly, the parameter values corresponding to the maximal number of DE genes were not the same as those from the MSE criterion. One potential explanation of this is that in order to maximize the number of genes differentiating highly from lowly aggressive tumors, more aggressive smoothing was needed to further decrease the gene variance.  D). These are the optimal parameters found by the between-dataset consistency analysis. Afterwards, we subset unmeasured genes which were recovered using network propagation and identified ageing-associated genes within these imputed genes. In the case of protein log2 fold changes in brain and liver smoothed with HD, the median absolute propagated log2 fold change of the ageingassociated genes within imputed genes is significantly higher than the median absolute propagated

DISCUSSION
Our investigation of two popular network propagation algorithms, Random Walk with Restart (RWR) and Heat Diffusion (HD) has revealed specific characteristics of the algorithms, it demonstrated potential artifacts resulting from the use of inappropriate network normalization methods and it serves as a guide for choosing optimal model parameters. Firstly, we have tested different graph normalization techniques for network propagation, which revealed that the degree normalized adjacency representation of the network does not induce a topology bias. On the other hand, the normalized Laplacian, lead to the accumulation of higher scores on hub nodes with both tested algorithms (HD and RWR).
Further, we have shown that optimizing the spreading parameters can be understood as a minimization problem of the bias-variance tradeoff, a well-known machine learning concept. We have assessed the impact of the spreading parameters on two multi-omics datasets with different approaches, using the bias-variance tradeoff as a first criterion, the transcriptome replicate consistency as a second and the mRNA and protein measurement consistency as a third criterion.
Especially the last criterion is important in view of recent efforts for multi-omics data integration (Charmpi et al. 2020;Selevsek et al. 2020). Interestingly, in the case of the rat liver data both intraand inter-dataset consistency could not be further increased. In principle, two explanations exist for this observation: either the network used for the propagation was better suited for the brain data than for the liver data ( Supplementary Fig. 4), or the quality of the liver data was already at a limit that could not be further improved. The fact that both consistency measures were higher in liver compared to brain already before network propagation suggests that the latter explanation is more likely true.
However, we wish to emphasize that the higher intrinsic consistency of the liver datasets may not only be due to technical reasons, but also due to a larger tissue homogeneity and smaller biological variability compared to the brain. For the prostate cancer dataset, more aggressive smoothing was needed for the protein compared to the mRNA layer in order to minimize the bias-variance tradeoff.
This observation might reflect technical differences between protein and mRNA quantification or it might be caused by buffering of mRNA changes at the level of proteins (Liu, Beyer, and Aebersold 2016). Note that buffered fold changes would lead to a smaller increase in the bias 2 curve, which is what we observed in case of the PCa protein data (Fig. 2G-H) compared to mRNA fold changes ( Fig.   2E-F). Additionally, the within-patient but also the between-layer similarity increased for most cases (patients or samples) with smoothing, while the few exceptions had low initial correlations.
Although mathematically we did not expect that RWR and HD necessarily lead to the same results, here we observed that with the empirical data at hand both methods lead to very similar conclusions.
Interestingly, the correlation between log2 fold changes propagated with HD and RWR is very high ( Supplementary Fig. 1). Although it is expected that RWR diffuses to larger regions, whereas HD remains more focused in a network sub-region (Ma, Chen, and Sun 2014;Kloster and Gleich 2014), we observed a high similarity between the optimal results of both algorithms. This can be explained by the fact that the overall distances in the network we used were small (mean distance for the rat network: 4.5; for the human network: 3.8) and we expect that differences between the algorithms would display more in networks with larger distances. Most biological networks have an average distance between 4 and 5 (Xu et al. 2011). We assume that the more we know about these networks, the shorter the distances will become, because more knowledge usually means adding edges and only very rarely results in removing edges. As a consequence we expect that the effective differences between RWR and HD on biological networks will be small for many applications. However, the correlation starts dropping when the impact of network information, which is determined by the spreading parameters t for HD and ɑ for RWR, differs between the two propagation processes. Also, depending on the specific application and depending on the network used, differences between RWR and HD may be more relevant and should thus be considered.
Network propagation with use of optimal spreading parameters imputed missing values and uncovered known ageing-associated genes. Further, it was able to emphasize expression differences between tumors of different grades. These analyses confirmed the biological relevance of network smoothing.
To our knowledge, our study is the first to assess the impact of spreading parameters α and t in RWR and HD systematically with several different approaches. Thus, our study serves as a template for how to identify optimal parameters, depending on the available data and tailored towards the research question.

Data used
For this study, the published young and old tissues transcriptome and proteome datasets from Rattus norvegicus were used (Ori et al. 2015). The accession number for transcriptome is GEO: GSE66715 and for proteome ProteomeXchange: PXD002467. The data were derived from young (6 months) and old (24 months) Rat liver and brain samples with three biological replicates for each age. The transcriptome profiling was from the entire tissues whereas the proteomics measurements were from four subcellular fractions (nuc:nuclei, pn1:mitochondrial, pn2:cytoplasmic membrane, sol:soluble cytosolic proteins). Thus to collate the mRNA and protein abundance changes, the protein log fold changes from four subcellular fractions of a tissue were combined by calculating the weighted means.

Weighted mean log2 fold changeprotein
We also used the published transcriptome and proteome data of a multi-omics PCa study (Charmpi et al. 2020) consisting of 66 tumor samples belonging to four different grade groups (G1, G2, G3, G4/5) and 39 benign samples from 39 patients. For some of the patients (27 out of the 39), two tumor regions (TA1 and TA2) had been analyzed. Due to degradation issues for the mRNA layer, three tumor samples (corresponding to two patients) were removed from that layer. We quantified mRNA and protein log2 fold changes for the tumor samples (versus benign) as described in (Charmpi et al. 2020) resulting in matrices of dimension 14,281x63 and 2,371x66 respectively. Average (mRNA and protein) log2 fold changes were computed across all tumor samples available at the corresponding layer.

STRING functional interaction network
The functional interaction network for Rattus norvegicus was retrieved from STRING database version 10.5 (Szklarczyk et al. 2015). The generic network was filtered with the combined scores threshold of above 0.9 for high confidence interactions. This step of filtering was essential to avoid the flow of signal to the false positive edges. Additionally, to ensure convergence of the propagation process the largest connected network component was used. The filtered network had 9,388 nodes and 439,121 interactions, this network was employed for diffusing the transcriptome and proteome signals. In the case of PCa, we used the STRING interaction network for Homo sapiens (version 10). Applying the same filtering criteria, the resulting network consisted of 10,729 nodes and 118,647 edges.

Graph normalization methods
Let G = (V,E) be a simple, connected, undirected and unweighted graph where V is the set of nodes and E the set of edges. For the network matrix W, different graph normalizations can be employed.
The simplest representation of a graph is its associated adjacency matrix A = [a i,j ]. The entries ai,j of the matrix are defined by where , ∈ . The adjacency matrix can be normalized row-wise by the degrees of the nodes. In this case, the entries are where di denotes the degree of the vertex vi. Additionally, the Laplacian of the graph can be used. It is defined by W L = D -A, where D is a diagonal matrix of the node degrees. The entries of the matrix are filled by The Laplacian can also be normalized by the degrees of the interacting nodes. In this case, the entries of the matrix are defined by

Topology bias
Below we examine the presence or absence of a topology bias using different graph normalization and propagation approaches. In case of RWR, using the degree row-normalized adjacency matrix does not lead to a topology bias. The steady/limiting vector should satisfy Equation (1): Denote by ℐ the vector of all ones (1, 1, . . .1) . When 0 = ℐ then in order not to have topology bias, the steady vector should also be the vector ℐ (definition of (no) topology bias), else stated the vector ℐ should satisfy Equation (11). To show this, we take the right-hand side and get i.e. it does so for all 0 ≤ ≤ 1 . For the second equality, we have used that − ℐ = � − ∑ (1, ) − ∑ (2, ) ⋮ � = ℐ, since the row sums of -WD are all equal to 1.
For the normalized Laplacian in case of RWR, the network matrix W is set to W LD to ensure that the diagonal elements of W are zero. We will have similarly But the row sums of the matrix with Laplacian normalization (∑ � ( , ) ′ ) are not equal, so the elements of the final vector above will be different. Thus the vector ℐ does not satisfy Equation (11), consequently there will be some bias for > 0. The hub nodes being visited more often in the random walk, they are expected to gather higher scores compared to the non-hub nodes. In particular, when = 1 the limiting vector will be proportional to the vector of square roots of node degrees (von Luxburg 2007).
In case of HD, the Laplacian and degree row-normalized adjacency matrix do not lead to a topology bias. Firstly for the Laplacian, since for the matrix = each row sums to 0, it follows that each row of the matrix − will be summing to 1 for all ≥ 0 (Glasser, Horn, and Meidan 1980). As a consequence, In other words, there will be no topology bias.
The degree row-normalized adjacency matrix W is studied next. To avoid that for very large t the algorithm may run into numerical instabilities, without loss of generality the adjacency matrix W � (= W + ) was used instead to propagate with HD. We have: since the row sums of � = W + are equal to 0 and thus the row sums of − � will be all equal to 1 for each ≥ 0 like before. Hence, using the adjacency matrix W � and the input vector filled with 1's, the output vector will be a vector with 1's and there will be no topology bias.
For the normalized Laplacian = ^, the row sums of the matrix −^ are not equal, consequently the elements of the propagated vector will be different for > 0 and further depend on the node degree where hub nodes are expected to have higher scores since they are visited more often compared to the non-hub nodes like in RWR.

Mean squared error computation
For each of the two tissues (brain and liver) and each replicate, we computed the MSE as the average (across the genes) squared difference between the smoothed replicate log2 fold changes for some value of the spreading parameter and the initial average log2 fold changes. This was done for all values of the spreading parameters using the two propagation algorithms (RWR and HD). Additionally, we computed the average (across the genes) squared difference between the smoothed replicate log2 fold changes and the corresponding smoothed average log2 fold changes (variance, see section below) as well as the average (across the genes) squared difference between the smoothed and initial average log2 fold changes (bias 2 , see section below) with varying spreading parameters. For computing these quantities (MSE, bias 2 and variance) only measured network nodes were used. The latter two quantities (bias 2 and replicate variance) corresponding to the same value of the spreading parameter were also summed up. The previous computations resulted in six values for the MSE, variance and sum (corresponding to the six replicates) for each value of the spreading parameter (per tissue and propagation algorithm). These six values were log-transformed each time and their average and SD was computed in the transformed space. With these, a lower and an upper bound were computed as average-SD and average+SD respectively. Finally, the average, lower and upper bound were transformed back to the original space (i.e. the average will correspond to a geometric mean). The transformation was performed to avoid negative lower bounds (because average-SD could take a value below zero in the original space). The same approach was applied for the PCa study both on the mRNA and protein layer.
The use of large spreading parameters leads to the loss of the biological signal As the spreading coefficients increase we tend to lose the biological signal. With the degree rownormalized adjacency matrix, the variance between the node scores decreases (Supplementary Fig. 3) and each input vector eventually converges to a vector with the same value for all elements as α approaches one and t goes to infinity. Although computing the Pearson correlation between constant vectors is not possible, we were able to do so (numerically) due to the presence of small numerical fluctuations in the smoothed node scores at the end of propagation (α = 1 for RWR and t = 1,000 for HD). However, the correlation results for large spreading parameters (α = 1 and t = 1,000) are not (biologically) meaningful.

Imputation of ageing-associated genes
After network propagation we subset unmeasured genes which were recovered using network propagation (n= 2,835 for RNA brain, n= 3,222 for RNA liver, n= 7,149 for protein brain, n= 7,318 for protein liver). Ageing-associated genes within these imputed genes based on the JenAge Ageing Factor Database (Hühne, Thalheim, and Sühnel 2013), filtered for the organism Rattus norvegicus, were identified (n= 102 for RNA brain, n= 111 for RNA liver, n= 332 for protein brain, n= 349 for protein liver, see Supplementary Table 1). Additionally we tested if the median absolute log2 fold change of the ageing-associated genes within imputed genes is higher than the median absolute log2 fold change of the imputed genes using an unpaired one-sided Wilcoxon rank sum test. The p-values were adjusted using the Benjamini-Hochberg (Benjamini and Hochberg 1995) correction approach.

Differential expression analysis on the PCa study
For each of the network nodes measured on the mRNA layer and each value of the spreading parameter, we applied a t-test comparing its propagated (with each of the two propagation algorithms) mRNA log2 fold changes in the group G4/5 with those in the combined group G1-2 (consisting of the samples in the groups G1 and G2). To correct for multiple hypothesis testing, we used the Benjamini-Yekutieli method (Benjamini and Yekutieli 2001) (at each spreading parameter separately). The significance threshold was set to 0.05 and 0.1 before and after adjustment of the pvalues respectively. Figure S1: High similarity between smoothed scores of RWR and HD. The smoothed log fold changes for proteome of brain (A) and liver (C) tissue and transcriptome of brain (B) and liver (D) tissue computed with RWR (varying α between 0 and 1) and HD (varying t between 0 and 1,000) are correlated. For the analysis, all genes which are present in the network (9,388 genes) are included.  D). We also applied the following randomization approach: we permuted the average log2 fold changes of the transcriptome for the brain and liver and computed bias 2 curves based on the permuted vectors. Bias 2 curves are shown for 10 permutations (orange). The computation of the bias 2 was based on the measured network nodes only. For the randomization approach, we only permuted the measured network nodes while the value of the unmeasured network nodes was always initially set to 0. E: Average log2 fold changes (of the measured network nodes) of the transcriptome for the brain (initial: dark blue, final (RWR, ɑ = 1): light blue) and liver (initial: dark red, final (RWR, ɑ = 1): pink) sorted in decreasing order.