STREAMLINE: Structural and Topological Performance Analysis of Algorithms for the Inference of Gene Regulatory Networks from Single-Cell Transcriptomic Data

In recent years, many algorithms for inferring gene regulatory networks from single-cell transcriptomic data have been published. Several studies have evaluated their accuracy in estimating the presence of an interaction between pairs of genes. However, these benchmarking analyses do not quantify the algorithms’ ability to capture structural properties of networks, which are fundamental, for example, for inferring the robustness of a gene network to external perturbations. Here, we devise a three-step benchmarking pipeline called STREAMLINE that quantifies the ability of algorithms to capture topological properties of networks and identify hubs. To this aim, we use data simulated from different types of networks as well as experimental data from three different organisms. We apply our benchmarking pipeline to four algorithms and provide guidance on which algorithm should be used depending on the global network property of interest.


Introduction
Single-cell transcriptomics techniques allow probing patterns of gene expression on an increasingly larger scale, with recent studies including million of cells and thousands of genes [1].Such rapid progress in expanding the scale of data available makes single-cell datasets more appealing for tasks like the inference of gene regulatory networks (GRNs), with the goal of achieving a mechanistic understanding of the systems at hand and going beyond purely descriptive characterizations [2]- [4].However, GRN inference from single-cell data entails many computational challenges, such as high levels of technical noise in the data [5], the extreme sparsity of the ground truth network to be inferred [6] and the increasing scale of gene expression data [7].For this reason, many algorithms for GRN inference from single-cell data have been published in the last few years.The increasingly large number of such algorithms demands benchmarking studies that can guide the user in the choice of the methods that perform best under various conditions [8]- [10].While these studies offer some guidance for users, they are affected by important limitations.First, the quantification of the performance is obtained from a limited number and types of networks.Moreover, the available benchmarking studies mostly focus on the ability of the GRN algorithms to predict local features of networks, like the interactions between pairs of genes, using, for example, area under the curve metrics, or the presence of specific sub-graphs (network motifs).Nevertheless, these metrics do not assess the algorithms' ability to infer the structural properties of the GRN, which quantify important features like the robustness to perturbations [11] and the presence of network hubs representing master regulators.These structural properties can be quantified by topological measurements [12], including, for instance, the network efficiency and the assortativity.So far, the performance of GRN inference algorithms on the estimation of topological properties has been only assessed with bulk RNA-seq data, and employing a limited number of synthetic networks [13], which makes it hard to reach robust conclusions, especially when applying the methods to single-cell data.In this work, we developed STREAMLINE, a three-step benchmarking framework to score the performance of GRN inference algorithms in estimating structural properties of networks from single-cell RNA-seq (scRNA-seq) datasets.Such properties quantify the network's robustness to perturbations and the presence of hubs.We use data simulated from hundreds of networks belonging to four classes with different structural properties that have been shown to be biologically relevant [14], [15], as well as from a set of curated networks extracted from real GRNs [8].In addition to simulated data, we also use real datasets from yeast, mouse, and human [16].We apply STREAMLINE to four GRN inference algorithms chosen amongst the top-performing ones [8].Our benchmarking analysis provides guidance in the choice of the algorithm for the prediction of network robustness and the identification of hubs.Moreover, our results point to systematic biases in some algorithms, which could indicate ways of improving them.To facilitate the use of our benchmarking framework, we have made it compatible with an existing pipeline (BEELINE [8]), and we make all the code available in a GitHub repository (github.com/ScialdoneLab/STREAMLINE).

Overview of STREAMLINE
The steps involved in STREAMLINE are schematically represented in Figure 1.We consider two types of datasets: simulated and real datasets.With the simulated datasets, we generate scRNA-seq data in silico from four classes of networks with well-defined and different structural properties: Random, Small-World, Scale-Free, and Semi-Scale-Free Networks.Random or Erdös-Renyi (ER) networks include a set of nodes in which each node pair has the same probability of being connected by an edge [17].In Scale-Free (SF) networks, the edges are drawn such that the degree distribution follows a power law [14].In Semi-Scale-Free (SSF) graphs only the out-degree distribution follows a power law, while the in-degree distribution is uniform.Such networks were introduced by Ouma et al. [14] to model real GRNs.Small-World (SW) networks have the property that the neighbors of any given node are likely to be neighbors of each other [15].Additionally, we included four Curated (Cur.)networks that consist of sub-networks of known GRNs [8].Networks from each class are defined by a set of parameters.To make our results independent of specific instances of networks, we sampled 20 networks from each class with different combinations of parameters and two sizes: a smaller (15 nodes and 50 edges) and a larger (25 nodes and 100 edges) size.All results shown below are averaged over all the instances of networks generated for a given class.Details about the network classes and the parameters used for network sampling are provided in the Methods section, Supplementary Section S1 and Table S1.From each of these networks, we simulated scRNA-seq datasets using BoolODE, a recently developed software based on ordinary differential equations [8] (see Methods section).In addition to simulated datasets, we also considered four real scRNA-seq datasets generated from different organisms and cell types: yeast [18], mouse dendritic cells (mDC) [19], mouse embryonic stem cells (mESC) [20] and human embryonic stem cells (hESC) [21].These datasets were used in a previous benchmarking study [16], where the authors also provide estimations of ground truth networks.The second step of our pipeline involves running the algorithms to infer GRNs from each of the datasets.We chose the four top-performing algorithms according to a recent study where the accuracy in predicting gene-gene interactions was evaluated [8]: PIDC [22], PPCOR [23], SINCERITIES [24] and GRNBoost2 [25].Two of these methods (PIDC and PPCOR) give as output undirected networks, while SINCERITIES and GRNBoost2 provide directed networks.A brief description of each algorithm is provided in the Methods section.In addition to scoring each method's ability to predict edges as a binary classification problem, similarly to what has been done in previous benchmarking analyses (see [8], [9], Supplementary Section S2 and Figure S1), we analyzed the ability of each method to predict global properties of networks.Specifically, we computed topological properties that quantify how efficiently information is exchanged in the network and the tendency of networks to include hubs.The efficiency of information exchange in a network can be a measure of how robust networks are to perturbations [26].To quantify this, we calculated three topological measures: the Global Efficiency, the Local Efficiency and the Average Shortest Path Length (see Methods).Another biologically important property of networks is the presence of hubs, i.e., nodes that have a degree much larger than the average.In GRNs, hubs are genes that regulate the expression levels of many other genes and can represent master regulators of a given biological process.We computed the Centralization, the Assortativity and the Clustering Coefficient, which are topological measures linked to the presence of hubs in a network.In addition to quantifying the tendency of networks to possess hubs, it is important to correctly identify them.Hence, we tested the GRN inference algorithms for their ability in predicting which nodes constitute hubs.To do so, we computed four metrics used to detect hubs -Page Rank Centrality, Betweenness Centrality, Out-Centrality and Radiality (see Methods) -and we compared the values obtained from the ground truth networks versus those calculated from the inferred networks.Below, we describe the detailed results of each of these benchmarking analyses.

Estimation of information exchange efficiency
As mentioned above, the topological properties we evaluated are the Average Shortest Path Length, the Global Efficiency, and the Local Efficiency (Figure 2A) of the inferred and ground truth networks in order to quantify topological aspects of the efficiency in information exchange.In particular, we estimated the Mean Signed Error (MSE) between these quantities computed on the ground truth networks and the networks inferred from each of the algorithms (see Methods section).First, we considered the simulated datasets generated from different classes of networks.Figure S2A displays the distributions of the ground truth values of all the topological measures we considered in this stage.These plots show how the different structural properties of each class of networks are reflected by different values of these topological measures.For example, the SW networks are characterized by larger Average Shortest Path Length and lower Global and Local Efficiency, as expected based on their properties [15].In Figure 2B, we report the MSE for all the topological measures computed on the simulated datasets.Overall, we found that the accuracy of the predictions depends on the type of network, in addition to the algorithm.For instance, all the algorithms we tested tend to overestimate the Local Efficiency (MSE > 0), except for SINCERITIES, which underestimates it in SSF and Cur networks.The best predictions (corresponding to MSE∼ 0) are obtained on SF and SSF networks.Similarly, the Global Efficiency tends to be overestimated by all algorithms except for GRNBoost2 (Figure 2B), which underestimates it.These biases imply that the best-performing algorithm depends on the type of network: for example, GRNBoost2 provides an accurate estimation of the Global Efficiency in SW networks, which is lower than in other network classes (Figure 2B and Figure S2).The best estimations of the Average Shortest Path Length (Figure 2B) are provided by the PIDC and PPCOR algorithms, especially in the ER, SF, and SSF networks.In the SW and the Cur networks, for which the Average Shortest Path Length is greater than for SSF graphs (Figure S2A), all algorithms underestimate this property, particularly GRNBoost2 and SINCERITIES.We performed the same analysis on four real scRNA-seq datasets from three species (Figure 2C).The corresponding ground truth networks have lower Global and Local Efficiency and larger Average Shortest Path Length compared to the synthetic networks we considered (Figure S2B).These differences likely affect the values of the MSE we computed, which are reported in Figure 2C.For example, unlike with the synthetic networks, we found an overall tendency of all algorithms to underestimate the Local Efficiency and overestimate the Global Efficiency.GRNBoost2 and PPCOR provide the most accurate predictions of the Global and Local Efficiency, respectively.As for the Average Shortest Path Length, the MSE is mostly positive, indicating an overestimation, and it is smallest for SINCERITIES, which is the bestperforming algorithm in this case.

Hub analysis
One important downstream analysis on GRN is the identification of genes with a number of links much larger than the average.These are known as network hubs, and they can play key roles in differentiation and reprogramming [27].Moreover, GRN hubs have been identified as potential disease regulators or drug targets [28].The presence of such nodes depends on several topological properties that are influenced by the type of network.In particular, we expect the hubs in SF and SSF networks to be more easily identifiable due to their node degree distribution.This is reflected, for example, by the higher Centralization values of SF and SSF networks compared to other classes of networks (Figure S2).
Here, we first analysed how well each algorithm predicts the value of topological measures that quantify the tendency of networks to include hubs.Then, we test how well the inferred GRN preserves hub identities.

Hub-related topological quantities
The topological measures we chose to quantify the presence of hubs are Assortativity, Clustering Coefficient and Centralization (Figure 3A).In networks with larger values of Assortativity, nodes with lower degrees tend to be linked to nodes that feature a higher degree; hence, in these networks, hubs tend to be present and clearly identifiable.Networks with a large Clustering Coefficient feature groups of nodes with high interconnectivity that, thus, have similar node degrees.In this situation, hubs are less dispersed.The Centralization quantifies how centralized a graph is around a small number of nodes, which will have a large number of links, and will therefore tend to be strong and clearly identifiable hubs.
Among the networks we simulated data from, the Assortativity of SW and ER networks is relatively closer to 0 on average compared to other networks, while their Clustering Coefficient is lowest (Figure S2A).Conversely, SF and SSF have lower (negative) Assortativity and larger Clustering Coefficients together with the Cur networks.This is in agreement with the known properties of each class of networks [14].As previously mentioned, the Cur networks are a group of four small subnetworks of real GRNs, therefore their properties are influenced by their small size and individual peculiarities rather than general characteristics.The three network properties assessed during this step uncover multiple algorithmspecific behaviours (Figure 3B).Firstly, the networks inferred by SINCERITIES have lower Assortativity and larger Centralization than the corresponding ground truth graphs.However, the estimations of the Clustering Coefficient obtained with SINCERITIES tend to be more accurate than those coming from other algorithms.
As in the previous section, we repeated the analysis on real datasets (Figure 3C).The corresponding networks have Assortativity values that are lower than those of most synthetic networks and thus best comparable to SF and SSF networks.On average, the Clustering coefficients are similar to those of ER or SW networks (Figure S2B).The distribution of the values of the Assortativity is in line with the Scale-Free hypothesis for real GRN [14].In summary, the gold standards display a higher tendency to contain hubs that are more clustered together.
Similarly to what happens with the synthetic datasets, here the Centralization is overestimated by SINCERITIES (Figure 3C).In contrast, the Assortativity is now preserved better by SINCERITIES rather than being underestimated.GRNBoost2, PIDC and PPCOR show similar performances.These three algorithms overestimate the Assortativity as well as the Centralization, albeit less than SINCERITIES.Furthermore, the Clustering Coefficient is underestimated by all algorithms.
The difference in performance between the real and the synthetic data might be due to a number of factors.First, the gold standards provide only estimates for GRNs of the three organisms while the synthetic data are simulated from fully specified networks.Furthermore, we found that the algorithms tend to output networks that feature similar topological properties, regardless of whether the data is experimentally or artificially generated.This might explain, for example, the opposite trends in the estimations of the Assortativity and the Clustering Coefficient with the synthetic versus the real datasets.

Hub identification
While hubs are loosely defined as nodes having degrees higher than average, there is no consensus on the best metric to identify them.For this reason, here we compute four centrality measures that have been previously adopted to find hubs in GRNs [29]: the Betweenness [30], the Out-Centrality [31], the Radiality [32], and the Page Rank [33] (see Methods).We verify how accurately these measures are estimated by the four inference algorithms introduced above.
More specifically, we select the set of top 10% nodes according to the centrality measure computed in the ground truth network, Ω true , and in the inferred network, Ω inf erred .Then, we quantify the similarity between the two sets of nodes with the Jaccard similarity index [34] (see Methods).Finally, we compute the ratio between J and J rand , i.e., the Jaccard index between Ω true and a set of randomly selected nodes Ω rand (see Methods).Hence, the ratio J/J rand represented in Figure 4 represents the improvement in the prediction of hubs in the inferred network compared to a random guess.
In general, we find that better scores are achieved on networks featuring stronger hubs (i.e., with larger values of centralization, like the GRN associated with hESC; see Figure S2).However, the scores depend on the algorithm as well as the network type.In real networks, there is a wide range of Centralization values, with the hESC networks featuring higher values, which indicate the presence of more clearly identifiable hubs.Conversely, the yeast GRN is characterized by lower values of Centralization, and, thus, by fewer hubs or hubs that are less isolated (Figure S2).As a possible consequence, the best performance is achieved on hESC networks.Similarly, the performance on synthetic data is best on SF networks which are likely to possess hubs that are easier to identify due to their degree distribution.The Betweenness estimates the influence that a node has on the information exchange in a graph based on path lengths.The Out Centrality is the out-degree of a node in directed networks or its overall degree in undirected networks.The Radiality assigns high centrality values to nodes that have a short distance to all vertices in their reachable neighborhood compared to the graph diameter.PageRank is a generalization of the degree centrality that considers the eigenvalues of a modified adjacency matrix.We provide a detailed definition of the hub metrics in the text and in the Methods section.

Discussion
Here, we significantly extended the benchmarking analyses to evaluate how well GRN inference algorithms can estimate the structural properties of the networks.More specifically, we quantify the ability of the algorithms to infer their robustness to perturbations as well as the presence and identification of network hubs.For this purpose, we computed six topological measures and tested four metrics for hub identification.Moreover, we considered scRNA-seq data simulated from different types of networks as well as real data collected from different organisms.The results of the benchmarking are summarized in Table 1.Overall, we observed that there is no single best-performing algorithm, but the performance depends on the properties of the ground truth network and the topological metric being considered.For example, with real datasets, GRNBoost2 achieves the best performance in the estimation of Global Efficiency; whereas SINCERITIES produces the most accurate estimations of Local Efficiency and Average Shortest Path Length in almost all the real datasets (Figure 2).A similar situation emerged for the metrics quantifying Hub Presence (Figure 3), where the estimations of Assortativity obtained with SINCERITIES are the most accurate, while PPCOR performed best when estimating the Clustering Coefficient and the Centralization in almost all the real datasets.
The situation for the Hub Identification is more complex, as the best-performing algorithm seems more susceptible to the specific dataset analyzed (Figure 4).However, overall, all algorithms achieve good performance with all datasets and metrics.A notable exception to this is the yeast dataset (Figure 4B), where the accuracy of predictions for all algorithms (and especially SINCERITIES) was relatively low.This might be linked to the ground truth network of yeast having less clearly defined hubs, as, e.g., the value of Centralization suggests (Figure S2).
The benchmarking done with synthetic networks allowed us to check the performance of algorithms with networks having specific and tunable properties.
In some cases, this has brought to light specific biases present in the networks estimated by each algorithm.For example, SINCERITIES produces more disassortative and centralized networks (i.e., networks with relatively low Assortativity and high Centralization), which causes an underestimation of Assortativity and overestimation of Centralization for all types of synthetic networks (Figure 3).Similar observations can be made, for example, with GRNBoost2, which tends to generate networks with lower Global Efficiency (Figure 2).While these biases are likely to derive from the specific strategies that each algorithm follows to estimate GRNs, their knowledge could be used to improve the algorithms: for example, by guiding the design of objective functions that could lead to networks with global properties closer to real GRNs.This approach can be justified by the observation that GRNs happen to share certain topological features, such as a scale-free node-degree distribution [35], which could be assumed as prior knowledge for the inference process.Finally, topological quantities can also be used to optimize community-based inference schemes.Currently, consensus networks are derived from the outputs of different methods by taking into account only their performance in the estimation of single edges.Instead, new strategies could be devised that also consider the estimation of the network's global properties.

Synthetic networks
We use parameter-controlled networks from four different classes as well as the Curated GRNs that have been used in BEELINE [8].The output of the network samplers is a graph G with n nodes and m edges.
Random networks Random networks were created by the Erdös-Renyi G(n, p) model, which outputs a graph with n nodes where each pair is connected with probability p [17].We set p so that the expected number of edges equals m.
Scale-Free networks Networks with a degree distribution that follows a power law are classified as Scale-Free [36].Given the parameter α, the expected degree distribution follows For directed networks the in-degree distribution and the out-degree distribution can feature different parameters α in and α out .We applied combinations of different in-and out degrees.The exact values can be found in Table S1.
Semi-Scale-Free networks Following the analysis of the degree distributions in known GRNs [14], we sample Semi-Scale-Free networks which feature an outdegree distribution that follows a power-law but a uniform in-degree distribution.Additionally, only 50% of the nodes have outgoing edges.
Small-World networks We use the Watts-Strogatz model to sample networks that feature Small-World topology [15].The algorithm starts with n nodes with degree k in a regular lattice and then rewires edges with probability p.
Curated networks Curated networks are four known GRNs that were used in BEELINE to evaluate the statistical performance of the GRN inference algorithms [8].These networks are simple models for mammalian cortical area development, ventral spinal cord development, hematopoietic stem cell differentiation and gonadal sex determination.We include them for comparison.
Network sampling We use the Julia package LightGraphs.jl[37] to sample the networks explained above.The parameters were chosen such that a large variety of structurally different networks is covered.We simulate single-cell RNA-sequencing data using BoolODE [8].For every parameter set we sampled data from 100 cells for 10 smaller and 10 larger networks.The exact specifications and number of networks can be found in Table S1.

Experimental networks
For the benchmarking of GRN inference on experimental single-cell RNAsequencing data we select four datasets from human [21], mouse [19], [20], and yeast [18] and compare the output networks to different types of gold standard networks that were collected by Stone et al. [16].The properties of the networks and the number of corresponding gold standards can be found in Table S2.For the detailed description of the preprocessing we refer to [16].

Inference algorithms
We select the four top-performing algorithms from BEELINE [8] and examine the results using our three-step benchmarking pipeline.
GRNBoost2: GRNBoost2 [25] infers a GRN independently for each gene, by identifying the most important regulators using a regression model.It is an alternative to GENIE3, which uses a similar inference scheme but does not scale to larger datasets due to its runtime.
SINCERITIES: SINCERITIES [24] is a causality based method, that computes temporal changes in the expression of each gene.The GRN is inferred by solving a specifically formulated ridge regression problem.

PIDC:
The PIDC inference scheme [22] is based on partial information decomposition, which is a multivariate information theoretic measure for triplets of random variables.Since it is symmetric, the resulting network is undirected.
PPCOR: PPCOR [23] calculates the partial and semi-partial correlation coefficients for every possible pair of genes.Edges are ranked according to these values.By using the correlation as sign, it is possible to assign a direction to the interactions but we found better topological performance in the undirected version.
4.3 Quantities used for benchmarking

Binary Edge Detection
In order to statistically benchmark the binary classification problem of correctly retrieving edges we followed previous studies [8], [9] by calculating the area under the precision recall curve (AUPRC) and the area under the receiver operator curve (AUROC) on synthetic data.For the experimental datasets we use the early precision (EPr) among the top k edges, where k is the number of interactions in the corresponding gold standard.The EPr is better suited to classification accuracy on large datasets where the reference networks is not the entire ground truth.

Topological graph properties related to information exchange
For our topological analysis we select six different properties that allow for a meaningful structural characterisation of a network and group them with respect to their matter as either being related to the exchange of information or the presence of hubs.We assume a graph with n nodes and m edges.
Average shortest path length: The Average Shortest Path Length measures by how many links two random nodes are connected on average.d(v, w) denotes the distance between two nodes v and w.
Global efficiency: The Global efficiency estimates how efficiently information is exchanged in the network on a global scale [26].It is given by where and E max is the maximum E glob for a fully connected graph given a number of nodes.Since gene regulation can be interpreted as information exchange between nodes in GRN, E glob is a meaningful quantity to estimate.
Local efficiency: The Local efficiency describes the resistance of the network to perturbation on a small scale [26].It is defined as where G v is the graph that only consists of v and its immediate neighbors.In practice there can be many factors that can perturb gene regulation, thus the local efficiency is an important score to be preserved, especially on experimental data.

Topological graph properties related to the presence of hubs
Assortativity: The preference for a network's nodes to attach to others that have a similar degree is captured by the Assortativity [38].It is quantified by the Assortativity Coefficient r, which is the Pearson correlation coefficient between the degree k of a node and the average degree of its neighbors ⟨k nn ⟩ when treating them as random variables.
Networks with a negative Assortativity Coefficient are called disassortative, networks with a positive r are called assortative.Disassortative networks have a higher tendency to possess hubs, which is an important feature of GRN that we examine in section 2.3.

Centralization:
The goal of the Centralization H is to provide an estimate of how centralized a graph is around the node v * which has the highest degree [31].
It is defined as A highly centralized network is focused around a small number of nodes which could be identified as biologically important.
Clustering Coefficient: The Clustering Coefficient measures the extend to which nodes in a graph tend to cluster together.It is quantified by the local clustering coefficient: and the global clustering coefficient [15]: where L v represents the number of links between the k v neighbors of node v.
For our analysis we focused on the global clustering coefficient since it captures clustering on a global scale.A network with a larger clustering coefficient is more interconnected which can result in more complicated gene regulations.

Evaluation score
Since we are particularly interested in whether certain topological features are over-or underestimated we employ the Mean Signed Error (MSE) as evaluation quantity.For a property P which is being analyzed on networks G 1 , G 2 , ..., G n it is computed by

Hub identification
Degree centrality: The degree centrality purely evaluates the degree of a node in a network.In a directed graph there are two different degree centralities: the in-degree centrality and the out-degree centrality.In the directed case we use the out-degree centrality, in the undirected case we use the overall degree centrality.
Betweenness centrality: Betweenness centrality describes the extend to which nodes stand between each other.It was formally defined by Freeman [30] as where η(u, w) is the number of shortest paths from u to w and η v (u, w) is the number of shortest paths from u to w passing through v.If the graph is not connected the betweenness centrality is evaluated for each connected component.
Page Rank centrality: The Page Rank centrality is the output of the Page Rank algorithm which is focused on link analysis.The output is a distribution that models the likelihood to reach any particular node when randomly moving along edges.Details of the algorithm can be found in [33].
where d(v, w) describes the shortest path length from v to w. Radiality is defined as Evaluation score: We rank the nodes of each network according to the centrality measures defined above and in networks with more than 50 nodes we classify the top 10% as hubs while in smaller networks we select the top 20%.
Our goal is to analyze the set similarity between the hubs in the ground truth or gold standard Ω true and the inferred networks Ω inf .Therefore we compute the average Jaccard coefficient [34] for every network class, which is given by: To quantify the performance of the different centrality measures, we use as evaluation score the ratio between the Jaccard coefficient based on the inferred networks and the expected coefficient for a random predictor which can be calculated explicitly using the number of nodes and edges in the network [39].

Code availability
STREAMLINE is available at github.com/ScialdoneLab/STREAMLINE.

S3. Topological properties of the reference networks
Figure S2 depicts the topological properties of the sampled ground truth networks as well as the gold standards for the experimental datasets.

Figure 1 :
Figure 1: Schematic overview of STREAMLINE.STREAMLINE consists of three steps: first, synthetic scRNA-seq data are generated from different classes of networks.Then, GRN inference methods are applied to synthetic as well as real data.Finally, the methods' performance on the predictions of edges and of structural network properties (quantifying the network robustness and hub presence) are evaluated.

Figure 2 :
Figure2: Results of the topological benchmarking of GRN inference algorithms with respect to information exchange both on synthetic and experimental scRNAseq datasets.(A) Schematic representations of the three topological measures we computed (see Methods).Global efficiency quantifies how well information can be distributed in the entire network.Local efficiency measures how robust the network is to perturbation on a small scale.The Average Shortest Path Length specifies how many links are necessary to go from one node to another on average.(B) Barplots showing the Mean Signed Error (MSE) for the estimations of the topological properties written at the top in different types of synthetic networks (indicated on the x-axis) and for different algorithms (marked by colors).(C) Same as B, for networks estimated from real scRNA-seq datasets (indicated on the x-axis).The error bars display the standard deviations.

Figure 3 :Figure 4 :
Figure 3: Results for the topological benchmarking of GRN inference assessing the presence of hubs.(A) Schematic representation of the three topological measures considered here (see Methods).The Assortativity quantifies the tendency of nodes in the networks to attach to others with similar degrees.The Clustering Coefficient reflects how much the nodes in a graph tend to cluster together.The Centralization indicates how strongly the network is arranged around a single centre.(B) Barplots showing the Mean Signed Error (MSE) for the estimations of the topological properties written at the top in different types of synthetic networks (indicated on the x-axis) and for different algorithms (marked by colors).(C) Same as B, for networks estimated from real scRNA-seq datasets (indicated on the x-axis).The error bars display the standard deviations.

Radiality: Radiality [ 32 ]
considers the global structure of the networks and indicates how connected an individual is in the entire network structure.The definition makes use of the reverse distance matrix R D = ( max v,w∈G d(v, w) + 1 − d(v i , v j )) i,j ,

Figure S1 :
Figure S1: Benchmarking of edge detection of GRN inference algorithms on synthetic and experimental scRNA-seq datasets.The heatmaps report the median AUPRC and AUROC for synthetic data (panels A and B, respectively) and the median EPr for experimental data (panel C).Rows and columns correspond to GRN inference algorithms and network types, respectively.

Table 1 :
The above table summarizes which algorithms performed best in each step of our benchmarking pipeline, subdivided by dataset and property type.The top performing methods were selected according to the lowest Mean Signed Error (MSE) in absolute value.

Table S3 :
The table above lists the best performing algorithms for the task of edge detection from the ground truth networks as a binary classification problem.The best performing algorithms were selected according to the highest EPr scores for the experimental datasets or the highest AUPRC and AUROC scores for the synthetic datasets.