Topological Data Analysis of Human Brain Networks Through Order Statistics

Understanding the common topological characteristics of the human brain network across a population is central to understanding brain functions. The abstraction of human connectome as a graph has been pivotal in gaining insights on the topological properties of the brain network. The development of group-level statistical inference procedures in brain graphs while accounting for the heterogeneity and randomness still remains a difficult task. In this study, we develop a robust statistical framework based on persistent homology using the order statistics for analyzing brain networks. The use of order statistics greatly simplifies the computation of the persistent barcodes. We validate the proposed methods using comprehensive simulation studies and subsequently apply to the resting-state functional magnetic resonance images. We found a statistically significant topological difference between the male and female brain networks.


Introduction
Modeling the human brain connectome as graphs has become the cornerstone of neuroscience, enabling an efficient abstraction of the brain regions and their interactions [1,2]. Graphs offer the simplistic construct with only a set of nodes and edges to describe the connectivity of the brain network [3]. The generalizability of graph representation allows one to obtain quantitative measures across multiple spatio-temporal scales ranging from the node level up to the whole network level [4,5]. To build the graph representation of brain networks, the whole brain is usually parcellated into hundreds of disjoint regions, which serves as nodes and the edges are associated with weights that indicate the strength of connection between the brain regions [6]. The graph theory based models provide reliable measures such as small-worldness, modularity, centrality and hubs [7][8][9]. However, these measures are often affected by the choice of arbitrary thresholds on the edge weights and thus make comparisons across networks difficult [10,11]. To overcome this issue, the topological data analysis (TDA) has emerged to be a powerful method to systematically extract information from hierarchical layers of abstraction [12][13][14][15].
Persistent homology (PH), one of the TDA techniques, provides a coherent framework for obtaining topological invariants or features such as connected components and cycles at different spatial resolutions [16][17][18][19]. These topological invariants are often used to provide robust quantification measures to assess the topological similarity between networks [6]. Mostly the persistent barcodes are represented as persistent landscapes or diagrams and their distributions are used to compute a topological distance measure [20]. The PH based topological distances are found to consistently outperform traditional graph based metrics [21]. The main idea of using PH to brain networks is to generate a sequence of nested networks over every possible threshold through a graph filtration, which builds the hierarchical structure of the brain networks at multiple scales [10,[22][23][24].
In the graph filtration, a series of nested graphs containing topological information at different scales are produced. During the graph filtration, some topological features may live longer, whereas others die quickly. The filtration process tracks the birth, death and persistence of the topological features. The lifespans or persistence of these features are directly related to the topological properties of networks. The collection of intervals from births to deaths that defines persistences are called the barcodes which characterizes the topology of an underlying dataset [14]. The persistent diagram displays the paired births and deaths as scatter points [16,20,25,26,[26][27][28][29]. The Betti curves, which counts the number of such features over filtrations, provide comprehensive visualizations of these intervals [6]. Thus, it is instructive to develop a statistical inference procedure using the persistent barcodes in order to compare across different groups and achieve meaningful inferences. This requires the statistical version of TDA [27,30]. [30] worked on computing confidence bands through bootstraps. [27] introduced persistent landscapes which lies in a vector space, where the sample mean and variance can be computed and thus enable a proper statistical inference. [31] worked on the hypothesis testing on the Gaussian kernel smoothing on persistent diagrams Analogous to the persistent barcodes, we also have their stochastic versions referred as the expected persistent barcodes. However, computing them requires complex theoretical constructs and they are generally approximated [32][33][34].
Since the real brain networks are often affected by heterogeneity and intrinsic randomness [35,36], it is challenging to build a coherent statistical framework to transform these topological features as quantitative measures to compare across different brain networks by averaging or matching [37]. The brain networks are inherently noisy which makes it even harder to establish similarity across networks. Thus, there is a need to develop a statistical model that accounts for the randomness and provides consistent results across networks. The statistical models based on the distributions are expected to be more robust and less affected by the presence of outliers. To this end, we use the concept of random graph to analyze brain networks across a population.
The random graphs have been investigated by many authors [38][39][40][41][42]. A graph whose features related to nodes and edges are determined in a random fashion is called a random graph. The theory of random graphs lies at the intersection between graph theory and probability theory. They are usually described using a probability distribution or a stochastic process that generates them [43,44]. The homology of random graphs have been studied studied by Kahle in particular. [45] investigated the connectivity of neighborhood complex of a random graph. [46] studied the expected topological properties of Rips complexes built on randomly distributed points in R d . [47] worked on the central limit theorem for Betti numbers of random simplical complexes. Random graphs are often encountered in graphical models, which build probabilistic models on the conditional dependency structures between nodes [48,49]. However, topology is rarely investigated in graphical models.
In this paper, we propose a more adaptable random graph model for brain networks. We consider a random complete graph, where all the nodes are connected with its edge weights randomly drawn from a continuous distribution. The consideration of a complete graph model simplifies building graph filtration straightforward [22,37]. We then compute the expected 0D and 1D barcodes through the order statistics [50][51][52][53][54][55]. The use of order statistics in computing persistent homology features such as persistent barcodes and Betti numbers can drastically speed up the computation. Further, we propose the expected topological loss (ETL), which quantifies the 0D and 1D barcodes obtained through order statistics. We use the ETL as a test statistic to determine the topological similarity and dissimilarity between networks. The proposed random graph model and corresponding ETL methods are validated using extensive simulation studies with the ground truths. Subsequently, the method is applied to the resting-state functional magnetic resonance images (rs-fMRI) of the human brain.

Data
We considered a resting-state fMRI dataset collected as part of the Human Connectome Project (HCP) [56,57]. The dataset consisted of fMRI scans of 400 subjects (168 males and 232 females) over approximately 14.5 minutes using a gradient-echoplanar imaging sequence with 1200 time points [24,37]. Informed consent was obtained from all participants by the Washington University in St. Louis institutional review board [58]. The ethics approval for using the HCP data was obtained from the local ethics committee of University of Wisconsin-Madison.
The human brain can be viewed as a weighted network with its neurons as nodes. However, considering a high number of neurons ∼ 10 12 in a human brain, the traditional brain imaging studies parcellate the brain into a manageable number of mutually exclusive regions [59][60][61]. These regions are then considered as nodes while the strength of connectivity between these regions are edges. For the considered dataset, the Automated Anatomical Labeling (AAL) template was employed to parcellate the brain volume into 116 non-overlapping anatomical regions [62] and the fMRI across voxels within each brain parcellation were averaged. This resulted 116 average fMRI time series with 1200 time points for each subject. Further, we removed fMRI volumes with significant head movements [63] because such movements are shown to produce spatial artifacts in functional connectivity [64][65][66].

Simplicial complex
A simplex is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. A 0-simplex is a point, a 1-simplex is a line segment, and a 2-simplex is a triangle. In general, a k-simplex S k is a convex hull of k + 1 affinely independent points u 0 , u 1 , . . . , u k ∈ R k : Whereas, a simplicial complex K is a set of simplices that satisfies the following two conditions. (1) Every face of a simplex from K is also in K. intersection of any two simplices S 1 , S 2 ∈ K is a shared face [67]. We call a simplicial complex consisting of up to k-simplices a k-skeleton. Since graphs are a collection of nodes (0-simplices) and edges (1-simplices), they are 1-skeleton simplicial complexes. In a 1-skeleton, 0-dimensional (0D) holes are connected components while the 1-dimensional (1D) holes are cycles. A cycle or loop in a graph is a path that starts and ends at the same node but no other nodes in the path are overlapping. There is no higher dimensional homology beyond dimensions 0 and 1 in 1-skeleton [37].

Graph filtration
The brain networks are traditionally represented and analyzed as a graph, a 1-skeleton consisting of only nodes and edges [59][60][61][62]68]. The main focus of functional brain network analysis is quantifying and modeling the pairwise interaction between brain regions, which is usually called the effective connectivity [69][70][71][72]. Thus, we limited our algebraic representation of brain networks to graphs. Compared to the vast body of studies analyzing brain networks as graphs, modeling them as higher order simplicial complexes are only few [17,[73][74][75]. We used the graph filtration, which iteratively builds nested subgraphs of the original graph in a hierarchical manner [22]. Currently, this is the most often used filtration in analyzing brain networks due to its simplicity. Consider a weighted graph G(p, w), where p is the number of nodes and w = (w 1 , . . . , w q ) is a q-dimensional vector of edge weights with q = (p 2 − p)/2. The binary graph G (p, w ) of G(p, w) has binary edge weight w ,i : The binary network G (p, w ) is a 1-skeleton. In 1-skeleton, 0-dimensional (0D) holes are connected components while the 1-dimensional (1D) holes are cycles. There is no higher dimensional homology beyond dimensions 0 and 1. The number of connected components and the number of independent cycles in a graph are referred to as the 0th Betti number (β 0 ) and 1st Betti number (β 1 ) respectively. For 1-skeletons, there is an efficient 1D filtration method called the graph filtration, which filters at the edge weights from −∞ to ∞ in a sequentially increasing manner [6,37]. The graph filtration of G is defined as a collection of nested binary networks G 0 ⊃ G 1 ⊃ · · · ⊃ G k over increasing filtration values 0 < 1 < · · · < k . We used the edge weights as the filtration values to make the graph filtration unique [6]. During the graph filtration, edges are deleted one at a time from the lowest edge weight to the highest. The deletion of an edge disconnect the graph into at most two. Thus, the number of connected components (β 0 ) stays the same or increases at most by one. Euler characteristic χ of the graph is given by [76] Thus the change of Euler characteristic ∆χ over the filtration is given by where the change of β 0 is ∆β 0 = 0 or 1. Subsequently the change of β 0 is ∆β 1 = −1 or 0. The number of cycles decrease at most by 1 [6,77].

Birth-death decomposition
When we increase the filtration value , either one new connected component appears or one cycle disappears [6]. Once a connected component is born, it never dies implying an infinite death value. On the other hand, all the cycles are considered to be born at −∞. Therefore, we simply ignore the infinite death values of the connected components and the negative infinite birth values of the cycles and build the computation framework based on only the birth (death) values of the connected components (cycles) [37]. Also, the number of connected components (or cycles) is non-decreasing (or non-increasing) as increases. Subsequently, the 0D barcode is given by a set of increasing birth values: and the 1D barcode is given by a set of increasing death values: By tracing the birth values of connected components and the death values of cycles together, we can characterize the topology of the graph. The above 0D and 1D barcodes summarize the persistences of connected components and cycles and are often visualized using persistent diagrams [16,[25][26][27] and Betti curves. The Betti curves plot the Betti numbers with respect to the filtration values. Since the Betti numbers are monotonic, the Betti curve is a step function with a one-unit jump (or drop) at every birth (or death) values. The total number of finite birth values of connected components and the total number of death values of cycles are respectively [37]. The number of connected components (β 0 ) and cycles (β 1 ) in the complete graph G −∞ are 1 and m 1 respectively. We note that every edge weight must be in either 0D barcode or 1D barcode as summarized in the following theorem [37]. The schematic of graph filtration and birth-death decomposition for a random graph is presented in Figure 1. The cycles we identify using the birth-death decomposition algebraically independent of each other and hence form a basis for cycles [24,37]. In binary graph G W3 in Figure 1, there is one cycle consisting of edge weights W (4) , W (5) and W (6) . The cycle can be algebraically represented as [W (4) ] + [W (5) ] + [W (6) ] with the convention of putting clockwise orientation along the edges. In the complete graph G −∞ , there are three independent cycles All other cycles can be represented as a linear combination of C 1 , C 2 and C 2 . For instance, Schematic of graph filtration and persistent barcodes computation. We consider a random weighted graph with p = 4 nodes, where the number of edges is q = p(p − 1)/2 = 6. The random edge weights are {W 1 , W 2 , . . . , W 6 }. We order them using the order statistic as W (1) < W (2) < · · · < W (6) . We remove each edge of the random graph one at a time in the graph filtration and construct the random birth and death sets of the connected components and cycles, respectively. The Betti-0 (lower right) and Betti-1 (lower left) curves are drawn using the birth and death sets. The blue and green shaded areas represent the areas under Betti-0 and Betti-1 curves. Later, we will consider the area under Betti-0 curve to quantify the curve and construct a test statistic to discriminate between two groups of networks.
The total number of algebraically independent cycles is the 1st Betti number β 1 , which is equivalent to the number of death values of cycles. For a complete graph with p nodes, the total number of edges is p(p − 1)/2. In a graph filtration, the total number of birth values of connected components equals to the number of edges p − 1 in the maximum spanning tree. From the birth-death decomposition, the remaining edges contribute to the death values of cycles. The remaining number of edges are [37] The connected components characterize the modular structure or shape of a network whereas the cycles are loops in a network [24,78]. In [24], the authors focused specifically on cycles in a brain network as they embed higher order signal transmission paths to provide insights of the functioning of the brain. The presence of more cycles in a network indicates a dense connection with stronger connectivity. The cycles in the brain network not only determines the propagation of information but also controls the feedback [79]. The connected components and cycles provide dependent but complementary information about the network.

Wasserstein distance on barcodes
Since there is no higher dimensional homology beyond dimensions 0 and 1 in 1-skeleton, the 0D and 1D barcodes together can characterize the topology of a network [14]. Therefore, the topological similarity between two such networks can be quantified using a distance metric between the corresponding 0D or 1D barcodes [80]. Often used metric is the Wasserstein distance [23,[81][82][83]. Let G 1 (p, u) and G 2 (p, v) be two networks and the corresponding barcodes (or persistent diagrams) be P 1 and P 2 . Then, the 2-Wasserstein distance on barcodes is given by x − τ (x) 2 1/2 over every possible bijection τ between P 1 and P 2 [23,84,85]. For graph filtrations, barcodes are 1D scatter points. Therefore, the bijection τ can be simplified to the L 2 norm between the sorted birth values of connected components or the sorted death values of cycles [23].
Theorem 2 Let G 1 and G 2 be two networks with p nodes and be the birth and death sets of the network G k , k = 1, 2. Then, the 2-Wasserstein distance between the 0D barcodes for graph filtration is given by , and the 2-Wasserstein distance between the 1D barcodes is

Expected persistent barcodes of random graph
Consider random graph G(p, W ), where its edge weights are drawn independent and identically from a distribution function F W . p is the number of nodes and W = (W 1 , . . . , W q ) is a q dimensional vector of random weights with q = (p 2 − p)/2. The considered graph is complete and its edge weights are drawn randomly from a continuous distribution. To be mathematically precise, the considered random graph is almost surely complete. Since we the edge weights are drawn from a continuous distribution, the probability of having zero edge weight is nil. Figure 2 displays weighted brain networks randomly drawn from Beta distributions. If we apply a graph filtration on the random weighted graph G(p, W W W ), we have a set of random birth values of connected components (or random 0D barcode) and a set of random death values of cycles (or random 1D barcode). Since the notions of random birth and death values are abstract, it is important to turn them into deterministic topological descriptors. As often, one of the simplest way to turn a random object into a deterministic summary is to consider its average behavior. To that end, we study the expected birth and death values (or expected persistent barcodes) as follows.
Let G(p, W ) be a random graph and its sorted random edge weights be where the subscript (i) indicates the ith smallest edge weight. For instance, Order statistics can be formulated by modeling indices (i) using random permutations while the actual edge weights are fixed nonrandom quantity. However, in order statistics, the indices (i) themselves are not considered as random but fixed [86,87]. They simply indicates the order the random variables are indexed. Only what is in the ith variable is considered as random. In this study, we will follow the traditional convention in order statistics. Let the random birth and death values of the connected components and cycles be where m 0 = p − 1 and m 1 = (p − 1)(p − 2)/2. Then, the expected birth and death values are given by where E indicates the standard expectation operator on a random weight. In order to compute the expected birth and death values, we provide an explicit expression for E W (k) , for k = 1, . . . , q, through Theorem 3 below.
Theorem 3 Let the edge weights W = {W 1 , W 2 , . . . , W q } of a random graph G(p, W ) be drawn from a distribution with cumulative distribution function (cdf) F W and probability density function (pdf) f W . Then, the expectation of the kth edge wight W (k) can be approximated by Proof Since the edge weights W 1 , W 2 , . . . , W q are drawn from a distribution with a cdf F W and a pdf f W , the pdf of the kth order statistic W (k) is given by W (k) does not follow a well-known distribution and, therefore, the computation of its mean and variance is difficult. However, [88] showed that the rth sample quantile of {W 1 , W 2 , . . . , W q } is asymptotically normally distributed: for large q, where AN stands for asymptotic normal distribution. Thus, the approximate mean and variance of W (k) can be found from (2) by letting r = k/(q + 1): The variance will be later used in computing confidence intervals. Now, we use Theorem 3 and provide expressions for the expected birth and death values in Theorem 4 below.
Theorem 4 Let G(p, W ) be a random graph, where its edge weights are drawn from a cdf F W . Then, the expected birth values of the connected components of G(p, W ) are given by and the expected death values of the cycles of G(p, W ) are given by The proof follows Theorem 3. As a special case of Theorem 4, we show that if the edge weights follow uniform distribution in [0, 1], the expected birth and death values have a more simplified and an exact form. The expected birth values of the connected components are given by and the expected death values of the cycles are given by Then the distribution of the kth order statistic simplifies to which is the distribution of the Beta distribution with parameters k and q + 1 − k. Since the mean of a Beta(k, q + 1 − k) distribution has an exact form of k/(q + 1), we have October 14, 2022 9/29 Once the expected birth and death values are computed, we can use them to plot Betti curves. In Figure 3 example, we generated two random graphs with p = 150 nodes and their edge weights drawn from Beta(2, 2) and Beta(2, 3) distributions. We observe that a slight change in distribution significantly affects the topology of a network.

Confidence bands on birth and death values
Given a set of n samples from a random graph G(p, W ), we show how to compute the confidence bands on the expected birth and death values. The method is later validated using a simulation study.
Let the random weights of the n sampled graphs be w 1 , w 2 , . . . , w n , where w i = (w i1 , w i2 , . . . , w iq ) and w ij ∼F W . From the previous section, we know that W (k) follows a asymptotic Gaussian distribution with its mean and variance being The density f W is estimated by averaging the n graphs with respect to their weights using Gaussian kernel density estimates (KDE). Let the average weight vector be w = 1 n n i=1 w i = (w 1 , w 2 , . . . , w q ) . Then, the KDE of the pdf f W is given by [89] f where K is the Gaussian kernel with bandwidth h. To estimate F −1 W , we first find the empirical cdf of F W based on the averaged weight vector w as Here, I A is an indicator function takeing value 1 if the event A is true and 0 otherwise. The inverse cumulative distribution of F W (x) is then given by Once f W and F −1 W are estimated, we plug-in the corresponding estimates in µ k and σ 2 k and obtain the estimates µ k and σ 2 k . Finally, we calculate the α% confidence intervals as where z α is such that P (Z ≥ z α ) = α/2 for standard normal Z ∼ N (0, 1). For α = 95, we have z α = 1.96.

Inference on expected birth and death values
Since a graph can be topologically characterized by 0D and 1D barcodes, the topological similarity and dissimilarity between two graphs can be measured using the differences of such barcodes. To quantify these differences, we propose the expected topological loss (ETL) as follows.
Let G 1 (p, U ) and G 2 (p, V ) be two random graphs, where the random weights U = {U 1 , . . . , U q } and V = {V 1 , . . . , V q } are drawn from distribution functions F U and F V , respectively. Further, let the expected birth and death values of G 1 (p, U ) be Similarly, let the expected birth and death values of G 2 (p, V ) be Then, the ETL is given by In most applications, the distribution functions F U and F V are unknown. In such scenarios, we plug-in the corresponding empirical distribution function estimates F −1 U and F −1 V in (3). The ETL is a function of expected 0D and 1D barcodes. The expected 0D barcode can also be viewed as the expected heights of branching in a random merge tree [90][91][92][93][94][95].

Application of ETL in discriminating networks
The ETL can be used to topologically discriminate between two groups of brain networks. Let Ω = {Ω 1 , . . . , Ω m } and Ψ = {Ψ 1 , . . . , Ψ n } be two sets consisting of m and n complete networks each comprising p number of nodes. Let the empirical distribution functions of the edge weights of the graphs in group Ω be F Ω1 , . . . , F Ωm and the expected birth and death values for Ω i be which will be simply denoted as u Ω 1i < · · · < u Ω m0i and v Ω 1i < · · · < v Ω m1i .
October 14, 2022 11/29 The average of expected birth and death values within the group Ω are given by and v Ω 1 < · · · < v Ω m1 .
Similarly, for the second group Ψ, let the average of expected birth and death values be The test statistic based on ETL to discriminate between groups is then given by Large L(Ω, Ψ) indicates a significant topological difference between the two groups whereas a small value suggests that there is no significant topological group difference.
Considering the probability distribution of the test statistic L(Ω, Ψ) is unknown, we use the permutation test [96][97][98][99]. In this study, we use 100000 permutations for small scale simulation studies and half million permutations for large-scale real data. A similar widely-used statistic is the maximum gap statistics. On a similar line to L(Ω, Ψ), the statistic is given by: We will use L 1 (Ω, Ψ) to compare with the ETL statistic L(Ω, Ψ) in the simulation study section.

Area under Betti curves in discriminating networks
The difference of β 0 curves can be also quantified using the area under the curve (AUC) [100,101]. The AUC for Ω i of the group Ω and for Ψ j of the group Ψ are given by We compute the AUC by summing up the areas of rectangular blocks formed by the expected persistent barcodes. For example, in Figure 1, the area under the Betti-0 curve is 2 W (5) − W (3) + 3 W (6) − W (5) .
To determine if AUC is significantly different between groups Ω and Ψ, we use the Wilcoxon rank sum test [102]. The Wilcoxon rank sum test is a nonparametric test of the null hypothesis, For randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X. This is unlike the previous situation, where we considered a ETL or a max-gap statistic. In those cases, since we are considering the distance between increasing births or deaths of two networks, the consideration of L 1 or L 2 norm in the statistic is meaningful. In contrast to distance-based methods, AUC offers an alternate area based topological inference procedure. The method can be equally applicable for area under Betti-1 curves as well.

Simulation study
Since there is no ground truth in real brain network data, we performed extensive simulation studies with known ground truth. The Matlab codes for simulation study is provided in https://github.com/laplcebeltrami/orderstat.

Validation of birth and death value estimates
We validate the method to estimate expected birth and death values. We generate the ground truth graph G(p, w) with given edge weights and calculate its birth and death values using the birth-death decomposition on p = 10 number of nodes [37]. The weight vector w is of dimension q = p(p − 1)/2 = 45. We drew the q variate random weights w from the Uniform(0, 1) distribution.
We then simulate n = 15 vectors of q-variate Gaussian noises and add them to the edge weights w of G(p, w) to have a set of n graphs G(p, w i ) = G(p, w + i ), i = 1, . . . , n, where i ∼ N q (0, σ 2 I) with σ = 0.02. The produced set of graphs {G(p, w 1 ), . . . , G(p, w n )} is considered as a realizations from a random graph G(p, W ) and apply the proposed method to calculate the expected birth and death values along with their corresponding confidence bands. Then, we compare them with the initially calculated birth and death values of G(p, w).

Analyzing topological similarity between networks
We provide a toy example to illustrate whether the topological similarity or dissimilarity of two groups of networks, drawn from two different distributions or the same distribution, can be identified using the ETL statistic (4). We used the Beta(a, b) distribution, which are all defined in interval (0, 1). The shape parameters a and b allow for the variety of shapes including the shape of a uniform distribution Uniform(0, 1) when a = b = 1. We considered three different distributions ( Figure 6).
We generated two groups of networks. For the both groups, we simulated n = 6, 8, 10, and 12 networks. We investigated the performance of both small (p = 10) and large (p = 100) network settings. Small networks may not yield complex cyclic structures often present in large networks. However, the overall conclusions are the same regardless of the size of networks. For the permutation test, we considered 100000 permutations and repeated that 10 times to compute the average p-values. Table 1 tabulates the p-values for small and large network settings. In all the scenarios, networks drawn from the same distribution produced large p-values and networks drawn from different distributions had p-values smaller than 0.01. Therefore, we conclude that the proposed ETL statistic, based on expected birth and death values, can discriminate networks drawn from different distributions at 99% confidence level. The middle panel of Figure 6 plots the histograms of the ETL test statistic and the corresponding observed test statistics (in dotted red) for two specific scenarios: (i) Beta(1, 1) vs. Beta(5, 2) (left) and (ii) Beta(1, 1) vs. Beta(1, 1) (right) with 12 networks in each group.

Comparison of ETL against baselines
We compared the proposed ETL with several other widely-used baseline topological distances such as bottleneck, Gromov-Hausdorff (GH), and Kolmogorov-Smirnov (KS) distances [21,103,104]. We also compared the results with the maximum gap statistic defined earlier in (5). In all the scenarios, we considered two groups of networks each of size n = 6. The remaining simulation setting is similar to the above. The corresponding p-values are presented in Table 2 for small (p = 10) and large network (p = 100) settings. From the table, we observe that the ETL performs well in most scenarios. In particular, we note that the KS based methods do not perform well whereas the maximum gap based method is quite competitive. Further, for testing no network differences, all the distances perform well. Since the maximum gap based method exhibits a competitive performance with the ETL based method, we plot the histograms of the maximum gap statistics obtained over different permutations and the corresponding observed test statistics (in dotted red) for two specific scenarios: (i) Beta(1, 1) vs. Beta(5, 2) (left) and (ii) Beta(1, 1) vs. Beta(1, 1) (right) with 6 networks in each group; see the bottom panel of Figure 6. Although both the methods (ETL and maximum-gap) perform well, the ETL generally produces better results (i.e., its p-value is closer to 0 when there is a network difference and closer to 1 when there is no network difference).

Area under Betti curves
We also conducted a simulation study for the method based on the area under β 0 curve. The considered simulation layout was the same as before. The obtained p-values are tabulated in Table 3 for small networks (p = 10) and large networks (p = 100). As shown in Figure 3, a slight change in distribution significantly changes the topology of the network and, therefore, the area under β 0 curve varies significantly. This change is more visible especially for large networks, which incases AUC. However, the Wilcoxon rank sum test places ranks to the aggregated sample that combined the first and second sample, then considers the sum of ranks for the both samples. This makes the test statistic fairly robust even if the distributions are varied. This makes the p-value computation extremely stable for large networks. For example Table 3 shows the Table 1. The average p-values obtained using the ETL statistic for various pairs of distributions for small (p = 10) and large (p = 100) network settings. Here, the columns 6 networks, 8 networks, 10 networks, and 12 networks indicate the number of networks that we considered for both the groups. The p-values smaller than 0.01 indicate that our method can identify network differences at a 99% confidence level.

Two-sample test using ETL statistic
Given the 400 correlation matrices of 168 males and 232 females, we aim to check whether the proposed ETL statistic can determine the difference between the groups of males and females. We assume that the male and female edge weights are coming from distributions with cdfs F U and F V , respectively. However, these distribution functions are unknown. Therefore, we need to estimate them because the ETL statistic is constructed using these cdfs. To estimate the cdf, we average the male (female) correlation matrices across 168 subjects (232 subjects) and find the empirical cdf based on the averaged 6670 edge weights. The empirical cdfs of the average edge weights of females (in solid black line) and males (in dashed red line) are presented in the left panel of Figure 8. We observe that the empirical cdf corresponding to female is slightly higher than that of male. This suggests a relatively more number of edge weights with smaller values for the female, and a relatively more number of edge weights with bigger values for the male. In other words, the distribution of the female edge weights is slightly positively skewed than the male edge weights. Figure 9 plots the β 0 and β 1 curves of the average female and male networks (calculated in the standard way) and their corresponding estimated counterparts (computed using the expected birth and death values). We observe that the estimated Betti curves well approximate the original Right panel: The β 0 (top) and β 1 (bottom) curves for female (in solid black) and male (in dashed red) brain networks. For a better visualization, we consider a threshold value of 0.5 while plotting the brain networks so that they contain fewer number of edges. Betti curves.
To conduct the test, we used the permutation test with 500000 random permutations. The observed test statistic is 4.9372 and the p-value is 0.0134. The histogram of test statistic is plotted in the right panel of Figure 8. We conclude that, although the weight distributions of the males and females are very close, the proposed ETL statistic can still discriminate them at a 95% confidence level.

Two-sample test using AUC statistic
We conducted a two-sample test using the method based on the area under β 0 curve. The observed value of the Wilcoxon rank-sum statistic is 48374. The statistic corresponds to the p-value of 0.1036. That is, the test fails to discriminate male and female subjects if we use the traditional values of α, the level of significance, to be 0.05 or 0.1. However, if we relax this assumption a bit, the test can discriminate males and females at a confidence level of 89.5%.
The sex differences of resting state functional networks were previously investigated. There is known sex difference in the parietal region involved in spatial ability [105]. [106] reported sex differences in the left parietal, precentral and postcentral regions. The sex difference is also reported in the left rolandic operculum [107]. The previous rs-fMRI studies mainly focused on brain region specific analysis and not topological. Our topological methods are different. The use of the order statistic in quantifying topological difference between males and females is novel. This method identifies the impact of distribution differences in topological features. These specific results have not been observed before to best of our knowledge.

Discussion
The concept of random graphs was first proposed in mid-twentieth century [39] and has been of many researchers' interest ever since [108][109][110][111][112]. The concepts of TDA tools such as persistent barcodes were extended to handle stochastic cases, which triggered the computation of expected persistent barcodes. However, such computation may require complex theoretical constructs. In this article, we considered a random graph model for which the computation of expected persistent barcodes became simplified by using the order statistic.
[37] formulated a topological loss based on the birth and death values of connected components and cycles of a network that provided an optimal matching and alignment at the edge level. In this article, we extended this formulation to a random graph scenario and proposed the expected topological loss (ETL) based on the expected birth and death values. We use the ETL as a test statistic to discriminate between two groups of networks. We validated this method using a simulation study. We showed that the ETL can identify group differences at a 99% confidence level whereas it produces large p-values when there is no network differences. We compared the proposed approach with baseline approaches and established an overall superior performance of the proposed method. Further, we considered the area under the Betti curves [101]. This resulted a scalar quantification of the curves which was used to discriminate between the groups. A respective simulation study showed its successful discriminative ability whenever there are network differences. We also applied the developed tools in a resting-state brain fMRI dataset and showed that they can differentiate male and female brain networks.
To calculate the expected persistent barcodes, we computed the unknown distribution using the nonparametric empirical distribution function. However, one may also consider hierarchical or Bayesian parametric models for the edge weights instead. For example, one may consider the edge weights to be drawn from a N (µ, σ 2 ) distribution, where the location parameter µ and the dispersion parameter σ 2 have a Gaussian and an inverse gamma conjugate prior, respectively. The parameters of the prior distributions will allow flexibility while we can still enjoy the advantages of a parametric model. This direction can be pursued in the future.
We can also use different filtration schemes such as relative filtration [113] or normalized filtration that scales filtration values between 0 and 1 [114]. As long as the order of sorted edge weights are not changed, they will not affect the statistical results. The Wasserstein distance we used is defined on the sorted edge weights. As long as we do not change the value of edge weights, the statistical results will not change.
Our methodology is based on the graph filtration, which gives both 0D and 1D persistence as monotonic 1D functions of birth and death values only. On the other hand, the clique filtration [115], does not produce monotone persistence or barcodes and the proposed method is not directly applicable [116][117][118][119][120][121]. Our method is applicable to any filtration that provides monotone persistence or Betti curves. The proposed graph filtration computes the barcodes in O(p log p), which is significantly faster than Rips filtrations. In traditional Rips filtrations, the computational complexity grows rapidly with the number of simplices [122]. With p nodes, the size of the k-skeleton grows as p k+1 and the computational run time is O(p 3k+3 ) [123,124]. Compared to the graph filtration, the Rips filtration constructed using Ripser package [125] is about 8 times slower in a computer. On top of that, we also need to compute the Wasserstein distance between persistent diagrams [126]. The Wasserstein distance computation requires expensive optimization process involving O(p 6 ) run-time [127,128].
Our algorithm exploits the geometric structure of the graph filtration, resulting in the persistence diagram representation in the form of 1-dimensional sorted scalar values. Thus, the proposed method computes he Wasserstein distance in O(p log p) bypassing multitude of computational bottlenecks. For resampling based statistical inference such as the permutation test, the computational bottleneck is caused by repreatedlby computing the test statistic over the random permutations of group labels at least half million times [129]. This is impractical if the Wasserstein distance for the Rips filtrations has to be used for 400 networks and then the whole computation has to be done repeatedly half million times. The development of scalable computation will have a significant impact in resampling based statistical inference.
Graph filtrations produce the monotone birth and death values over filtrations. Since our birth and death values are exactly edge weights, the slight changes in edge weight distribution will correspondingly change the birth and death values slightly. Since the expected topological loss (ETL) is sum of differences of birth and death The average Betti curves of obtained from the graph filtration on correlation matrices computed separately for the inhibition (go) and initiation (no-go) blocks of fMRI time series in cognitive aging study [130,131].
values, it will also change correspondingly. This is the very reason which assures that our method can successfully discriminate topological group differences whenever there is a difference in edge distributions between two groups.
The monotone Betti curves usually follow S-shaped curves similar to the pattern of cumulative distribution functions. The pattern of S-shaped Betti curves do not change drastically even if different dataset is used as long as they are weighted graphs. To demonstrate this, we plotted Betti curves for task-fMRI. Figure 10 shows the Betti curves for task-fMRI, where cognitive inhibition was measured using go/no-go paradigm in 144 subjects on 100 brain regions [130,131]. The correlation matrices were computed separately for the inhibition (go) and initiation (no-go) blocks of fMRI time series. The monotonic Betti curves show almost identical pattern of Betti curves in rs-fMRI networks in our study.