Supplementary information for ‘ maxnodf : an R package for fair and fast comparisons of nestedness between networks’

package for fair and fast comparisons of nestedness between networks’ Christoph Hoeppke and Benno I. Simmons 1 Conservation Science Group, Department of Zoology, University of Cambridge, The David Attenborough Building, Pembroke Street, Cambridge, CB2 3QZ, UK 2 Faculty of Mathematics, Wilberforce Road, Cambridge CB3 0WA, UK 3 Mathematical Institute, Andrew Wiles Building, Woodstock Road, Oxford, OX2 6GG, UK 4 Centre for Ecology and Conservation, College of Life and Environmental Sciences, University of Exeter, Cornwall Campus, Penryn, UK


Appendix S1
Throughout the manuscript we made the statement, that the extended NODF-maximisation problem is N P -hard.We proof this by first reducing the NODF-maximisation problem to an equivalent mixed integer problem (MIP).The class of mixed integer problems NODF-maximisation falls into is generally hard.We provide a proof for this by reducing the N P -complete vertex cover problem to the generalised NODF-maximisation problem.
Theorem 1 (NODF maximisation problem properties).Given three integer N, M, L ∈ N ≥1 , with N, M > 1 and N + M + 1 ≤ L ≤ N × M , the NODF maximisation problem can be written as a MIP.The resulting generalised NODF-maximisation problem is N P -hard.
Proof of Theorem 1.To formalise the NODF maximisation problem we first consider the set of biadjacency matrices we are optimising over.It is sufficient to consider these bi-adjacency matrices, as each undirected and unweighted bipartite network is uniquely described its bi-adjacency matrix.
For a given bipartite graph To be consider in the NODF maximisation process a bi-adjacency matrix A has needs to 1. be a binary matrix of the correct shape A ∈ {0, 1} N ×M , 2. have the correct number of total links

have no empty rows
For a given matrix A ∈ {0, 1} N ×M we denote the rows and columns by the column vectors r 1 , r 2 , . . ., r N and c 1 , c 2 , . . ., c M respectively such that (3) We can thus write the NODF maximisation problem as max where 1 is the ones vector, such that for all vectors In the following we will establish that the NODF maximisation problem (4) can be equivalently formulated as a linear mixed integer program.Linear mixed integer programs contain both continuous variables x ∈ [a, b] ⊂ R and discrete variables y ∈ Ω ⊂ N, however all optimisation variables are only involved in the constraints or in the objective function linearly.The NODF maximisation problem as written in (4), while only having linear constraints, contains the nonlinear expression (r T i r i ) −1 (r T i r j ).We commence by introducing binary variables to account for the decreasing fill conditions [r T i 1 > r T j 1] and, will evaluate to 1 if, and only if the i-th row has higher fill as the j-th row.Equivalently we treat the conditions [c T i 1 > c T j 1] for column fills.Let δ r i,j , δ c i,j ∈ {0, 1} be such that (5) We note that the expression As a consequence of δ r i,j , δ c i,j ∈ {0, 1}, we can enforce the conditions (5) with the linear constraints Next we introduce continuous variables w r i,j , w c i,j ∈ [0, 1] to represent the expressions In the following we will primarily focus on the derivation of the conditions for w r i,j , as the conditions for w c i,j follow with analogous reasoning.Let us first consider the subexpression which is quadratic in the entries of the underlying bi-adjacency matrix A ∈ {0, 1} N ×M .The fact that we are restricting ourselves to considering only binary variables in A allows us to reformulate this quadratic expression using binary, linearly constraint variables.We introduce q r i,j;l , q c i,j;l ∈ {0, 1} with In the optimisation problem q r i,j;l , q c i,j;l will only be multiplied with non-negative variables, consequently we may assume that q r i,j;l , q c i,j;l always attain there maximal value at the global optimum.Consequently the conditions introduced in (13), given that A is a binary matrix, ensure that q r i,j;l = A i,l A j,l , q c i,j;l = A l,i A l,j .
This lets us write the expression for w as which is linear in the q-variables.The remaining non-linearity in equations ( 16) and ( 17) is eliminated by introducing the variables binary variables z r k ∈ {0, 1}, and z c k ∈ {0, 1}, with From plot S4 of these constraints on z we can see that they ensure that Figure S4: Plot of the constraints on z-variables given in equations ( 18) and ( 19).
Using the binary z-variables we employ the so called big-M trick, as shown in Parker & Rardin (2014), we can implement constraints that are active only if the corresponding z-variable is 0 or false.
The constraints enforce the conditions ( 16) and ( 17) as progressively more restrictive conditions are enforced as z-variables are true for higher values of k.We note at this point that the constraints ( 21) and ( 22) are linear in all programming variables.As a last step we avoid the multiplication of w and δ-variables by including the constraints To shorten the notation for writing the complete linear mixed integer programming problem for NODF maximisation, we introduce the shorthand notation (δ, q, w) ∈ K, where K is the feasible set of all auxiliary variables The linear mixed integer program form of the NODF maximisation problem is therefore given as max For an appropriately chosen matrix B and vectors v x , x y and b problem ( 27) can be written in the more general form max We prove N P -hardness for this problem by following the well known reduction of the N P -complete minimum vertex cover problem to the above problem.Given a graph G = (V, E), where V is a finite list of graph vertices and E ⊂ V × V is a finite set of edges, the minimum vertex cover problem is to find the smallest vertex set V ⊂ V , so that for every (u, v) ∈ E at least one of the statements u ∈ V or v ∈ V is true.The minimum vertex cover was proven to be N P -hard in Karp (1972).Then N P -hardness of the generalised NODF-maximisation problem follows by the linear time reduction of the minimum vertex cover problem to 3 Appendix S2 Improving NODF calculation efficiency The NODF c metric relies on evaluating NODF n = NODF max(NODF) , where max(NODF) denotes the maximum nestedness that can be obtained by networks in the same class as the original network.We say that two network are in the same class if they share the same number of species in both classes and the total number of links in the network are identical.
We can compute the nestedness of a single network with reasonable efficiency, however approximating max(NODF) requires the application of optimisation algorithms.During the course of such algorithms, it is often the case that NODF is reevaluated repeatedly.The speed at which we can reevaluate NODF thus becomes the determining factor for the performance of optimisation algorithms.
To assess the computational cost of NODF, we revisit the definition provided in Almeida-Neto et al. (2008) and provide an algorithm for its evaluation.Note that other definitions of nestedness exist, with different advantages and disadvantages.We choose to focus on the original definition of NODF here, as it is the most widely used and has been extensively tested by Song et al. (2017), but developing and testing methods for normalising other definitions of nestedness (e.g.Fortuna et al. 2019) is an important area for future work.The computation of NODF relies on evaluating, for all row and column pairs, the paired nestedness N r,i,j paired , N c,i,j paired .We consider the rows r 1 , r 2 ∈ N ≥1 .If r 2 ≥ r 1 we set N r,r1,r2 = 0. Next we check that the number of links from row r 1 to row r 2 is decreasing.If the number of links is non decreasing, we set again N r,r1,r2 = 0, otherwise we compute the partial overlap between rows r 1 and r 2 .The partial overlap is defined as the proportion of links in row r 2 , which have matching links in row r 1 .The paired nestedness for columns is defined analogous.For a network A ∈ {0, 1} N ×M with N rows and M columns, computing the partial overlap between two rows or columns requires 2M or 2N operations respectively.During the computation of NODF we need to compute N 2 paired nestedness values for rows and M 2 paired nestedness values for columns.This results in a computational cost of O(N 2 × M + M 2 × N ) for one NODF evaluation.This cost can also be seen in Algorithm 1, as it results from summation inside of nested for loops.As a result, evaluating NODF for a network of twice the size Ã ∈ {0, 1} 2N ×2M requires 8 times as many operations as evaluating NODF for the original network A. We thus say that the computational cost of NODF grows cubic with network size.
Algorithm 1 NODF evaluation Computes the N ODF value of a network A 2: NODF r ← 0 3: for r 1 ∈ {1, . . ., N } do Compute the row contribution to NODF 5: for r 2 ∈ {r 1 + 1, . . ., N } do 6: end for 13: end for 14: for c 1 ∈ {1, . . ., M } do Compute the column contribution to NODF 15: for c 2 ∈ {c 1 + 1, . . ., M } do 16: end for 23: end for 24: 25: end procedure During optimisation processes we often perform minor modifications, which only alter the network in a single position.We can use the knowledge that all but one entries in the network remain unchanged to accelerate the computation of NODF.Consider a modification of network A ∈ {0, 1} N ×M in position (i, j).We only need to consider the N values A(1, j), . . .A(N, j) to update NODF r and the M values A(i, 1), . . ., A(i, M ) to update NODF c .This results in a total complexity reduction from O(N 2 × M + N × M 2 ) to O(N + M ).We can thus recompute the NODF value for a network which has been modified in a single position in linear rather than cubic time.These efficiency improvements enable the use of more advanced algorithms like, like hill climbing or simulated annealing, to the NODF maximisation problem.

Package functions
The R package maxnodf contains methods for maximising the NODF metric for ecological networks, subject to the constraint that all species have at least one link.Different applications will require a different trade-off between optimisation quality and performance.To address these different preferences we offer three different quality settings in the maxnodf function: 0, 1 and 2, where 0 is the lowest quality and 2 is the highest quality.These three quality settings build upon each other, guaranteeing that the NODF values obtained on a higher quality setting will never be below those obtained at lower quality.The medium quality setting (quality 1) first runs the low quality algorithm (quality 0) before applying further optimisation.Similarly, the high quality setting (quality 2) first runs the medium quality algorithm (quality 1), before applying further optimisation.As each algorithm takes, as its starting point, the solution of the next-lowest quality algorithm below it, higher quality algorithms can never have worse optima than lower-quality algorithms.We note that networks with N rows M columns and a total of L links, where L ≤ N + M are necessarily compartmentalised (contain multiple disconnected subnetworks).In such cases nestedness computations should be performed on individual compartments of the network.In these cases maxnodf displays a warning message, and the maximisation is instead performed on the set of networks with L = N + M + 1 links.

Quality 0: Greedy algorithm
Calling maxnodf with a quality parameter of 0 invokes a greedy optimiser, which is the least accurate but fastest optimiser offered in maxnodf.We first construct a network . ., M .For the case N = M = 5 the initial network is given by Note that by filling the first row and first column of the matrix, our constraint that all species must have at least one link is satisfied in the initial network.We denote the network and its component at step k by A (k) and a (k) i,j respectively.In each step we consider potential new link positions, which border links in the current network (at no point are links in the first row or column modified, ensuring that all species always have at least one link).We evaluate NODF for each of the potential modifications of A (k−1) and introduce a new link at the NODF-maximising index pair (i * , j * ).We successively add links to A (k) until the desired number of links is reached.An algorithm description of this procedure is given below: Algorithm 2 Greedy optimisation 1: procedure maxnodf(N, M, L, quality=0) Greedy maximisation

15:
A(i, j) ← 0 16: end for 17: end for 19: return A 20: end procedure Quality 1: Greedy algorithm with hill climbing Calling maxnodf with the quality parameter 1 first invokes the same greedy algorithm as quality 0 does.Starting further optimisation from this network ensures that quality 1 will never perform worse than quality 0.
Improvements over the greedy algorithm can be obtained with an algorithm called hill climbing.In hill climbing we perform local modifications to a network by moving a link from its original position to a neighboring position.Algorithm (3) describes how neighbour positions are computed: the neighbours of a focal link are the empty (0) cells immediately above, below, left, and right, of the focal link.For all link positions (i, j), with i, j ≥ 2 and A(i, j) = 1 we test moving the link to all possible neighbor positions, and for each of the resulting networks we compute NODF.In each hill climbing step, the network is modified by executing the NODF-maximising move of a link to a neighboring position.Links located in the first row and first column are excluded from this procedure, ensuring all species always have at least one link, and to avoid compartmentalization effects during this optimisation procedure (i.e.we ensure the network is always one connected component, and never comprises more than one disconnected components).We say that the matrices obtained by moving a link position to a neighboring position are neighbors of the previous matrix.The hill climbing algorithm terminates once no such local modification achieves a strictly higher NODF value.This ensures that we always terminate on a local maximum.Algorithm (4) shows how the hill climbing procedure is used to improve on a previous result.

Algorithm 3 Neighbors
A(i, j) ← 1 10: A(i, j) ← 0 end while 12: return A 13: end procedure Quality 2: Greedy algorithm with hill climbing and simulated annealing For the quality parameter of 2, we first apply the same greedy and hill climbing algorithms as in quality settings 0 and 1. Saving a copy of the starting position, and updating it upon finding a new best network ensures that quality 2 will never perform worse then 0 or 1. Applying the hill climbing algorithm after a new optimal network is found ensures that the algorithm always terminates in a local optimum.As a result, moving a link to a neighboring position cannot improve the NODF value.To further improve on such networks we require algorithms that are able to escape from local optima.Such algorithms necessarily have the property that worse results need to be accepted in certain situations.
One such algorithm is called simulated annealing and was proposed by Kirkpatrick et al. (1983).In simulated annealing, we accept network modifications which reduce the NODF with a certain probability.This probability is determined by a temperature parameter T and the reduction in NODF which this move has introduced is denoted by ∆ NODF.We then accept the move according to the Kirkpatrik acceptance probability function After each step of the simulated annealing algorithm, the temperature T is reduced by setting 1) , where 0 < α < 1.As result the algorithm is more likely to accept large drops in NODF when T is large, resulting in a fast scan across the space of possible networks at the start of the algorithm, followed by a more concentrated search in the area where we expect the true optimal network, when T is low.Simulated annealing is an established and reliable method within network research, having been used, for example, to obtain the network partition which maximises modularity (Guimera & Amaral 2005).The core of our simulated annealing algorithm follows the same ideas as these past examples.However, our novelty comes from an highly efficient implementation, which results in fast computation speeds, as well as combining simulated annealing with other algorithms -greedy and hill climbing -to produce bespoke algorithms for NODF maximisation, as detailed below.
We combine the simulated annealing with hill climbing by starting a search for local optima whenever the simulated annealing algorithm reaches a new NODF-maximising network.This condition guarantees that the hill climbing algorithm does not reset the algorithm the previous local optimum.The algorithm used by maxnodf at quality 2 is given below in pseudocode.Note that neighbour solutions are computed as shown in Algorithm (3), and, as in the previous algorithms, the first row and column of the network are never changed, ensuring all species always have at least one link.

Analysis S1
To test if it might ever be misleading to use the lower quality greedy algorithm, we correlate the maximum NODF achieved by the greedy (quality 0) and simulated annealing (quality 2) algorithms for the 135 networks in our dataset.
We find that the simulated annealing algorithm and the greedy algorithm produce almost identical results when the maximum nestedness is high, but the simulated annealing algorithm produces increasingly better maxima than the greedy algorithm when the maximum nestedness is low (Figure S5).Specifically, we find that the percentage increase in maximum NODF achieved by the simulated annealing algorithm is kept to below 5% when the maximum NODF in the network (assessed by the greedy algorithm) is greater than 0.6 (Figure S5c).When maximum NODF is below 0.6, the simulated annealing algorithm produces maxima that are sometimes greater than a 5% increase relative to the greedy algorithm (Figure S5c).(c) Relationship between the maximum nestedness calculated by the greedy algorithm and the percentage increase in maximum nestedness achieved by using the simulated annealing algorithm over the greedy algorithm.As long as the greedy algorithm reports a maximum nestedness above 0.6, there is less than a 5% increase in maximum nestedness to be gained by using a simulated annealing algorithm.Thus, if the greedy algorithm reports a maximum nestedness of >0.6, it may not be necessary to use the simulated annealing algorithm.
We therefore recommend that if time or computational constraints prevent running all networks through the simulated annealing algorithm, best practice might involve running all networks in a dataset through the greedy algorithm, and if any of these networks have a maximum nestedness below 0.6, we recommend running these networks through the simulated annealing algorithm.
We believe that the greedy algorithm is the best choice for most questions and think that the sacrifice in quality would rarely change conclusions qualitatively.This is supported by a recent re-analysis which found that using higher-quality simulated annealing algorithms to calculate nestedness did not qualitatively change conclusions (Song et al. 2019).Higher-quality algorithms should therefore only be used if computational time is not a constraint.The highest quality algorithm does technically produce more accurate results and, for a small number of networks, large percentage changes in maximum NODF, but in aggregate this is unlikely to make qualitative difference to results, unless all networks in the dataset have unusually low maximum nestedness values (which can be assessed beforehand using the greedy algorithm).

Analysis S2
In their seminal paper on nestedness, Bascompte et al. (2003) normalized nestedness values as (E − N )/N , where E is the observed nestedness of a network and N is the mean nestedness across an ensemble of null networks.This is in contrast to z-scores, which are normalized as (E-N )/σ, where σ is the standard deviation of nestedness across an ensemble of null networks.
We tested for a correlation between null model measures and the N ODF c measure presented in this paper.For the null model measure we normalised by dividing by the average of the null replicates rather than their standard deviation.We ran the analysis for the same 135 networks as used in the benchmarking analysis.We used the 'curveball' algorithm to generate null networks, which preserve degree distribution and connectance (Strona et al. 2014).We found no strong association between our measure N ODF c and the null model measure (correlation coefficient 0.17).
There was no significant relationship between the two measures.Although the P value was approaching significance (0.0512), the R squared was low (0.02).Visually, the two measures appear highly unrelated (Figure S6).Two outliers were removed from this analysis whose Cooks distance exceeded 4/n, where n is the number of data points.

Figure S1 :Figure S2 :Figure S3 :
Figure S1: Comparison of the time taken to run, and maximum NODF achieved, for the greedy algorithm in the maxnodf package (quality 0) and the original algorithm proposed by Song et al. (2017).All algorithms were run on an identical set of 135 empirical pollination networks from the Web of Life dataset.Small points represent individual networks; large points represent medians.Ellipses represent 95% confidence intervals.

Figure S5 :
Figure S5: Comparison of the maximum nestedness achieved by the greedy (maxnodf 0) and simulated annealing (maxnodf 2) algorithms.Network data were the same 135 networks as in Figure 1.(a) Relationship between the raw nestedness values produced by the two algorithms.(b) Relationship between the ranks of maximum nestedness values produced by the two algorithms.(c)Relationship between the maximum nestedness calculated by the greedy algorithm and the percentage increase in maximum nestedness achieved by using the simulated annealing algorithm over the greedy algorithm.As long as the greedy algorithm reports a maximum nestedness above 0.6, there is less than a 5% increase in maximum nestedness to be gained by using a simulated annealing algorithm.Thus, if the greedy algorithm reports a maximum nestedness of >0.6, it may not be necessary to use the simulated annealing algorithm.

Figure S6 :
Figure S6: Association between NODFc and null-corrected NODF, where null-corrected NODF is defined as (E-N )/N , where E is the observed nestedness of a network and N is the mean nestedness across an ensemble of null networks.