Main

The informatics revolution has made it possible for a wide range of large-scale, rapidly evolving networks, such as transportation, technological, social and biological networks, to be analysed1,2,3,4,5. Although these networks are extremely different in their function and attributes, the analysis of their fabric provides evidence of several shared regularities, suggesting general and common self-organizing principles beyond the specific details of the individual systems. In this context, the statistical physics approach has been exploited as a very convenient strategy because of its deep connection with statistical graph theory and because of its power to quantitatively characterize macroscopic phenomena in terms of the microscopic dynamics of the various systems1,2,3,4,9. As an initial discriminant of structural ordering, attention has been focused on the networks’ degree distribution; that is, the probability P(k) that any given node in the network shares an edge with k neighbouring nodes. This function is, however, only one of the many statistics characterizing the structural and hierarchical ordering of a network; a full account of the connectivity pattern calls for the detailed study of the multipoint degree correlation functions and/or an opportune combination of these.

In this letter, we tackle a main structural property of complex networks, the so-called ‘rich-club’ phenomenon. This property has been discussed in several instances in both social and computer sciences and refers to the tendency of high-degree nodes, the hubs of the network, to be very well-connected to each other. Essentially, nodes with a large number of links, usually referred to as rich nodes, are much more likely to form tight and well-interconnected subgraphs (clubs) than low-degree nodes. A first quantitative definition of the rich-club phenomenon is given by the rich-club coefficient φ, introduced in the context of the Internet in ref. 7. Denoting by E>k the number of edges among the N>k nodes having degree higher than a given value k, the rich-club coefficient is expressed as

where N>k(N>k−1)/2 represents the maximum possible number of edges among the N>k nodes. Therefore, φ(k) measures the fraction of edges actually connecting those nodes out of the maximum number of edges they might possibly share. The behaviour of the rich-club coefficient as a function of the degree k is a probe for the topological correlations in a complex network, and it yields important information about its underlying architecture. Structural properties, in turn, have immediate consequences on a network’s features and tasks, such as robustness, performance of biological functions or selection of traffic backbones, depending on the system at hand. In a social context, for example, a rich-club coefficient increasing with the degree k indicates the dominance of an ‘oligarchy’ of highly connected and mutually communicating individuals, as opposed to a structure consisting of many loosely connected and relatively independent subcommunities. In the Internet, such a feature would point to an architecture in which important hubs are much more densely interconnected than peripheral nodes to provide the transit backbone of the network7. It is also worth stressing that the rich-club phenomenon is not trivially related to the mixing properties of networks, which enable the distinction between assortative networks, where high-degree nodes preferentially attach to high-degree nodes, and disassortative networks, showing the opposite tendency10,11,12. Indeed, the rich-club phenomenon is not necessarily associated with assortative mixing. In the top panel of Fig. 1, we sketch a simple construction in which a disassortative network is showing the rich-club phenomenon. In other words, the rich-club phenomenon and the mixing properties express different features that are not trivially related or derived from each other (the technical discussion of this point is reported in the Methods section).

Figure 1: Schematic of the rich-club phenomenon and rich-club spectrum φ(k) for real networks.
figure 1

At the top, a conceptual example of disassortative network showing the presence of the rich-club phenomenon is shown. Disassortative mixing is given by the tendency of hubs to be, on average, more likely to be connected to low-degree nodes. However, the four rich nodes represented in the schematic show a clear rich-club behaviour by forming a fully connected clique within the club. At the bottom, results for the four real-world networks and the three analysed models are shown. The computer generated networks, ER, MR and BA, have size N=105 and average degree 〈k〉=6. ER refers to the Erdös–Rényi graph, MR is constructed from the Molloy–Reed algorithm with a given degree distribution P(k)k−3 and the Barabasi–Albert (BA) model is generated by growing a network with preferential attachment that produces a scale-free graph with power-law degree sequence with exponent γ=3. Results are averaged over n=102 different realizations for each model. All networks share a monotonic increasing behaviour of φ(k), independent of the nature of the degree distribution characterizing the network and of the possible presence of underlying structural organization principles. Random networks, either having a poissonian degree distribution (such as ER) or a heavy-tailed P(k) (such as MR and BA), also show a rich-club spectrum increasing with increasing values of the degree. This common trend is indeed due to an intrinsic feature of every network structure, for which hubs have simply a larger probability of being more interconnected than low-degree nodes.

In Fig. 1, we report the behaviour of the rich-club coefficient as a function of the degree in a variety of real-world networks drawn from the biological, social and technological world. In Table 1, we summarize the basic topological features of these networks and the datasets used. We also consider three standard network models: the Erdös–Rényi (ER) graph13, the generalized random network having a heavy-tailed degree distribution obtained with the Molloy–Reed (MR) algorithm14 and the Barabasi–Albert (BA) model15. In the ER graph, N nodes are connected by E edges randomly chosen with probability p out of the N(N−1)/2 possible pairs of nodes. The MR network is obtained starting from a given degree sequence P(k) (in our case P(k)kγ with γ=3) by randomly connecting nodes with the constraints of avoiding self-loops and multiple edges. The BA model is generated by using the growing algorithm of ref. 15 that produces a scale-free graph with a power-law degree sequence with exponent γ=3. In all cases, the generated networks have N=105 vertices and an average degree 〈k〉=6.

Table 1 Basic topological properties of the analysed datasets. We considered four real-world networks: (a) the protein interaction network12,22 of the yeast Saccharomyces cerevisiae collected with different experimental techniques and documented at the database of interacting proteins (http://dip.doe-mbi.ucla.edu/); (b) the scientific collaboration network23,24 extracted from the electronic database e-print archive in the area of condensed matter physics (http://xxx.lanl.gov/archive/cond-mat/), from 1995 to 1998, in which nodes represent scientists and a connection exists if they coauthored at least one paper in the archive; (c) the network of worldwide air transportation25,26 representing the International Air Transport Association (http://www.iata.org/) database of airport pairs connected by direct flights for the year 2002; (d) the Internet network at the autonomous system4 level4,10,27,28,29 from data collected by the Oregon Route Views project (http://www.routeviews.org/) in May 2001, in which nodes represent Internet service providers and edges connections among those. The sizes of the networks in number of nodes and edges are shown, along with the average degree 〈k〉 and the maximum degree value kmax. We also give the value for the corresponding structural cutoff, ks, in the uncorrelated case16.

As is evident from Fig. 1, the monotonic increase of φ(k) is a feature shared by all of the analysed datasets. This behaviour is claimed to provide evidence of the rich-club phenomenon as φ(k) progressively increases in vertices with increasing degree (see, for example, ref. 7 for the Internet case, where a different representation of the function is adopted with φ defined in terms of the rank r of nodes sorted by decreasing degree values). However, a monotonic increase of φ(k) does not necessarily imply the presence of the rich-club phenomenon. Indeed, even in the case of the ER graph, a completely random network, we find an increasing rich-club coefficient. This implies that the increase of φ(k) is a natural consequence of the fact that vertices with high degree have a larger probability of sharing edges than low-degree vertices. This feature is therefore imposed by construction and does not represent a signature of any particular organizing principle or structure, as is clear in the ER case. A simple inspection of the φ(k) trend is therefore potentially misleading in the discrimination of the rich-club phenomenon.

To find opportune baselines for the detection of the rich-club phenomenon, we focus on the theoretical analysis of φ(k). In the Methods section, we derive an expression for the rich-club coefficient as a function of the convolution of the two vertices degree-correlation function P(k,k′). Interestingly, it is possible to obtain an explicit expression for the rich-club coefficient of random uncorrelated networks. In this case, the two vertices degree-correlation function is a simple function of the degree distribution, yielding the following behaviour for uncorrelated large-size networks at high degrees:

where kmax is the maximum degree present in the network. Equation (2) shows unequivocally that the rich-club coefficient is also a monotonically increasing function for uncorrelated networks, so that, to assess the presence of rich-club structural ordering, it is necessary to compare it with that obtained from the appropriate null model with the same degree distribution, thus providing a suitable normalization of φ(k).

From the discussion above, a possible choice for the normalization of the rich-club coefficient is provided by the ratio ρunc(k)=φ(k)/φunc(k), where φunc(k) is analytically calculated by inserting the network’s degree distribution P(k) into equation (4), as reported in the Methods section. A ratio larger than one is the actual evidence for the presence of a rich-club phenomenon leading to an increase in the interconnectivity of high-degree nodes in a more pronounced way than in the random case. In contrast, a ratio ρunc(k)<1 is a signature of an opposite organizing principle that leads to a lack of interconnectivity among high-degree nodes. On the other hand, a completely degree–degree uncorrelated network with finite size is not always realizable owing to structural constraints. Indeed, any finite-size random network presents a structural cutoff value ks over which the requirement of the lack of dangling edges introduces the presence of multiple and self-connections, and/or degree–degree correlations16,17. Networks with bounded degree distributions and finite second moment 〈k2〉 present a kmax that is below the structural value ks. In this situation, φunc(k) is properly defined for all degrees and is representative of the network topology. However, in networks with heavy-tailed degree distribution (for example, scale-free degree distributions with 2<γ≤3, as observed in many real systems), this is no longer the case and ks is generally smaller than kmax. In fact, structural degree–degree correlations and higher-order effects, such as the emergence of large cliques18, set in even in completely random networks. The normalization of φ(k) that takes these effects into account is provided by the expression ρran(k)=φ(k)/φran(k), where φran(k) is the rich-club coefficient of the maximally random network with the same degree distribution P(k) of the network under study2,12. Operatively, the maximally random network can be thought of as the stationary ensemble of networks visited by a process that, at any time step, randomly selects a couple of links of the original network and exchanges two of their ending points (automatically preserving the degree distribution). Also in this case, an actual rich-club ordering is denoted by a ratio ρran(k)>1. Therefore, whereas ρunc(k) provides information about the overall rich-club ordering in the network with respect to an ideally uncorrelated graph, ρran(k) is a normalized measure that discounts the structural correlations owing to unavoidable finite-size effects, providing a better discrimination of the actual presence of the rich-club phenomenon due to the ordering principles shaping the network.

In Fig. 2, we report the ratios ρran(k) for the real-world and the simulated networks. The analysis clearly discriminates between networks with or without rich-club ordering. In particular, we identify a strong rich-club ordering in the scientific collaboration network, providing support to the idea that the elite formed by more influential scientists tends to form collaborative groups within specific domains. This also supports the view that the rich-club phenomenon is a natural tendency in many social networks. We find a clearly opposite result in the decreasing behaviour of the rich-club spectrum for the protein interaction network and the Internet map at the autonomous system level. In both cases, this evidence provides interesting information about the system structure and function.

Figure 2: Assessment for the presence of the rich-club phenomenon in the networks under study.
figure 2

φ(k) is compared with the null hypothesis provided by the maximally random network with φran(k). The ratio ρran=φ/φran is plotted as a function of the degree k and compared with the baseline value equal to 1. If ρ(k)>1 (<1) the network shows the presence (absence) of the rich-club phenomenon with respect to the random case. The protein interaction network, the Internet map at the autonomous system level and the scientific collaboration network show clear behaviours as explained in the main text. The worldwide air transportation network shows a mild rich-club ordering with ρran(k)>1. The ER and MR network models show a ratio ρran(k)=1 for all k, as expected, whereas the BA model shows a mixing behaviour with values above 1 for very high degrees.

The lack of rich-club ordering in the protein interaction network indicates that proteins with large numbers of interactions are presiding over different functions and thus, in general, are coordinating specific functional modules (whose detailed analysis requires specific tools19). Figure 3 shows portions of the protein interaction network and the scientific collaboration network including the club of N>k nodes (N>k=29 and N>k=35 for the protein interactions, N>k=30 and N>k=36 for the scientific collaboration) and the connections among them. The network representations clearly show the presence of a rich-club phenomenon in the scientific collaboration network, where most of the rich nodes are highly interconnected forming tight subgraphs, in contrast to the protein interaction network case, where only a few links seem to connect rich nodes, the rest linking to lower-degree vertices.

Figure 3: Graph representations of the rich clubs.
figure 3

Progressively smaller clubs of N>k rich nodes in the protein interaction network (top) and in the scientific collaboration network (bottom) are shown together with the E>k connections among them. Here N>k=35, E>k=37 (top left) and N>k=29, E>k=21 (top right) for the protein interaction network; N>k=36, E>k=62 (bottom left) and N>k=30, E>k=54 (bottom right) for the scientific collaboration network. The two graph representations for each network show progressively smaller clubs made of N>k rich nodes for increasing values of the degree k. The links connecting the rich nodes to the rest of the network are not represented for the sake of simplicity. The protein interaction network shows a club whose hubs are relatively independent, being loosely connected among each other, leaving the remaining links to coordinate specific functional modules. A different picture is observed in the scientific collaboration network case, where most of the hubs form cliques and tightly interconnected subgraphs, thus revealing the tendency of scientists to form densely interconnected collaborative groups. The graphs have been produced with the Pajek software (http://vlado.fmf.uni-lj.si/pub/networks/pajek/).

In the case of the Internet, the appropriate analysis of the rich-club phenomenon shows that, in contrast to previous claims7, the structure at the autonomous system level lacks rich-club ordering. This might seem counter-intuitive. It is reasonable to imagine that the Internet backbone is made of interconnected transit providers that are also local hubs. This, however, is not the case and an explanation can be easily found in the fact that we are just considering topological properties. Indeed, the backbone hubs are identified more in terms of their bandwidth and traffic capacity than in terms of the sole number of connections. The present result suggests that high-degree hubs provide connectivity to the local region of the Internet and are not tightly interconnected. The backbone of interconnected transit providers is instead identified by high-traffic links, which play a crucial role in terms of traffic capacities but whose number might represent a small fraction of the total possible number of interconnections.

This discussion points out that, in some cases, the concept of rich-club ordering should be generalized to evaluate the richness of vertices not just in terms of their degree but in terms of the actual traffic or intensity of interactions handled. In this case, we have to consider a weighted network representation of the system where a weight wi j representing the traffic or intensity of interaction is associated with each edge between the vertices i and j. Also in this case, however, the study of the weighted rich-club coefficient alone does not discriminate the actual presence of the rich-club effect (see Methods section). Given the entanglement of the weight and degree correlations, the appropriate null hypothesis is, however, more complicated to define and a detailed account of the evaluation of the weighted rich-club effect will be provided elsewhere.

The analysis presented here provides the baseline functions for the detection of the rich-club phenomenon and its effect on the structure of large-scale networks. This allows the measurement of this effect in a wide range of systems, finally enabling a quantitative discussion of various claims such as ‘high centrality’ backbones in technological networks and ‘elitarian’ clubs in social systems.

Methods

Analytical expression of the rich-club coefficient

The basic analytical understanding of the rich-club phenomenon starts by considering the quantity Ek k, representing the total number of edges between vertices of degree k and of degree k′ for , and twice the number of edges between vertices in the same degree class. We can express the numerator of φ(k) in equation (1) as , where kmax is the maximum degree present in the network and where, for the sake of simplicity, the variable k is thought of as continuous. In turn, the quantity Ek k can be expressed as a function of the joint degree probability distribution11,16,20,21 by means of the identity NkP(k,k′)=Ek k, yielding

From equation (3), it is clear that φ(k) is also a measure of correlations in the network, although it represents a different projection of P(k,k′) as compared with other degree–degree correlation measures. At the same time, it is possible to see that the rich-club coefficient expresses a property that is not trivially related to the usual indicators of assortative behaviour, such as the Pearson’s correlation coefficient11 or the average nearest-neighbour degree10. Note that these assortativity measures quantify two-point correlations and so account for quasi-local properties of the nodes in the network, whereas the rich-club phenomenon is computed as a global feature within a restricted subset. The double integral is indeed a convolution of the correlation function that allows the presence of different combinations of the assortative and rich-club features in the same network.

Only in the case of random uncorrelated networks3,4,21 does the joint degree distribution P(k,k′) factorize and take the simple form Punc(k,k′)=k kP(k)P(k′)/〈k2. By inserting this expression into equation (3), we obtain φ(k) for uncorrelated networks as

where we have applied L’Hôpital’s rule to derive the behaviour for large-size networks and high degrees.

Rich-club coefficient for weighted networks

If the rich-club is defined as the set of nodes having a strength larger than a given value s, a possible definition of the weighted rich-club coefficient can be expressed as

where W>s represents the sum of the weights on the links connecting two nodes in the club and the normalization is given by the sum of the strengths of the rich nodes.