Elsevier

Social Networks

Volume 21, Issue 3, July 1999, Pages 239-267
Social Networks

The interaction of size and density with graph-level indices

https://doi.org/10.1016/S0378-8733(99)00011-8Get rights and content

Abstract

The size and density of graphs interact powerfully and subtly with other graph-level indices (GLIs), thereby complicating their interpretation. Here we examine these interactions by plotting changes in the distributions of several popular graph measures across graphs of varying sizes and densities. We provide a generalized framework for hypothesis testing as a means of controlling for size and density effects, and apply this method to several well-known sets of social network data; implications of our findings for methodology and substantive theory are discussed.

Introduction

In the study of social networks, positional (or nodal) indices are often employed in order to understand particular features of social positions; likewise, higher level measures like centralization are useful for gaining an understanding of social networks in their entirety. The need to quantify phenomena at the network level has given rise to Graph-level indices (GLIs) such as degree centralization, connectedness, and hierarchicalization. These measures quantify various features of graphs.2 For instance, in measuring Krackhardt hierarchicalization, one obtains the fraction of connected pairs that are asymmetric in their ability to reach one another. This quantity by itself is informative and useful in that one gains insight into a graph's structure by knowing it; furthermore, in various substantive contexts the value of such a measure may have theoretical significance for some phenomenon of interest per se (the spread of rumors, for instance). Problems arise, however, when one is interested in constructing a sociological interpretation of a graph-level measure. Rather than being concerned with what the GLI is for a particular graph, one may wonder why it takes on a particular value. For example, examining a very sparse relation, such as “x is the mentor of y”, will usually result in the observation of an extremely high hierarchicalization value.3 One can certainly conclude that the network is very hierarchical. However, is this hierarchicalization the result of the nature of “mentorship” or merely of the network's sparseness? Similarly, we may run into difficulties when attempting to use this measure as a predictive or classificatory variable. If we examine the hierarchicalization values of mentorship in a variety of populations and find little variance, does this suggest that there is something inherent to mentorship per se which makes it uniformly hierarchical in a wide range of cases, or is it simply the case that such is a necessary result of studying any sparse relation? If we attempt to predict other variables from hierarchicalization and find positive results, should we assume that it is the hierarchy which matters, or the sparseness of the relation? The problem is an important one, and affects our research whether our use of the GLI is independent or dependent, classificatory or motivated by substantive theory.

The reason for this particular difficulty, as we have suggested, lies in the subtleties of the distributions of GLIs across the space of possible structures. In the case described above, problems arise because sparse digraphs have disproportionately high hierarchicalization scores; this follows from the fact that a far greater proportion of reachability relations are asymmetric within this set of graphs than within the set of all directed graphs. When faced with one or more observations of “high” hierarchicalization, then, one cannot immediately distinguish between the possibility that the observations follow from the mathematical necessity of sparseness, and the possibility that the observations reflect a network formation process which is biased towards hierarchicalization per se. Without a baseline model (Mayhew, 1984) — that can tell one what one should expect from the most basic parameters of graph structure — one is therefore quite limited in the conclusions one can draw from the GLI values alone.

The hierarchy example demonstrates that density can be a powerful covariate of GLIs, and that failure to carefully consider its effects can lead to difficulties in the analysis of network data. Unfortunately, many if not most GLIs are also quite sensitive to graph size as well. How can we take such factors into account when analyzing network data containing GLIs? Here, we shall attempt to characterize GLI behavior with respect to these most basic of structural parameters, and to suggest how these behaviors impact network theory and methodology. In order to control for size and density effects, hypothesis tests using baseline models on size and density are also described. As an illustration of the use of the technique, simple null-hypotheses will be tested against 20 of the social networks found in the database of the network software UCINET IV (Borgatti et al., 1991) for six common GLIs. Finally, some implications of our findings regarding the interaction of GLIs with size and density for network theory and methodology will be discussed, along with directions for future research.

Although size and density clearly affect GLIs, efforts to control their effects have often been limited to adjusting the GLIs themselves. To partially remove the effects of size, for instance, measures are often normalized by the maximum attainable value for a graph of a given size. Below is an example of one common normalized measure of graph degree centralization (Freeman, 1979).CD=i=1g(CD(n*)−CD(ni))maxi=1g(CD(n*)−CD(ni))=i=1g(CD(n*)−CD(ni))(g−1)(g−2).

The CD(ni) in the numerator are the individual sums of each of the g actors' in- and out-degrees, or links, while CD(n*) is the largest of these values among all actors.4 The denominator contains the normalizing term, which is equal to the maximum possible value of the numerator (occurring when the graph has a star configuration). This type of normalization, while limiting the measure to the range of 0–1, does not usefully control for size. As can be seen, the maximum (non-normalized) degree centralization increases at a rate proportional to the square of g. However, there is no reason to believe that raw degree centralization in real social networks will increase at the same rate, or even that the median normalized degree centralization over the population of all possible graphs will fall at the 0.5 point.

In addition to size, density is not controlled for by maximum-value normalization either. It is often the case that the maximum GLI scores are not even attainable for all densities. For instance, the degree centralization for a sparse network with fewer links than that needed for a star configuration can never equal (g−1)(g−2); the same, of course, holds true for extremely dense networks. Recognizing that many GLIs attain maximum values on a very small set of special case graphs which are unlike the larger population of graphs in a variety of respects, one is given to wonder whether other statistics on these measures are similarly skewed. As we shall see presently, this is in fact the case for a number of commonly employed GLIs.

The revelation that density is interwoven with other GLIs is not a new one. Friedkin (1981)showed that, in the set of GLIs he examined, attempts to control for graph size encounter problems of non-linearity and heteroscedasticity. In the same study, Friedkin also found that density has a strong effect, though his conclusions primarily concerned the merits of the measure of density itself. This paper uses a similar approach to examine the distributions of several other structural measures, and also examines the usefulness of a simple hypothesis test in controlling for both size and density.

A great deal of research along a different vein has gone into controlling not only for density and size, but also the number of mutual, asymmetric, and null ties (Holland and Leinhardt, 1970), and the in- and out-degrees of individual nodes (Snijders, 1991). In general, no analytical methods are known for deriving either means or variances for GLIs under these conditions, much less their distributions. Monte Carlo sampling methods are used by Snijders to control for in- and out-degrees, and a similar approach is taken here for the simpler case of controlling only density and size. This study diverges in purpose from that of Snijders by focussing on the use of actual GLIs and directly illustrating their distributions and the usage of their distributions in hypothesis testing.

Section snippets

Graph-level indices

The six GLIs examined are either in common use or illustrate distributions of theoretical interest. Though not included in the list which follows, it must be emphasized that both size and density are also GLIs. Size is defined here as the number of nodes in the graph, and density is given as the average number of links per node.

Methodological applications

In Section 2, we considered a number of GLIs, showed a simple framework for examining GLI distributions, and examined GLI behavior across graphs of varying sizes and densities. In this section, we present some simple applications of these findings to network methodology. (Quantiles for use with the null hypothesis testing procedure described below have been included in Appendix A.)

Discussion

As we have seen, both size and density have powerful — and complex — interactions with other GLIs. These interactions stem from fundamental constraints on the space of graphs, constraints that severely limit the combinations of GLI values which can be realized on a given graph. Across the space of graphs, such constraints further alter GLI distributions, causing some values to be vastly more common than others and to generally affect the ranges of realizations which are possible in particular

Conclusion

Using a simplifying assumption regarding graph distribution, this study shows that size and density strongly interact with all graph-level measures (GLIs) examined. Graphs with different sizes and/or densities will often have dramatically different probability distributions for the same GLI, and thus their GLI values will have different interpretations. Because of this, it may be difficult for the researcher to know whether a specific GLI value is the result of a direct structural social

Acknowledgements

This work was supported in part by the Center for the Computational Analysis of Social and Organizational Systems, the Institute for Complex Engineered Systems, and by grant no. N00014-97-1-0037 from the Office of Naval Research (ONR), United States Navy.

References (45)

  • H.R Bernard et al.

    Estimating the size of an average personal network and of an event subpopulation: some empirical results

    Social Science Research

    (1991)
  • Borgatti, S.P., Everett, M.G., Freeman, L.C., 1991. UCINET, Version IV. Analytic Technology, Columbia,...
  • R.S Burt

    Social contagion and innovation, cohesion versus structural equivalence

    American Journal of Sociology

    (1987)
  • Butts, C., 1999. The complexity of social networks: theoretical and empirical findings. CASOS Working Paper. Carnegie...
  • Butts, C., Carley, K.M., 1998. Canonical labeling to facilitate graph comparison. CASOS Working Paper. Carnegie Mellon...
  • K.M Carley

    Organizational learning and personnel turnover

    Organization Science

    (1992)
  • A.M Cohen

    Changing small-group communication networks

    Administrative Science Quarterly

    (1962)
  • K.S Cook et al.

    The distribution of power in exchange networks

    American Journal of Sociology

    (1983)
  • Durkheim, E., 1893. The Division of Labor in Society. Free Press, New...
  • Durkheim, E., 1897. Suicide. Free Press, New...
  • Emerson, R.M., 1972. Exchange theory: Part II. Exchange relations and network structures. In: Berger, J., Zelditch, M.,...
  • P Erdos et al.

    On the evolution of random graphs

    Publications of the Mathematical Institute of the Hungarian Academy of Sciences

    (1960)
  • Cited by (0)

    1

    This material was based upon work supported under a National Science Foundation Graduate Fellowship.

    View full text