Surprise (networks)

Surprise (denoted S) is a measure of community structure in complex networks. The name Surprise derives from the fact that its maximization finds the most surprising partition into communities of the network, that is, the most unlikely one. S accurately evaluates, in a global manner, the quality of a partition using a cumulative hypergeometric distribution.

Definition

Formula

Given a partition into communities, Surprise compares the number of links within and between communities in that partition with the expected number of them in a random network with the same distribution of nodes per communities. In this manner, S evaluates, at the same time, both the number of nodes and links.

The formula of Surprise for a given partition is :

$$S = -\log \sum_{j=p}^{\min(M,n)}\frac{\binom{M}{j}\binom{F - M}{n - j}}{\binom{F}{n}}$$ where F is the maximum possible number of links in the network for the number of nodes k

$F = \frac{k(k-1)}{2}$

M is the maximum possible number of intra-community links. Let C be the number of communities, then

$M = \sum_{i=1}^{C} \frac{k_i(k_i-1)}{2}$

n is the actual number of links in the network and p is the actual number of intra-community links of that partition.

Properties

  • S values range between zero and plus infinity
  • S becomes zero in the following two cases:

::*All nodes are joined into a single community

::*Each node has its own community

A new paradigm in community detection

The analysis of community structure in networks is of great importance in many scientific fields and, therefore, how to optimally detect these communities has been object of research in recent years. In 2004, Newman and Girvan proposed the modularity Q as a benefit function to evaluate the quality of the partition into communities of a network. Q has been widely used and cited in hundreds of articles. However, a few years later, Fortunato and Barthélemy demonstrated that Q suffers a resolution limit. That is, the measure is unable to detect communities smaller than a certain threshold, which is determined by the size of the network. Although several attempts to avoid this limitation have been proposed, none of them solves it in a general manner.

In 2011, Aldecoa and Marín presented Surprise and showed how its maximization finds qualitatively better partitions than those obtained by maximizing Q. Besides its empirical superior performance, S has obvious theoretical advantages over Q and Q-based indexes. While these measures evaluate the quality of the partition based on the density of links, regardless of the number of nodes; S takes into account both the number of nodes and links. Moreover, Q computes the quality of the partition by summing the individual qualities of the different communities. On the contrary, S evaluates the quality of the whole partition at the same time.

See also

  • Complex network
  • Community structure