igraph Reference Manual

For using the igraph C library

Chapter 25. Detecting community structure

1. Common functions related to community structure
2. Community structure based on statistical mechanics
3. Community structure based on eigenvectors of matrices
4. Walktrap: Community structure based on random walks
5. Edge betweenness based community detection
6. Community structure based on the optimization of modularity
7. Fluid communities
8. Label propagation
9. The InfoMAP algorithm
10. Voronoi communities

Community detection is concerned with clustering the vertices of networks into tightly connected subgraphs called "communities". The following references provide a good introduction to the topic of community detection:

S. Fortunato: "Community Detection in Graphs". Physics Reports 486, no. 3–5 (2010): 75–174. https://doi.org/10.1016/j.physrep.2009.11.002.

S. Fortunato and D. Hric: "Community Detection in Networks: A User Guide". Physics Reports 659 (2016): 1–44. https://doi.org/10.1016/j.physrep.2016.09.002.

1. Common functions related to community structure

1.1. igraph_modularity — Calculates the modularity of a graph with respect to some clusters or vertex types.
1.2. igraph_modularity_matrix — Calculates the modularity matrix.
1.3. igraph_community_optimal_modularity — Calculate the community structure with the highest modularity value.
1.4. igraph_community_to_membership — Cut a dendrogram after a given number of merges.
1.5. igraph_reindex_membership — Makes the IDs in a membership vector contiguous.
1.6. igraph_compare_communities — Compares community structures using various metrics.
1.7. igraph_split_join_distance — Calculates the split-join distance of two community structures.

1.1. `igraph_modularity` — Calculates the modularity of a graph with respect to some clusters or vertex types.

igraph_error_t igraph_modularity(const igraph_t *graph,
                      const igraph_vector_int_t *membership,
                      const igraph_vector_t *weights,
                      const igraph_real_t resolution,
                      const igraph_bool_t directed,
                      igraph_real_t *modularity);

The modularity of a graph with respect to some clustering of the vertices (or assignment of vertex types) measures how strongly separated the different clusters are from each other compared to a random null model. It is defined as

Q = 1/(2m) sum_ij (A_ij - γ k_i k_j / (2m)) δ(c_i,c_j),

where m is the number of edges, A_ij is the adjacency matrix, k_i is the degree of vertex i, c_i is the cluster that vertex i belongs to (or its vertex type), δ(i,j)=1 if i=j and 0 otherwise, and the sum goes over all i, j pairs of vertices. Note that in this formula, the diagonal of the adjacency matrix contains twice the number of self-loops.

The resolution parameter γ allows weighting the random null model, which might be useful when finding partitions with a high modularity. Maximizing modularity with higher values of the resolution parameter typically results in more, smaller clusters when finding partitions with a high modularity. Lower values typically results in fewer, larger clusters. The original definition of modularity is retrieved when setting γ = 1.

Modularity can also be calculated on directed graphs. This only requires a relatively modest change,

Q = 1/m sum_ij (A_ij - γ k^out_i k^in_j / m) δ(c_i,c_j),

where k^out_i is the out-degree of node i and k^in_j is the in-degree of node j.

Modularity on weighted graphs is also meaningful. When taking edge weights into account, A_ij equals the weight of the corresponding edge (or 0 if there is no edge), k_i is the strength (i.e. the weighted degree) of vertex i, with similar counterparts for a directed graph, and m is the total weight of all edges.

Note that the modularity is not well-defined for graphs with no edges. igraph returns NaN for graphs with no edges; see https://github.com/igraph/igraph/issues/1539 for a detailed discussion.

For the original definition of modularity, see Newman, M. E. J., and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E 69, 026113. https://doi.org/10.1103/PhysRevE.69.026113

For the directed definition of modularity, see Leicht, E. A., and Newman, M. E. J. (2008). Community Structure in Directed Networks. Physical Review Letters 100, 118703. https://doi.org/10.1103/PhysRevLett.100.118703

For the introduction of the resolution parameter γ, see Reichardt, J., and Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E 74, 016110. https://doi.org/10.1103/PhysRevE.74.016110

Arguments:

`graph`:	The input graph.
`membership`:	Numeric vector of integer values which gives the type of each vertex, i.e. the cluster to which it belongs. It does not have to be consecutive, i.e. empty communities are allowed. For better performance, ensure that community indices are nonnegative and smaller than the vertex count. This can be ensured using `igraph_reindex_membership()`.
`weights`:	Weight vector or `NULL` if no weights are specified.
`resolution`:	The resolution parameter `γ`. Must not be negative. Set it to 1 to use the classical definition of modularity.
`directed`:	Whether to use the directed or undirected version of modularity. Ignored for undirected graphs.
`modularity`:	Pointer to a real number, the result will be stored here.

Returns:

Error code.

See also:

igraph_modularity_matrix()

Time complexity: O(|V|+|E|), the number of vertices plus the number of edges, assuming that community indices are nonnegative and smaller than the vertex count. Otherwise, O(|V| log |V| + |E|).

1.2. `igraph_modularity_matrix` — Calculates the modularity matrix.

igraph_error_t igraph_modularity_matrix(const igraph_t *graph,
                             const igraph_vector_t *weights,
                             const igraph_real_t resolution,
                             igraph_matrix_t *modmat,
                             igraph_bool_t directed);

This function returns the modularity matrix, which is defined as

B_ij = A_ij - γ k_i k_j / (2m)

for undirected graphs, where A_ij is the adjacency matrix, γ is the resolution parameter, k_i is the degree of vertex i, and m is the number of edges in the graph. When there are no edges, or the weights add up to zero, the result is undefined.

For directed graphs the modularity matrix is changed to

B_ij = A_ij - γ k^out_i k^in_j / m

where k^out_i is the out-degree of node i and k^in_j is the in-degree of node j.

Note that self-loops in undirected graphs are multiplied by 2 in this implementation. If weights are specified, the weighted counterparts of the adjacency matrix and degrees are used.

Arguments:

`graph`:	The input graph.
`weights`:	Edge weights, pointer to a vector. If this is a null pointer then every edge is assumed to have a weight of 1.
`resolution`:	The resolution parameter `γ`. Must not be negative. Default is 1. Lower values favor fewer, larger communities; higher values favor more, smaller communities.
`modmat`:	Pointer to an initialized matrix in which the modularity matrix is stored.
`directed`:	For directed graphs: if the edges should be treated as undirected. For undirected graphs this is ignored.

Returns:

Error code.

See also:

igraph_modularity()

1.3. `igraph_community_optimal_modularity` — Calculate the community structure with the highest modularity value.

igraph_error_t igraph_community_optimal_modularity(const igraph_t *graph,
                                                   const igraph_vector_t *weights,
                                                   const igraph_real_t resolution,
                                                   igraph_real_t *modularity,
                                                   igraph_vector_int_t *membership);

This function calculates the optimal community structure for a graph, in terms of maximal modularity score. Both undirected and directed graphs are supported.

The calculation is done by transforming the modularity maximization into an integer programming problem, and then calling the GLPK library to solve that. Please see Ulrik Brandes et al.: On Modularity Clustering, IEEE Transactions on Knowledge and Data Engineering 20(2):172-188, 2008 https://doi.org/10.1109/TKDE.2007.190689.

Note that exact modularity optimization is an NP-complete problem, and all known algorithms for it have exponential time complexity. This means that you probably don't want to run this function on larger graphs. Graphs with up to fifty vertices should be fine, graphs with a couple of hundred vertices might be possible.

Arguments:

`graph`:	The input graph. It may be undirected or directed.
`weights`:	Vector giving the weights of the edges. If it is `NULL` then each edge is supposed to have the same weight.
`resolution`:	Resolution parameter. Must be greater than or equal to 0. Lower values favor fewer, larger communities; higher values favor more, smaller communities. Set it to 1 to use the classical definition of modularity.
`modularity`:	Pointer to a real number, or a null pointer. If it is not a null pointer, then a optimal modularity value is returned here.
`membership`:	Pointer to a vector, or a null pointer. If not a null pointer, then the membership vector of the optimal community structure is stored here.

Returns:

Error code. When GLPK is not available, IGRAPH_UNIMPLEMENTED is returned.

See also:

igraph_modularity(), igraph_community_fastgreedy() for an algorithm that finds a local optimum in a greedy way.

Time complexity: exponential in the number of vertices.

Example 25.1. File examples/simple/igraph_community_optimal_modularity.c

#include <igraph.h>

int main(void) {
    igraph_t graph;
    igraph_vector_int_t membership;
    igraph_real_t modularity;
    igraph_error_handler_t *handler;
    igraph_error_t errcode;

    /* Initialize the library. */
    igraph_setup();

    igraph_small(&graph, 9, IGRAPH_UNDIRECTED,
                 0, 3, 0, 4, 0, 1, 0, 2, 0, 5, 0, 6, 1, 3, 3, 5, 4, 7, 7, 8, 2, 4, 1, 2, 6, 8,
                 -1);

    igraph_vector_int_init(&membership, 0);

    /* Temporarily ignore errors so we can check if the functionality is available. */
    handler = igraph_set_error_handler(&igraph_error_handler_printignore);

    errcode = igraph_community_optimal_modularity(&graph, NULL, 1, &modularity, &membership);
    if (errcode == IGRAPH_UNIMPLEMENTED) {
        /* igraph was not build with GLPK; functionality unavailable, so we quit */
        return 77;
    }

    /* Restore the previous error handler. */
    igraph_set_error_handler(handler);

    printf("Highest possible modularity: %g\n", modularity);
    printf("Membership: ");
    igraph_vector_int_print(&membership);

    igraph_destroy(&graph);

    igraph_vector_int_destroy(&membership);

    return 0;
}

1.4. `igraph_community_to_membership` — Cut a dendrogram after a given number of merges.

igraph_error_t igraph_community_to_membership(const igraph_matrix_int_t *merges,
                                   igraph_int_t nodes,
                                   igraph_int_t steps,
                                   igraph_vector_int_t *membership,
                                   igraph_vector_int_t *csize);

This function creates a membership vector from a dendrogram whose leaves are individual vertices by cutting it at the specified level. It produces a membership vector that contains for each vertex its cluster ID, numbered from zero. This is the same membership vector format that is produced by igraph_connected_components(), as well as all community detection functions in igraph.

It takes as input the number of vertices n, and a merges matrix encoding the dendrogram, in the format produced by hierarchical clustering functions such as igraph_community_edge_betweenness(), igraph_community_walktrap() or igraph_community_fastgreedy(). The matrix must have two columns and up to n - 1 rows. Each row represents merging two dendrogram nodes into their parent node. The leaf nodes of the dendrogram are indexed from 0 to n - 1 and are identical to the vertices of the graph that is being partitioned into communities. Row i contains the children of dendrogram node with index n + i.

This function performs steps merge operations as prescribed by the merges matrix and returns the resulting partitioning into n - steps communities.

If merges is not a complete dendrogram, it is possible to take steps steps if steps is not bigger than the number lines in merges.

Arguments:

`merges`:	The two-column matrix containing the merge operations.
`nodes`:	The number of leaf nodes in the dendrogram.
`steps`:	Integer constant, the number of steps to take.
`membership`:	Pointer to an initialized vector, the membership results will be stored here, if not `NULL`. The vector will be resized as needed.
`csize`:	Pointer to an initialized vector, or `NULL`. If not `NULL` then the sizes of the components will be stored here, the vector will be resized as needed.

Returns:

Error code.

See also:

igraph_community_walktrap(), igraph_community_edge_betweenness(), igraph_community_fastgreedy() for community structure detection algorithms producing merge matrices in this format; igraph_le_community_to_membership() to perform merges starting from a given cluster assignment.

Time complexity: O(|V|), the number of vertices in the graph.

1.5. `igraph_reindex_membership` — Makes the IDs in a membership vector contiguous.

igraph_error_t igraph_reindex_membership(
        igraph_vector_int_t *membership,
        igraph_vector_int_t *new_to_old,
        igraph_int_t *nb_clusters);

This function reindexes component IDs in a membership vector in a way that the new IDs start from zero and go up to C-1, where C is the number of unique component IDs in the original vector.

Arguments:

`membership`:	Numeric vector which gives the type of each vertex, i.e. the component to which it belongs. The vector will be altered in-place.
`new_to_old`:	Pointer to a vector which will contain the old component ID for each new one, or `NULL`, in which case it is not returned. The vector will be resized as needed.
`nb_clusters`:	Pointer to an integer for the number of distinct clusters. If not `NULL`, this will be updated to reflect the number of distinct clusters found in membership.

Returns:

Error code.

Time complexity: Let n be the length of the membership vector. O(n) if cluster indices are within 0..n-1, and O(n log(n)) otherwise.

1.6. `igraph_compare_communities` — Compares community structures using various metrics.

igraph_error_t igraph_compare_communities(const igraph_vector_int_t *comm1,
                               const igraph_vector_int_t *comm2, igraph_real_t* result,
                               igraph_community_comparison_t method);

This function assesses the distance between two community structures using the variation of information (VI) metric of Meila (2003), the normalized mutual information (NMI) of Danon et al (2005), the split-join distance of van Dongen (2000), the Rand index of Rand (1971) or the adjusted Rand index of Hubert and Arabie (1985).

Some of these measures are defined based on the entropy of a discrete random variable associated with a given clustering C of vertices. Let p_i be the probability that a randomly picked vertex would be part of cluster i. Then the entropy of the clustering is

H(C) = - sum_i p_i log p_i

Similarly, we can define the joint entropy of two clusterings C_1 and C_2 based on the probability p_ij that a random vertex is part of cluster i in the first clustering and cluster j in the second one:

H(C_1, C_2) = - sum_ii p_ij log p_ij

The mutual information of C_1 and C_2 is then MI(C_1, C_2) = H(C_1) + H(C_2) - H(C_1, C_2) >= 0 . A large mutual information indicates a high overlap between the two clusterings. The normalized mutual information, as computed by igraph, is

NMI(C_1, C_2) = 2 MI(C_1, C_2) / (H(C_1) + H(C_2)).

It takes its value from the interval (0, 1], with 1 achieved when the two clusterings coincide.

The variation of information is defined as VI(C_1, C_2) = [H(C_1) - MI(C_1, C_2)] + [H(C_2) - MI(C_1, C_2)]. Lower values of the variation of information indicate a smaller difference between the two clusterings, with VI = 0 achieved precisely when they coincide. igraph uses natural units for the variation of information, i.e. it uses the natural logarithm when computing entropies.

The Rand index is defined as the probability that the two clusterings agree about the cluster memberships of a randomly chosen vertex pair. All vertex pairs are considered, and the two clusterings are considered to be in agreement about the memberships of a vertex pair if either the two vertices are in the same cluster in both clusterings, or they are in different clusters in both clusterings. The Rand index is then the number of vertex pairs in agreement, divided by the total number of vertex pairs. A Rand index of zero means that the two clusterings disagree about the membership of all vertex pairs, while 1 means that the two clusterings are identical.

The adjusted Rand index is similar to the Rand index, but it takes into account that agreement between the two clusterings may also occur by chance even if the two clusterings are chosen completely randomly. The adjusted Rand index therefore subtracts the expected fraction of agreements from the value of the Rand index, and divides the result by one minus the expected fraction of agreements. The maximum value of the adjusted Rand index is still 1 (similarly to the Rand index), indicating maximum agreement, but the value may be less than zero if there is less agreement between the two clusterings than what would be expected by chance.

For an explanation of the split-join distance, see igraph_split_join_distance().

References:

Meilă M: Comparing clusterings by the variation of information. In: Schölkopf B, Warmuth MK (eds.). Learning Theory and Kernel Machines: 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA. Lecture Notes in Computer Science, vol. 2777, Springer, 2003. ISBN: 978-3-540-40720-1. https://doi.org/10.1007/978-3-540-45167-9_14

Danon L, Diaz-Guilera A, Duch J, Arenas A: Comparing community structure identification. J Stat Mech P09008, 2005. https://doi.org/10.1088/1742-5468/2005/09/P09008

van Dongen S: Performance criteria for graph clustering and Markov cluster experiments. Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000. https://ir.cwi.nl/pub/4461

Rand WM: Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846-850, 1971. https://doi.org/10.2307/2284239

Hubert L and Arabie P: Comparing partitions. Journal of Classification 2:193-218, 1985. https://doi.org/10.1007/BF01908075

Arguments:

`comm1`:	the membership vector of the first community structure
`comm2`:	the membership vector of the second community structure
`result`:	the result is stored here.
`method`:	the comparison method to use. `IGRAPH_COMMCMP_VI` selects the variation of information (VI) metric of Meila (2003), `IGRAPH_COMMCMP_NMI` selects the normalized mutual information measure proposed by Danon et al (2005), `IGRAPH_COMMCMP_SPLIT_JOIN` selects the split-join distance of van Dongen (2000), `IGRAPH_COMMCMP_RAND` selects the unadjusted Rand index (1971) and `IGRAPH_COMMCMP_ADJUSTED_RAND` selects the adjusted Rand index.

Returns:

Error code.

See also:

igraph_split_join_distance().

Time complexity: O(n log(n)).

1.7. `igraph_split_join_distance` — Calculates the split-join distance of two community structures.

igraph_error_t igraph_split_join_distance(const igraph_vector_int_t *comm1,
                               const igraph_vector_int_t *comm2, igraph_int_t *distance12,
                               igraph_int_t *distance21);

The split-join distance between partitions A and B is the sum of the projection distance of A from B and the projection distance of B from A. The projection distance is an asymmetric measure and it is defined as follows:

First, each set in partition A is evaluated against all sets in partition B. For each set in partition A, the best matching set in partition B is found and the overlap size is calculated. (Matching is quantified by the size of the overlap between the two sets). Then, the maximal overlap sizes for each set in A are summed together and subtracted from the number of elements in A.

The split-join distance will be returned in two arguments, distance12 will contain the projection distance of the first partition from the second, while distance21 will be the projection distance of the second partition from the first. This makes it easier to detect whether a partition is a subpartition of the other, since in this case, the corresponding distance will be zero.

Reference:

Arguments:

`comm1`:	the membership vector of the first community structure
`comm2`:	the membership vector of the second community structure
`distance12`:	pointer to an `igraph_int_t`, the projection distance of the first community structure from the second one will be returned here.
`distance21`:	pointer to an `igraph_int_t`, the projection distance of the second community structure from the first one will be returned here.

Returns:

Error code.

See also:

igraph_compare_communities() with the IGRAPH_COMMCMP_SPLIT_JOIN method if you are not interested in the individual distances but only the sum of them.

Time complexity: O(n log(n)).

2. Community structure based on statistical mechanics

2.1. igraph_community_spinglass — Community detection based on statistical mechanics.
2.2. igraph_community_spinglass_single — Community of a single node based on statistical mechanics.

2.1. `igraph_community_spinglass` — Community detection based on statistical mechanics.

igraph_error_t igraph_community_spinglass(const igraph_t *graph,
                               const igraph_vector_t *weights,
                               igraph_real_t *modularity,
                               igraph_real_t *temperature,
                               igraph_vector_int_t *membership,
                               igraph_vector_int_t *csize,
                               igraph_int_t spins,
                               igraph_bool_t parupdate,
                               igraph_real_t starttemp,
                               igraph_real_t stoptemp,
                               igraph_real_t coolfact,
                               igraph_spincomm_update_t update_rule,
                               igraph_real_t gamma,
                               igraph_spinglass_implementation_t implementation,
                               igraph_real_t gamma_minus);

This function implements the community structure detection algorithm proposed by Joerg Reichardt and Stefan Bornholdt. The algorithm is described in their paper: Statistical Mechanics of Community Detection, http://arxiv.org/abs/cond-mat/0603718 .

From version 0.6, igraph also supports an extension to the algorithm that allows negative edge weights. This is described in V. A. Traag and Jeroen Bruggeman: Community detection in networks with positive and negative links, http://arxiv.org/abs/0811.2329 .

Arguments:

`graph`:	The input graph, it may be directed but the direction of the edges is ignored by the algorithm.
`weights`:	The vector giving the edge weights, it may be `NULL`, in which case all edges are weighted equally. The edge weights must be positive unless using the `IGRAPH_SPINCOMM_IMP_NEG` implementation.
`modularity`:	Pointer to a real number, if not `NULL` then the modularity score of the solution will be stored here. This is the gereralized modularity, taking into account the resolution parameter `gamma`. See `igraph_modularity()` for details.
`temperature`:	Pointer to a real number, if not `NULL` then the temperature at the end of the algorithm will be stored here.
`membership`:	Pointer to an initialized vector or `NULL`. If not `NULL` then the result of the clustering will be stored here. For each vertex, the number of its cluster is given, with the first cluster numbered zero. The vector will be resized as needed.
`csize`:	Pointer to an initialized vector or `NULL`. If not `NULL` then the sizes of the clusters will stored here in cluster number order. The vector will be resized as needed.
`spins`:	Integer giving the number of spins, i.e. the maximum number of clusters. Even if the number of spins is high the number of clusters in the result might be small.
`parupdate`:	A Boolean constant, whether to update all spins in parallel. It is not implemented in the `IGRAPH_SPINCOMM_INP_NEG` implementation.
`starttemp`:	Real number, the temperature at the start. A reasonable default is 1.0.
`stoptemp`:	Real number, the algorithm stops at this temperature. A reasonable default is 0.01.
`coolfact`:	Real number, the cooling factor for the simulated annealing. A reasonable default is 0.99.
`update_rule`:	The type of the update rule. Possible values: `IGRAPH_SPINCOMM_UPDATE_SIMPLE` and `IGRAPH_SPINCOMM_UPDATE_CONFIG`. Basically this parameter defines the null model based on which the actual clustering is done. If this is `IGRAPH_SPINCOMM_UPDATE_SIMPLE` then the random graph (i.e. G(n,p)), if it is `IGRAPH_SPINCOMM_UPDATE` then the configuration model is used. The configuration means that the baseline for the clustering is a random graph with the same degree distribution as the input graph.
`gamma`:	Real number. The gamma parameter of the algorithm, acting as a resolution parameter. Smaller values typically lead to larger clusters, larger values typically lead to smaller clusters.
`implementation`:	Constant, chooses between the two implementations of the spin-glass algorithm that are included in igraph. `IGRAPH_SPINCOMM_IMP_ORIG` selects the original implementation, this is faster, `IGRAPH_SPINCOMM_INP_NEG` selects an implementation that allows negative edge weights.
`gamma_minus`:	Real number. Parameter for the `IGRAPH_SPINCOMM_IMP_NEG` implementation. This acts as a resolution parameter for the negative part of the network. Smaller values of `gamma_minus` leads to fewer negative edges within clusters. If this argument is set to zero, the algorithm reduces to a graph coloring algorithm when all edges have negative weights, using the number of spins as the number of colors.

Returns:

Error code.

See also:

igraph_community_spinglass_single() for calculating the community of a single vertex.

Time complexity: TODO.

2.2. `igraph_community_spinglass_single` — Community of a single node based on statistical mechanics.

igraph_error_t igraph_community_spinglass_single(const igraph_t *graph,
                                      const igraph_vector_t *weights,
                                      igraph_int_t vertex,
                                      igraph_vector_int_t *community,
                                      igraph_real_t *cohesion,
                                      igraph_real_t *adhesion,
                                      igraph_real_t *inner_links,
                                      igraph_real_t *outer_links,
                                      igraph_int_t spins,
                                      igraph_spincomm_update_t update_rule,
                                      igraph_real_t gamma);

This function implements the community structure detection algorithm proposed by Joerg Reichardt and Stefan Bornholdt. It is described in their paper: Statistical Mechanics of Community Detection, http://arxiv.org/abs/cond-mat/0603718 .

This function calculates the community of a single vertex without calculating all the communities in the graph.

Arguments:

`graph`:	The input graph, it may be directed but the direction of the edges is not used in the algorithm.
`weights`:	Pointer to a vector with the weights of the edges. Alternatively `NULL` can be supplied to have the same weight for every edge.
`vertex`:	The vertex ID of the vertex of which this community is calculated.
`community`:	Pointer to an initialized vector, the result, the IDs of the vertices in the community of the input vertex will be stored here. The vector will be resized as needed.
`cohesion`:	Pointer to a real variable, if not `NULL` the cohesion index of the community will be stored here.
`adhesion`:	Pointer to a real variable, if not `NULL` the adhesion index of the community will be stored here.
`inner_links`:	Pointer to a real, if not `NULL` the number of edges within the community (or the sum of their weights) is stored here.
`outer_links`:	Pointer to a real, if not `NULL` the number of edges between the community and the rest of the graph (or the sum of their weights) will be stored here.
`spins`:	The number of spins to use, this can be higher than the actual number of clusters in the network, in which case some clusters will contain zero vertices.
`update_rule`:	The type of the update rule. Possible values: `IGRAPH_SPINCOMM_UPDATE_SIMPLE` and `IGRAPH_SPINCOMM_UPDATE_CONFIG`. Basically this parameter defined the null model based on which the actual clustering is done. If this is `IGRAPH_SPINCOMM_UPDATE_SIMPLE` then the random graph (ie. G(n,p)), if it is `IGRAPH_SPINCOMM_UPDATE` then the configuration model is used. The configuration means that the baseline for the clustering is a random graph with the same degree distribution as the input graph.
`gamma`:	Real number. The gamma parameter of the algorithm. This defined the weight of the missing and existing links in the quality function for the clustering. The default value in the original code was 1.0, which is equal weight to missing and existing edges. Smaller values make the existing links contibute more to the energy function which is minimized in the algorithm. Bigger values make the missing links more important. (If my understanding is correct.)

Returns:

Error code.

See also:

igraph_community_spinglass() for the traditional version of the algorithm.

Time complexity: TODO.

3. Community structure based on eigenvectors of matrices

3.1. igraph_community_leading_eigenvector — Leading eigenvector community finding (proper version).
3.2. igraph_community_leading_eigenvector_callback_t — Callback for the leading eigenvector community finding method.
3.3. igraph_le_community_to_membership — Cut an incomplete dendrogram after a given number of merges, starting with an initial cluster assignment.

The function documented in these section implements the “leading eigenvector” method developed by Mark Newman and published in MEJ Newman: Finding community structure using the eigenvectors of matrices, Phys Rev E 74:036104 (2006).

The heart of the method is the definition of the modularity matrix B = A - P, A being the adjacency matrix of the (undirected) network, and P contains the probability that certain edges are present according to the “configuration model”. In other words, a P_ij element of P is the probability that there is an edge between vertices i and j in a random network in which the degrees of all vertices are the same as in the input graph. See igraph_modularity_matrix() for more details.

The leading eigenvector method works by calculating the eigenvector of the modularity matrix for the largest positive eigenvalue and then separating vertices into two community based on the sign of the corresponding element in the eigenvector. If all elements in the eigenvector are of the same sign that means that the network has no underlying community structure. Check Newman's paper to understand why this is a good method for detecting community structure.

The leading eigenvector community structure detection method is implemented in igraph_community_leading_eigenvector(). After the initial split, the following splits are done in a way to optimize modularity regarding to the original network. Note that any further refinement, for example using Kernighan-Lin, as proposed in Section V.A of Newman (2006), is not implemented here.

Example 25.2. File examples/simple/igraph_community_leading_eigenvector.c

#include <igraph.h>
#include <stdio.h>

int main(void) {

    igraph_t g;
    igraph_matrix_int_t merges;
    igraph_vector_int_t membership;

    /* Initialize the library. */
    igraph_setup();

    /* Zachary Karate club */
    igraph_small(&g, 0, IGRAPH_UNDIRECTED,
                 0,  1,  0,  2,  0,  3,  0,  4,  0,  5,
                 0,  6,  0,  7,  0,  8,  0, 10,  0, 11,
                 0, 12,  0, 13,  0, 17,  0, 19,  0, 21,
                 0, 31,  1,  2,  1,  3,  1,  7,  1, 13,
                 1, 17,  1, 19,  1, 21,  1, 30,  2,  3,
                 2,  7,  2,  8,  2,  9,  2, 13,  2, 27,
                 2, 28,  2, 32,  3,  7,  3, 12,  3, 13,
                 4,  6,  4, 10,  5,  6,  5, 10,  5, 16,
                 6, 16,  8, 30,  8, 32,  8, 33,  9, 33,
                 13, 33, 14, 32, 14, 33, 15, 32, 15, 33,
                 18, 32, 18, 33, 19, 33, 20, 32, 20, 33,
                 22, 32, 22, 33, 23, 25, 23, 27, 23, 29,
                 23, 32, 23, 33, 24, 25, 24, 27, 24, 31,
                 25, 31, 26, 29, 26, 33, 27, 33, 28, 31,
                 28, 33, 29, 32, 29, 33, 30, 32, 30, 33,
                 31, 32, 31, 33, 32, 33,
                 -1);

    igraph_matrix_int_init(&merges, 0, 0);
    igraph_vector_int_init(&membership, 0);

    igraph_community_leading_eigenvector(&g, /*weights=*/ NULL,
                                         &merges, &membership,
                                         /*steps=*/ 1,
                                         /*options=*/ NULL,
                                         /*modularity=*/ NULL,
                                         /*start=*/ NULL,
                                         /*eigenvalues=*/ NULL,
                                         /*eigenvectors=*/ NULL,
                                         /*history=*/ NULL,
                                         /*callback=*/ NULL,
                                         /*callback_extra=*/ NULL);

    igraph_matrix_int_print(&merges);
    igraph_vector_int_print(&membership);

    printf("\n");

    /* Make all the steps */
    igraph_community_leading_eigenvector(&g, /*weights=*/ NULL,
                                         &merges, &membership,
                                         /*steps=*/ igraph_vcount(&g),
                                         /*options=*/ NULL,
                                         /*modularity=*/ NULL,
                                         /*start=*/ NULL,
                                         /*eigenvalues=*/ NULL,
                                         /*eigenvectors=*/ NULL,
                                         /*history=*/ NULL,
                                         /*callback=*/ NULL,
                                         /*callback_extra=*/ NULL);

    igraph_matrix_int_print(&merges);
    igraph_vector_int_print(&membership);

    igraph_vector_int_destroy(&membership);
    igraph_matrix_int_destroy(&merges);
    igraph_destroy(&g);

    return 0;
}

3.1. `igraph_community_leading_eigenvector` — Leading eigenvector community finding (proper version).

igraph_error_t igraph_community_leading_eigenvector(
        const igraph_t *graph,
        const igraph_vector_t *weights,
        igraph_matrix_int_t *merges,
        igraph_vector_int_t *membership,
        igraph_int_t steps,
        igraph_arpack_options_t *options,
        igraph_real_t *modularity,
        igraph_bool_t start,
        igraph_vector_t *eigenvalues,
        igraph_vector_list_t *eigenvectors,
        igraph_vector_int_t *history,
        igraph_community_leading_eigenvector_callback_t *callback,
        void *callback_extra);

Newman's leading eigenvector method for detecting community structure. This is the proper implementation of the recursive, divisive algorithm: each split is done by maximizing the modularity regarding the original network, see MEJ Newman: Finding community structure in networks using the eigenvectors of matrices, Phys Rev E 74:036104 (2006). https://doi.org/10.1103/PhysRevE.74.036104

Arguments:

graph:

The input graph. Edge directions will be ignored.

weights:

The weights of the edges, or NULL for unweighted graphs.

merges:

The result of the algorithm, a matrix containing the information about the splits performed. The matrix is built in the opposite way however, it is like the result of an agglomerative algorithm. Unlike with most other hierarchical community detection functions in igraph, the integers in this matrix represent community indices, not vertex indices. If at the end of the algorithm (after steps steps was done) there are “p” communities, then these are numbered from zero to p-1. The first line of the matrix contains the first “merge” (which is in reality the last split) of two communities into community p, the merge in the second line forms community p+1, etc. The matrix should be initialized before calling and will be resized as needed. This argument is ignored if it is NULL.

membership:

The membership of the vertices after all the splits were performed will be stored here. The vector must be initialized before calling and will be resized as needed. This argument is ignored if it is NULL. This argument can also be used to supply a starting configuration for the community finding, in the format of a membership vector. In this case the start argument must be set to true.

steps:

The maximum number of steps to perform. It might happen that some component (or the whole network) has no underlying community structure and no further steps can be done. If you want as many steps as possible then supply the number of vertices in the network here.

options:

The options for ARPACK. Supply NULL here to use the defaults. n is always overwritten. ncv is set to at least 4.

modularity:

If not a null pointer, then it must be a pointer to a real number and the modularity score of the final division is stored here.

start:

Boolean, whether to use the community structure given in the membership argument as a starting point.

eigenvalues:

Pointer to an initialized vector or a null pointer. If not a null pointer, then the eigenvalues calculated along the community structure detection are stored here. The non-positive eigenvalues, that do not result a split, are stored as well.

eigenvectors:

If not a null pointer, then the eigenvectors that are calculated in each step of the algorithm are stored here, in a list of vectors. Each eigenvector is stored in an igraph_vector_t object.

history:

Pointer to an initialized vector or a null pointer. If not a null pointer, then a trace of the algorithm is stored here, encoded numerically. The various operations:

`IGRAPH_LEVC_HIST_START_FULL`	Start the algorithm from an initial state where each connected component is a separate community.
`IGRAPH_LEVC_HIST_START_GIVEN`	Start the algorithm from a given community structure. The next value in the vector contains the initial number of communities.
`IGRAPH_LEVC_HIST_SPLIT`	Split a community into two communities. The id of the splitted community is given in the next element of the history vector. The id of the first new community is the same as the id of the splitted community. The id of the second community equals to the number of communities before the split.
`IGRAPH_LEVC_HIST_FAILED`	Tried to split a community, but it was not worth it, as it does not result in a bigger modularity value. The id of the community is given in the next element of the vector.

callback:

A null pointer or a function of type igraph_community_leading_eigenvector_callback_t. If given, this callback function is called after each eigenvector/eigenvalue calculation. If the callback returns IGRAPH_STOP, then the community finding algorithm stops. If it returns IGRAPH_SUCCESS, the algorithm continues normally. Any other return value is considered an igraph error code and will terminete the algorithm with the same error code. See the arguments passed to the callback at the documentation of igraph_community_leading_eigenvector_callback_t.

callback_extra:

Extra argument to pass to the callback function.

Returns:

Error code.

See also:

igraph_community_walktrap() and igraph_community_spinglass() for other community structure detection methods.

Time complexity: O(|E|+|V|^2*steps), |V| is the number of vertices, |E| the number of edges, “steps” the number of splits performed.

3.2. `igraph_community_leading_eigenvector_callback_t` — Callback for the leading eigenvector community finding method.

typedef igraph_error_t igraph_community_leading_eigenvector_callback_t(
    const igraph_vector_int_t *membership,
    igraph_int_t comm,
    igraph_real_t eigenvalue,
    const igraph_vector_t *eigenvector,
    igraph_arpack_function_t *arpack_multiplier,
    void *arpack_extra,
    void *extra);

The leading eigenvector community finding implementation in igraph is able to call a callback function, after each eigenvalue calculation. This callback function must be of igraph_community_leading_eigenvector_callback_t type. The following arguments are passed to the callback:

Arguments:

`membership`:	The actual membership vector, before recording the potential change implied by the newly found eigenvalue.
`comm`:	The id of the community that the algorithm tried to split in the last iteration. The community IDs are indexed from zero here!
`eigenvalue`:	The eigenvalue the algorithm has just found.
`eigenvector`:	The eigenvector corresponding to the eigenvalue the algorithm just found.
`arpack_multiplier`:	A function that was passed to `igraph_arpack_rssolve()` to solve the last eigenproblem.
`arpack_extra`:	The extra argument that was passed to the ARPACK solver.
`extra`:	Extra argument that as passed to `igraph_community_leading_eigenvector()`.

3.3. `igraph_le_community_to_membership` — Cut an incomplete dendrogram after a given number of merges, starting with an initial cluster assignment.

igraph_error_t igraph_le_community_to_membership(const igraph_matrix_int_t *merges,
                                                 igraph_int_t steps,
                                                 igraph_vector_int_t *membership,
                                                 igraph_vector_int_t *csize);

This function takes a dendrogram whose leaves are cluster IDs given in an initial cluster assignment provided in membership. Then it updates the cluster assignment by performing the specified number of mergers, as given by the dendrogram encoded in merges. It is a more general version of igraph_community_to_membership(), which assumes that the dendrogram leaves are singleton clusters corresponding to individual vertices.

This dendrogram format is suitable for divise hierarchical community detection algorithms that stop before dividing the graph into individual vertices, such as igraph_community_leading_eigenvector().

Initially, membership is expected to contain m contiguous cluster indices, numbered from zero. These correspond to the leaf nodes of the dendrogram. Row i of the two-column merges matrix contains the IDs of clusters that are merged together into dendrogram node m + i. It may have up to m - 1 rows.

This function performs steps merge operations as prescribed by the merges matrix and updates membership to the resulting partitioning into m - steps communities.

Arguments:

`merges`:	The two-column matrix containing the merge operations. See `igraph_community_leading_eigenvector()` for the detailed syntax. This is usually from the output of the leading eigenvector community structure detection routines.
`steps`:	The number of steps to make according to `merges`.
`membership`:	Initially the starting membership vector, on output the resulting membership vector, after performing `steps` merges.
`csize`:	Optionally the sizes of the communities are stored here, if this is not a null pointer, but an initialized vector.

Returns:

Error code.

See also:

igraph_community_to_membership() for a simpler interface that starts by merging individual vertices.

Time complexity: O(|V|), the number of vertices.

4. Walktrap: Community structure based on random walks

4.1. igraph_community_walktrap — Community finding using a random walk based similarity measure.

4.1. `igraph_community_walktrap` — Community finding using a random walk based similarity measure.

igraph_error_t igraph_community_walktrap(const igraph_t *graph,
                              const igraph_vector_t *weights,
                              igraph_int_t steps,
                              igraph_matrix_int_t *merges,
                              igraph_vector_t *modularity,
                              igraph_vector_int_t *membership);

This function is the implementation of the Walktrap community finding algorithm, see Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks, https://arxiv.org/abs/physics/0512106

Currently the original C++ implementation is used in igraph, see https://www-complexnetworks.lip6.fr/~latapy/PP/walktrap.html We are grateful to Matthieu Latapy and Pascal Pons for providing this source code.

In contrast to the original implementation, isolated vertices are allowed in the graph and they are assumed to have a single incident loop edge with weight 1.

Arguments:

`graph`:	The input graph, edge directions are ignored.
`weights`:	Numeric vector giving the weights of the edges. If it is a NULL pointer then all edges will have equal weights. The weights are expected to be positive.
`steps`:	Integer constant, the length of the random walks. Typically, good results are obtained with values between 3-8 with 4-5 being a reasonable default.
`merges`:	Pointer to a matrix, the merges performed by the algorithm will be stored here (if not `NULL`). Each merge is a row in a two-column matrix and contains the IDs of the merged clusters. Clusters are numbered from zero and cluster numbers smaller than the number of nodes in the network belong to the individual vertices as singleton clusters. In each step a new cluster is created from two other clusters and its id will be one larger than the largest cluster id so far. This means that before the first merge we have `n` clusters (the number of vertices in the graph) numbered from zero to `n - 1`. The first merge creates cluster `n`, the second cluster `n + 1`, etc.
`modularity`:	Pointer to a vector. If not `NULL` then the modularity score of the current clustering is stored here after each merge operation.
`membership`:	Pointer to a vector. If not a `NULL` pointer, then the membership vector corresponding to the maximal modularity score is stored here.

Returns:

Error code.

Time complexity: O(|E||V|^2) in the worst case, O(|V|^2 log|V|) typically, |V| is the number of vertices, |E| is the number of edges.

Example 25.3. File examples/simple/walktrap.c

#include <igraph.h>

int main(void) {
    igraph_t g;
    igraph_matrix_int_t merges;
    igraph_vector_t modularity;
    igraph_int_t no_of_nodes;
    igraph_int_t i;

    /* Initialize the library. */
    igraph_setup();

    igraph_small(&g, 5, IGRAPH_UNDIRECTED,
                 0, 1, 0, 2, 0, 3, 0, 4, 1, 2, 1, 3, 1, 4, 2, 3, 2, 4, 3, 4,
                 5, 6, 5, 7, 5, 8, 5, 9, 6, 7, 6, 8, 6, 9, 7, 8, 7, 9, 8, 9, 0, 5, 4, 9, -1);
    igraph_vector_init(&modularity, 0);
    igraph_matrix_int_init(&merges, 0, 0);

    igraph_community_walktrap(&g,
                              NULL /* no weights */,
                              4 /* steps */,
                              &merges, &modularity,
                              /* membership=*/ NULL);

    no_of_nodes = igraph_vcount(&g);
    printf("Merges:\n");
    for (i = 0; i < igraph_matrix_int_nrow(&merges); i++) {
        printf("%2.1" IGRAPH_PRId " + %2." IGRAPH_PRId " -> %2." IGRAPH_PRId " (modularity %4.2f)\n",
               MATRIX(merges, i, 0), MATRIX(merges, i, 1),
               no_of_nodes + i, VECTOR(modularity)[i]);
    }

    igraph_destroy(&g);

    igraph_matrix_int_destroy(&merges);
    igraph_vector_destroy(&modularity);

    return 0;
}

5. Edge betweenness based community detection

5.1. igraph_community_edge_betweenness — Community finding based on edge betweenness.
5.2. igraph_community_eb_get_merges — Calculating the merges, i.e. the dendrogram for an edge betweenness community structure.

5.1. `igraph_community_edge_betweenness` — Community finding based on edge betweenness.

igraph_error_t igraph_community_edge_betweenness(const igraph_t *graph,
                                      igraph_vector_int_t *removed_edges,
                                      igraph_vector_t *edge_betweenness,
                                      igraph_matrix_int_t *merges,
                                      igraph_vector_int_t *bridges,
                                      igraph_vector_t *modularity,
                                      igraph_vector_int_t *membership,
                                      igraph_bool_t directed,
                                      const igraph_vector_t *weights,
                                      const igraph_vector_t *lengths);

Community structure detection based on the betweenness of the edges in the network. This method is also known as the Girvan-Newman algorithm.

The idea behind this method is that the betweenness of the edges connecting two communities is typically high, as many of the shortest paths between vertices in separate communities pass through them. The algorithm successively removes edges with the highest betweenness, recalculating betweenness values after each removal. This way eventually the network splits into two components, then one of these components splits again, and so on, until all edges are removed. The resulting hierarhical partitioning of the vertices can be encoded as a dendrogram.

In directed graphs, when directed is set to true, the directed version of betweenness and modularity are used, however, only splits into weakly connected components are detected.

When edge weights are given, the ratio of betweenness and weight values is used to choose which edges to remove first, as described in M. E. J. Newman: Analysis of Weighted Networks (2004), Section C. Thus, edges with large weights are treated as strong connections, and will be removed later than weak connections having similar betweenness. Weights are also used for calculating modularity.

If lengths are given, they will be considered for shortest path length calculations while computing betweenness values.

Note: In igraph 0.10, this function interpreted weights in a different, erroneous way, and issued a warning when weights were used. Please see https://github.com/igraph/igraph/issues/2229 for additional details.

References:

M. Girvan and M. E. J. Newman, Community Structure in Social and Biological Networks, PNAS 99, 7821 (2002). https://doi.org/10.1073/pnas.122653799

M. E. J. Newman, Analysis of Weighted Networks, Phys. Rev. E 70, 9 (2004). https://doi.org/10.1103/PhysRevE.70.056131

Arguments:

`graph`:	The input graph.
`removed_edges`:	Pointer to an initialized integer vector, which will be resized as needed. The IDs of the removed edges in the order of their removal will be stored here. This vector is suitable as input to `igraph_community_eb_get_merges()`. This parameter may be `NULL` if the edge IDs are not needed by the caller.
`edge_betweenness`:	Pointer to an initialized vector or `NULL`. In the former case the edge betweenness of the removed edges is stored here. The vector will be resized as needed. Note that the betweenness values stored here are not divided by weights.
`merges`:	Pointer to an initialized matrix or `NULL`. If not `NULL` then merges performed by the algorithm are stored here. Even if this is a divisive algorithm, we can replay it backwards and note which two clusters were merged. Clusters are numbered from zero. See `igraph_community_to_membership()` for details. The matrix will be resized as needed.
`bridges`:	Pointer to an initialized vector of `NULL`. If not `NULL` then the indices into `result` of all edges which caused one of the `merges` will be put here. This is equivalent to all edge removals which separated the network into more components, in reverse order.
`modularity`:	If not a null pointer, then the modularity values of the different divisions are stored here, in the order corresponding to the merge matrix. The modularity values will take weights into account if `weights` is not null.
`membership`:	If not a null pointer, then the membership vector, corresponding to the highest modularity value, is stored here.
`directed`:	Boolean constant. Controls whether to calculate directed betweenness (i.e. directed paths) for directed graphs, and whether to use the directed version of modularity. It is ignored for undirected graphs.
`weights`:	An optional vector containing edge weights. If not `NULL`, the weights will be used to divide the edge betweenness scores, as well as for the calculation of modularity.
`lengths`:	An optional vector containing edge lengths. If not `NULL`, path lengths used in the betweenness calculation will take these lengths into account.

Returns:

Error code.

Time complexity: O(|V||E|^2), as the betweenness calculation requires O(|V||E|) and we do it |E|-1 times.

Example 25.4. File examples/simple/igraph_community_edge_betweenness.c

#include <igraph.h>

int main(void) {
    igraph_t graph;
    igraph_vector_int_t edges;
    igraph_vector_t eb, weights;
    igraph_matrix_int_t merges;
    igraph_vector_int_t bridges;

    /* Initialize the library. */
    igraph_setup();

    igraph_small(&graph, 5, IGRAPH_UNDIRECTED,
                 0, 1, 1, 2, 0, 2, 0, 3, 1, 3, 1, 4,
                 -1);
    igraph_vector_init_range(&weights, 1, igraph_ecount(&graph)+1);
    igraph_vector_init(&eb, 0);
    igraph_vector_int_init(&edges, 0);
    igraph_matrix_int_init(&merges, 0, 0);
    igraph_vector_int_init(&bridges, 0);
    igraph_community_edge_betweenness(&graph, &edges, &eb, &merges,
                                      &bridges, /*modularity*/ NULL,
                                      /*membership*/ NULL,
                                      IGRAPH_UNDIRECTED,
                                      &weights,
                                      NULL);
    printf("edges:\n");
    igraph_vector_int_print(&edges);
    printf("edge betweenness:\n");
    igraph_vector_print(&eb);
    printf("merges:\n");
    igraph_matrix_int_print(&merges);
    printf("bridges:\n");
    igraph_vector_int_print(&bridges);

    igraph_destroy(&graph);
    igraph_vector_int_destroy(&edges);
    igraph_vector_destroy(&eb);
    igraph_vector_destroy(&weights);
    igraph_matrix_int_destroy(&merges);
    igraph_vector_int_destroy(&bridges);

    return 0;
}

5.2. `igraph_community_eb_get_merges` — Calculating the merges, i.e. the dendrogram for an edge betweenness community structure.

igraph_error_t igraph_community_eb_get_merges(const igraph_t *graph,
                                   const igraph_bool_t directed,
                                   const igraph_vector_int_t *edges,
                                   const igraph_vector_t *weights,
                                   igraph_matrix_int_t *res,
                                   igraph_vector_int_t *bridges,
                                   igraph_vector_t *modularity,
                                   igraph_vector_int_t *membership);

This function is handy if you have a sequence of edges which are gradually removed from the network and you would like to know how the network falls apart into separate components. The edge sequence may come from the igraph_community_edge_betweenness() function, but this is not necessary. Note that igraph_community_edge_betweenness() can also calculate the dendrogram, via its merges argument. Merges happen when the edge removal process is run backwards and two components become connected.

Arguments:

`graph`:	The input graph.
`directed`:	Whether to use the directed or undirected version of modularity. Will be ignored for undirected graphs.
`edges`:	Vector containing the edges to be removed from the network, all edges are expected to appear exactly once in the vector.
`weights`:	An optional vector containing edge weights. If null, the unweighted modularity scores will be calculated. If not null, the weighted modularity scores will be calculated. Ignored if both `modularity` and `membership` are `NULL` pointers.
`res`:	Pointer to an initialized matrix, if not `NULL` then the dendrogram will be stored here, in the same form as for the `igraph_community_walktrap()` function: the matrix has two columns and each line is a merge given by the IDs of the merged components. The component IDs are numbered from zero and component IDs smaller than the number of vertices in the graph belong to individual vertices. The non-trivial components containing at least two vertices are numbered from `n`, where `n` is the number of vertices in the graph. So if the first line contains `a` and `b` that means that components `a` and `b` are merged into component `n`, the second line creates component `n + 1`, etc. The matrix will be resized as needed.
`bridges`:	Pointer to an initialized vector of `NULL`. If not `NULL` then the indices into `edges` of all edges which caused one of the merges will be put here. This is equal to all edge removals which separated the network into more components, in reverse order.
`modularity`:	If not a null pointer, then the modularity values for the different divisions, corresponding to the merges matrix, will be stored here.
`membership`:	If not a null pointer, then the membership vector for the best division (in terms of modularity) will be stored here.

Returns:

Error code.

See also:

igraph_community_edge_betweenness().

Time complexity: O(|E|+|V|log|V|), |V| is the number of vertices, |E| is the number of edges.

6. Community structure based on the optimization of modularity

6.1. igraph_community_fastgreedy — Finding community structure by greedy optimization of modularity.
6.2. igraph_community_multilevel — Finding community structure by multi-level optimization of modularity (Louvain).
6.3. igraph_community_leiden — Finding community structure using the Leiden algorithm.
6.4. igraph_community_leiden_simple — Finding community structure using the Leiden algorithm, simple interface.

6.1. `igraph_community_fastgreedy` — Finding community structure by greedy optimization of modularity.

igraph_error_t igraph_community_fastgreedy(const igraph_t *graph,
                                const igraph_vector_t *weights,
                                igraph_matrix_int_t *merges,
                                igraph_vector_t *modularity,
                                igraph_vector_int_t *membership);

This function implements the fast greedy modularity optimization algorithm for finding community structure, see A Clauset, MEJ Newman, C Moore: Finding community structure in very large networks, http://www.arxiv.org/abs/cond-mat/0408187 for the details.

Some improvements proposed in K Wakita, T Tsurumi: Finding community structure in mega-scale social networks, http://www.arxiv.org/abs/cs.CY/0702048v1 have also been implemented.

Arguments:

`graph`:	The input graph. It must be a graph without multiple edges. This is checked and an error message is given for graphs with multiple edges.
`weights`:	Potentially a numeric vector containing edge weights. Supply a null pointer here for unweighted graphs. The weights are expected to be non-negative.
`merges`:	Pointer to an initialized matrix or `NULL`, the result of the computation is stored here as a merges matrix representing a dendrogram. The matrix has two columns and each merge corresponds to one merge, the IDs of the two merged components are stored. The component IDs are numbered from zero and the first `n` components are the individual vertices, `n` is the number of vertices in the graph. Component `n` is created in the first merge, component `n+1` in the second merge, etc. The matrix will be resized as needed. If this argument is `NULL` then it is ignored completely.
`modularity`:	Pointer to an initialized vector or `NULL` pointer, in the former case the modularity scores along the stages of the computation are recorded here. The vector will be resized as needed.
`membership`:	Pointer to a vector. If not a null pointer, then the membership vector corresponding to the best split (in terms of modularity) is stored here.

Returns:

Error code.

See also:

igraph_community_to_membership() to cut the dendrogram at an arbitrary number of steps.

Time complexity: O(|E||V|log|V|) in the worst case, O(|E|+|V|log^2|V|) typically, |V| is the number of vertices, |E| is the number of edges.

Example 25.5. File examples/simple/igraph_community_fastgreedy.c

#include <igraph.h>

int main(void) {
    igraph_t graph;
    igraph_vector_t weights;
    igraph_matrix_int_t merges;

    /* Initialize the library. */
    igraph_setup();

    igraph_matrix_int_init(&merges, 0, 0);
    igraph_vector_init_int(&weights, 8, 10, 10, 1, 1, 1, 1, 1, 1);

    igraph_small(&graph, 6, IGRAPH_UNDIRECTED,
                 0,1, 1,2, 2,3, 2,4, 2,5, 3,4, 3,5, 4,5, -1);
    igraph_community_fastgreedy(&graph, &weights, &merges,
                                /*modularity*/ NULL,
                                /*membership=*/ NULL);
    igraph_matrix_int_print(&merges);
    igraph_destroy(&graph);
    igraph_vector_destroy(&weights);
    igraph_matrix_int_destroy(&merges);

    return 0;
}

6.2. `igraph_community_multilevel` — Finding community structure by multi-level optimization of modularity (Louvain).

igraph_error_t igraph_community_multilevel(const igraph_t *graph,
                                           const igraph_vector_t *weights,
                                           const igraph_real_t resolution,
                                           igraph_vector_int_t *membership,
                                           igraph_matrix_int_t *memberships,
                                           igraph_vector_t *modularity);

This function implements a multi-level modularity optimization algorithm for finding community structure, sometimes known as the Louvain algorithm.

The algorithm is based on the modularity measure and a hierarchical approach. Initially, each vertex is assigned to a community on its own. In every step, vertices are re-assigned to communities in a local, greedy way: in a random order, each vertex is moved to the community with which it achieves the highest contribution to modularity. When no vertices can be reassigned, each community is considered a vertex on its own, and the process starts again with the merged communities. The process stops when there is only a single vertex left or when the modularity cannot be increased any more in a step.

The resolution parameter γ allows finding communities at different resolutions. Higher values of the resolution parameter typically result in more, smaller communities. Lower values typically result in fewer, larger communities. The original definition of modularity is retrieved when setting γ=1. Note that the returned modularity value is calculated using the indicated resolution parameter. See igraph_modularity() for more details.

The original version of this function was contributed by Tom Gregorovic.

Reference:

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10008(10), 6 (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008

Arguments:

`graph`:	The input graph. It must be an undirected graph.
`weights`:	Numeric vector containing edge weights. If `NULL`, every edge has equal weight. The weights are expected to be non-negative.
`resolution`:	Resolution parameter. Must be greater than or equal to 0. Lower values favor fewer, larger communities; higher values favor more, smaller communities. Set it to 1 to use the classical definition of modularity.
`membership`:	The membership vector, the result is returned here. For each vertex it gives the ID of its community. The vector must be initialized and it will be resized accordingly.
`memberships`:	Numeric matrix that will contain the membership vector after each level, if not `NULL`. It must be initialized and it will be resized accordingly.
`modularity`:	Numeric vector that will contain the modularity score after each level, if not `NULL`. It must be initialized and it will be resized accordingly.

Returns:

Error code.

Time complexity: in average near linear on sparse graphs.

Example 25.6. File examples/simple/igraph_community_multilevel.c

#include <igraph.h>

void show_results(igraph_t *g, igraph_vector_int_t *membership, igraph_matrix_int_t *memberships, igraph_vector_t *modularity, FILE* f) {
    igraph_int_t i, j, no_of_nodes = igraph_vcount(g);

    j = igraph_vector_which_max(modularity);
    for (i = 0; i < igraph_vector_int_size(membership); i++) {
        if (VECTOR(*membership)[i] != MATRIX(*memberships, j, i)) {
            fprintf(f, "WARNING: best membership vector element %" IGRAPH_PRId " does not match the best one in the membership matrix\n", i);
        }
    }

    fprintf(f, "Modularities:\n");
    igraph_vector_print(modularity);

    for (i = 0; i < igraph_matrix_int_nrow(memberships); i++) {
        for (j = 0; j < no_of_nodes; j++) {
            fprintf(f, "%" IGRAPH_PRId " ", MATRIX(*memberships, i, j));
        }
        fprintf(f, "\n");
    }

    fprintf(f, "\n");
}

int main(void) {
    igraph_setup();
    igraph_t g;
    igraph_vector_t modularity;
    igraph_vector_int_t edges;
    igraph_vector_int_t membership;
    igraph_matrix_int_t memberships;
    igraph_int_t i, j, k;

    /* Initialize the library. */
    igraph_setup();

    igraph_vector_init(&modularity, 0);
    igraph_vector_int_init(&membership, 0);
    igraph_matrix_int_init(&memberships, 0, 0);

    igraph_rng_seed(igraph_rng_default(), 42);

    /* Unweighted test graph from the paper of Blondel et al */
    igraph_small(&g, 16, IGRAPH_UNDIRECTED,
                 0, 2, 0, 3, 0, 4, 0, 5,
                 1, 2, 1, 4, 1, 7,
                 2, 4, 2, 5, 2, 6,
                 3, 7,
                 4, 10,
                 5, 7, 5, 11,
                 6, 7, 6, 11,
                 8, 9, 8, 10, 8, 11, 8, 14, 8, 15,
                 9, 12, 9, 14,
                 10, 11, 10, 12, 10, 13, 10, 14,
                 11, 13,
                 -1);
    igraph_community_multilevel(&g, 0, 1, &membership, &memberships, &modularity);
    show_results(&g, &membership, &memberships, &modularity, stdout);

    /* Higher resolution */
    igraph_community_multilevel(&g, 0, 1.5, &membership, &memberships, &modularity);
    show_results(&g, &membership, &memberships, &modularity, stdout);

    igraph_destroy(&g);

    /* Ring of 30 cliques */
    igraph_vector_int_init(&edges, 0);
    for (i = 0; i < 30; i++) {
        for (j = 0; j < 5; j++) {
            for (k = j + 1; k < 5; k++) {
                igraph_vector_int_push_back(&edges, i * 5 + j);
                igraph_vector_int_push_back(&edges, i * 5 + k);
            }
        }
    }
    for (i = 0; i < 30; i++) {
        igraph_vector_int_push_back(&edges, i * 5 % 150);
        igraph_vector_int_push_back(&edges, (i * 5 + 6) % 150);
    }
    igraph_create(&g, &edges, 150, 0);
    igraph_community_multilevel(&g, 0, 1, &membership, &memberships, &modularity);
    show_results(&g, &membership, &memberships, &modularity, stdout);
    igraph_destroy(&g);

    igraph_vector_destroy(&modularity);
    igraph_vector_int_destroy(&membership);
    igraph_vector_int_destroy(&edges);
    igraph_matrix_int_destroy(&memberships);

    return 0;
}

6.3. `igraph_community_leiden` — Finding community structure using the Leiden algorithm.

igraph_error_t igraph_community_leiden(
        const igraph_t *graph,
        const igraph_vector_t *edge_weights,
        const igraph_vector_t *vertex_out_weights,
        const igraph_vector_t *vertex_in_weights,
        igraph_real_t resolution,
        igraph_real_t beta,
        igraph_bool_t start,
        igraph_int_t n_iterations,
        igraph_vector_int_t *membership,
        igraph_int_t *nb_clusters,
        igraph_real_t *quality);

This function implements the Leiden algorithm for finding community structure.

It is similar to the multilevel algorithm, often called the Louvain algorithm, but it is faster and yields higher quality solutions. It can optimize both modularity and the Constant Potts Model, which does not suffer from the resolution-limit (see Traag, Van Dooren & Nesterov).

The Leiden algorithm consists of three phases: (1) local moving of vertices, (2) refinement of the partition and (3) aggregation of the network based on the refined partition, using the non-refined partition to create an initial partition for the aggregate network. In the local move procedure in the Leiden algorithm, only vertices whose neighborhood has changed are visited. Only moves that strictly improve the quality function are made. The refinement is done by restarting from a singleton partition within each cluster and gradually merging the subclusters. When aggregating, a single cluster may then be represented by several vertices (which are the subclusters identified in the refinement).

The Leiden algorithm provides several guarantees. The Leiden algorithm is typically iterated: the output of one iteration is used as the input for the next iteration. At each iteration all clusters are guaranteed to be (weakly) connected and well-separated. After an iteration in which nothing has changed, all vertices and some parts are guaranteed to be locally optimally assigned. Note that even if a single iteration did not result in any change, it is still possible that a subsequent iteration might find some improvement. Each iteration explores different subsets of vertices to consider for moving from one cluster to another. Finally, asymptotically, all subsets of all clusters are guaranteed to be locally optimally assigned. For more details, please see Traag, Waltman & van Eck (2019).

The objective function being optimized is

1 / 2m sum_ij (A_ij - γ n_i n_j) δ(s_i, s_j)

in the undirected case and

1 / m sum_ij (A_ij - γ n^out_i n^in_j) δ(s_i, s_j)

in the directed case. Here m is the total edge weight, A_ij is the weight of edge (i, j), γ is the so-called resolution parameter, n_i is the vertex weight of vertex i (separate out- and in-weights are used with directed graphs), s_i is the cluster of vertex i and δ(x, y) = 1 if and only if x = y and 0 otherwise.

By setting n_i = k_i, the degree of vertex i, and dividing γ by 2m (by m in the directed case), we effectively obtain an expression for modularity. Hence, the standard modularity will be optimized when you supply the degrees (out- and in-degrees with directed graphs) as the vertex weights and by supplying as a resolution parameter 1/(2m) (1/m with directed graphs). Use the igraph_community_leiden_simple() convenience function to compute vertex weights automatically for modularity maximization.

References:

V. A. Traag, L. Waltman, N. J. van Eck: From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(1), 5233 (2019). http://dx.doi.org/10.1038/s41598-019-41695-z

V. A. Traag, P. Van Dooren, and Y. Nesterov: Narrow scope for resolution-limit-free community detection. Phys. Rev. E 84, 016114 (2011). https://doi.org/10.1103/PhysRevE.84.016114

Arguments:

`graph`:	The input graph.
`edge_weights`:	Numeric vector containing edge weights. If `NULL`, every edge has equal weight of 1. The weights need not be non-negative.
`vertex_out_weights`:	Numeric vector containing vertex weights, or vertex out-weights for directed graphs. If `NULL`, every vertex has equal weight of 1.
`vertex_in_weights`:	Numeric vector containing vertex in-weights for directed graphs. If set to `NULL`, in-weights are assumed to be the same as out-weights, which effectively ignores edge directions. Must be `NULL` for undirected graphs.
`n_iterations`:	Iterate the core Leiden algorithm the indicated number of times. If this is a negative number, it will continue iterating until an iteration did not change the clustering. Two iterations are often sufficient, thus 2 is a reasonable default.
`beta`:	The randomness used in the refinement step when merging. A small amount of randomness (`beta` = 0.01) typically works well.
`start`:	Start from membership vector. If this is true, the optimization will start from the provided membership vector. If this is false, the optimization will start from a singleton partition.
`n_iterations`:	Iterate the core Leiden algorithm for the indicated number of times. If this is a negative number, it will continue iterating until an iteration did not change the clustering.
`membership`:	The membership vector. This is both used as the initial membership from which optimisation starts and is updated in place. It must hence be properly initialized. When finding clusters from scratch it is typically started using a singleton clustering. This can be achieved using `igraph_vector_int_init_range()`.
`nb_clusters`:	The number of clusters contained in the final `membership`. If `NULL`, the number of clusters will not be returned.
`quality`:	The quality of the partition, in terms of the objective function as included in the documentation. If `NULL` the quality will not be calculated.

Returns:

Error code.

Time complexity: near linear on sparse graphs.

See also:

igraph_community_leiden_simple() for a simplified interface that allows specifying an objective function directly and does not require vertex weights.

Example 25.7. File examples/simple/igraph_community_leiden.c

#include <igraph.h>
#include <stdio.h>

int main(void) {
    igraph_t graph;
    igraph_vector_int_t membership;
    igraph_vector_t vertex_weights;
    igraph_vector_t vertex_out_weights, vertex_in_weights;
    igraph_int_t nb_clusters;
    igraph_real_t quality;

    /* Initialize the library. */
    igraph_setup();

    /* Set default seed to get reproducible results */
    igraph_rng_seed(igraph_rng_default(), 0);

    /* UNDIRECTED EXAMPLE */

    /* Simple unweighted undirected graph */
    igraph_small(&graph, 10, IGRAPH_UNDIRECTED,
                 0, 1, 0, 2, 0, 3, 0, 4, 1, 2, 1, 3, 1, 4, 2, 3, 2, 4, 3, 4,
                 5, 6, 5, 7, 5, 8, 5, 9, 6, 7, 6, 8, 6, 9, 7, 8, 7, 9, 8, 9,
                 0, 5, -1);

    /* Perform Leiden algorithm using CPM for 1 iteration */
    igraph_vector_int_init(&membership, igraph_vcount(&graph));
    igraph_community_leiden(&graph, NULL, NULL, NULL,
                            /* resolution */ 0.05,
                            /* beta */ 0.01,
                            /* start */ false,
                            /* iterations */ 1,
                            &membership, &nb_clusters, &quality);

    printf("Leiden found %" IGRAPH_PRId " clusters using CPM (resolution parameter 0.05), quality is %.4f.\n", nb_clusters, quality);
    printf("Membership: ");
    igraph_vector_int_print(&membership);
    printf("\n");

    /* Start from existing membership for 10 iterations to improve it further */
    igraph_community_leiden(&graph, NULL, NULL, NULL,
                            /* resolution */ 0.05,
                            /* beta */ 0.01,
                            /* start */ true,
                            /* iterations */ 10,
                            &membership, &nb_clusters, &quality);

    printf("Iterated Leiden, using CPM (resolution parameter 0.05), quality is %.4f.\n", quality);
    printf("Membership: ");
    igraph_vector_int_print(&membership);
    printf("\n");

    /* Use degrees as vertex weights for optimizing modularity */
    igraph_vector_init(&vertex_weights, igraph_vcount(&graph));
    igraph_strength(&graph, &vertex_weights, igraph_vss_all(), IGRAPH_ALL, IGRAPH_LOOPS, NULL);

    /* Perform Leiden algorithm using modularity until stable iteration */
    igraph_community_leiden(&graph, NULL, &vertex_weights, NULL,
                            /* resolution */ 1.0 / (2 * igraph_ecount(&graph)),
                            /* beta */ 0.01,
                            /* start */ false,
                            /* iterations */ -1,
                            &membership, &nb_clusters, &quality);

    printf("Leiden found %" IGRAPH_PRId " clusters using modularity, quality is %.4f.\n", nb_clusters, quality);
    printf("Membership: ");
    igraph_vector_int_print(&membership);
    printf("\n");

    igraph_vector_destroy(&vertex_weights);
    igraph_vector_int_destroy(&membership);
    igraph_destroy(&graph);

    /* DIRECTED EXAMPLE */

    /* Simple unweighted directed graph */
    igraph_small(&graph, 6, IGRAPH_DIRECTED,
                 0, 1, 0, 3, 0, 5, 1, 3, 1, 4, 2, 3, 4, 1, 5, 2, 5, 4,
                 -1);

    igraph_vector_int_init(&membership, igraph_vcount(&graph));
    igraph_vector_init(&vertex_out_weights, igraph_vcount(&graph));
    igraph_vector_init(&vertex_in_weights, igraph_vcount(&graph));
    igraph_strength(&graph, &vertex_out_weights, igraph_vss_all(), IGRAPH_OUT, IGRAPH_LOOPS, NULL);
    igraph_strength(&graph, &vertex_in_weights, igraph_vss_all(), IGRAPH_IN, IGRAPH_LOOPS, NULL);

    /* Perform Leiden algorithm using modularity for two iterations,
     * which are usually sufficient to achieve stability.
     * Note that in the directed case, we divide by the edge count m, not 2*m. */
    igraph_community_leiden(&graph, NULL, &vertex_out_weights, &vertex_in_weights,
                            /* resolution */ 1.0 / igraph_ecount(&graph),
                            /* beta */ 0.01,
                            /* start */ false,
                            /* iterations */ 2,
                            &membership, &nb_clusters, &quality);

    printf("Leiden found %" IGRAPH_PRId " clusters using modularity, quality is %.4f.\n", nb_clusters, quality);
    printf("Membership: ");
    igraph_vector_int_print(&membership);
    printf("\n");

    igraph_vector_destroy(&vertex_in_weights);
    igraph_vector_destroy(&vertex_out_weights);
    igraph_vector_int_destroy(&membership);
    igraph_destroy(&graph);

    return 0;
}

6.4. `igraph_community_leiden_simple` — Finding community structure using the Leiden algorithm, simple interface.

igraph_error_t igraph_community_leiden_simple(
        const igraph_t *graph,
        const igraph_vector_t *weights,
        igraph_leiden_objective_t objective,
        igraph_real_t resolution,
        igraph_real_t beta,
        igraph_bool_t start,
        igraph_int_t n_iterations,
        igraph_vector_int_t *membership,
        igraph_int_t *nb_clusters,
        igraph_real_t *quality);

This is a simplified interface to igraph_community_leiden() for convenience purposes. Instead of requiring vertex weights, it allows choosing from a set of objective functions to maximize. It implements these objective functions by passing suitable vertex weights to igraph_community_leiden(), as explained in the documentation of that function.

Arguments:

graph:

The input graph. May be directed or undirected.

weights:

The edge weights. If NULL, all weights are assumed to be 1.

objective:

The objective function to maximize.

`IGRAPH_LEIDEN_OBJECTIVE_MODULARITY`	Use the generalized modularity, defined as `Q = 1/(2m) sum_ij (A_ij - γ k_i k_j / (2m)) δ(c_i, c_j)` for undirected graphs and as `Q = 1/m sum_ij (A_ij - γ k^out_i k^in_j / m) δ(c_i, c_j)` for directed graphs. This effectively uses a multigraph configuration model as the null model. Edge weights must not be negative.
`IGRAPH_LEIDEN_OBJECTIVE_CPM`	Use the constant Potts model, whose objective function is defined as `Q = 1/(2m) sum_ij (A_ij - γ) δ(c_i, c_j)` for undirected graphs and as `Q = 1/m sum_ij (A_ij - γ) δ(c_i, c_j)` for directed graphs. Edge weights are allowed to be negative. Edge directions have no impact on the result.
`IGRAPH_LEIDEN_OBJECTIVE_ER`	Use an objective function based on the multigraph Erdős-Rényi G(n,p) null model, defined as `Q = 1/(2m) sum_ij (A_ij - γ p) δ(c_i, c_j)` for undirected graphs and as `Q = 1/m sum_ij (A_ij - γ p) δ(c_i, c_j)` for directed graphs. `p` is the weighted density, i.e. the average link strength between all vertex pairs (whether adjacent or not). Edge weights must not be negative. Edge directions have no impact on the result.

In the above formulas, A is the adjacency matrix, m is the total edge weight, k are the (out- and in-) degrees, γ is the resolution parameter, and δ(c_i, c_j) is 1 if vertices i and j are in the same community and 0 otherwise. Edge directions are only relevant with IGRAPH_LEIDEN_OBJECTIVE_MODULARITY. The other two objective functions are equivalent between directed and undirected graphs: the formal difference is due to each edge being included twice in undirected (symmetric) adjacency matrices.

resolution:

The resolution parameter, which is represented by γ in the objective functions detailed above.

beta:

The randomness used in the refinement step when merging. A small amount of randomness (beta = 0.01) typically works well.

start:

Start from membership vector. If this is true, the optimization will start from the provided membership vector. If this is false, the optimization will start from a singleton partition.

n_iterations:

Iterate the core Leiden algorithm the indicated number of times. If this is a negative number, it will continue iterating until an iteration did not change the clustering. Two iterations are often sufficient, thus 2 is a reasonable default.

membership:

The membership vector. If start is set to false, it will be resized appropriately. If start is true, it must be a valid membership vector for the given graph.

nb_clusters:

The number of clusters contained in the final membership. If NULL, the number of clusters will not be returned.

quality:

The quality of the partition, in terms of the objective function selected by objective. If NULL the quality will not be calculated.

Returns:

Error code.

Time complexity: near linear on sparse graphs.

See also:

igraph_community_leiden() for a more flexible interface that allows specifying raw vertex weights.

7. Fluid communities

7.1. igraph_community_fluid_communities — Community detection based on fluids interacting on the graph.

7.1. `igraph_community_fluid_communities` — Community detection based on fluids interacting on the graph.

igraph_error_t igraph_community_fluid_communities(
        const igraph_t *graph,
        igraph_int_t no_of_communities,
        igraph_vector_int_t *membership);

The algorithm is based on the simple idea of several fluids interacting in a non-homogeneous environment (the graph topology), expanding and contracting based on their interaction and density. Weighted graphs are not supported.

This function implements the community detection method described in: Parés F, Gasulla DG, et. al. (2018) Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm. In: Complex Networks & Their Applications VI: Proceedings of Complex Networks 2017 (The Sixth International Conference on Complex Networks and Their Applications), Springer, vol 689, p 229. https://doi.org/10.1007/978-3-319-72150-7_19

Arguments:

`graph`:	The input graph. The graph must be simple and connected. Edge directions will be ignored.
`no_of_communities`:	The number of communities to be found. Must be greater than 0 and fewer than number of vertices in the graph.
`membership`:	The result vector mapping vertices to the communities they are assigned to.

Returns:

Error code.

Time complexity: O(|E|)

8. Label propagation

8.1. igraph_community_label_propagation — Community detection based on label propagation.

8.1. `igraph_community_label_propagation` — Community detection based on label propagation.

igraph_error_t igraph_community_label_propagation(const igraph_t *graph,
        igraph_vector_int_t *membership,
        igraph_neimode_t mode,
        const igraph_vector_t *weights,
        const igraph_vector_int_t *initial,
        const igraph_vector_bool_t *fixed,
        igraph_lpa_variant_t lpa_variant);

This function implements the label propagation-based community detection algorithm described by Raghavan, Albert and Kumara (2007). This version extends the original method by the ability to take edge weights into consideration and also by allowing some labels to be fixed. In addition, it implements the fast label propagation alternative introduced by Traag and Šubelj (2023).

The algorithm works by iterating over nodes and updating the label of a node based on the labels of its neighbors. The labels that are most frequent among the neighbors are said to be dominant labels. The label of a node is always updated to a dominant label. The algorithm guarantees that the label for each is dominant when it terminates.

There are several variants implemented, which work slightly differently with the dominance of labels. Nodes with a dominant label might no longer have a dominant label if one of their neighbors change label. In IGRAPH_LPA_DOMINANCE an additional iteration over all nodes is made after updating all labels to double check whether all nodes indeed have a dominant label. When updating the label of a node, labels are always sampled from among all dominant labels. The algorithm stops when all nodes have dominant labels. In IGRAPH_LPA_RETENTION instead labels are only updated when they are not dominant. That is, they retain their current label whenever the current label is already dominant. The algorithm does not make an additional iteration to check for dominance. Instead, it simply keeps track whether a label has been updated, and terminates if no updates have been made. In IGRAPH_LPA_FAST labels are sampled from among all dominant labels, similar to IGRAPH_LPA_DOMINANCE. Instead of iterating over all nodes, it keeps track of a queue of nodes that should be considered. Nodes are popped from the queue when they are considered for update. When the label of a node is updated, the node's neighbors are added to the queue again (if they weren't already in the queue). The algorithm terminates when the queue is empty. All variants guarantee that the labels for all nodes are dominant.

Weights are taken into account as follows: when the new label of node i is determined, the algorithm iterates over all edges incident on node i and calculate the total weight of edges leading to other nodes with label 0, 1, 2, ..., k - 1 (where k is the number of possible labels). The new label of node i will then be the label whose edges (among the ones incident on node i) have the highest total weight.

For directed graphs, it is important to know that labels can circulate freely only within the strongly connected components of the graph and may propagate in only one direction (or not at all) between strongly connected components. You should treat directed edges as directed only if you are aware of the consequences.

References:

Raghavan, U.N. and Albert, R. and Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76, 036106 (2007). https://doi.org/10.1103/PhysRevE.76.036106

Šubelj, L.: Label propagation for clustering. Chapter in "Advances in Network Clustering and Blockmodeling" edited by P. Doreian, V. Batagelj & A. Ferligoj (Wiley, New York, 2018). https://doi.org/10.1002/9781119483298.ch5 https://arxiv.org/abs/1709.05634

Traag, V. A., and Šubelj, L.: Large network community detection by fast label propagation. Scientific Reports, 13:1, (2023). https://doi.org/10.1038/s41598-023-29610-z https://arxiv.org/abs/2209.13338

Arguments:

graph:

The input graph. Note that the algorithm was originally defined for undirected graphs. You are advised to set mode to IGRAPH_ALL if you pass a directed graph here to treat it as undirected.

membership:

The membership vector, the result is returned here. For each vertex it gives the ID of its community (label).

mode:

Whether to consider edge directions for the label propagation, and if so, which direction the labels should propagate. Ignored for undirected graphs. IGRAPH_ALL means to ignore edge directions (even in directed graphs). IGRAPH_OUT means to propagate labels along the natural direction of the edges. IGRAPH_IN means to propagate labels backwards (i.e. from head to tail). It is advised to set this to IGRAPH_ALL unless you are specifically interested in the effect of edge directions.

weights:

The weight vector, it should contain a positive weight for all the edges.

initial:

The initial state. If NULL, every vertex will have a different label at the beginning. Otherwise it must be a vector with an entry for each vertex. Non-negative values denote different labels, negative entries denote vertices without labels. Unlabeled vertices which are not reachable from any labeled ones will remain unlabeled at the end of the label propagation process, and will be labeled in an additional step to avoid returning negative values in membership. In undirected graphs, this happens when entire connected components are unlabeled. Then, each unlabeled component will receive its own separate label. In directed graphs, the outcome of the additional labeling should be considered undefined and may change in the future; please do not rely on it.

fixed:

Boolean vector denoting which labels are fixed. Of course this makes sense only if you provided an initial state, otherwise this element will be ignored. Note that vertices without labels cannot be fixed. The fixed status will be ignored for these with a warning. Also note that label numbers by themselves have no meaning, and igraph may renumber labels. However, co-membership constraints will be respected: two vertices can be fixed to be in the same or in different communities.

lpa_variant:

Which variant of the label propagation algorithm to run.

`IGRAPH_LPA_DOMINANCE`	check for dominance of all nodes after each iteration.
`IGRAPH_LPA_RETENTION`	keep current label if among dominant labels, only check if labels changed.
`IGRAPH_LPA_FAST`	sample from dominant labels, only check neighbors.

Returns:

Error code.

Time complexity: O(m+n)

Example 25.8. File examples/simple/igraph_community_label_propagation.c

#include <igraph.h>
#include <stdio.h>

int main(void) {
    igraph_t graph;
    igraph_vector_int_t membership;
    igraph_real_t modularity;

    /* Initialize the library. */
    igraph_setup();

    igraph_famous(&graph, "Zachary"); /* We use Zachary's karate club network. */

    /* Label propagation is a stochastic method; the result will depend on the random seed. */
    igraph_rng_seed(igraph_rng_default(), 123);

    /* All igraph functions that returns their result in an igraph_vector_t must be given
       an already initialized vector. */
    igraph_vector_int_init(&membership, 0);
    igraph_community_label_propagation(
        &graph, &membership, /* mode = */ IGRAPH_ALL,
        /* weights= */ NULL, /* initial= */ NULL, /* fixed= */ NULL,
        IGRAPH_LPA_DOMINANCE);

    /* Also calculate the modularity of the partition */
    igraph_modularity(
        &graph, &membership, /* weights= */ NULL, /* resolution = */ 1,
        /* directed= */ false, &modularity);

    printf("%" IGRAPH_PRId " communities found; modularity score is %g.\n",
           igraph_vector_int_max(&membership) + 1, modularity);

    printf("Communities membership: ");
    igraph_vector_int_print(&membership);

    /* Destroy data structures at the end. */
    igraph_vector_int_destroy(&membership);
    igraph_destroy(&graph);

    return 0;
}

9. The InfoMAP algorithm

9.1. igraph_community_infomap — Community structure that minimizes the expected description length of a random walker trajectory.

9.1. `igraph_community_infomap` — Community structure that minimizes the expected description length of a random walker trajectory.

igraph_error_t igraph_community_infomap(
        const igraph_t *graph,
        const igraph_vector_t *edge_weights,
        const igraph_vector_t *vertex_weights,
        igraph_int_t nb_trials,
        igraph_bool_t is_regularized,
        igraph_real_t regularization_strength,
        igraph_vector_int_t *membership,
        igraph_real_t *codelength);

Implementation of the Infomap community detection algorithm of Martin Rosvall and Carl T. Bergstrom. This algorithm takes edge directions into account. For more details, see the visualization of the math and the map generator at https://www.mapequation.org.

Infomap is based on a random walker model similar to PageRank: the walker either chooses out-edges to follow with probabilities proportional to edge weights, or teleports to a random vertex with probability 0.15. Vertex weights can be given to control the probability of choosing different vertices as the target of the teleportation. In addition, Infomap can be regularized to account for potential missing links.

As of igraph 1.0, the Infomap library written by Daniel Edler, Anton Holmgren and Martin Rosvall is used. See https://github.com/mapequation/infomap/.

If you want to specify a random seed (as in the original implementation) you can use igraph_rng_seed().

References:

M. Rosvall and C. T. Bergstrom: Maps of information flow reveal community structure in complex networks, PNAS 105, 1118 (2008). https://dx.doi.org/10.1073/pnas.0706851105, https://arxiv.org/abs/0707.0609

M. Rosvall, D. Axelsson, and C. T. Bergstrom: The map equation, Eur. Phys. J. Special Topics 178, 13 (2009). https://dx.doi.org/10.1140/epjst/e2010-01179-1, https://arxiv.org/abs/0906.1405

Smiljanić, Jelena, Daniel Edler, and Martin Rosvall: Mapping Flows on Sparse Networks with Missing Links. Phys Rev E 102 (1–1): 012302 (2020). https://doi.org/10.1103/PhysRevE.102.012302, https://arxiv.org/abs/2106.14798

Arguments:

`graph`:	The input graph. Edge directions are taken into account.
`edge_weights`:	Numeric vector giving the weights of the edges. The random walker will favour edges with high weights over edges with low weights; the probability of picking a particular outbound edge from a node is directly proportional to its weight. If it is `NULL` then all edges will have equal weights. The weights are expected to be non-negative.
`vertex_weights`:	Numeric vector giving the weights of the vertices. Vertices with higher weights are favoured by the random walker when it teleports to a new vertex. The probability of picking a vertex when the random walker teleports is directly proportional to the weight of the vertex. If this argument is `NULL` then all vertices will have equal weights. Weights are expected to be positive.
`nb_trials`:	The number of attempts to partition the network (can be any integer value equal to or larger than 1).
`is_regularized`:	If true, adds a fully connected Bayesian prior network to avoid overfitting due to missing links.
`regularization_strength`:	Adjust relative strength of the Bayesian prior network used for regularization. This multiplies the default strength, a parameter of 1 hence uses the default regularization strength. Ignored when `is_regularized` is set to `false`.
`membership`:	Pointer to a vector. The membership vector is stored here. `NULL` means that the caller is not interested in the membership vector.
`codelength`:	Pointer to a real. If not `NULL` the code length of the partition is stored here.

Returns:

Error code. When Infomap is not available, IGRAPH_UNIMPLEMENTED is returned.

Time complexity: TODO.

10. Voronoi communities

10.1. igraph_community_voronoi — Finds communities using Voronoi partitioning.

10.1. `igraph_community_voronoi` — Finds communities using Voronoi partitioning.

igraph_error_t igraph_community_voronoi(
        const igraph_t *graph,
        igraph_vector_int_t *membership, igraph_vector_int_t *generators,
        igraph_real_t *modularity,
        const igraph_vector_t *lengths, const igraph_vector_t *weights,
        igraph_neimode_t mode, igraph_real_t r);

Warning

This function is experimental and its signature is not considered final yet. We reserve the right to change the function signature without changing the major version of igraph. Use it at your own risk.

This function finds communities using a Voronoi partitioning of vertices based on the given edge lengths divided by the edge clustering coefficient (igraph_ecc()). The generator vertices are chosen to be those with the largest local relative density within a radius r, with the local relative density of a vertex defined as s m / (m + k), where s is the strength of the vertex, m is the number of edges within the vertex's first order neighborhood, while k is the number of edges with only one endpoint within this neighborhood.

References:

Deritei et al., Community detection by graph Voronoi diagrams, New Journal of Physics 16, 063007 (2014) https://doi.org/10.1088/1367-2630/16/6/063007

Molnár et al., Community Detection in Directed Weighted Networks using Voronoi Partitioning, Scientific Reports 14, 8124 (2024) https://doi.org/10.1038/s41598-024-58624-4

Arguments:

`graph`:	The input graph. It must be simple.
`membership`:	If not `NULL`, the membership of each vertex is returned here.
`generators`:	If not `NULL`, the generator points used for Voronoi partitioning are returned here.
`modularity`:	If not `NULL`, the modularity score of the partitioning is returned here.
`lengths`:	Edge lengths, or `NULL` to consider all edges as having unit length. Voronoi partitioning will use edge lengths equal to lengths / ECC where ECC is the edge clustering coefficient.
`weights`:	Edge weights, or `NULL` to consider all edges as having unit weight. Weights are used when selecting generator points, as well as for computing modularity.
`mode`:	If `IGRAPH_OUT`, distances from generator points to all other nodes are considered. If `IGRAPH_IN`, the reverse distances are used. If `IGRAPH_ALL`, edge directions are ignored. This parameter is ignored for undirected graphs.
`r`:	The radius/resolution to use when selecting generator points. The larger this value, the fewer partitions there will be. Pass in a negative value to automatically select the radius that maximizes modularity.

Returns:

Error code.

See also:

igraph_voronoi(), igraph_ecc().

Time complexity: TODO.

igraph Reference Manual

Chapter 25. Detecting community structure

1. Common functions related to community structure

1.1. igraph_modularity — Calculates the modularity of a graph with respect to some clusters or vertex types.

1.2. igraph_modularity_matrix — Calculates the modularity matrix.

1.3. igraph_community_optimal_modularity — Calculate the community structure with the highest modularity value.

1.4. igraph_community_to_membership — Cut a dendrogram after a given number of merges.

1.5. igraph_reindex_membership — Makes the IDs in a membership vector contiguous.

1.6. igraph_compare_communities — Compares community structures using various metrics.

1.7. igraph_split_join_distance — Calculates the split-join distance of two community structures.

2. Community structure based on statistical mechanics

2.1. igraph_community_spinglass — Community detection based on statistical mechanics.

2.2. igraph_community_spinglass_single — Community of a single node based on statistical mechanics.

3. Community structure based on eigenvectors of matrices

3.1. igraph_community_leading_eigenvector — Leading eigenvector community finding (proper version).

3.2. igraph_community_leading_eigenvector_callback_t — Callback for the leading eigenvector community finding method.

3.3. igraph_le_community_to_membership — Cut an incomplete dendrogram after a given number of merges, starting with an initial cluster assignment.

4. Walktrap: Community structure based on random walks

4.1. igraph_community_walktrap — Community finding using a random walk based similarity measure.

5. Edge betweenness based community detection

5.1. igraph_community_edge_betweenness — Community finding based on edge betweenness.

5.2. igraph_community_eb_get_merges — Calculating the merges, i.e. the dendrogram for an edge betweenness community structure.

6. Community structure based on the optimization of modularity

6.1. igraph_community_fastgreedy — Finding community structure by greedy optimization of modularity.

6.2. igraph_community_multilevel — Finding community structure by multi-level optimization of modularity (Louvain).

6.3. igraph_community_leiden — Finding community structure using the Leiden algorithm.

6.4. igraph_community_leiden_simple — Finding community structure using the Leiden algorithm, simple interface.

7. Fluid communities

7.1. igraph_community_fluid_communities — Community detection based on fluids interacting on the graph.

8. Label propagation

8.1. igraph_community_label_propagation — Community detection based on label propagation.

9. The InfoMAP algorithm

9.1. igraph_community_infomap — Community structure that minimizes the expected description length of a random walker trajectory.

10. Voronoi communities

10.1. igraph_community_voronoi — Finds communities using Voronoi partitioning.

Warning

1.1. `igraph_modularity` — Calculates the modularity of a graph with respect to some clusters or vertex types.

1.2. `igraph_modularity_matrix` — Calculates the modularity matrix.

1.3. `igraph_community_optimal_modularity` — Calculate the community structure with the highest modularity value.

1.4. `igraph_community_to_membership` — Cut a dendrogram after a given number of merges.

1.5. `igraph_reindex_membership` — Makes the IDs in a membership vector contiguous.

1.6. `igraph_compare_communities` — Compares community structures using various metrics.

1.7. `igraph_split_join_distance` — Calculates the split-join distance of two community structures.

2.1. `igraph_community_spinglass` — Community detection based on statistical mechanics.

2.2. `igraph_community_spinglass_single` — Community of a single node based on statistical mechanics.

3.1. `igraph_community_leading_eigenvector` — Leading eigenvector community finding (proper version).

3.2. `igraph_community_leading_eigenvector_callback_t` — Callback for the leading eigenvector community finding method.

3.3. `igraph_le_community_to_membership` — Cut an incomplete dendrogram after a given number of merges, starting with an initial cluster assignment.

4.1. `igraph_community_walktrap` — Community finding using a random walk based similarity measure.

5.1. `igraph_community_edge_betweenness` — Community finding based on edge betweenness.

5.2. `igraph_community_eb_get_merges` — Calculating the merges, i.e. the dendrogram for an edge betweenness community structure.

6.1. `igraph_community_fastgreedy` — Finding community structure by greedy optimization of modularity.

6.2. `igraph_community_multilevel` — Finding community structure by multi-level optimization of modularity (Louvain).

6.3. `igraph_community_leiden` — Finding community structure using the Leiden algorithm.

6.4. `igraph_community_leiden_simple` — Finding community structure using the Leiden algorithm, simple interface.

7.1. `igraph_community_fluid_communities` — Community detection based on fluids interacting on the graph.

8.1. `igraph_community_label_propagation` — Community detection based on label propagation.

9.1. `igraph_community_infomap` — Community structure that minimizes the expected description length of a random walker trajectory.

10.1. `igraph_community_voronoi` — Finds communities using Voronoi partitioning.