python-igraph manual

For using igraph from Python

   Home       Trees       Indices       Help   
Package igraph :: Module clustering
[hide private]

Module clustering

source code

Classes related to graph clustering.


License: Copyright (C) 2006-2012 Tamás Nepusz <ntamas@gmail.com> Pázmány Péter sétány 1/a, 1117 Budapest, Hungary This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Classes [hide private]
  Clustering
Class representing a clustering of an arbitrary ordered set.
  VertexClustering
The clustering of the vertex set of a graph.
  Dendrogram
The hierarchical clustering (dendrogram) of some dataset.
  VertexDendrogram
The dendrogram resulting from the hierarchical clustering of the vertex set of a graph.
  Cover
Class representing a cover of an arbitrary ordered set.
  VertexCover
The cover of the vertex set of a graph.
  CohesiveBlocks
The cohesive block structure of a graph.
Functions [hide private]
 
compare_communities(comm1, comm2, method='vi', remove_none=False)
Compares two community structures using various distance measures.
source code
 
split_join_distance(comm1, comm2, remove_none=False)
Calculates the split-join distance between two community structures.
source code
Variables [hide private]
  __package__ = 'igraph'

Imports: deepcopy, izip, pi, StringIO, community_to_membership, property, Configuration, UniqueIdGenerator, ClusterColoringPalette, Histogram, _get_wrapper_for_width, str_to_orientation


Function Details [hide private]

compare_communities(comm1, comm2, method='vi', remove_none=False)

source code 

Compares two community structures using various distance measures.

Parameters:
  • comm1 - the first community structure as a membership list or as a Clustering object.
  • comm2 - the second community structure as a membership list or as a Clustering object.
  • method - the measure to use. "vi" or "meila" means the variation of information metric of Meila (2003), "nmi" or "danon" means the normalized mutual information as defined by Danon et al (2005), "split-join" means the split-join distance of van Dongen (2000), "rand" means the Rand index of Rand (1971), "adjusted_rand" means the adjusted Rand index of Hubert and Arabie (1985).
  • remove_none - whether to remove None entries from the membership lists. This is handy if your Clustering object was constructed using VertexClustering.FromAttribute using an attribute which was not defined for all the vertices. If remove_none is False, a None entry in either comm1 or comm2 will result in an exception. If remove_none is True, None values are filtered away and only the remaining lists are compared.
Returns:
the calculated measure.
Reference:
  • Meila M: Comparing clusterings by the variation of information. In: Scholkopf B, Warmuth MK (eds). Learning Theory and Kernel Machines: 16th Annual Conference on Computational Learning Theory and 7th Kernel Workship, COLT/Kernel 2003, Washington, DC, USA. Lecture Notes in Computer Science, vol. 2777, Springer, 2003. ISBN: 978-3-540-40720-1.
  • Danon L, Diaz-Guilera A, Duch J, Arenas A: Comparing community structure identification. J Stat Mech P09008, 2005.
  • van Dongen D: Performance criteria for graph clustering and Markov cluster experiments. Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000.
  • Rand WM: Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846-850, 1971.
  • Hubert L and Arabie P: Comparing partitions. Journal of Classification 2:193-218, 1985.

split_join_distance(comm1, comm2, remove_none=False)

source code 

Calculates the split-join distance between two community structures.

The split-join distance is a distance measure defined on the space of partitions of a given set. It is the sum of the projection distance of one partition from the other and vice versa, where the projection number of A from B is if calculated as follows:

  1. For each set in A, find the set in B with which it has the maximal overlap, and take note of the size of the overlap.
  2. Take the sum of the maximal overlap sizes for each set in A.
  3. Subtract the sum from n, the number of elements in the partition.

Note that the projection distance is asymmetric, that's why it has to be calculated in both directions and then added together. This function returns the projection distance of comm1 from comm2 and the projection distance of comm2 from comm1, and returns them in a pair. The actual split-join distance is the sum of the two distances. The reason why it is presented this way is that one of the elements being zero then implies that one of the partitions is a subpartition of the other (and if it is close to zero, then one of the partitions is close to being a subpartition of the other).

Parameters:
  • comm1 - the first community structure as a membership list or as a Clustering object.
  • comm2 - the second community structure as a membership list or as a Clustering object.
  • remove_none - whether to remove None entries from the membership lists. This is handy if your Clustering object was constructed using VertexClustering.FromAttribute using an attribute which was not defined for all the vertices. If remove_none is False, a None entry in either comm1 or comm2 will result in an exception. If remove_none is True, None values are filtered away and only the remaining lists are compared.
Returns:
the projection distance of comm1 from comm2 and vice versa in a tuple. The split-join distance is the sum of the two.

Reference: van Dongen D: Performance criteria for graph clustering and Markov cluster experiments. Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000.

See Also: compare_communities() with method = "split-join" if you are not interested in the individual projection distances but only the sum of them.


   Home       Trees       Indices       Help