Cooperative-based graph clustering

Cooperative Clustering on Graphs (CC/G) is a graph-based clustering approach that combines multiple software clustering algorithms with a cooperative strategy to produce a measurably better and more stable results than any of those algorithms does individually.
CC/G is useful for addressing many graph mining research challenges. It can identify anomalies in graph-based datasets, which is applicable in fields including but not limited to network intrusion, fraud detection, health monitoring, and sensor networks. It can also be used as a technique for subgraph matching in the context of dynamic graphs and datasets that represent the temporal evolution of relationships between entities.
Description
Cooperative clustering on graphs is an unsupervised learning algorithm for clustering a graph networks into k partitions based on intra-cluster density and inter-cluster sparsity. The main idea is to apply different clustering algorithms on the node-edge network. Each algorithm will provide different k clusters. A common agreement among those clusterings is then found. This agreement identifies the minimum number of k clusters that the graph network should have. The last step is to merge the remaining graph elements that exhibited disagreement in-between the clusters that were initially determined using optimum intra-cluster density and inter-cluster sparsity. These steps are illustrated by the following figure.
The algorithm makes no assumptions about the size or the number of clusters. Besides this, the algorithm can make use of multiple clustering criteria functions. Also, It overcomes the drawbacks of clustering algorithms that the result will not vary greatly unless the data change in the same amount when using different clustering criteria.
Illustrative Example
We used CC/G to partition a small mobile game application, written in Java. We chose this mobile game because it is a nicely designed system with a few number of classes. The figure shows the dependency graph for this mobile game. A dependency graph is a directed graph that represents the dependencies of several software artifacts one another. The mobile game application consists of 9 classes. The classes have been given letters for simplicity.
We will start by, partitioning the application dependency graph using three different clustering algorithms: Nearest Ascent Hill Climbing (NAHC), Steepest Ascend Hill Climbing (SAHC) and Bertin’s reoderable matrix. Then, we are going to measure the modularization quality using TurboMQ by Brian S. Mitchell.
The second step, we are going to find the a common agreement among those clusterings. In the last step, we are going to merge the remaining graph elements that exhibited disagreement in-between the clusters that were initially determined.
Apparently, the three clustering algorithms provide different solutions with different modularization quality. Also, they have a disagreement about the position of element e. Here, CC/G, provides a solution for those two problems. CC/G gives a better modularization quality and resolve the disagreement between clustering solutions.
 
< Prev   Next >