An Ensemble Learning Strategy for Graph Clustering, by Ovelg¨ onne and Geyer-
Schulz, describes the heuristic CGGCi RG, whose main idea is to combine several
weak classifiers into a strong classifier. From the maximal overlap of clusterings
computed by weak classifiers, the algorithm searches for a solution with high quality.
This way difficult choices are deferred after easy decisions have been fixed, which
leads to a high quality due to a better control of the search space traversal. It turns
out that the quality of the initial clusterings is of minor importance for the quality
of the final result given enough iterations.
While graph partitioning is rooted in the parallel computing community, the
picture appears to be different for graph clustering as only two clustering papers
employ significant parallelism. The agglomerative algorithm in Parallel Community
Detection for Massive Graphs, by Riedy, Meyerhenke, Ediger, and Bader, starts
out with each vertex as its own cluster. In each following iteration, beneficial
cluster merges improving the objective function value are identified and performed
in parallel by means of weighted matchings. The implementation is capable of
clustering graphs with a few billion edges in less than 10 minutes on a standard
Intel-based server.
The second paper that uses considerable parallelism to accelerate the solution
process is Graph Coarsening and Clustering on the GPU, by Fagginger Auer and
Bisseling. This paper also uses an agglomerative approach with matchings. It
alleviates the problem of small matchings due to star subgraphs by merging siblings,
i. e., neighbors of neighbors that do not share an edge. High performance is achieved
by careful algorithm design, optimizing the interplay of the CPU and the employed
graphics hardware.
5. Directions for Further Research
In the field of graph partitioning, important directions for further research
mentioned at the workshop are the widespread handling of directed graphs (or un-
symmetric matrices in case of matrix partitioning) and an improved consideration of
the objective function maximum communication volume. One possible approach—
also presented at the workshop—is to use hypergraphs instead of graphs. But this
seems to come at the price of worse performance and/or worse edge cut quality. For
the related problem of repartitioning with migration minimization, highly scalable
tools with a good solution quality are sought.
An active graph clustering research area is the development of objective func-
tions whose optimization leads to realistic and meaningful clusterings. While mod-
ularity has been very popular over recent years, current studies show that its de-
ficiencies can be severe and hard to avoid. The analysis of massive graphs for
clustering purposes is still in its infancy. Only two submissions for the graph clus-
tering challenge made use of significant parallelism. And only one of them was able
to process the largest graph in the challenge core benchmark, a web graph with 3.3
billion edges. Considering the size of today’s online social networks and WWW (to
name a few), there is a need to scale the analysis algorithms to larger input sizes.
Previous Page Next Page