5. DIRECTIONS FOR FURTHER RESEARCH xiii

An Ensemble Learning Strategy for Graph Clustering, by Ovelg¨ onne and Geyer-

Schulz, describes the heuristic CGGCi RG, whose main idea is to combine several

weak classifiers into a strong classifier. From the maximal overlap of clusterings

computed by weak classifiers, the algorithm searches for a solution with high quality.

This way diﬃcult choices are deferred after easy decisions have been fixed, which

leads to a high quality due to a better control of the search space traversal. It turns

out that the quality of the initial clusterings is of minor importance for the quality

of the final result given enough iterations.

While graph partitioning is rooted in the parallel computing community, the

picture appears to be different for graph clustering as only two clustering papers

employ significant parallelism. The agglomerative algorithm in Parallel Community

Detection for Massive Graphs, by Riedy, Meyerhenke, Ediger, and Bader, starts

out with each vertex as its own cluster. In each following iteration, beneficial

cluster merges improving the objective function value are identified and performed

in parallel by means of weighted matchings. The implementation is capable of

clustering graphs with a few billion edges in less than 10 minutes on a standard

Intel-based server.

The second paper that uses considerable parallelism to accelerate the solution

process is Graph Coarsening and Clustering on the GPU, by Fagginger Auer and

Bisseling. This paper also uses an agglomerative approach with matchings. It

alleviates the problem of small matchings due to star subgraphs by merging siblings,

i. e., neighbors of neighbors that do not share an edge. High performance is achieved

by careful algorithm design, optimizing the interplay of the CPU and the employed

graphics hardware.

5. Directions for Further Research

In the field of graph partitioning, important directions for further research

mentioned at the workshop are the widespread handling of directed graphs (or un-

symmetric matrices in case of matrix partitioning) and an improved consideration of

the objective function maximum communication volume. One possible approach—

also presented at the workshop—is to use hypergraphs instead of graphs. But this

seems to come at the price of worse performance and/or worse edge cut quality. For

the related problem of repartitioning with migration minimization, highly scalable

tools with a good solution quality are sought.

An active graph clustering research area is the development of objective func-

tions whose optimization leads to realistic and meaningful clusterings. While mod-

ularity has been very popular over recent years, current studies show that its de-

ficiencies can be severe and hard to avoid. The analysis of massive graphs for

clustering purposes is still in its infancy. Only two submissions for the graph clus-

tering challenge made use of significant parallelism. And only one of them was able

to process the largest graph in the challenge core benchmark, a web graph with 3.3

billion edges. Considering the size of today’s online social networks and WWW (to

name a few), there is a need to scale the analysis algorithms to larger input sizes.