Log on/register
BioMed Central home | Journals A-Z | Feedback | Support | My details
 
Open AccessHighly AccessResearch

Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

László A Zahoránszky1 email, Gyula Y Katona1 email, Péter Hári2 email, András Málnási-Csizmadia3 email, Katharina A Zweig4 email and Gergely Zahoránszky-Köhalmi2,3 email

Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Budapest, Hungary

DELTA Informatika Zrt, Budapest, Hungary

Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary

Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary

author email corresponding author email

Algorithms for Molecular Biology 2009, 4:12doi:10.1186/1748-7188-4-12

Published: 19 October 2009

Abstract

Background

Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.

Results

In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.

Conclusion

Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.