Clustering plays an important role in a multitude of bioinformatics applications, including protein function prediction, population genetics, and gene expression analysis. The results of most clustering algorithms are sensitive to variations of the input data, the clustering algorithm and its parameters, and individual datasets. Consensus clustering (CC) is an extension to clustering algorithms that aims to construct a robust result from those clustering features that are invariant under the above sources of variation. As part of CC, stability scores can provide an idea of the degree of reliability of the resulting clustering. Here, we present a review that structures the CC approaches in the literature into three principal types, introduce and illustrate the concept of stability scores, and illustrate the use of CC in applications to simulated and real-world gene expression datasets.
Easy-to-use tutorial Tutorial.ipynb
See package on CRAN link
Please cite the following manuscript:
- Behnam Yousefi, Benno Schwikowski, "Consensus Clustering for Robust Bioinformatics Analysis," BioRxiv (2024).