-
Notifications
You must be signed in to change notification settings - Fork 6
pandemonium: GUI for cluster analysis with interactive visualization
Cluster analysis is searching for groups of similar observations in a dataset, and can be used to uncover hidden patterns. Many different algorithms have been developed, and they generally come with tuning parameters that can substantially change the clustering solution. Exploring the different choices can provide different insights into the data. In this project we want to use cluster analysis to understand patterns for observations that are represented in two distinct multivariate spaces. For example in physics we might be interested in how parts of a multivariate parameter space for a theoretical model map onto patterns seen in observable space defined by model predictions computed for different measurements. Another example is in hydrology, where we may be interested in the connection between biotic (e.g. distribution of landscapes, ecosystems, communities) and abiotic (e.g. topography, climatic variables, extent of surface water) variables. Finally we can also imagine applications in the exploration of machine learning models, for example exploring the connection between the input variable space with a latent layer in an autoencoder neural network.
Exploring cluster solutions from one space across the other space will be used to gain insights, and an interactive interface should be used to allow the user to explore different settings in the algorithm for a detailed exploration. This will be implemented in an R Shiny app, and making use of different strategies for multivariate data visualization in connection with cluster analysis.
A prototype of such a Shiny app has been implemented in the pandemonium R package available via GitHub (https://github.com/uschiLaa/pandemonium) and is tailored for one specific physics application, see https://doi.org/10.1140/epjp/s13360-021-02310-1 for details. It uses hierarchical clustering, allows interactive selection of tuning parameters such as the distance metric or the linkage, and includes 8 tabs that use different approaches to investigate the cluster solution across the two spaces, for example parallel coordinate plots, tour visualization and non-linear dimension reduction.
Starting from the prototype, the aim will be to change this tailor-made app into a general purpose tool that can be applied to any setting where cluster analysis can be used to understand connections between two representation spaces of the same observations. In addition, we will further explore what visualizations are useful, and in particular the options of including interactive visualizations and slice tour visualizations. The app will be developed as an R package and should be made available via CRAN at the end of the project.
The package will make exploration of clustering solutions across connected data spaces available in an interactive app, and this will make tools from cluster analysis and multivariate data visualization accessible in a wide range of applications.
Contributors, please contact mentors below after completing at least one of the tests below.
- EVALUATING MENTOR: Ursula Laa ([email protected]) has written the prototype app and has extensive experience with multivariate data visualization and the development of R packages. She has mentored a successful GSoC project in 2024.
- Co-mentor: German Valencia ([email protected]) will provide expertise for physics applications and how clustering and multivariate visualization can be applied in those settings.
- Easy: install the prototype app pandemonium from GitHub and run the included example. Find two clustering settings that produce very different results.
- Medium: Use the detourr R package to generate two different tour paths of any example data, show both tours side-by side with linked brushing between the two displays.
- Hard: Embed the linked display from detourr in the pandemonium app and submit a pull request.
Contributors, please post a link to your test results here.
- EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
- Divendra Yadav, GitHub Profile,Easy Solution,Medium Solution,Hard Solution
- Gabriel Mccoy, GitHub Profile,Easy Solution,Medium Solution,Hard Solution
Jiayi Qian - Github link