Compute Astronomical Dendrograms in Heat #2271
brownbaerchen
started this conversation in
Student projects
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Dendrograms are a hierarchical clustering that are commonly used in astrophysics to identify structures in observational data. See the documentation of astrodendro for more information about what a dendrogram is and how it is presently computed.
Astrodendro is an excellent piece of software in order to compute and analyze dendrograms, but is slow for large data. The algorithm loops through every individual data point and assigns it to its respective structure. As observational data is increasing in size, Astrodendro is becoming a bottleneck. Speeding this up by use of Heat's distribution and GPU capabilities would have a large impact on astronomy.
This project has two steps:
Develop an algorithm for merging local dendrograms into a global one
In order to efficiently use the multi-core hardware that we have nowadays, we must compute local dendrograms independently of each other and then compute the global one from these local ones. This is non-trivial because the dendrogram is a very global clustering: The local tree structure depends on the tree structure of all other dendrograms. So you need to find a clever way of splitting local structures and merging them to recover the correct global structure.
Develop a vectorized implementation to speed up local dendrogram computation
Vectorized implementations that can take advantage of efficient implementations within numpy or pytorch dramatically outperform explicit python loops. However, this requires reworking the core algorithm from the ground up. A vectorized implementation can also efficiently run on GPU, delivering huge speedup.
Who is this for?
This project is suitable for a Master's thesis. You need to be comfortable with Python and ideally have some first experience with distributed computing. Ideally, you are not afraid of applied maths if you want to tackle the algorithm rework.
If you are looking for a challenge that will actually have a meaningful impact, this is the right project for you.
We are looking forward to hearing back from you via a comment on this discussion.
Beta Was this translation helpful? Give feedback.
All reactions