Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leiden not working for Zacharys Karate Club dataset #43

Open
nicarq opened this issue Jul 13, 2024 · 2 comments
Open

Leiden not working for Zacharys Karate Club dataset #43

nicarq opened this issue Jul 13, 2024 · 2 comments
Assignees

Comments

@nicarq
Copy link

nicarq commented Jul 13, 2024

I tried using Leiden for the classic dataset of Zacharys Karate Club but it's not working. I wonder what I'm doing wrong. It didn't cluster anything at all.

This is what I get

Loaded network with 34 nodes
Loaded compact network with 34 nodes
Leiden algorithm did not improve the clustering.
Cluster 11: [11]
Cluster 33: [33]
Cluster 12: [12]
Cluster 18: [18]
Cluster 31: [31]
Cluster 27: [27]
Cluster 21: [21]
Cluster 25: [25]
Cluster 28: [28]
Cluster 29: [29]
Cluster 14: [14]
Cluster 26: [26]
Cluster 6: [6]
Cluster 4: [4]
Cluster 2: [2]
Cluster 8: [8]
Cluster 19: [19]
Cluster 1: [1]
Cluster 15: [15]
Cluster 24: [24]
Cluster 5: [5]
Cluster 10: [10]
Cluster 13: [13]
Cluster 22: [22]
Cluster 9: [9]
Cluster 32: [32]
Cluster 30: [30]
Cluster 7: [7]
Cluster 3: [3]
Cluster 0: [0]
Cluster 16: [16]
Cluster 17: [17]
Cluster 23: [23]
Cluster 20: [20]

the code is here https://github.com/nicarq/example-leiden-rust/tree/main fwiw I may had done something incorrect when using the library because it's strange that it didn't work

@daxpryce
Copy link
Contributor

daxpryce commented Jan 7, 2025

@nicarq I just now have looked at this - I'm sorry for not seeing it sooner! After MSFT stopped actively committing to graspologic-org, I didn't get any notifications about this until just now. I'll take a look at this soon and get back to you, if only so we both know what went wrong, even if it isn't something you're actively looking into any longer

@daxpryce
Copy link
Contributor

daxpryce commented Jan 7, 2025

Rather than go through the way your code did, I really quickly slammed this together in python (but we can make some educated assessments from there):

Rough code, syntax not guaranteed to be accurate:

mkdir quicktest
cd quicktest
uv init
uv add graspologic # I'm too tired to look up the actual version of networkx, so I made graspologic pull it for me. 
uv add ipython
uv run ipython
import graspologic_native as gpn
import networkx as nx
karate = nx.karate_club_graph()
edges = [(str(source), str(target), data["weight"]) for source, target, data in karate.edges(data=True)]
gpn.leiden(edges)

Returned:

(0.4449035812672175,
 {'32': 3,
  '0': 0,
  '4': 1,
  '11': 0,
  '30': 3,
  '9': 3,
  '10': 1,
  '16': 1,
  '7': 0,
  '20': 3,
  '2': 0,
  '17': 0,
  '5': 1,
  '13': 0,
  '33': 3,
  '12': 0,
  '1': 0,
  '18': 3,
  '14': 3,
  '19': 0,
  '22': 3,
  '23': 2,
  '29': 3,
  '24': 2,
  '3': 0,
  '21': 0,
  '15': 3,
  '31': 2,
  '6': 1,
  '25': 2,
  '8': 3,
  '27': 2,
  '26': 3,
  '28': 2})

so, we know it's clustering - the modularity is kinda poor and there's only 4 clusters, but that's still more right than the that you were getting in your example. nobody wants something that useless, so we need to figure out what's up.

My theory is the karate club edge list you are loading is possibly a: repeating edges and effectively overwriting their weights, or b: not summing weights when it should have (it's kinda unclear which it is)

The edges that networkx packages up for karate club are, in (source, target, weight) tuple3's:

In [21]: edges
Out[21]:
[('0', '1', 4),
 ('0', '2', 5),
 ('0', '3', 3),
 ('0', '4', 3),
 ('0', '5', 3),
 ('0', '6', 3),
 ('0', '7', 2),
 ('0', '8', 2),
 ('0', '10', 2),
 ('0', '11', 3),
 ('0', '12', 1),
 ('0', '13', 3),
 ('0', '17', 2),
 ('0', '19', 2),
 ('0', '21', 2),
 ('0', '31', 2),
 ('1', '2', 6),
 ('1', '3', 3),
 ('1', '7', 4),
 ('1', '13', 5),
 ('1', '17', 1),
 ('1', '19', 2),
 ('1', '21', 2),
 ('1', '30', 2),
 ('2', '3', 3),
 ('2', '7', 4),
 ('2', '8', 5),
 ('2', '9', 1),
 ('2', '13', 3),
 ('2', '27', 2),
 ('2', '28', 2),
 ('2', '32', 2),
 ('3', '7', 3),
 ('3', '12', 3),
 ('3', '13', 3),
 ('4', '6', 2),
 ('4', '10', 3),
 ('5', '6', 5),
 ('5', '10', 3),
 ('5', '16', 3),
 ('6', '16', 3),
 ('8', '30', 3),
 ('8', '32', 3),
 ('8', '33', 4),
 ('9', '33', 2),
 ('13', '33', 3),
 ('14', '32', 3),
 ('14', '33', 2),
 ('15', '32', 3),
 ('15', '33', 4),
 ('18', '32', 1),
 ('18', '33', 2),
 ('19', '33', 1),
 ('20', '32', 3),
 ('20', '33', 1),
 ('22', '32', 2),
 ('22', '33', 3),
 ('23', '25', 5),
 ('23', '27', 4),
 ('23', '29', 3),
 ('23', '32', 5),
 ('23', '33', 4),
 ('24', '25', 2),
 ('24', '27', 3),
 ('24', '31', 2),
 ('25', '31', 7),
 ('26', '29', 4),
 ('26', '33', 2),
 ('27', '33', 4),
 ('28', '31', 2),
 ('28', '33', 2),
 ('29', '32', 4),
 ('29', '33', 2),
 ('30', '32', 3),
 ('30', '33', 3),
 ('31', '32', 4),
 ('31', '33', 4),
 ('32', '33', 5)]

Looking at your gml / csv file, it does not have any weights, and it also doesn't have a multigraph that can be described as a weighted undirected graph, and that will be problematic. It looks like your graph in the gml is behaving as a directed graph without weights, and ... while I can't say it's wrong in it's response, I do know that leiden is not really intended to be done on any graph that isn't an Undirected, Weighted, Non-Multigraph (meaning there is precisely one edge from a to b, so if you do in fact have a multigraph, you can sum the weights (or count the edges) and collapse it into a single a <-> b relationship with an aggregate weight)

Again, my apologies for looking at this so late. I'll make sure I put a watch on this repo so I don't miss anything further.

Cheers!

@daxpryce daxpryce self-assigned this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants