-
Notifications
You must be signed in to change notification settings - Fork 25
chore: bump dependencies #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Do you have an example of this (the image) that you could add as a comment? |
|
Also could you mention what computer you are using (there is something weird with windows version 10 with scikit learn that we were dealing with 2 years ago but never fixed #135) |
|
NixOS 25.05.20250108.bffc22e (Warbler) on x86_64, though I do think this issue is reproducing on CI, so I doubt this is platform dependent. I'll reproduce it. |
|
Lines 83 to 95 in 54bd449
|
|
Based on what I’m reading, bumping the version is a good idea because it improves PCA in a meaningful way. Before, PCA (in version 1.2) signs were chosen by looking at how the data looked after projection (transformed data), but now they are chosen by inspecting each component vector directly. Previously, each solver decided the signs in its own way, which meant that changing the solver could cause components to unexpectedly flip signs. While the mathematical results seem to always still be correct, this causes confusion. The new approach in version 1.7 makes the sign choice independent of the solver because it relies only on the component vector’s own values, ensuring we always get consistent component signs no matter which solver we use. We don't allow a user to pick a solver, we have it set to auto at the moment. |
untested; need to open my devcontainer
|
Let me know if this commit is the right way to approach that. To my understanding, this does mess with our sampling test because our component vectors differ per run; hopefully, the aforementioned commit addresses that. |
|
Blocked by #265 - we should discuss removing GraphSpace support in today's meeting. |
For that specific test, we want the output to be the same despite shuffling the rows and columns. So having 2 different outputs is not what we would want. |
Hm, do we roll back |
|
Maybe we need to rethink the test then. Is it okay the values are the same but the signs change? Or should it always be the same values no matter the data? (Just some questions to ask). I will think through what we can do for this test case. |
Can you help me get caught up on the core question? It seems valid to have a transformation of the PCA solution where the signs are flipped. |
The essence of the test that was broken here was row shuffling preserving PCA output: dataframe_shuffled = dataframe.sample(frac=1, axis=1) # permute the columns
ml.pca(dataframe_shuffled, OUT_DIR + 'pca-shuffled-columns.png', OUT_DIR + 'pca-shuffled-columns-variance.txt',
OUT_DIR + 'pca-shuffled-columns-coordinates.tsv')
coord = pd.read_table(OUT_DIR + 'pca-shuffled-columns-coordinates.tsv')
coord = coord.round(5) # round values to 5 digits to account for numeric differences across machines
coord.sort_values(by='algorithm', ignore_index=True, inplace=True)This now needs an assertion check for two possible signings: assert coord.equals(expected) or coord.equals(expected_other) |
|
|
I think so? I'll take a better look at the blame to find the exact commit, if that's helpful. (Regardless, we probably should stop committing artifacts to |
|
If it is the ensembling one, there was a data.pickle I added that has a toy dataset network that is needed for ensembling in general that we use to ensure we calculate recall correct. I'm not sure why that would need to be updated (if it is this pickled dataset) because it has nothing to do with the pca. |
This seems worthwhile to discuss in a new issue as a testing strategy change if it affects multiple tests. |
👍 It was #212 (origin commit). I'll prepare another PR to make that test follow #339. |
Pickled objects contain internal representations of the object, and are subject to change throughout package versions. In this case, |
|
#340 should fix this. |
|
@ntalluri there seems to be sorting issues with the new commits making the PCA test flaky (see this commit which addresses that) - the tests only worked occasionally on my machine before that commit. |
ntalluri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. However, I am going to test the environment updates locally before approving.
|
By the way, some dependencies may be a little outdated (e.g. this PR outlived a pandas release cycle) - I can update them, but this PR was mostly intended to avoid all of the dependency errors that were causing me problems with pixi. |
|
Okay - sorry about this terribly long hill of a PR for how small it is. That should be the last commit which fixes the tests from the latest merge 👍 |
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the TestML tests locally and they passed
pre-commit is too outdated to run typos, we need to bump pre-commit
Closes #210. Closes #134.