-
Notifications
You must be signed in to change notification settings - Fork 39
Speedup For EDistance-like distances #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #880 +/- ##
==========================================
- Coverage 73.54% 71.83% -1.71%
==========================================
Files 48 48
Lines 5613 5734 +121
==========================================
- Hits 4128 4119 -9
- Misses 1485 1615 +130
🚀 New features to boost your workflow:
|
|
Ok I found out why the test fail. I checked myself and when I directly run this cell on the main branch I get import matplotlib.pyplot as plt
import numpy as np
import pertpy as pt
import scanpy as sc
from seaborn import clustermap
adata = pt.dt.distance_example()
obs_key = "perturbation" # defines groups to testTurns out the notebook was giving this invalid if f"{self.obsm_key}_{self.cell_wise_metric}_predistances" not in adata.obsp:
self.precompute_distances(adata, n_jobs=n_jobs, **kwargs)while it hits in newer implementation because it doesn't precompute the whole distance matrices in this cell anymore distance = pt.tl.Distance(metric="euclidean", obsm_key="X_pca")
df = distance.pairwise(adata, groupby=obs_key)So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating. That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not. |
Yes, I agree with you. We can happily change that.
Are you willing to make this change or do you want me to do it? Ideally, we'd include the update commit in this PR. Thank you! |
Zethson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very very much for your work!
I really appreciate that you took the time to make the code much more explicit and clearer.
| for i in prange(n_samples_X): | ||
| for j in range(n_samples_Y): | ||
| # Compute euclidean distance | ||
| dist_sq = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part here is duplicated with the one above, right? I'm okay with it because it's explicit here.
| X: First array of shape (n_samples_X, n_features). | ||
| Y: Second array of shape (n_samples_Y, n_features). If None, computes within-group | ||
| distances (X to X). | ||
| metric: Distance metric to use. Currently only "euclidean" is optimized with numba. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK whether we need to specify the optimization here because that's an implementation detail and doesn't need to be user facing.
| Args: | ||
| X: First array of shape (n_samples_X, n_features). | ||
| Y: Second array of shape (n_samples_Y, n_features). If None, computes within-group | ||
| distances (X to X). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this up into the line above and/or move the "if None, computes" part into it's own line.
| """ | ||
| if metric == "euclidean": | ||
| if len(kwargs) > 0: | ||
| # warn that kwargs are not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # warn that kwargs are not used |
| # pairwise distances for each group separately. Other metrics are not | ||
| # able to handle precomputed distances such as the PseudobulkDistance. | ||
| if self.metric_fct.accepts_precomputed: | ||
| # Check if metric supports value caching (within/between distances) - more efficient than precomputed matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clear comments, thank you!
| (like mean pairwise distances within groups, between groups) to avoid redundant computation. | ||
| This mode is incompatible with bootstrap since cached values would be invalid. | ||
| Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I think this function is simple enough to not have to document the return this detailed.
No description provided.