Speedup For EDistance-like distances #880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

selmanozleyen wants to merge 9 commits into scverse:main from selmanozleyen:speedup/edistance

Member

selmanozleyen commented Nov 12, 2025

No description provided.


          init

9a50824

selmanozleyen changed the title ~~init~~ Speedup For EDistance like Distances


          use a kernel

777d3d6

selmanozleyen changed the title ~~Speedup For EDistance like Distances~~ Speedup For EDistance-like distances

codecov-commenter commented Nov 12, 2025 •

edited

Loading

Codecov Report

❌ Patch coverage is 66.91729% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.83%. Comparing base (12897e1) to head (fd1965a).
⚠️ Report is 12 commits behind head on main.

Files with missing lines	Patch %	Lines
pertpy/tools/_distances/_distances.py	66.91%	44 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #880      +/-   ##
==========================================
- Coverage   73.54%   71.83%   -1.71%     
==========================================
  Files          48       48              
  Lines        5613     5734     +121     
==========================================
- Hits         4128     4119       -9     
- Misses       1485     1615     +130

Files with missing lines	Coverage Δ
pertpy/tools/_distances/_distances.py	`85.87% <66.91%> (-4.70%)`	⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Zethson marked this pull request as draft

November 12, 2025 10:46

selmanozleyen and others added 3 commits

November 29, 2025 23:46


          Merge branch 'main' into speedup/edistance


          kwargs for pairwisedistance

81667dd


          fix n_pairs blunder

825b4d6

Member Author

selmanozleyen commented Nov 30, 2025

Ok I found out why the test fail. I checked myself and when I directly run this cell on the main branch I get Distance.precompute_distances() got an unexpected keyword argument 'verbose'

import matplotlib.pyplot as plt
import numpy as np
import pertpy as pt
import scanpy as sc
from seaborn import clustermap
adata = pt.dt.distance_example()
obs_key = "perturbation"  # defines groups to test

Turns out the notebook was giving this invalid kwarg but it didn't fail because the older implementation would precompute the distances and it wouldn't hit.

if f"{self.obsm_key}_{self.cell_wise_metric}_predistances" not in adata.obsp:
     self.precompute_distances(adata, n_jobs=n_jobs, **kwargs)

while it hits in newer implementation because it doesn't precompute the whole distance matrices in this cell anymore

distance = pt.tl.Distance(metric="euclidean", obsm_key="X_pca")
df = distance.pairwise(adata, groupby=obs_key)

So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating. That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not.

selmanozleyen marked this pull request as ready for review

November 30, 2025 08:43

Member

Zethson commented Nov 30, 2025

That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not.

Yes, I agree with you. We can happily change that.

So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating.

Are you willing to make this change or do you want me to do it? Ideally, we'd include the update commit in this PR.

Thank you!

selmanozleyen mentioned this pull request

remove verbose option in pairwise function scverse/pertpy-tutorials#56

Merged


          update subproject commit

98f3c64

Zethson approved these changes

View reviewed changes

Member

Zethson left a comment

Thank you very very much for your work!

I really appreciate that you took the time to make the code much more explicit and clearer.

pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved

pertpy/tools/_distances/_distances.py

    
                  for i in prange(n_samples_X):

                      for j in range(n_samples_Y):

                          # Compute euclidean distance

                          dist_sq = 0.0

Member

Zethson Dec 5, 2025

This part here is duplicated with the one above, right? I'm okay with it because it's explicit here.

pertpy/tools/_distances/_distances.py

    
                      X: First array of shape (n_samples_X, n_features).

                      Y: Second array of shape (n_samples_Y, n_features). If None, computes within-group

                         distances (X to X).

                      metric: Distance metric to use. Currently only "euclidean" is optimized with numba.

Member

Zethson Dec 5, 2025

IDK whether we need to specify the optimization here because that's an implementation detail and doesn't need to be user facing.

pertpy/tools/_distances/_distances.py

    
                  Args:

                      X: First array of shape (n_samples_X, n_features).

                      Y: Second array of shape (n_samples_Y, n_features). If None, computes within-group

                         distances (X to X).

Member

Zethson Dec 5, 2025

Please move this up into the line above and/or move the "if None, computes" part into it's own line.

pertpy/tools/_distances/_distances.py

    
                  """

                  if metric == "euclidean":

                      if len(kwargs) > 0:

                          # warn that kwargs are not used

Member

Zethson Dec 5, 2025

Suggested change

# warn that kwargs are not used

pertpy/tools/_distances/_distances.py

    
                      # pairwise distances for each group separately. Other metrics are not

                      # able to handle precomputed distances such as the PseudobulkDistance.

                      if self.metric_fct.accepts_precomputed:

                      # Check if metric supports value caching (within/between distances) - more efficient than precomputed matrix

Member

Zethson Dec 5, 2025

Very clear comments, thank you!

pertpy/tools/_distances/_distances.py

    
                      (like mean pairwise distances within groups, between groups) to avoid redundant computation.

                      This mode is incompatible with bootstrap since cached values would be invalid.

                      Returns:

Member

Zethson Dec 5, 2025

Nitpick: I think this function is simple enough to not have to document the return this detailed.

selmanozleyen and others added 3 commits

December 5, 2025 11:03


          Merge branch 'main' into speedup/edistance

326c64e


          Update pertpy/tools/_distances/_distances.py

88a1239

Co-authored-by: Lukas Heumos <[email protected]>


          Merge branch 'main' into speedup/edistance

fd1965a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet