Skip to content

Conversation

@selmanozleyen
Copy link
Member

No description provided.

@selmanozleyen selmanozleyen changed the title init Speedup For EDistance like Distances Nov 12, 2025
@selmanozleyen selmanozleyen changed the title Speedup For EDistance like Distances Speedup For EDistance-like distances Nov 12, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 66.91729% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.83%. Comparing base (12897e1) to head (fd1965a).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
pertpy/tools/_distances/_distances.py 66.91% 44 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #880      +/-   ##
==========================================
- Coverage   73.54%   71.83%   -1.71%     
==========================================
  Files          48       48              
  Lines        5613     5734     +121     
==========================================
- Hits         4128     4119       -9     
- Misses       1485     1615     +130     
Files with missing lines Coverage Δ
pertpy/tools/_distances/_distances.py 85.87% <66.91%> (-4.70%) ⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Zethson Zethson marked this pull request as draft November 12, 2025 10:46
@selmanozleyen
Copy link
Member Author

Ok I found out why the test fail. I checked myself and when I directly run this cell on the main branch I get Distance.precompute_distances() got an unexpected keyword argument 'verbose'

import matplotlib.pyplot as plt
import numpy as np
import pertpy as pt
import scanpy as sc
from seaborn import clustermap
adata = pt.dt.distance_example()
obs_key = "perturbation"  # defines groups to test

Turns out the notebook was giving this invalid kwarg but it didn't fail because the older implementation would precompute the distances and it wouldn't hit.

if f"{self.obsm_key}_{self.cell_wise_metric}_predistances" not in adata.obsp:
     self.precompute_distances(adata, n_jobs=n_jobs, **kwargs)

while it hits in newer implementation because it doesn't precompute the whole distance matrices in this cell anymore

distance = pt.tl.Distance(metric="euclidean", obsm_key="X_pca")
df = distance.pairwise(adata, groupby=obs_key)

So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating. That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not.

@selmanozleyen selmanozleyen marked this pull request as ready for review November 30, 2025 08:43
@Zethson
Copy link
Member

Zethson commented Nov 30, 2025

That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not.

Yes, I agree with you. We can happily change that.

So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating.

Are you willing to make this change or do you want me to do it? Ideally, we'd include the update commit in this PR.

Thank you!

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very very much for your work!

I really appreciate that you took the time to make the code much more explicit and clearer.

for i in prange(n_samples_X):
for j in range(n_samples_Y):
# Compute euclidean distance
dist_sq = 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part here is duplicated with the one above, right? I'm okay with it because it's explicit here.

X: First array of shape (n_samples_X, n_features).
Y: Second array of shape (n_samples_Y, n_features). If None, computes within-group
distances (X to X).
metric: Distance metric to use. Currently only "euclidean" is optimized with numba.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK whether we need to specify the optimization here because that's an implementation detail and doesn't need to be user facing.

Args:
X: First array of shape (n_samples_X, n_features).
Y: Second array of shape (n_samples_Y, n_features). If None, computes within-group
distances (X to X).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this up into the line above and/or move the "if None, computes" part into it's own line.

"""
if metric == "euclidean":
if len(kwargs) > 0:
# warn that kwargs are not used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# warn that kwargs are not used

# pairwise distances for each group separately. Other metrics are not
# able to handle precomputed distances such as the PseudobulkDistance.
if self.metric_fct.accepts_precomputed:
# Check if metric supports value caching (within/between distances) - more efficient than precomputed matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear comments, thank you!

(like mean pairwise distances within groups, between groups) to avoid redundant computation.
This mode is incompatible with bootstrap since cached values would be invalid.
Returns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think this function is simple enough to not have to document the return this detailed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants