Skip to content

Conversation

@LucaMarconato
Copy link
Member

Temporary PR to contribute to #328 (for which I have no push permission).
Instead of merging the current PR, we can merge this into the remote fork from which #328 is opened from. Merge #328 and close this PR.

# Cell Segmentation keys
CELL_SEG_KEY_HD = 'cell_segmentations'
NUCLEUS_SEG_KEY_HD = 'nucleus_segmentations'
CELL_SEG_KEY_HD = "cell_segmentations"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and there you will see some code formatting changes done automatically from the pre-commit. Running pre-commit install on your machine should enable pre-commit and automatically change the code upon committing it.

Comment on lines +78 to +82
load_nucleus_segmentations
If `True` and nucleus segmentation files are present, load nucleus segmentation polygons and the corresponding
nucleus-filtered count table. The counts are aggregated from the 2 µm binned matrix using the provided
barcode mappings so that only bins under segmented nuclei contribute to each cell’s counts. Requires all of:
nucleus segmentation GeoJSON, barcode_mappings.parquet, and the 2 µm filtered_feature_bc_matrix.h5.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter description was missing.

Comment on lines 788 to 799
def _extract_geometries_from_geojson(
adata: AnnData,
geojson_path: Path,
) -> GeoDataFrame:
"""Extract geometries and create a GeoDataFrame from a GeoJSON features map.
Parameters
----------
cell_adata : anndata.AnnData
cell_adata
AnnData object containing cell data.
geojson_features_map : dict[str, Any]
Dictionary mapping cell IDs to GeoJSON features.
geojson_path
Path to the GeoJSON file containing cell segmentation geometries.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the functions _extract_geometries_from_geojson() and _make_geojson_features_map() by removing the second function. In fact, the return of that function was used exclusively by the first function.

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Oct 27, 2025

@stephenwilliams22 I finished the code review. My changes to your PR can be seen in this diff view https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c.

In general the PR looks great, thanks for the contribution! My comments above describe my code changes and there is only one missing thing (I cannot reproduce the tests), described in this comment here: https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c#r2467148123.

To proceed, please merge the branch of this PR directly to the branch of your PR as I cannot write to yours (since it's a fork from the 10x Genomics org and not a personal fork). I will then merge back your branch to this branch.

@stephenwilliams22
Copy link
Collaborator

Thanks for going over this @LucaMarconato ! For my unit tests, how do you all typically run them? Is there built in infrastructure? My testing bundle is certainly smaller than a normal HD run but is not "tiny" at close to 300Mb. I don't want this to mess up any automated testing you might have.

@stephenwilliams22
Copy link
Collaborator

@stephenwilliams22 I finished the code review. My changes to your PR can be seen in this diff view https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c.

In general the PR looks great, thanks for the contribution! My comments above describe my code changes and there is only one missing thing (I cannot reproduce the tests), described in this comment here: https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c#r2467148123.

To proceed, please merge the branch of this PR directly to the branch of your PR as I cannot write to yours (since it's a fork from the 10x Genomics org and not a personal fork). I will then merge back your branch to this branch.

@LucaMarconato I have merged this branch into my branch. I'll have a look at tests and hopefully we'll be good to go after that.

@LucaMarconato
Copy link
Member Author

For my unit tests, how do you all typically run them?

We have 2 types of tests. The tests for the "tiny" datasets are here in GitHub. As you observed, ideally the datasets would be smaller, but since we don't have many of them, we can proceed with them even if they are 300MB.

Then we have a second set of tests, with larger datasets, that we run only right before making a release. They currently run on a AirFlow Pipeline that is only accessible to us. We are looking into way to change this so that we can also have that public, but for the moment the current setup is the most cost effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants