temp PR: Visium HD 4.0 #333

LucaMarconato · 2025-10-22T13:02:38Z

Temporary PR to contribute to #328 (for which I have no push permission).
Instead of merging the current PR, we can merge this into the remote fork from which #328 is opened from. Merge #328 and close this PR.

for more information, see https://pre-commit.ci

src/spatialdata_io/readers/visium_hd.py

LucaMarconato · 2025-10-27T21:11:20Z

src/spatialdata_io/_constants/_constants.py

    # Cell Segmentation keys
-    CELL_SEG_KEY_HD = 'cell_segmentations'
-    NUCLEUS_SEG_KEY_HD = 'nucleus_segmentations'
+    CELL_SEG_KEY_HD = "cell_segmentations"


Here and there you will see some code formatting changes done automatically from the pre-commit. Running pre-commit install on your machine should enable pre-commit and automatically change the code upon committing it.

src/spatialdata_io/readers/visium_hd.py

LucaMarconato · 2025-10-27T21:14:50Z

src/spatialdata_io/readers/visium_hd.py

+    load_nucleus_segmentations
+        If `True` and nucleus segmentation files are present, load nucleus segmentation polygons and the corresponding
+        nucleus-filtered count table. The counts are aggregated from the 2 µm binned matrix using the provided
+        barcode mappings so that only bins under segmented nuclei contribute to each cell’s counts. Requires all of:
+        nucleus segmentation GeoJSON, barcode_mappings.parquet, and the 2 µm filtered_feature_bc_matrix.h5.


The parameter description was missing.

src/spatialdata_io/readers/visium_hd.py

LucaMarconato · 2025-10-27T21:31:36Z

src/spatialdata_io/readers/visium_hd.py

+def _extract_geometries_from_geojson(
+    adata: AnnData,
+    geojson_path: Path,
+) -> GeoDataFrame:
    """Extract geometries and create a GeoDataFrame from a GeoJSON features map.

    Parameters
    ----------
-    cell_adata : anndata.AnnData
+    cell_adata
        AnnData object containing cell data.
-    geojson_features_map : dict[str, Any]
-        Dictionary mapping cell IDs to GeoJSON features.
+    geojson_path
+        Path to the GeoJSON file containing cell segmentation geometries.


I merged the functions _extract_geometries_from_geojson() and _make_geojson_features_map() by removing the second function. In fact, the return of that function was used exclusively by the first function.

src/spatialdata_io/readers/visium_hd.py

tests/test_visium_hd.py

LucaMarconato · 2025-10-27T21:39:47Z

@stephenwilliams22 I finished the code review. My changes to your PR can be seen in this diff view https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c.

In general the PR looks great, thanks for the contribution! My comments above describe my code changes and there is only one missing thing (I cannot reproduce the tests), described in this comment here: https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c#r2467148123.

To proceed, please merge the branch of this PR directly to the branch of your PR as I cannot write to yours (since it's a fork from the 10x Genomics org and not a personal fork). I will then merge back your branch to this branch.

stephenwilliams22 · 2025-10-29T16:07:58Z

Thanks for going over this @LucaMarconato ! For my unit tests, how do you all typically run them? Is there built in infrastructure? My testing bundle is certainly smaller than a normal HD run but is not "tiny" at close to 300Mb. I don't want this to mess up any automated testing you might have.

stephenwilliams22 · 2025-10-29T16:18:35Z

@stephenwilliams22 I finished the code review. My changes to your PR can be seen in this diff view https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c.

In general the PR looks great, thanks for the contribution! My comments above describe my code changes and there is only one missing thing (I cannot reproduce the tests), described in this comment here: https://github.com/scverse/spatialdata-io/pull/333/files/0e891ec7c58d0583cae50a376024ad87b7bf7832..955a9a64058f42e62af308f16e1a48ae7b05823c#r2467148123.

To proceed, please merge the branch of this PR directly to the branch of your PR as I cannot write to yours (since it's a fork from the 10x Genomics org and not a personal fork). I will then merge back your branch to this branch.

@LucaMarconato I have merged this branch into my branch. I'll have a look at tests and hopefully we'll be good to go after that.

…aldata-io into stephen/spaceranger4.0

LucaMarconato · 2025-11-03T13:57:49Z

For my unit tests, how do you all typically run them?

We have 2 types of tests. The tests for the "tiny" datasets are here in GitHub. As you observed, ideally the datasets would be smaller, but since we don't have many of them, we can proceed with them even if they are 300MB.

Then we have a second set of tests, with larger datasets, that we run only right before making a release. They currently run on a AirFlow Pipeline that is only accessible to us. We are looking into way to change this so that we can also have that public, but for the moment the current setup is the most cost effective.

stephenwilliams22 and others added 16 commits September 24, 2025 13:24

feat: suppor for spaceranger4.0

b1e6d06

make nuc segmentation and cut dup code

a6d7e9f

use only in_nucleus

31379c8

make local functions

34ac722

ruff

b7848fa

typing

ae43fe7

nuc seg cli

31c76fd

tests

9134c94

add vignette

42028ce

update BARCODE_MAPPINGS_PATH and calc centroids

b269d32

delete vignette. moved to new PR

0e891ec

Merge branch 'main' into stephen/spaceranger4.0

c9674ca

[pre-commit.ci] auto fixes from pre-commit.com hooks

49d3e7e

for more information, see https://pre-commit.ci

wip code review visium hd

30e4076

wip code review

cc9dc68

initial review pass

e5c2d16

LucaMarconato mentioned this pull request Oct 27, 2025

Add visium hd test datasets #336

Merged

LucaMarconato added 4 commits October 27, 2025 20:10

Merge branch 'main' into stephen/spaceranger4.0

ab3611b

pin pyarrow

8cecbf3

fix docstring

4b8a41f

code review simplification

39ffde5