Skip to content

fix/embedding unique obs names#476

Draft
srivarra wants to merge 2 commits into
mainfrom
fix/embedding-unique-obs-names
Draft

fix/embedding unique obs names#476
srivarra wants to merge 2 commits into
mainfrom
fix/embedding-unique-obs-names

Conversation

@srivarra

@srivarra srivarra commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

No description provided.

srivarra and others added 2 commits June 23, 2026 14:34
EmbeddingWriter wrote stores with duplicate obs_names: write_on_epoch_end
concatenated per-batch index frames without ignore_index, so the positional
index restarted each batch (0,1,..,0,1,..) and those labels became obs_names.
Consumers worked around this with scattered obs_names_make_unique() calls,
which only dedupe a single store and break once N stores are ad.concat'd.

obs_names is never referenced directly (cell identity lives in the obs
columns), so make it an opaque, globally-unique handle: a random uuid4 per
observation, assigned at the single creation site. Stores stay unique within
and across any number of concatenated stores. Also pass ignore_index=True at
the per-batch concat to drop the duplicate intermediate index.

Refs #475

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EmbeddingWriter now assigns unique uuid obs_names at the source, so the
per-consumer obs_names_make_unique() calls are dead no-ops. Remove all nine
across the mmd, linear-classifier, and pseudotime code paths.

Refs #475

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@srivarra srivarra linked an issue Jun 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EmbeddingWriter produces AnnData with duplicate obs_names

1 participant