Add metadata only stage to repodata generation by soapy1 · Pull Request #284 · conda/conda-index

soapy1 · 2026-04-16T22:30:17Z

Description

This introduces a new upstream stage called md. The existing stages in conda-index are:

fs - means that the artifact is now available in the set of packages and is assumed by default to be the local filesystem
indexed - means that the entry already exists in the database (same filename, same timestamp, same hash), and its package metadata has been extracted to the index_json etc

This pr adds md which is like fs in that it represents an aritfact that is now available. But, it is assumed to not be available on the local filesystem. md is a sibiling to the indexed stage. This is helpful for indexing artifacts that are not able to be represented on the local filesystem. For example, to represent pypi packages.

To include metadata sourced packages in repodata be use to include include_stages and package_extensions to the ChannelIndex.cache_kwargs during it's instantiation. For example:

channel_index = ChannelIndex(
        tmp_path,
        "haswheels",  # channel name if different than last segment of tmp_path
        repodata_v3=True,
        cache_kwargs={
            "package_extensions": CONDA_PACKAGE_EXTENSIONS + (".whl",),
            "include_stages": ["md"],
        },
    )

And inject the metadata using the store_md_state function:

cache = channel_index.cache_for_subdir("noarch")
. . .
cache.store_md_state(listdir_like())

See tests/test_demonstrate_wheel.py for how this changes the api for injecting wheel data into repodata.

ref: conda/conda-pypi#276 (comment)

Other noteable changes

fixed import errors when importing from conda_index.postgres.cache import PsqlCache
- this caused test failures in tests/test_psql.py - looks like these tests have been getting skipped
- fixed those tests
update conda_index.conda_index.cache.BaseCondaIndexCache.database_path to only prepend the database prefix if the prefix is not already included

Checklist - did you ...

Add a file to the news directory (using the template) for the next release's release notes?
Add / update necessary tests?
Add / update outdated documentation?

soapy1 · 2026-04-17T16:21:45Z

        self.created_at = now_dt.strftime("%Y-%m-%dT%H:%M:%SZ")

-    def cache_for_subdir(self, subdir):
+    def cache_for_subdir(self, subdir, stage: str | None = None):


A different approach might make the cache layer know about what all the desired upstream stages are instead of getting a cache by subdir and stage.
By moving the responsibility of keeping track of the upstream stages to the cache layer, we can avoid the merging logic below. This could be replaced by accounting for the multiple stages in the query phase.

The database layer is the right place to merge stages. It could be accounted for in cache_kwargs when creating ChannelIndex, or in indexed_packages() when performing the query.

Let's go for including this in the cache_kwargs. We'll define a new attribute include_stages which accepts a list of stages to consider in addition to upstream_stage.

dholth · 2026-04-22T19:28:09Z

            WITH
            fs AS
-                ( SELECT path, mtime, size, sha256, md5 FROM stat WHERE stage = :upstream_stage ),
+                ( SELECT path, mtime, size, sha256, md5 FROM stat WHERE stage IN ({stages_placeholders}) ),


We'll have to check what happens when path is in fs and md at the same time. We would get the same index_json twice, maybe, but it would also overwrite itself in the output dict.

yes! Added a test in tests/test_index.py::test_index_noarch_with_wheels. This will put wheels and noarch conda packages in the same noarch subdir.

dholth · 2026-04-23T13:58:24Z

Do you think md is a sibling to the indexed phase, not the upstream fs phase? We wouldn't join md with indexed to find a list of packages whose metadata needs to be added; instead; md implies that the metadata has been loaded in the index_json table. Then we only consider md+indexed when pulling data out of the database.

soapy1 · 2026-04-27T21:53:51Z

hmmm, I was thinking of md and fs as siblings. But I think you are right, it should be md and indexed. Will re-jig it!

soapy1 · 2026-05-05T20:39:06Z

+                stat["size"],
+                stat["mtime"],
+                {},
+                stat["repodata"],


This means that when a user is creating a function like listdir_like they need to also add this repodata key. That isn't very listdir_stat like.....

For example, something like

def listdir_like(): for path, repodata in wheels.items(): assert "sha256" in repodata if "md5" not in repodata: repodata["md5"] = None yield { "path": cache.database_path(path), "size": repodata["size"], "mtime": repodata.get("timestamp", 1), "repodata": repodata, }

This is maybe a bit too much of a hack. @dholth what do you think?

soapy1 · 2026-05-05T20:47:34Z

    "msgpack",
+    "psycopg2",
    "ruamel.yaml",
+    "sqlalchemy",


hmmm, I expect these dependencies were not included for some reason. After fixing the psql cache import errors in the tests, I got new errors saying that these dependencies are missing. Is there a strong incentive to keep these out of the main package? An alternative is to add them as optional dependencies.

dholth · 2026-05-05T20:48:06Z

This program is written to work with "low dependencies", so it's necessary for sqlalchemy to be only an optional dependency for conda-index. This should help users who install conda-index alongside other programs or incidentally as a conda, conda-pypi, or conda-build dependency.

conda-bot added this to 🔎 Review Apr 16, 2026

github-project-automation Bot moved this to 🆕 New in 🔎 Review Apr 16, 2026

conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 16, 2026

dholth self-requested a review April 17, 2026 12:18

soapy1 commented Apr 17, 2026

View reviewed changes

Comment thread conda_index/index/__init__.py Outdated

soapy1 commented Apr 17, 2026

View reviewed changes

soapy1 force-pushed the cache-keep-around branch from e095641 to cd05ea6 Compare April 17, 2026 21:18

dholth reviewed Apr 17, 2026

View reviewed changes

Comment thread conda_index/index/cache.py Outdated

soapy1 commented Apr 22, 2026

View reviewed changes

Comment thread conda_index/index/sqlitecache.py Outdated

dholth reviewed Apr 22, 2026

View reviewed changes

danyeaw added this to conda Roadmap and Sprint Planning Apr 22, 2026

github-project-automation Bot moved this to Sorting ⚙️ in conda Roadmap and Sprint Planning Apr 22, 2026

danyeaw moved this from Sorting ⚙️ to In Progress 🏗️ in conda Roadmap and Sprint Planning Apr 22, 2026

dholth mentioned this pull request Apr 23, 2026

store_pypi_metadata causes spurious re-extraction of virtual wheel packages conda/conda-pypi#340

Closed

2 tasks

soapy1 self-assigned this Apr 27, 2026

danyeaw assigned dholth Apr 29, 2026

soapy1 force-pushed the cache-keep-around branch from bf866b1 to 55367f2 Compare May 4, 2026 17:47

soapy1 changed the title ~~experiment: Add metadata only stage to repodata generation~~ Add metadata only stage to repodata generation May 4, 2026

dholth reviewed May 4, 2026

View reviewed changes

Comment thread conda_index/index/sqlitecache.py Outdated

soapy1 force-pushed the cache-keep-around branch 4 times, most recently from d243c9f to 88fa138 Compare May 5, 2026 18:11

soapy1 added 5 commits May 5, 2026 11:12

Add md stage for metadata only cache items

4877af7

store_fs_state should operate on its own stage?

56af5d6

Apply multiple upstream stage to postgres

778d964

Fix tests

a6afe28

Add upstream stages enum

32efad0

soapy1 added 9 commits May 5, 2026 11:12

Extract constant for 'indexed' state

b2c0fe6

Moce UpstreamStages enum to the cache module

5497f73

Add enum for indexed stages

042ef2b

Force metadata state into indexed packages

4be89b1

Back out changes to postgres module

2e6dcc9

Update test cache and postgres cache for new Cache abstract methods

7538d31

Fix typo

002f5a1

Add test for wheels and conda packages together

6748ebb

Add ability to spcify additional stages to pull index data from

c538a50

soapy1 force-pushed the cache-keep-around branch from 88fa138 to c538a50 Compare May 5, 2026 18:12

soapy1 added 5 commits May 5, 2026 11:42

Parameterize index tests for sqlite and psql backends

0eda38c

Changed packages should just compare to the indexed state

bc665a6

Apply include_stages setting to postgres

6c51404

Fix psql tests

abccc39

Ensure shards also pick up the right packages to include in repodata

1c01b91

soapy1 force-pushed the cache-keep-around branch from 802ef0d to 1c01b91 Compare May 5, 2026 20:28

soapy1 commented May 5, 2026

View reviewed changes

soapy1 force-pushed the cache-keep-around branch from 37d0d0a to d72e5fe Compare May 5, 2026 20:44

soapy1 commented May 5, 2026

View reviewed changes

soapy1 force-pushed the cache-keep-around branch from b55eae9 to bdd5b90 Compare May 5, 2026 21:14

soapy1 commented May 5, 2026

View reviewed changes

Comment thread tests/conftest.py Outdated

soapy1 force-pushed the cache-keep-around branch 3 times, most recently from 5fb569e to 0a51a31 Compare May 5, 2026 22:10

Skip postgres tests conditionally

365ee48

soapy1 force-pushed the cache-keep-around branch from 0a51a31 to 365ee48 Compare May 5, 2026 22:19

soapy1 marked this pull request as ready for review May 5, 2026 23:39

soapy1 moved this from In Progress 🏗️ to In review 🔍 in conda Roadmap and Sprint Planning May 5, 2026

soapy1 requested a review from dholth May 20, 2026 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metadata only stage to repodata generation#284

Add metadata only stage to repodata generation#284
soapy1 wants to merge 23 commits into
conda:mainfrom
soapy1:cache-keep-around

soapy1 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

soapy1 Apr 17, 2026

Uh oh!

dholth Apr 17, 2026 •

edited

Loading

Uh oh!

soapy1 May 5, 2026

Uh oh!

Uh oh!

Uh oh!

dholth Apr 22, 2026

Uh oh!

soapy1 May 5, 2026

Uh oh!

dholth commented Apr 23, 2026

Uh oh!

soapy1 commented Apr 27, 2026

Uh oh!

Uh oh!

soapy1 May 5, 2026

Uh oh!

soapy1 May 5, 2026

Uh oh!

dholth commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

soapy1 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Other noteable changes

Checklist - did you ...

Uh oh!

Uh oh!

soapy1 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

dholth Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soapy1 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dholth Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

soapy1 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

dholth commented Apr 23, 2026

Uh oh!

soapy1 commented Apr 27, 2026

Uh oh!

Uh oh!

soapy1 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

soapy1 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

dholth commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

soapy1 commented Apr 16, 2026 •

edited

Loading

dholth Apr 17, 2026 •

edited

Loading