Add metadata only stage to repodata generation#284
Conversation
| self.created_at = now_dt.strftime("%Y-%m-%dT%H:%M:%SZ") | ||
|
|
||
| def cache_for_subdir(self, subdir): | ||
| def cache_for_subdir(self, subdir, stage: str | None = None): |
There was a problem hiding this comment.
A different approach might make the cache layer know about what all the desired upstream stages are instead of getting a cache by subdir and stage.
By moving the responsibility of keeping track of the upstream stages to the cache layer, we can avoid the merging logic below. This could be replaced by accounting for the multiple stages in the query phase.
There was a problem hiding this comment.
The database layer is the right place to merge stages. It could be accounted for in cache_kwargs when creating ChannelIndex, or in indexed_packages() when performing the query.
There was a problem hiding this comment.
Let's go for including this in the cache_kwargs. We'll define a new attribute include_stages which accepts a list of stages to consider in addition to upstream_stage.
e095641 to
cd05ea6
Compare
| WITH | ||
| fs AS | ||
| ( SELECT path, mtime, size, sha256, md5 FROM stat WHERE stage = :upstream_stage ), | ||
| ( SELECT path, mtime, size, sha256, md5 FROM stat WHERE stage IN ({stages_placeholders}) ), |
There was a problem hiding this comment.
We'll have to check what happens when path is in fs and md at the same time. We would get the same index_json twice, maybe, but it would also overwrite itself in the output dict.
There was a problem hiding this comment.
yes! Added a test in tests/test_index.py::test_index_noarch_with_wheels. This will put wheels and noarch conda packages in the same noarch subdir.
|
Do you think |
|
hmmm, I was thinking of |
d243c9f to
88fa138
Compare
| stat["size"], | ||
| stat["mtime"], | ||
| {}, | ||
| stat["repodata"], |
There was a problem hiding this comment.
This means that when a user is creating a function like listdir_like they need to also add this repodata key. That isn't very listdir_stat like.....
For example, something like
def listdir_like():
for path, repodata in wheels.items():
assert "sha256" in repodata
if "md5" not in repodata:
repodata["md5"] = None
yield {
"path": cache.database_path(path),
"size": repodata["size"],
"mtime": repodata.get("timestamp", 1),
"repodata": repodata,
}
This is maybe a bit too much of a hack. @dholth what do you think?
| "msgpack", | ||
| "psycopg2", | ||
| "ruamel.yaml", | ||
| "sqlalchemy", |
There was a problem hiding this comment.
hmmm, I expect these dependencies were not included for some reason. After fixing the psql cache import errors in the tests, I got new errors saying that these dependencies are missing. Is there a strong incentive to keep these out of the main package? An alternative is to add them as optional dependencies.
|
This program is written to work with "low dependencies", so it's necessary for |
5fb569e to
0a51a31
Compare
Description
This introduces a new upstream stage called
md. The existing stages in conda-index are:fs- means that the artifact is now available in the set of packages and is assumed by default to be the local filesystemindexed- means that the entry already exists in the database (same filename, same timestamp, same hash), and its package metadata has been extracted to the index_json etcThis pr adds
mdwhich is likefsin that it represents an aritfact that is now available. But, it is assumed to not be available on the local filesystem.mdis a sibiling to theindexedstage. This is helpful for indexing artifacts that are not able to be represented on the local filesystem. For example, to represent pypi packages.To include metadata sourced packages in repodata be use to include
include_stagesandpackage_extensionsto theChannelIndex.cache_kwargsduring it's instantiation. For example:And inject the metadata using the
store_md_statefunction:See
tests/test_demonstrate_wheel.pyfor how this changes the api for injecting wheel data into repodata.ref: conda/conda-pypi#276 (comment)
Other noteable changes
from conda_index.postgres.cache import PsqlCachetests/test_psql.py- looks like these tests have been getting skippedconda_index.conda_index.cache.BaseCondaIndexCache.database_pathto only prepend the database prefix if the prefix is not already includedChecklist - did you ...
newsdirectory (using the template) for the next release's release notes?