Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for locking kernels #10

Merged
merged 12 commits into from
Jan 21, 2025
Merged

Add support for locking kernels #10

merged 12 commits into from
Jan 21, 2025

Conversation

danieldk
Copy link
Member

@danieldk danieldk commented Jan 20, 2025

  • hf-kernels lock . locks the kernels specified in a project's pyproject.toml. Building a package with the setuptools build backend, which is the default for pyproject.toml, will add the lock file to the package's metadata. The a kernel can then be downloaded and loaded at the locked version using get_locked_kernel.
  • hf-kernels download . downloads the locked kernels to the HF cache directory. The kernel can then be loaded using load_kernel (which is a small wrapper for get_locked_kernel).

PR uploaded as hf-kernels 0.1.1 to PyPI for testing.

In the future, we want to be able to specify versions in pyproject.toml (which then get locked), but that's for a later PR.

This change allows Python projects that use kernels to lock the
kernel revisions on a project-basis. For this to work, the user
only has to include `hf-kernels` as a build dependency. During
the build, a lock file is written to the package's pkg-info.
During runtime we can read it out and use the corresponding
revision. When the kernel is not locked, the revision that is provided
as an argument is used.
if locked_sha is None:
raise ValueError(f"Kernel `{repo_id}` is not locked")

package_name, package_path = install_kernel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_kernel is supposed to ignore entirely the download. This reintroduces it.

hf_hub_downlod(..., local_files_only=True) is the only public API I found to IGNORE downloading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't have the model cached and use load_kernel, I get the following error:

 % python -c 'import kernel_test; kernel_test.run()'
[...]
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/daniel/.cache/huggingface/hub/models--kernels-community--activation/snapshots/a71853ecbdd899526f9810cc558ee24081a6302e/build/torch25-cxx98-cu124-x8
6_64-linux/activation/__init__.py'

The error could be better, but it doesn't seem to download?

I did forget to pass through local_files_only to get_metadata. Pushing a fix for that now. Then it fails even earlier:

 % python -c 'import kernel_test; kernel_test.run()'
[...]
huggingface_hub.errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is let's keep the actual code local_files_only separate. Otherwise it's super easy to screw up and reintroduce the internet connection. (Better yet if we could sidestep that bad API altogether).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, made separate again.


file_locks = []
for sibling in r.siblings:
if sibling.rfilename.startswith("build/torch"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we filter by version too here ? (Maybe subsequent PR ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lockfile should contain all build variants, because we don't know what Torch/CUDA version a downstream user will have.

Or did you mean kernel version? If that, one particular commit is only supposed to have one version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I meant kernel version/version range.

return

lock_path = cwd / "hf-kernels.lock"
if not lock_path.exists():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that always true ?

How can the lockfile exists before it's created ?

Copy link
Member Author

@danieldk danieldk Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could happen in an editable install:

  • Create the lockfile.
  • Do an editable install. (the lockfile gets written into the package info)
  • Remove the lockfile.
  • Do an editable install.

Though I still need to check whether this is the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add, this checks the existence of the lock file in the project's source directory, which should exist prior to running the install if you want to lock the versions.

@danieldk danieldk merged commit 544354c into main Jan 21, 2025
3 checks passed
@danieldk danieldk deleted the kernels-lock branch January 21, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants