Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial make auto cache plugin #2912

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dansola
Copy link
Contributor

@dansola dansola commented Nov 7, 2024

Why are the changes needed?

Make caching easier to use in flytekit by reducing cognitive burden of specifying cache versions

What changes were proposed in this pull request?

AutoCache protocol added to flytekit with a salt argument and a get_version method. The task function is passed to get_version in the task decorator to determine a cache version. Then the new cache version follows the same path a user created cache version would.

auto_cache plugin created which will contain implementations of AutoCache. The first one is CacheFunctionBody which just checks the function body and ignores formatting and comments.

How was this patch tested?

Unit tests added which verify versions are consistent and change when we expect. Since changes to the function name will cause a different hash, we move dummy functions to a separate directory and import them so we can keep the name the same but test that the hash changes with the contents change.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@dansola
Copy link
Contributor Author

dansola commented Nov 7, 2024

@eapolinario @cosmicBboy Here is a draft PR doing roughly what we talked about. Would be great to get your opinions before I implement more hashing methods. A couple questions:

  1. Is flytekit/core/task.py the right place to hash the task function body? i.e in the decorator right before we pass the cache version to TaskMetadata.
  2. Do we still like the idea of using protocols and having a different implantation for different things like the function body, imports, imagespec, external packages, etc.? Right now cache will take a list[AutoCache] and we combine the hashes (we can hash them later so the string isn't long), but is the most user friendly as compared to some dataclass config that has a series of boolean arguments for example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant