-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Danielsola/se 187 experiment with auto cache versioning v2 #2750
base: master
Are you sure you want to change the base?
Danielsola/se 187 experiment with auto cache versioning v2 #2750
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any overhead when hashing all the modules? If so, I think we need
- Move
hash_module_recursive
,hash_set
, and other methods out from AutoCache, and a@lru_cache
for each function, so we won't hash same module twice if there are multiple tasks in the workflow. - only calculate the hash at compilation time.
If I install flytekit from a local directory (pip install -e .
). will it generate a new hash whenever I change the flytekit code?
imported_module = importlib.import_module(alias.name) | ||
imported_modules.append(imported_module) | ||
|
||
def visit_ImportFrom(self, node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def visit_ImportFrom(self, node): | |
def visit_import_from(self, node): |
from flytekit.auto_cache.auto_cache import AutoCacheConfig | ||
|
||
|
||
@task(auto_cache_config=AutoCacheConfig(check_task=True, check_modules=True, check_packages=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a default AutoCacheConfig to the task and set all the parameters to true, so users don't need to specify it by default. However, I'm not sure if it's too aggressive.
This is pretty cool. There's enough interest in a feature like this that we have at least 3 different PRs tackling this problem at different levels (See the linked PRs in the original issue). @dansola , do you want to take a stab at writing an RFC before deciding on one approach? |
Why are the changes needed?
Automatic cache versioning might make UX easier in that cache versions don't need to be bumped manually each time a task is changed. Forgetting to manually change the cache version could result in confusing behavior as Flyte will hit the cache and not execute new code.
This code was thrown together to prototype the idea - its a bit messy and could be optimized a lot. Would like to get thoughts and opinions from the community here. There're a lot of considerations including non-python dependencies but this might be a starting point for scoping out this feature.
What changes were proposed in this pull request?
Added a new argument to the task decorator with is a
AutoCacheConfig
:This let's you determine what is checked when automatically generating a cache version for a task.
check_task
-> this just checks the code contents of the task ignoring any comments or formatting changescheck_packages
-> this traverses the imports in the task and checks the package version of any externally imported packagescheck_modules
-> this traverses the imports in the task and checks the contents of any modules that were imported and are defined in the current repoimage_spec
-> this checks the tag of an imagespecHow was this patch tested?
Test case has this structure:
Some example of cache version given different conditions:
auto_cache_config = AutoCacheConfig()
:No matter the change to any module, the cache version doesn't change.
auto_cache_config = AutoCacheConfig(check_task=True)
:No matter the change to
something_to_import.py
orsomething_to_import_2.py
the config version doesn't change. The following changes to the task affect the cache version:config version =
346f4b191fa0471e397496636b1b5df04cbbb3332eb9847edc9aa5b435883109
config version =
346f4b191fa0471e397496636b1b5df04cbbb3332eb9847edc9aa5b435883109
(same as original)config version =
5b6d0622517b6d68bd162bc49ac0eeede688579eaa8b1c5e9230e7d06d14ad73
(different than original)The above behavior will be consistent no matter the other arguments to
AutoCacheConfig
.auto_cache_config = AutoCacheConfig(check_modules=True)
:Changes to something_to_import.py and something_to_import_2.py will affect cache version (formatting does have an affect right now).
config version =
30ad245f8785b25188f303a199d5155c552ab4b08274c8f9dc858dab12f04513
config version =
e07532a55732b01cd71821d6731c5596660d88ce40ecb68505a17c8c5fc30275
(different than original)config version =
d6747baae014c66c14d320e8d02d374409668df25da97df79c144f5687a055b6
(different than original)The above behavior will be consistent no matter the other arguments to
AutoCacheConfig
.auto_cache_config = AutoCacheConfig(check_packages=True)
:Changes to imported packages affect cache version:
config version =
0fc6300d289f6cfbc66179dc07c48fe5ede106f9e29455a83733892e2b7981cc
config version =
e3d453686c0ea4f60a1839a026f4613b5946f8518ec8e5c8ac7c219d727bc369
(different than original)config version =
0fc6300d289f6cfbc66179dc07c48fe5ede106f9e29455a83733892e2b7981cc
(different than original)The above behavior will be consistent no matter the other arguments to
AutoCacheConfig
.auto_cache_config = AutoCacheConfig(image_spec=True)
:config version =
04f01bb5e35b47eee2a928c125f1ccb8ab65ca17a61859b38b64e5bfde17e823
config version =
336a800c4d1c16726b6e1de10c28e60264d1c58d587302c3a5f2b918da53f3b0
(different than original)The above behavior will be consistent no matter the other arguments to
AutoCacheConfig
.See the
auto_cache_examples
dir for the examples used here.Screenshots
Check all the applicable boxes
Related PRs
Docs link