Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Datacatalog cache deletion #4655

Open
wants to merge 69 commits into
base: master
Choose a base branch
from

Conversation

pvditt
Copy link
Contributor

@pvditt pvditt commented Dec 30, 2023

Tracking issue

#2867

Why are the changes needed?

Flyte doesn't support the deletion of cached task executions. Adding cache eviction/cleanup gives Flyte users more control over their data helping with things such as GDPR compliance + reducing their cache size.

Opted to support just single task executions as invalidating an entire workflow's cache can run into timeout issues going down the nesting of nodes.

What changes were proposed in this pull request?

  • add flyteadmin endpoint to support the deletion of single task executions
  • add catalog endpoint to support the deletion of single task executions

How was this patch tested?

  1. unit tests

  2. Run workflow that caches task executions

grpcurl -plaintext -d '{"task_execution_id": {"node_execution_id": {"execution_id": {"project": "flytesnacks", "domain": "development", "name": "fbcf25e4303104b4b81e"}, "node_id": "n0"}, "task_id": {"project": "flytesnacks", "domain": "development", "name": "basics.cache_test.basic_cache", "version": "98XuDRSQW4hVWKYQdQfd1Q=="}, "retry_attempt": 0}}' localhost:8089 flyteidl.service.CacheService.EvictTaskExecutionCache

Setup process

        enum CatalogCacheStatus {
            CACHE_DISABLED = 0,
            CACHE_MISS = 1,
            CACHE_HIT = 2,
            CACHE_POPULATED = 3,
            CACHE_LOOKUP_FAILURE = 4,
            CACHE_PUT_FAILURE = 5,
            CACHE_SKIPPED = 6,
            CACHE_EVICTED = 7
        }

(NOTE - a follow up PR will be to get that change merged into console since console doesn't support CACHE_EVICT CatalogCacheStatus)

  • can also verify by checking artifact_data and artifacts tables

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Nick Müller and others added 26 commits December 15, 2022 19:17
Implemented new datacatalog functionality required for cache eviction
Updated to latest unreleased version of flyteidl and flyteplugins

Signed-off-by: Nick Müller <[email protected]>
Allows for fields to be explicity set/updated to nil

Signed-off-by: Nick Müller <[email protected]>
Allows for re-use by cache manager

Signed-off-by: Nick Müller <[email protected]>
Added endpoint for evicting execution cache
Added endpoint for evicting task execution cache

Signed-off-by: Nick Müller <[email protected]>
Extended reservation retrieval to allow querying via artifact tag in catalog client interface

Signed-off-by: Nick Müller <[email protected]>
Added method to delete catalog artifact by ID

Signed-off-by: Nick Müller <[email protected]>
…epropeller and flytestdlib

Signed-off-by: Nick Müller <[email protected]>
Added new CacheEvictionError message representing an error encountered during eviction of stored data
Added new UpdateTaskExecution endpoint for updating task executions, currently only supporting cache eviction

Signed-off-by: Nick Müller <[email protected]>
Ran go mod tidy

Signed-off-by: Nick Müller <[email protected]>
grpc-gateway parsing of URL params does not work for joined endpoint at the moment - fixed in major version upgrade
Added extra CacheEvictionErrorCode enum entries

Signed-off-by: Nick Müller <[email protected]>
…artifacts to datacatalog

Signed-off-by: Nick Müller <[email protected]>
Implement deleting of artifacts as bulk operation

Signed-off-by: Nick Müller <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Copy link

codecov bot commented Dec 30, 2023

Codecov Report

Attention: Patch coverage is 67.56098% with 133 lines in your changes are missing coverage. Please review.

Project coverage is 59.07%. Comparing base (e07084c) to head (9e322a8).

Files Patch % Lines
flyteadmin/pkg/manager/impl/cache_manager.go 68.18% 65 Missing and 5 partials ⚠️
datacatalog/pkg/repositories/gormimpl/artifact.go 75.92% 9 Missing and 4 partials ⚠️
flytestdlib/catalog/client.go 0.00% 11 Missing ⚠️
flytestdlib/catalog/datacatalog/transformer.go 0.00% 10 Missing ⚠️
flytestdlib/catalog/noop_catalog.go 0.00% 10 Missing ⚠️
...atacatalog/pkg/manager/impl/reservation_manager.go 0.00% 5 Missing and 2 partials ⚠️
datacatalog/pkg/manager/impl/artifact_manager.go 88.00% 5 Missing and 1 partial ⚠️
flytestdlib/catalog/datacatalog/datacatalog.go 87.23% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4655      +/-   ##
==========================================
+ Coverage   59.00%   59.07%   +0.07%     
==========================================
  Files         645      647       +2     
  Lines       55578    55972     +394     
==========================================
+ Hits        32792    33065     +273     
- Misses      20194    20301     +107     
- Partials     2592     2606      +14     
Flag Coverage Δ
unittests 59.07% <67.56%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…on-past-executions' into feature/datacatalog-cache-deletion
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
@pvditt pvditt mentioned this pull request Jan 9, 2024
3 tasks
@pvditt pvditt marked this pull request as ready for review January 31, 2024 20:07
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jan 31, 2024
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets cleanup imports and resolve merge conflicts and then merge!

flyteidl/protos/flyteidl/service/cache.proto Outdated Show resolved Hide resolved
flytepropeller/pkg/controller/nodes/array/handler_test.go Outdated Show resolved Hide resolved
flytepropeller/pkg/controller/nodes/executor_test.go Outdated Show resolved Hide resolved
flytepropeller/pkg/controller/workflow/executor_test.go Outdated Show resolved Hide resolved
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
Signed-off-by: Paul Dittamo <[email protected]>
@pvditt pvditt requested a review from hamersaw February 23, 2024 04:43
@pvditt
Copy link
Contributor Author

pvditt commented Mar 12, 2024

put up change for console: flyteorg/flyteconsole#851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants