[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5701

ddl-rliu · 2024-08-28T19:47:31Z

Motivation: Why do you think this is important?

Today, the flyteadmin pod is blocked from starting up until the OIDC provider is healthy and available (the pod gets stuck in Error state). In some Kubernetes configurations, this erroring-pod could cause deployment-wide issues. The current behavior could be made more resilient.

(Note that this applies to configurations using useAuth=true)

Goal: What should the final outcome look like, ideally?

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

Describe alternatives you've considered

A workaround is to disable Flyte until the OIDC provider is available.

Propose: Link/Inline OR Additional context

Proposed fix here: #5702

Are you sure this issue hasn't been raised already?

Yes

Have you read the Code of Conduct?

Yes

The text was updated successfully, but these errors were encountered:

runllm · 2024-08-28T19:47:46Z

The issue of FlyteAdmin being blocked from starting up until the OIDC provider is healthy is known. According to Flyte's documentation, FlyteAdmin requires an OIDC provider for authentication when useAuth=true. This can cause deployment issues if the OIDC provider is unavailable.

A proposed solution is to allow FlyteAdmin to start even if the OIDC provider is unavailable and re-initialize the OIDC provider later. This approach would improve resiliency and can be made configurable. Currently, a workaround is to disable Flyte until the OIDC provider is available.

Would you like more details on how to implement this or any other specific aspect?

Sources:

Flyte's documentation

This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

Sovietaced · 2024-10-30T05:27:42Z

A better approach in these configurations is to allow flyte to start up, even if the OIDC provider is unavailable. Then, try to re-initialize the OIDC provider later in the deployment lifespan. This is a more resilient approach, and it can be made configurable.

This is a double edged sword. If I recall correctly there was some work done recently to indefinitely cache something related to OIDC on boot. If the OIDC provider is down and you do a normal rolling deployment you could end up in a worse state since you previously had working pods but now you have broken ones.

ddl-rliu added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Aug 28, 2024

ddl-rliu mentioned this issue Aug 28, 2024

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5702

Open

3 tasks

eapolinario self-assigned this Sep 5, 2024

eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5701

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5701

ddl-rliu commented Aug 28, 2024 •

edited

Loading

runllm bot commented Aug 28, 2024

Sovietaced commented Oct 30, 2024

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5701

[Core feature] Allow flyteadmin to start even if OIDC is unavailable (Improve flyteadmin startup resiliency) #5701

Comments

ddl-rliu commented Aug 28, 2024 • edited Loading

Motivation: Why do you think this is important?

Goal: What should the final outcome look like, ideally?

Describe alternatives you've considered

Propose: Link/Inline OR Additional context

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

runllm bot commented Aug 28, 2024

Sovietaced commented Oct 30, 2024

ddl-rliu commented Aug 28, 2024 •

edited

Loading