Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criticality Discussion/ Improvements #213

Closed
kfswain opened this issue Jan 21, 2025 · 2 comments
Closed

Criticality Discussion/ Improvements #213

kfswain opened this issue Jan 21, 2025 · 2 comments

Comments

@kfswain
Copy link
Collaborator

kfswain commented Jan 21, 2025

Criticality has been discussed quite a bit, Ex: here and here, as well as in the Inf-GW weekly meeting.

We recognize that this field may be imperfect, but without user feedback its currently difficult to iterate in the proper direction. So to centralize discussion, we are creating this issue.

@shaneutt
Copy link
Member

One of my questions is about how Criticality functions during the entire lifecycle of models:

In practice will operators have to update large groups of InferenceModels to accommodate for the criticality of new InferenceModels? For example let's say I have 60 models out there, 10 of them Critical, 30 Default and 20 Sheddable. I have 20 new InferenceModels to deploy which now need to be the only Critical ones. So... do I have to "downgrade" other existing models in order to achieve this? Is this example a reasonable one, and am I understanding correctly?

More essentially: Criticality is a specification of relationships across multiple resources to "rank" them, and any kind of cross-resource relationship in Kubernetes has the potential to introduce complexity. Do we see potential for ways (like the above) in which this could become painful for operators on clusters?

@nirrozenbaum
Copy link
Contributor

I agree with @shaneutt.
I was also trying to understand - what is this field trying to model?
I was expecting to see prioritization of requests according to the requesting user or something equivalent.
what is the use case where this field is used?

@kubernetes-sigs kubernetes-sigs locked and limited conversation to collaborators Feb 6, 2025
@kfswain kfswain converted this issue into discussion #297 Feb 6, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants