-
Notifications
You must be signed in to change notification settings - Fork 19
Autoscaling Feature spec #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,239 @@ | ||||||||
| # Application Autoscaling | ||||||||
|
|
||||||||
| ## Overview | ||||||||
|
|
||||||||
| Autoscaling is a critical capability for modern cloud applications, enabling them to dynamically adjust resources based on demand. Platform engineers and developers need the ability to optimize the resource utilization when deploying applications using Radius across different runtime environments. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
|
|
||||||||
| ## Autoscaling in the cloud-native ecosystem | ||||||||
|
|
||||||||
| The following are the most common autoscaling mechanisms available in the cloud-native ecosystem. | ||||||||
|
|
||||||||
| **Kubernetes** | ||||||||
| 1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| 2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests of the pods. This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| 3. **KEDA** - Kubernetes Event-driven Autoscaling (KEDA) is an open-source component that enables autoscaling of Kubernetes workloads based on external metrics. KEDA operates on top of the HPA and triggers scaling based on metrics from various sources, such as message queues, databases, or observability platforms. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd move this to #2 since it's more popular than VPA. |
||||||||
|
|
||||||||
| **Serverless Container platforms** | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| 1. **Azure Container Instances and Apps** - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
| 2. **AWS Fargate and App Runner** - AWS Fargate provides autoscaling based on CPU, memory and cloud watch metrics. App runner provides autoscaling based on HTTP traffic | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be ECS not Fargate. And ignore AppRunner. Usage is almost nil. |
||||||||
| 3. **Google Cloud Run** - Google Cloud Run provides autoscaling based on HTTP traffic. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you say HTTP traffic, do you mean HTTP requests per second? |
||||||||
|
|
||||||||
| **Serverless Functions** | ||||||||
| Azure Functions, AWS Lambda and Google Cloud Functions provide autoscaling based on the number of incoming requests and other event-driven triggers. They also have options to have some number of warm instances to reduce cold starts. | ||||||||
|
|
||||||||
| ## Opportunity for Radius | ||||||||
|
|
||||||||
| The main opportunity for Radius is to provide a simple abstraction to configure autoscaling policies enabling `separation of concerns` for the platform engineering and application teams while leveraging the autoscaling mechanisms available in the underlying runtimes. The platform operator should have an ability to configure the autoscaling policies in the environment configuration and the developer should be able to inherit or override the autoscaling policies in the application resource definition. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not use the term runtime. A container runtime is a specific thing: containerd, Docker Engine, CRI-O, Podman. You are referring to a container platform. I know Radius uses this term, but it is incorrect and we should not propagate the usage.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd recommend using consistent persona names. Application teams → developers. Platform operator → platform engineer. Same for application resource definition → application definition.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We seem to be making some broad assumptions with this statement. I'm not convinced. Without strong user feedback, I do not believe this is a requirement. |
||||||||
|
|
||||||||
| ## Goals | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The two goals below do not resonate with me. In my head it's something along the lines of:
|
||||||||
|
|
||||||||
| 1. **Runtime Agnostic Scaling** - Enable a runtime agnostic way to configure autoscaling policies for applications deployed using Radius. | ||||||||
|
|
||||||||
| 2. **Unified Scaling Model** - Enable a unified and a consistent scaling policy model that works with different autoscaling mechanisms | ||||||||
|
|
||||||||
| ## Out of Scope | ||||||||
|
|
||||||||
| This might be out of scope for the initial release but can be considered for future releases. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you saying KEDA is out of scope for this document? I would encourage you to think beyond just the first release and document the end-to-end feature spec. Then prioritize what is implemented in what order. It's fine to say we'll do KEDA in a future feature spec, but are we sure we shouldn't be doing KEDA first? |
||||||||
|
|
||||||||
| **KEDA Integration** - Integrate with KEDA to enable event-driven autoscaling for applications deployed using Radius. | ||||||||
|
|
||||||||
| ## Personas | ||||||||
|
|
||||||||
| 1. Platform engineer - The platform engineer is responsible for setting up the environment where applications are deployed and ensuring that the applications are running optimally. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think in a lot of my comments I even refer to platform engineers when I really mean operations engineers. |
||||||||
|
|
||||||||
| 2. Application developer - The application developer is responsible for building and deploying applications using Radius. The developer should be able to leverage the autoscaling policies configured by the platform engineer and override them based on the workloads. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See previous comment about overriding rules |
||||||||
|
|
||||||||
| ## User experience | ||||||||
|
|
||||||||
| ### Configuring Autoscaling in container resource type | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check casing on headings. Follow either sentence casing for heading such as "User experience" and "Configure autoscaling in container resource type" or title casing such as "User Experience" and "Configure Autoscaling in Container Resource Type". Check doc for consistency. |
||||||||
|
|
||||||||
| A developer specifies a simple HPA autoscaling configuration in the container resource definition. Below are the various options to configure autoscaling in the container resource type. | ||||||||
|
|
||||||||
| #### Option 1: Adding autoscaling in the compute configuration | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For consistency with the manualScaling extension, let's assume option 2. I don't see any reason we would do option 1 or 3. I also think having these options here takes away from your broader options. The options we should be debating aren't these, it's:
|
||||||||
|
|
||||||||
| This enables to configure compute specific autoscaling policies. For e.g. : If the platform is Kubernetes, the default HPA config can be added to the runtime configuration and for serverless platform, the default HTTP based autoscaling config can be added to the runtime configuration. | ||||||||
|
|
||||||||
| Eg : Kubernetes HPA config | ||||||||
|
|
||||||||
| ```bicep | ||||||||
| resource container 'Applications.Containers@2023-10-01-preview' = { | ||||||||
| name: 'myapp' | ||||||||
| properties: { | ||||||||
| image: 'myregistry.azurecr.io/myapp:latest' | ||||||||
| ports: { | ||||||||
| } | ||||||||
| runtimes: { | ||||||||
| kubernetes: { | ||||||||
| autoscaling: { | ||||||||
| minReplicas: 1 | ||||||||
| maxReplicas: 10 | ||||||||
| cpuUtilization: 50 | ||||||||
| memoryLimit: 50 | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Eg : ACA HTTP based autoscaling config | ||||||||
|
|
||||||||
| ```bicep | ||||||||
| resource container 'Applications.Containers@2023-10-01-preview' = { | ||||||||
| name: 'myapp' | ||||||||
| properties: { | ||||||||
| image: 'myregistry.azurecr.io/myapp:latest' | ||||||||
| ports: { | ||||||||
| } | ||||||||
| runtimes: { | ||||||||
| serverless-platforms: { | ||||||||
| autoscaling: { | ||||||||
| minReplicas: 1 | ||||||||
| maxReplicas: 10 | ||||||||
| http-concurrency: 50 | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Pros: | ||||||||
| 1. Easy to understand and configure. | ||||||||
| 1. Autoscale config is run-time specific and doesn't have to be standardized across runtimes as different platforms have different autoscaling mechanisms. | ||||||||
| 1. Enterprises pick platforms based on their application workloads and the underlying platform provides default autoscaling policies based on the workloads. Radius provides a clear abstraction to leverage default autoscaling mechanisms from the underlying platform. | ||||||||
|
|
||||||||
| Cons: | ||||||||
| 1. More configuration to manage across different runtimes. | ||||||||
| 2. The developer needs to know the underlying platform and the autoscaling policies required for the platform. | ||||||||
|
|
||||||||
| #### Option 2: Adding it as part extensions | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see this as a viable approach. If the maxReplicas is hard coded in the application definition, there would be no difference based on the environment. That doesn't meet the requirements. If that's the case, we're wasting our time considering this as an option. Is there some way to make this environment-responsive? I don't see how. |
||||||||
|
|
||||||||
| `extensions` have been the way to provide punch through capabilities which do not change the behaviors of the resource. The following is an example of adding autoscaling as part of the extensions. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor correction: |
||||||||
|
|
||||||||
| ```bicep | ||||||||
| resource container 'Applications.Containers@2023-10-01-preview' = { | ||||||||
| name: 'myapp' | ||||||||
| properties: { | ||||||||
| image: 'myregistry.azurecr.io/myapp:latest' | ||||||||
| extensions: [ | ||||||||
| { | ||||||||
| kind: 'autoscaling' | ||||||||
| minReplicas: 1 | ||||||||
| maxReplicas: 10 | ||||||||
| cpuUtilization: 50 | ||||||||
| memoryLimit: 50 | ||||||||
| httpConcurrency: 50 | ||||||||
| } | ||||||||
| } | ||||||||
| ] | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Pros: consistent with existing manual scaling config | ||||||||
|
|
||||||||
| Cons: not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? needs a platform specific discriminator | ||||||||
| May be, this is not a developer problem and should be handled by the platform operator. | ||||||||
|
|
||||||||
| #### Option 3: Adding it as top level common property | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Option 3 makes the most sense IF hpa and/or http auto-scaling and their corresponding properties are common across container platforms. |
||||||||
|
|
||||||||
| ```bicep | ||||||||
| resource container 'Applications.Containers@2023-10-01-preview' = { | ||||||||
| name: 'myapp' | ||||||||
| properties: { | ||||||||
| image: 'myregistry.azurecr.io/myapp:latest' | ||||||||
| autoscaling: { | ||||||||
| minReplicas: 1 | ||||||||
| maxReplicas: 10 | ||||||||
| hpa: { | ||||||||
| cpuUtilization: 50 | ||||||||
| memoryLimit: 50 | ||||||||
| } | ||||||||
| http: { | ||||||||
| httpConcurrency: 50 | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Cons:same as above, not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? May be, this is not a developer problem and should be handled by the platform operator. | ||||||||
|
|
||||||||
| ### Configuring Autoscaling in Radius Environment | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would autoscaling policies specified in the container override the autoscaling policies set for the environment? |
||||||||
|
|
||||||||
| A platform engineer or the operator sets up the Radius environment for the applications to be deployed. Today Radius environment contains the configurations for the container runtime, identity provider and the secrets store. Since scaling is a runtime concern, the platform operator would also need an ability to configure the scaling policies in the environment configuration. | ||||||||
|
|
||||||||
| ```bicep | ||||||||
| resource environment 'Applications.Core/environments@2023-10-01-preview' = { | ||||||||
| name: 'myenv' | ||||||||
| properties: { | ||||||||
| compute: { | ||||||||
| kind: 'kubernetes' | ||||||||
| namespace: 'default' | ||||||||
| identity: { | ||||||||
| kind: 'azure.com.workload' | ||||||||
| oidcIssuer: oidcIssuer | ||||||||
| } | ||||||||
| autoscaling: { | ||||||||
| minReplicas: 1 | ||||||||
| maxReplicas: 10 | ||||||||
| cpuUtilization: 50 | ||||||||
| memoryLimit: 50 | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Pros: | ||||||||
| 1. Autoscaling is an infrastructure problem, the platform engineer has the ability to configure the autoscaling policies in the environment configuration thus separating the concerns of the platform engineer and the developer. | ||||||||
| 1. Dev, test environments wouldn't require autoscaling policies and the platform engineer has more flexibility to configure the scaling policies based on the environment. | ||||||||
| 1. The platform engineer can configure the scaling policies once and all the applications deployed in the environment will inherit the scaling policies. | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bad thing. There must be a way for developers to opt their container into autoscaling, and there must be a method for operations engineers to tune autoscaling on a per container basis. |
||||||||
| 1. Developer can focus on the application modelling and not worry about the scaling policies. If needed the developer can override the scaling policies in the application resource definition. Overriding scenario could be for example, if the developer wants to scale based on some events from the Kafka or RabbitMQ modelled in the application definition | ||||||||
|
|
||||||||
| ### Configuring Autoscaling as a core resource type | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow this completely - I see Radius resources as objects it needs to provision or create at deploy time, but autoscaling is not an object, but rather a configuration. I think this kind of pattern might be better handled through bicep params instead?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The biggest advantage here is the ability to delegate via RBAC, that's missing from your pros. But I still don't quite see how to make this environment-specific. |
||||||||
|
|
||||||||
| Since Autoscaling is a core functionality of any underlying runtime, it can also be abstracted as a core resource type. The following is an example of configuring autoscaling as a core resource type. | ||||||||
|
|
||||||||
| ```bicep | ||||||||
| resource autoscaling 'Applications.Autoscaling@2023-10-01-preview' = { | ||||||||
| name: 'myapp' | ||||||||
| environment: 'myenv' | ||||||||
| properties: { | ||||||||
| minReplicas: 1 | ||||||||
| maxReplicas: 10 | ||||||||
| hpa: { | ||||||||
| cpuUtilization: 50 | ||||||||
| memoryLimit: 50 | ||||||||
| } | ||||||||
| http: { | ||||||||
| httpConcurrency: 50 | ||||||||
| } | ||||||||
| keda: { | ||||||||
| queue: rabbitmq.QueueName | ||||||||
| queueLength: 10 | ||||||||
| } | ||||||||
| } | ||||||||
| } | ||||||||
|
|
||||||||
| resource rabbitmq 'Applications.Messaging/rabbitmqQueues@2023-10-01-preview' = { | ||||||||
| name: 'rabbitmq' | ||||||||
| properties: { | ||||||||
| queueName: 'myqueue' | ||||||||
| .... | ||||||||
| } | ||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Pros: | ||||||||
| 1. Provides a declarative way to manage the autoscaling policies across different runtimes. Consistent with how Azure, AWS and GCP provides a declarative way to manage the autoscaling config for their compute services. | ||||||||
| 1. Decouples the autoscaling configuration from the container resource type. Makes it easier to manage updates to autoscaling policies without having to update the container resource type. | ||||||||
| 1. Gives flexibility as the resource type can be modelled within an environment or an application catering to different enterprise needs. Platform engineers can also control if their developers need to manage the autoscaling policies. | ||||||||
| 1. For advanced autoscaling using KEDA, the developer can declaratively connect to the messaging service (RabbitMQ, Kafka) and configure the autoscaling policies in the application resource definition. | ||||||||
|
|
||||||||
| Cons: | ||||||||
| 1. Another core resource type to manage for Radius | ||||||||
|
|
||||||||
| ## Feature Summary | ||||||||
| < tbd> | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to list user stories. Here's a first cut: