Autoscaling Feature spec by Reshrahim · Pull Request #86 · radius-project/design-notes

Reshrahim · 2025-03-05T21:19:06Z

No description provided.

Signed-off-by: Reshma Abdul Rahim <reshmarahim.abdul@microsoft.com>

willtsai · 2025-03-05T21:43:52Z

architecture/2025-02-application-autoscaling.md

+
+#### Option 2: Adding it as part extensions
+
+`extensions` have been the way to provide punch through capabilities which do not change the behaviors of the resource. The following is an example of adding autoscaling as part of the extensions. 


minor correction: runtimes is the punch through

willtsai · 2025-03-05T21:45:32Z

architecture/2025-02-application-autoscaling.md

+Cons: not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? needs a platform specific discriminator
+May be, this is not a developer problem and should be handled by the platform operator. 
+
+#### Option 3: Adding it as top level common property


I think Option 3 makes the most sense IF hpa and/or http auto-scaling and their corresponding properties are common across container platforms.

willtsai · 2025-03-05T21:46:56Z

architecture/2025-02-application-autoscaling.md

+
+Cons:same as above, not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? May be, this is not a developer problem and should be handled by the platform operator.
+
+### Configuring Autoscaling in Radius Environment


Would autoscaling policies specified in the container override the autoscaling policies set for the environment?

willtsai · 2025-03-05T21:50:10Z

architecture/2025-02-application-autoscaling.md

+1. The platform engineer can configure the scaling policies once and all the applications deployed in the environment will inherit the scaling policies.
+1. Developer can focus on the application modelling and not worry about the scaling policies. If needed the developer can override the scaling policies in the application resource definition. Overriding scenario could be for example, if the developer wants to scale based on some events from the Kafka or RabbitMQ modelled in the application definition 
+
+### Configuring Autoscaling as a core resource type


I don't follow this completely - I see Radius resources as objects it needs to provision or create at deploy time, but autoscaling is not an object, but rather a configuration. I think this kind of pattern might be better handled through bicep params instead?

zachcasper · 2025-03-05T21:39:28Z

architecture/2025-02-application-autoscaling.md

+
+## Overview
+
+Autoscaling is a critical capability for modern cloud applications, enabling them to dynamically adjust resources based on demand. Platform engineers and developers need the ability to optimize the resource utilization when deploying applications using Radius across different runtime environments. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.


Suggested change

Autoscaling is a critical capability for modern cloud applications, enabling them to dynamically adjust resources based on demand. Platform engineers and developers need the ability to optimize the resource utilization when deploying applications using Radius across different runtime environments. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.

Autoscaling is a critical capability required for cloud-native applications to provide a performant experience for its users. Platform engineers and developers must collaborate together to ensure the application behaves well as it scales and and scales down, and that the platform maximizes utilization of the available computing resources. This collaboration requires that both developers and platform engineers work together—it is not possible for only one persona to successfully configure autoscaling without the other. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.

zachcasper · 2025-03-05T21:41:44Z

architecture/2025-02-application-autoscaling.md

+The following are the most common autoscaling mechanisms available in the cloud-native ecosystem. 
+
+**Kubernetes**
+1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.


Suggested change

1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.

1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on resource metrics (CPU and memory utilization) or custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.

zachcasper · 2025-03-05T21:42:32Z

architecture/2025-02-application-autoscaling.md

+
+**Kubernetes**
+1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.
+2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests of the pods. This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.


Suggested change

2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests of the pods. This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.

2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests (the minumum) of the pods up to the limit (the maximum). This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.

zachcasper · 2025-03-05T21:44:39Z

architecture/2025-02-application-autoscaling.md

+3. **KEDA** - Kubernetes Event-driven Autoscaling (KEDA) is an open-source component that enables autoscaling of Kubernetes workloads based on external metrics. KEDA operates on top of the HPA and triggers scaling based on metrics from various sources, such as message queues, databases, or observability platforms. 
+
+**Serverless Container platforms**
+1. **Azure Container Instances and Apps** - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.


Suggested change

1. **Azure Container Instances and Apps** - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.

1. **Azure Container Instances** - Azure Container Instances today only has the ability to manual scale using the `desiredCount` property on the NGroup. When autoscaling is available for NGroups, this section will be updated.

2. **Azure Container Apps** – Azure Container Apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.

zachcasper · 2025-03-05T21:45:23Z

architecture/2025-02-application-autoscaling.md

+
+**Serverless Container platforms**
+1. **Azure Container Instances and Apps** - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.
+2. **AWS Fargate and App Runner** - AWS Fargate provides autoscaling based on CPU, memory and cloud watch metrics. App runner provides autoscaling based on HTTP traffic 


This should be ECS not Fargate. And ignore AppRunner. Usage is almost nil.

zachcasper · 2025-03-06T16:49:14Z

architecture/2025-02-application-autoscaling.md

+
+A developer specifies a simple HPA autoscaling configuration in the container resource definition. Below are the various options to configure autoscaling in the container resource type.
+
+#### Option 1: Adding autoscaling in the compute configuration


For consistency with the manualScaling extension, let's assume option 2. I don't see any reason we would do option 1 or 3. I also think having these options here takes away from your broader options. The options we should be debating aren't these, it's:

Option 1 – Autoscaling specified as part of the environment configuration with applications able to opt in via properties in the container resource definition

Option 2 – Autoscaler as a resource type analogous to a gateway which enables platform engineers delegate to developers via RBAC

zachcasper · 2025-03-06T16:53:20Z

architecture/2025-02-application-autoscaling.md

+1. More configuration to manage across different runtimes.
+2. The developer needs to know the underlying platform and the autoscaling policies required for the platform.
+
+#### Option 2: Adding it as part extensions


I don't see this as a viable approach. If the maxReplicas is hard coded in the application definition, there would be no difference based on the environment. That doesn't meet the requirements. If that's the case, we're wasting our time considering this as an option. Is there some way to make this environment-responsive? I don't see how.

zachcasper · 2025-03-06T16:57:12Z

architecture/2025-02-application-autoscaling.md

+Pros:
+1. Autoscaling is an infrastructure problem, the platform engineer has the ability to configure the autoscaling policies in the environment configuration thus separating the concerns of the platform engineer and the developer. 
+1. Dev, test environments wouldn't require autoscaling policies and the platform engineer has more flexibility to configure the scaling policies based on the environment.
+1. The platform engineer can configure the scaling policies once and all the applications deployed in the environment will inherit the scaling policies.


This is a bad thing. There must be a way for developers to opt their container into autoscaling, and there must be a method for operations engineers to tune autoscaling on a per container basis.

zachcasper · 2025-03-06T16:59:55Z

architecture/2025-02-application-autoscaling.md

+1. The platform engineer can configure the scaling policies once and all the applications deployed in the environment will inherit the scaling policies.
+1. Developer can focus on the application modelling and not worry about the scaling policies. If needed the developer can override the scaling policies in the application resource definition. Overriding scenario could be for example, if the developer wants to scale based on some events from the Kafka or RabbitMQ modelled in the application definition 
+
+### Configuring Autoscaling as a core resource type


The biggest advantage here is the ability to delegate via RBAC, that's missing from your pros.

But I still don't quite see how to make this environment-specific.

zachcasper · 2025-03-06T18:21:32Z

architecture/2025-02-application-autoscaling.md

@@ -0,0 +1,239 @@
+# Application Autoscaling
+
+## Overview


It may be better to list user stories. Here's a first cut:

As a platform engineer, I need to enable autoscaling in an environment. I want to my developers to be able to opt-in one of their container services to autoscaling and specify the containers scaling metric.

As a developer, I need to configure autoscaling for my container. Some of my containers use resource metrics and others use a custom metric provided by my application.

As an operations engineer, I need to tune each container to maximize application performance and resource utilization.

github-actions · 2025-06-05T18:21:22Z

This pull request has been automatically marked as stale because it has been inactive for 90 days. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2025-06-13T18:06:49Z

This pull request has been closed due to inactivity. Feel free to reopen if you are still working on it.

Reshrahim added 2 commits March 5, 2025 14:43

Autoscaling draft

0fc32e4

Signed-off-by: Reshma Abdul Rahim <reshmarahim.abdul@microsoft.com>

update date

173a053

Signed-off-by: Reshma Abdul Rahim <reshmarahim.abdul@microsoft.com>

willtsai reviewed Mar 5, 2025

View reviewed changes

zachcasper reviewed Mar 6, 2025

View reviewed changes

github-actions bot added the Stale label Jun 5, 2025

github-actions bot closed this Jun 13, 2025


		#### Option 2: Adding it as part extensions

		`extensions` have been the way to provide punch through capabilities which do not change the behaviors of the resource. The following is an example of adding autoscaling as part of the extensions.


		Cons:same as above, not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? May be, this is not a developer problem and should be handled by the platform operator.

		### Configuring Autoscaling in Radius Environment


		## Overview

		Autoscaling is a critical capability for modern cloud applications, enabling them to dynamically adjust resources based on demand. Platform engineers and developers need the ability to optimize the resource utilization when deploying applications using Radius across different runtime environments. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.

	1. Horizontal Pod Autoscaler (HPA) - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.
	1. Horizontal Pod Autoscaler (HPA) - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on resource metrics (CPU and memory utilization) or custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.

	2. Vertical Pod Autoscaler (VPA) - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests of the pods. This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.
	2. Vertical Pod Autoscaler (VPA) - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests (the minumum) of the pods up to the limit (the maximum). This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.

	1. Azure Container Instances and Apps - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.
	1. Azure Container Instances - Azure Container Instances today only has the ability to manual scale using the `desiredCount` property on the NGroup. When autoscaling is available for NGroups, this section will be updated.
	2. Azure Container Apps – Azure Container Apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.


		A developer specifies a simple HPA autoscaling configuration in the container resource definition. Below are the various options to configure autoscaling in the container resource type.

		#### Option 1: Adding autoscaling in the compute configuration

Comments

Conversation

Reshrahim commented Mar 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants