Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
239 changes: 239 additions & 0 deletions architecture/2025-02-application-autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Application Autoscaling

## Overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to list user stories. Here's a first cut:

  1. As a platform engineer, I need to enable autoscaling in an environment. I want to my developers to be able to opt-in one of their container services to autoscaling and specify the containers scaling metric.
  2. As a developer, I need to configure autoscaling for my container. Some of my containers use resource metrics and others use a custom metric provided by my application.
  3. As an operations engineer, I need to tune each container to maximize application performance and resource utilization.


Autoscaling is a critical capability for modern cloud applications, enabling them to dynamically adjust resources based on demand. Platform engineers and developers need the ability to optimize the resource utilization when deploying applications using Radius across different runtime environments. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Autoscaling is a critical capability for modern cloud applications, enabling them to dynamically adjust resources based on demand. Platform engineers and developers need the ability to optimize the resource utilization when deploying applications using Radius across different runtime environments. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.
Autoscaling is a critical capability required for cloud-native applications to provide a performant experience for its users. Platform engineers and developers must collaborate together to ensure the application behaves well as it scales and and scales down, and that the platform maximizes utilization of the available computing resources. This collaboration requires that both developers and platform engineers work together—it is not possible for only one persona to successfully configure autoscaling without the other. This document outlines the design and the user experience for configuring autoscaling policies in Radius applications.


## Autoscaling in the cloud-native ecosystem

The following are the most common autoscaling mechanisms available in the cloud-native ecosystem.

**Kubernetes**
1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on observed CPU utilization, Memory and other custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.
1. **Horizontal Pod Autoscaler (HPA)** - Kubernetes native autoscaling mechanism that scales the number of pods in a deployment based on resource metrics (CPU and memory utilization) or custom metrics. This is the most common autoscaling mechanism used in the Kubernetes ecosystem.

2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests of the pods. This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests of the pods. This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.
2. **Vertical Pod Autoscaler (VPA)** - Kubernetes native autoscaling mechanism that automatically adjusts the CPU and memory requests (the minumum) of the pods up to the limit (the maximum). This is the least common autoscaling mechanism used in the Kubernetes ecosystem as it requires restarting the pods to apply the new resource requests.

3. **KEDA** - Kubernetes Event-driven Autoscaling (KEDA) is an open-source component that enables autoscaling of Kubernetes workloads based on external metrics. KEDA operates on top of the HPA and triggers scaling based on metrics from various sources, such as message queues, databases, or observability platforms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. **KEDA** - Kubernetes Event-driven Autoscaling (KEDA) is an open-source component that enables autoscaling of Kubernetes workloads based on external metrics. KEDA operates on top of the HPA and triggers scaling based on metrics from various sources, such as message queues, databases, or observability platforms.
3. **KEDA** - Kubernetes Event-driven Autoscaling (KEDA) is an open-source component that enables autoscaling of Kubernetes workloads based on external metrics. KEDA operates on top of the HPA and triggers scaling based on metrics from various sources, such as message queues, databases, or observability platforms. KEDA is emerging as the more popular autoscaling mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this to #2 since it's more popular than VPA.


**Serverless Container platforms**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Serverless Container platforms**
**Serverless Container Platforms**

1. **Azure Container Instances and Apps** - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Azure Container Instances and Apps** - Azure container instances doesn't provide an inbuilt solution to automatically scale. Container apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.
1. **Azure Container Instances** - Azure Container Instances today only has the ability to manual scale using the `desiredCount` property on the NGroup. When autoscaling is available for NGroups, this section will be updated.
2. **Azure Container Apps** – Azure Container Apps provide scaling based on HTTP traffic and other event-driven triggers (KEDA). For web apps, the preferred scaling mechanism is based on HTTP traffic. For event-driven workloads, the preferred scaling mechanism is based on KEDA.

2. **AWS Fargate and App Runner** - AWS Fargate provides autoscaling based on CPU, memory and cloud watch metrics. App runner provides autoscaling based on HTTP traffic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ECS not Fargate. And ignore AppRunner. Usage is almost nil.

3. **Google Cloud Run** - Google Cloud Run provides autoscaling based on HTTP traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say HTTP traffic, do you mean HTTP requests per second?


**Serverless Functions**
Azure Functions, AWS Lambda and Google Cloud Functions provide autoscaling based on the number of incoming requests and other event-driven triggers. They also have options to have some number of warm instances to reduce cold starts.

## Opportunity for Radius

The main opportunity for Radius is to provide a simple abstraction to configure autoscaling policies enabling `separation of concerns` for the platform engineering and application teams while leveraging the autoscaling mechanisms available in the underlying runtimes. The platform operator should have an ability to configure the autoscaling policies in the environment configuration and the developer should be able to inherit or override the autoscaling policies in the application resource definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use the term runtime. A container runtime is a specific thing: containerd, Docker Engine, CRI-O, Podman. You are referring to a container platform. I know Radius uses this term, but it is incorrect and we should not propagate the usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend using consistent persona names. Application teams → developers. Platform operator → platform engineer.

Same for application resource definition → application definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the developer should be able to inherit or override the autoscaling policies in the application resource definition.

We seem to be making some broad assumptions with this statement. I'm not convinced. Without strong user feedback, I do not believe this is a requirement.


## Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two goals below do not resonate with me. In my head it's something along the lines of:

  1. Establish a method for developers to express when their application expects to be autoscaled
  2. Enable platform engineers to configure environment-specific autoscaling behavior, taking input from the developers
  3. Enable platform engineers and SREs to tune autoscaling for a specific application
  4. Accomplish the above goals in a container platform agnostic manner


1. **Runtime Agnostic Scaling** - Enable a runtime agnostic way to configure autoscaling policies for applications deployed using Radius.

2. **Unified Scaling Model** - Enable a unified and a consistent scaling policy model that works with different autoscaling mechanisms

## Out of Scope

This might be out of scope for the initial release but can be considered for future releases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying KEDA is out of scope for this document? I would encourage you to think beyond just the first release and document the end-to-end feature spec. Then prioritize what is implemented in what order.

It's fine to say we'll do KEDA in a future feature spec, but are we sure we shouldn't be doing KEDA first?


**KEDA Integration** - Integrate with KEDA to enable event-driven autoscaling for applications deployed using Radius.

## Personas

1. Platform engineer - The platform engineer is responsible for setting up the environment where applications are deployed and ensuring that the applications are running optimally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Platform engineer - The platform engineer is responsible for setting up the environment where applications are deployed and ensuring that the applications are running optimally.
1. Platform engineer - The platform engineer is responsible for setting up the environment where applications are deployed and configure autoscaling rules for each environment.
2. SRE / operations engineer – The operations engineer is responsible for monitoring the application to ensure availability and that the application is performing as required. Once applications are deployed in production, they are responsible for tuning autoscaling behavior balancing performance and resource usage in real time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a lot of my comments I even refer to platform engineers when I really mean operations engineers.


2. Application developer - The application developer is responsible for building and deploying applications using Radius. The developer should be able to leverage the autoscaling policies configured by the platform engineer and override them based on the workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about overriding rules


## User experience

### Configuring Autoscaling in container resource type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check casing on headings. Follow either sentence casing for heading such as "User experience" and "Configure autoscaling in container resource type" or title casing such as "User Experience" and "Configure Autoscaling in Container Resource Type". Check doc for consistency.


A developer specifies a simple HPA autoscaling configuration in the container resource definition. Below are the various options to configure autoscaling in the container resource type.

#### Option 1: Adding autoscaling in the compute configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the manualScaling extension, let's assume option 2. I don't see any reason we would do option 1 or 3. I also think having these options here takes away from your broader options. The options we should be debating aren't these, it's:

  • Option 1 – Autoscaling specified as part of the environment configuration with applications able to opt in via properties in the container resource definition
  • Option 2 – Autoscaler as a resource type analogous to a gateway which enables platform engineers delegate to developers via RBAC


This enables to configure compute specific autoscaling policies. For e.g. : If the platform is Kubernetes, the default HPA config can be added to the runtime configuration and for serverless platform, the default HTTP based autoscaling config can be added to the runtime configuration.

Eg : Kubernetes HPA config

```bicep
resource container 'Applications.Containers@2023-10-01-preview' = {
name: 'myapp'
properties: {
image: 'myregistry.azurecr.io/myapp:latest'
ports: {
}
runtimes: {
kubernetes: {
autoscaling: {
minReplicas: 1
maxReplicas: 10
cpuUtilization: 50
memoryLimit: 50
}
}
}
}
}
```

Eg : ACA HTTP based autoscaling config

```bicep
resource container 'Applications.Containers@2023-10-01-preview' = {
name: 'myapp'
properties: {
image: 'myregistry.azurecr.io/myapp:latest'
ports: {
}
runtimes: {
serverless-platforms: {
autoscaling: {
minReplicas: 1
maxReplicas: 10
http-concurrency: 50
}
}
}
}
}
```

Pros:
1. Easy to understand and configure.
1. Autoscale config is run-time specific and doesn't have to be standardized across runtimes as different platforms have different autoscaling mechanisms.
1. Enterprises pick platforms based on their application workloads and the underlying platform provides default autoscaling policies based on the workloads. Radius provides a clear abstraction to leverage default autoscaling mechanisms from the underlying platform.

Cons:
1. More configuration to manage across different runtimes.
2. The developer needs to know the underlying platform and the autoscaling policies required for the platform.

#### Option 2: Adding it as part extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this as a viable approach. If the maxReplicas is hard coded in the application definition, there would be no difference based on the environment. That doesn't meet the requirements. If that's the case, we're wasting our time considering this as an option. Is there some way to make this environment-responsive? I don't see how.


`extensions` have been the way to provide punch through capabilities which do not change the behaviors of the resource. The following is an example of adding autoscaling as part of the extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor correction: runtimes is the punch through


```bicep
resource container 'Applications.Containers@2023-10-01-preview' = {
name: 'myapp'
properties: {
image: 'myregistry.azurecr.io/myapp:latest'
extensions: [
{
kind: 'autoscaling'
minReplicas: 1
maxReplicas: 10
cpuUtilization: 50
memoryLimit: 50
httpConcurrency: 50
}
}
]
}
}
```

Pros: consistent with existing manual scaling config

Cons: not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? needs a platform specific discriminator
May be, this is not a developer problem and should be handled by the platform operator.

#### Option 3: Adding it as top level common property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Option 3 makes the most sense IF hpa and/or http auto-scaling and their corresponding properties are common across container platforms.


```bicep
resource container 'Applications.Containers@2023-10-01-preview' = {
name: 'myapp'
properties: {
image: 'myregistry.azurecr.io/myapp:latest'
autoscaling: {
minReplicas: 1
maxReplicas: 10
hpa: {
cpuUtilization: 50
memoryLimit: 50
}
http: {
httpConcurrency: 50
}
}
}
}
```

Cons:same as above, not straightforward or intuitive as user doesn't know which autoscale config is required for the platform ? May be, this is not a developer problem and should be handled by the platform operator.

### Configuring Autoscaling in Radius Environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would autoscaling policies specified in the container override the autoscaling policies set for the environment?


A platform engineer or the operator sets up the Radius environment for the applications to be deployed. Today Radius environment contains the configurations for the container runtime, identity provider and the secrets store. Since scaling is a runtime concern, the platform operator would also need an ability to configure the scaling policies in the environment configuration.

```bicep
resource environment 'Applications.Core/environments@2023-10-01-preview' = {
name: 'myenv'
properties: {
compute: {
kind: 'kubernetes'
namespace: 'default'
identity: {
kind: 'azure.com.workload'
oidcIssuer: oidcIssuer
}
autoscaling: {
minReplicas: 1
maxReplicas: 10
cpuUtilization: 50
memoryLimit: 50
}
}
}
```

Pros:
1. Autoscaling is an infrastructure problem, the platform engineer has the ability to configure the autoscaling policies in the environment configuration thus separating the concerns of the platform engineer and the developer.
1. Dev, test environments wouldn't require autoscaling policies and the platform engineer has more flexibility to configure the scaling policies based on the environment.
1. The platform engineer can configure the scaling policies once and all the applications deployed in the environment will inherit the scaling policies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad thing. There must be a way for developers to opt their container into autoscaling, and there must be a method for operations engineers to tune autoscaling on a per container basis.

1. Developer can focus on the application modelling and not worry about the scaling policies. If needed the developer can override the scaling policies in the application resource definition. Overriding scenario could be for example, if the developer wants to scale based on some events from the Kafka or RabbitMQ modelled in the application definition

### Configuring Autoscaling as a core resource type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this completely - I see Radius resources as objects it needs to provision or create at deploy time, but autoscaling is not an object, but rather a configuration. I think this kind of pattern might be better handled through bicep params instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest advantage here is the ability to delegate via RBAC, that's missing from your pros.

But I still don't quite see how to make this environment-specific.


Since Autoscaling is a core functionality of any underlying runtime, it can also be abstracted as a core resource type. The following is an example of configuring autoscaling as a core resource type.

```bicep
resource autoscaling 'Applications.Autoscaling@2023-10-01-preview' = {
name: 'myapp'
environment: 'myenv'
properties: {
minReplicas: 1
maxReplicas: 10
hpa: {
cpuUtilization: 50
memoryLimit: 50
}
http: {
httpConcurrency: 50
}
keda: {
queue: rabbitmq.QueueName
queueLength: 10
}
}
}

resource rabbitmq 'Applications.Messaging/rabbitmqQueues@2023-10-01-preview' = {
name: 'rabbitmq'
properties: {
queueName: 'myqueue'
....
}
}
```

Pros:
1. Provides a declarative way to manage the autoscaling policies across different runtimes. Consistent with how Azure, AWS and GCP provides a declarative way to manage the autoscaling config for their compute services.
1. Decouples the autoscaling configuration from the container resource type. Makes it easier to manage updates to autoscaling policies without having to update the container resource type.
1. Gives flexibility as the resource type can be modelled within an environment or an application catering to different enterprise needs. Platform engineers can also control if their developers need to manage the autoscaling policies.
1. For advanced autoscaling using KEDA, the developer can declaratively connect to the messaging service (RabbitMQ, Kafka) and configure the autoscaling policies in the application resource definition.

Cons:
1. Another core resource type to manage for Radius

## Feature Summary
< tbd>

Loading