-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Spec: Serverless Container Runtimes #77
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
|
||
**Risk: platform specific features that cannot be implemented in Radius** - There might be platform-specific features that cannot (or should not) be implemented in Radius. For example, Kubernetes has features for taints and tolerations that are not common across compute platforms and thus should not be implemented in Radius. This risk can be mitigated by providing mechanisms to punch-through the Radius abstraction and use platform-specific features directly, like the [Kubernetes customization options](https://docs.radapp.io/guides/author-apps/containers/overview/#kubernetes) currently available in Radius. | ||
|
||
**Risk: Differing platforms between Radius control plane and application runtime** - There is a need to deploy and maintain a Kubernetes cluster to host the Radius control plane is needed even if the application is completely serverless and this might not work for some customers (e.g. those who absolutely do not want to use Kubernetes). For now we will accept this risk and consider alternative hosting platforms for the Radius control plane to be out of scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we run Radius on just Docker? I believe Dapr does that with dapr init
vs dapr init -k
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It requires a lot of code changes but would that be a good investment? 🧐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytimocin Do you mean running Radius on Docker within an ACI compute cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean running Radius on containers (https://docs.dapr.io/getting-started/install-dapr-selfhost/) so that the user wouldn't need a Kubernetes cluster to do a deployment of Serverless resources.
Co-authored-by: Zach Casper <[email protected]> Signed-off-by: Will <[email protected]>
Co-authored-by: Zach Casper <[email protected]> Signed-off-by: Will <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Co-authored-by: Zach Casper <[email protected]> Signed-off-by: Will <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Nice document 💯
|
||
> Note: We assume that the `sku: 'Confidential'` property for ACI (and other comparable properties across platforms like ECS) is scoped to the individual container and not the entire application or environment. We will need to validate this assumption as a part of the implementation. | ||
|
||
> Note: To make the app definition portable, we must allow several `runtimes` to be declared per container and these should all be optional for cases where the user wants to punch through the Radius abstraction. If they declare a `runtimes` property that doesn't match the targeted deployment environment's compute, we should simply ignore that property for that deployment. If they haven't declared any `runtimes` property that match the compute of the targeted deployment environment, then we should deploy their container assuming that no `runtimes` property was provided and thus no "punch-through" behavior will be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also mean that an application could specify a punch through that is not used when deployed to a different environment? For example, could a user specify a K8S punch through and still deploy the app to ACI, in which case the K8S punch through would be ignored? If yes, then I agree. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a good future feature would be the ability to add additional platform-specific properties in downstream environments, similar to what a GitOps workflow would allow. For example, the dev specifies a K8S property and the operations person overrides that in production with a different value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, could a user specify a K8S punch through and still deploy the app to ACI, in which case the K8S punch through would be ignored?
Yes, this is exactly what I'm proposing, glad you agree :)
For example, the dev specifies a K8S property and the operations person overrides that in production with a different value.
For this one, are you saying that we should add the ability to apply universal container configurations at the environment level that would override the container configurations defined within individual applications? If so, I can see a use case for this but I wonder if that's something RBAC (when that feature is available) can take care of? For example, the sku
of the container compute is specifiable in the environment and then RBAC rules prevent developers from overriding sku
configurations from within their application definitions.
|
||
After successful deployment, the user can view the serverless containers in the CLI using the `rad app graph` command or via the Application Graph in the Radius Dashboard. | ||
|
||
## Key investments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a key investment for Radius managing external environments? This could be a separately deployable feature that has value on its own, but it is definitely required for ACI support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note in the risks section (see Risk: Deploying to multiple clusters ) where I call out that we might have to partially implement (or at least take into consideration) the Inter-cluster app deployment and management feature. I hesitate to add this to the explicit set of features we plan to deliver as that itself is a separate (and possibly large) project that we probably don't want to fully implement as a subproject of serverless. In other words, we should only build enough functionality for Radius managing external environments that will satisfy our Serverless features. Let me know if you disagree and we can discuss further.
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
Signed-off-by: Will Tsai <[email protected]>
…into serverless-compute
|
||
### Step 1: Define and deploy a Radius Environment for serverless | ||
|
||
The user defines a new Radius Environment for serverless compute by creating a new Environment definition file (e.g. `env.bicep`) and specifying the necessary settings for the serverless platform before deploying the environment using `rad deploy env.bicep`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, rad cli can speak only to a radius in kubernetes.
We have to also capture the cli side changes that would be needed to speak with serverless. We might have to add on config.yaml we have today to capture the kind of radius instance. "kind" and "context" should support the new feature.
workspaces:
default: dev
items:
default:
connection:
context: kind-kind
kind: kubernetes
environment: /planes/radius/local/resourceGroups/default/providers/Applications.Core/environments/default
scope: /planes/radius/local/resourceGroups/default
+ kind: 'aci' | ||
+ // This is currently required, but may be made optional if | ||
+ // it doesn't apply to all compute platforms | ||
+ namespace: 'default' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean all tasks ( whatever is equivalent to pod) would be always deployed in default namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
<!-- At the most basic level, what are we trying to accomplish? --> | ||
1. Genericize the Radius platform to support deployment of applications on serverless (and other) compute platforms beyond Kubernetes. | ||
1. Enable developers to deploy and manage serverless containers in a way that is consistent with the existing Radius environment and application model. | ||
1. Make all Radius features (Recipes, Connections, App Graph, UDT, etc.) available to engineers building applications on Radius-enabled supported serverless platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we saying that 100% of Radius features that are available on Kubernetes will also be available on serverless? Should we add a caveat to say that all serverless platforms may not support all Radius features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should add note for "failing gracefully" on cases where something is not supported on the platform (e.g. Dapr)
|
||
**Assumption**: As a result of the implementation, the refactoring done to the core Radius codebase to support Serverless container runtimes should also allow for easier extensibility of Radius to support other new platforms going forward. In other words, the work done for Serverless should be genericized within the Radius code and not resemble custom implementations for Kubernetes and select Serverless platforms. | ||
|
||
**Assumption**: If there was an app that the platform team wanted to run on serverless (e.g. ECS, ACI) and another on Kubernetes (e.g. EKS, AKS) they would configure two different environments. Semantically, an environment is a single place, so we should not have multiple places (i.e. different container platforms) within a single environment. We think that this assumption is reasonable given multiple compute targets would require a cross container compute service discovery which users are unlikely to have the desire in setting up. We'll validate this assumption with feedback from potential users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think that it would be difficult for a Radius instance to manage multiple environments at the same time? If the deployments specify which environment to deploy to, would the internal support of multiple environments be difficult?
+ // all the other container properties should also be implemented (e.g. `env`, `readinessProbe`, `livenessProbe`, etc.) | ||
} | ||
extensions: [ | ||
+ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we need to specify a runtime under extensions
, or are we planning to match element names? If we are matching element names, we could run into naming conflicts where the same name means different things on different platforms.
|
||
**Risk: Significant refactoring work anticipated** - There will be a significant amount of work to refactor the Radius code as a part of implementation, especially since the current codebase is heavily Kubernetes-centric. While this is an unavoidable risk, we will ensure that the refactoring work is done with future extensibility in mind so that it becomes easier to add support for additional compute platforms in the future. | ||
|
||
## Key assumptions to test and questions to answer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an assumption that Radius support of GitOps will not be affected by whether the platform is serverless or Kubernetes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GitOps integrations that Radius has (Flux) or will (ArgoCD) implmented is kubernetes-specific, which means deployment to ACI via gitops should work as long as Radius control plane is run in Kubernetes
### Non-goals (out of scope) | ||
<!-- What are we explicitly not trying to accomplish? --> | ||
1. Hosting the Radius control plane on serverless (and other) compute platforms outside of Kubernetes. This is a separate project tracked in the roadmap: https://github.com/radius-project/roadmap/issues/39. | ||
1. The ability to run the Radius control plane separately from the compute cluster to which it is deploying applications. This is a separate project tracked in the roadmap: https://github.com/radius-project/roadmap/issues/42. However, given that the initial requirement is for a Kubernetes-hosted Radius control plane to deploy applications to a serverless compute platform, we might need to partially implement or at least take into consideration the separation of the Radius control plane cluster from the target deployment cluster as a part of serverless implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are kind of achieving this with this feature actually, right? Deploying to Serverless from a Radius that is running on Kubernetes.
> Must be sure to include serverless platform specific customizations via a `runtimes` property, similar to how Kubernetes patching was implemented for [containers](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/#runtimes). | ||
|
||
Allow for Radius abstraction of a [container](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/) resource that can be deployed to serverless compute platforms. Must be sure to include serverless platform specific configurations via [connections](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/#connections), [extensions](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/#extensions), and routing to other resources via [gateways](https://docs.radapp.io/reference/resource-schema/core-schema/gateway/). | ||
> One future idea to explore is whether we can we build extensibility via UDT and Recipes - i.e. allow Recipes to deploy predefined Container types, which themselves can be serverless containers. In other words, developers can deploy a UDT container resource using a Recipe such that all the user has to do when declaring a container is specifying the bare minimum (e.g. name, application) and let the predefined Recipe handle the rest. This will require the implementation of UDTs (e.g. defining specific types of containers as individual resource types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require us to add something like UDE (User-Defined Engine) since we will need the necessary engine (or driver) for that runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then the UDT can choose the driver that it needs to use.
|
||
**Risk: platform specific features that cannot be implemented in Radius** - There might be platform-specific features that cannot (or should not) be implemented in Radius. For example, Kubernetes has features for taints and tolerations that are not common across compute platforms and thus should not be implemented in Radius. This risk can be mitigated by providing mechanisms to punch-through the Radius abstraction and use platform-specific features directly, like the [Kubernetes customization options](https://docs.radapp.io/guides/author-apps/containers/overview/#kubernetes) currently available in Radius. | ||
|
||
**Risk: Differing platforms between Radius control plane and application runtime** - There is a need to deploy and maintain a Kubernetes cluster to host the Radius control plane is needed even if the application is completely serverless and this might not work for some customers (e.g. those who absolutely do not want to use Kubernetes). For now we will accept this risk and consider alternative hosting platforms for the Radius control plane to be out of scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean running Radius on containers (https://docs.dapr.io/getting-started/install-dapr-selfhost/) so that the user wouldn't need a Kubernetes cluster to do a deployment of Serverless resources.
|
||
## Topic Summary | ||
<!-- A paragraph or two to summarize the topic area. Just define it in summary form so we all know what it is. --> | ||
Given the importance of serverless infrastructure in the modern application landscape, it is a priority for Radius to expand beyond Kubernetes and support additional container platforms with lower operational overhead. The initial expansion will focus on support for Azure Container Instances, then AWS Elastic Container Service including AWS Fargate. This will be followed by more feature rich platforms including Azure Container Apps and eventually Google CloudRun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial expansion will focus on support for Azure Container Instances, then AWS Elastic Container Service including AWS Fargate. This will be followed by more feature rich platforms including Azure Container Apps and eventually Google CloudRun.
Just curious - how did we decide the priority of different platforms?
> Must be sure to include serverless platform specific customizations via a `runtimes` property, similar to how Kubernetes patching was implemented for [containers](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/#runtimes). | ||
|
||
Allow for Radius abstraction of a [container](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/) resource that can be deployed to serverless compute platforms. Must be sure to include serverless platform specific configurations via [connections](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/#connections), [extensions](https://docs.radapp.io/reference/resource-schema/core-schema/container-schema/#extensions), and routing to other resources via [gateways](https://docs.radapp.io/reference/resource-schema/core-schema/gateway/). | ||
> One future idea to explore is whether we can we build extensibility via UDT and Recipes - i.e. allow Recipes to deploy predefined Container types, which themselves can be serverless containers. In other words, developers can deploy a UDT container resource using a Recipe such that all the user has to do when declaring a container is specifying the bare minimum (e.g. name, application) and let the predefined Recipe handle the rest. This will require the implementation of UDTs (e.g. defining specific types of containers as individual resource types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered directly building in recipe support natively into Radius Container resource rather than through UDT?
|
||
### Scenario 2: Define and create core resources (i.e. Gateway, Secret Store, Extender) for use in serverless containers | ||
<!-- One or two sentence summary --> | ||
Enable the ability to define and create core Radius resources (i.e. Gateway, Secret Store, Extender) for use in serverless containers. Radius would leverage the solution available on the hosting platform to create and manage these resources. For example, if the serverless compute platform has a built-in secret store, Radius would use that instead of creating its own. This would allow for a consistent experience across different serverless compute platforms while still leveraging the unique features of each platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extender
is a very open ended resource type, so I don't think it fits the category this scenario captures.
``` | ||
|
||
> Note: For the Kubernetes implementation, Radius installs and configures Contour as the default ingress controller to enable Gateway features. For serverless platforms, we will need to evaluate the default ingress/gateway for each platform and implement it in a way that allows for flexibility in the future to support other ingress/gateway options. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we consider recipes for gateways so that based on environment, a gateway thats compatible could be deployed?
+ // one provider per environment | ||
+ providers: { | ||
+ // if the compute platform for this environment is ACI, then the provider must be Azure | ||
+ azure: { | ||
+ scope: '/subscriptions/.../resourceGroups/myrg' | ||
+ } | ||
+ // if the compute platform for this environment is ECS, then the provider must be AWS | ||
+ aws: { | ||
+ scope: '/planes/aws/aws/accounts/${account}/regions/${region}' | ||
+ } | ||
+ } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd get user feedback on this. There may not be a strong case for multiple compute types in one environment, but storage across platforms might be a valid scenario for some. That being said, this is not a one way door so we can iteratively add multi provider support from here.
} | ||
``` | ||
|
||
> Note: The `providers` property in the environment is necessary to specify the scope of the environment and is different from the provider credentials that were registered using `rad init`. The provider credentials are used to authenticate with the underlying compute provider, while the `providers` environment property is used to specify the scope of the environment. Thus, we might consider renaming the Environment `providers` property to avoid confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to restrict credentials to map 1-1 with the compute platform as well?
name: 'secrets' | ||
properties:{ | ||
application: application | ||
type: 'generic' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this type be updated as well to point to the compute platform or the specific secret manager being targeted?
Signed-off-by: Will Tsai <[email protected]>
add feature spec for Serverless Container Runtimes