Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Spec: Radius Control Plane Upgrades #84

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions tools/2025-02-control-plane-upgrades-feature-spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Topic: Radius Control Plane Upgrades

* **Author**: Will Tsai (@WillTsai)

## Topic Summary
The Radius control plane upgrades feature aims to provide a seamless and efficient process for upgrading the control plane components of the Radius project. This feature will ensure that users can easily upgrade their Radius control plane without downtime, leveraging best practices from similar projects like Dapr, Crossplane, and Istio.

### Top level goals
- Enable zero-downtime upgrades for the Radius control plane.
- Provide clear and concise documentation for the upgrade process.
- Ensure compatibility with existing Radius deployments and configurations.

### Non-goals (out of scope)
- Upgrading data plane components.
- Handling custom user configurations beyond the default settings.

## User profile and challenges
The primary users of this feature are DevOps engineers and system administrators responsible for managing Radius deployments. They face challenges in maintaining high availability and minimizing downtime during control plane upgrades.

### User persona(s)
- DevOps engineers at medium to large enterprises.
- System administrators managing Kubernetes clusters with Radius deployments.

### Challenge(s) faced by the user
Users experience pain points related to downtime and complexity during control plane upgrades. Current offerings may not provide a seamless upgrade process, leading to potential disruptions in service.

### Positive user outcome
By delivering this feature, users will be able to upgrade the Radius control plane with minimal disruption, ensuring high availability and reliability of their deployments.

## Key scenarios
### Scenario 1: Zero-downtime upgrade
Users can perform a control plane upgrade without any downtime, ensuring continuous availability of their Radius deployment.

### Scenario 2: Rollback capability
Users can easily rollback to a previous version of the control plane in case of any issues during the upgrade process.

### Scenario 3: Compatibility checks
The upgrade process includes compatibility checks to ensure that the new version is compatible with existing configurations and deployments.

## Key dependencies and risks
- **Kubernetes cluster** – The upgrade process relies on a functioning Kubernetes cluster.
- **Helm** – The upgrade process uses Helm for managing the control plane components.
- **Risk: Incompatible configurations** – Mitigation plan: Implement compatibility checks and provide clear documentation for handling custom configurations.

## Key assumptions to test and questions to answer
- Assumption: Users have a basic understanding of Kubernetes and Helm.
- Question: How can we ensure compatibility with custom user configurations?
- Plan: Conduct user research and gather feedback during the beta testing phase.

## Current state
The current state of the Radius control plane upgrade process is manual and may involve downtime. There is no standardized process for performing upgrades.

## Details of user problem
When I try to upgrade the Radius control plane, I face challenges related to downtime and complexity. The current process is manual and error-prone, leading to potential disruptions in service. These issues result in increased operational overhead and reduced reliability of my Radius deployment.

## Desired user experience outcome
After this scenario is implemented, I can perform control plane upgrades seamlessly and without downtime. As a result, I can maintain high availability and reliability of my Radius deployment, reducing operational overhead and ensuring a smooth upgrade process.

### Detailed user experience
1. User initiates the upgrade process using Helm.
2. The system performs compatibility checks to ensure the new version is compatible with existing configurations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two most important things to check are:

  • If there is a DB migration and, if so, apply that (not sure how this would work backwards as in downgrades)
  • If there is a change in one of the APIs that could break the resources of the user in production

3. The control plane components are upgraded without downtime.
4. User verifies the upgrade and confirms successful completion.

## Key investments
### Feature 1
Zero-downtime upgrade capability for the Radius control plane.

### Feature 2
Rollback capability to revert to a previous version in case of issues.

### Feature 3
Compatibility checks to ensure seamless upgrades with existing configurations.
Loading