Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions docs/proposals/20240807-in-place-updates-implementation-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# In-place updates in Cluster API - Implementations notes

This document is an collection of notes about implementation details for the in-place update proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document is an collection of notes about implementation details for the in-place update proposal.
This document is a collection of notes about implementation details for the in-place update proposal.


As soon as the implementation will be completed, some of the notes in this document will be moved back
into the proposal or moved to the user facing documentation about this feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
into the proposal or moved to the user facing documentation about this feature.
into the proposal or into the user-facing documentation for this feature.


## Notes about in-place update implementation for machine deployments

- In place is always considered as potentially disruptive
- in place must respect maxUnavailable
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
- if maxUnavailable is zero, a new machine must be created first, then as soon as there is a “buffer” for in-place, the in-place update can proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update can proceed

Just to avoid interpretation of "done" as "finished".

- when in-place is possible, the system should try to in-place update as many machines as possible.
- maxSurge is not fully used (it is used only for scale up by one if maxUnavailable =0)

- No in-place updates are performed when using rollout strategy on delete.

- The implementation respects the existing set of responsibilities of each controller
- MD controller manages MS
- MD controller enforces maxUnavailable, maxSurge
- MD controller decides when to scale up newMS, when to scale down oldMS
- When there is a decision to scale down, MD controller should check if this can be done via in-place vs delete/recreate. If in-place is possible:
- Old MS will be informed to move machines to the newMS, and newMS will be informed it will receive machines from oldMS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The implementation respects the existing set of responsibilities of each controller
- MD controller manages MS
- MD controller enforces maxUnavailable, maxSurge
- MD controller decides when to scale up newMS, when to scale down oldMS
- When there is a decision to scale down, MD controller should check if this can be done via in-place vs delete/recreate. If in-place is possible:
- Old MS will be informed to move machines to the newMS, and newMS will be informed it will receive machines from oldMS.
- The implementation respects the existing set of responsibilities of each controller:
- MD controller manages MS:
- MD controller enforces maxUnavailable, maxSurge
- MD controller decides when to scale up newMS, when to scale down oldMS
- When scaling down, the MD controller checks whether the operation can be performed in-place instead of delete/recreate. If in-place is possible:
- Old MS is instructed to move machines to the newMS, and newMS is informed to receive machines from oldMS.

- MS controller manages a subset of Machines
- When scaling down the old MS, if required to move, MS controller is responsible for moving a Machine to newMS
- MS controller, when reconciling the newMS, will take over the moved machine and start the actual in-place upgrade operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- MS controller, when reconciling the newMS, will take over the moved machine and start the actual in-place upgrade operation
- When reconciling the new MachineSet, the MS controller takes ownership of the moved machine and begins the actual in-place upgrade.


- Orchestration of in-place upgrades between MD controller, MS controller, and Machine controller is implemented using annotations.
Following schemas provide a overview of how new annotation are used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Following schemas provide a overview of how new annotation are used
Following schemas provide an overview of how new annotations are used.


Workflow #1: MD controller detects an in-place update is possible and it informs oldMS and newMS about how to perform this operation

```mermaid
sequenceDiagram
autonumber
participant MD Controller
participant RX
participant MS1 (OldMS)
participant MS2 (NewMS)
MD Controller-->>+RX: Can you update in-place from MS1 (OldMS) to MD2 (NewMS)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MD Controller-->>+RX: Can you update in-place from MS1 (OldMS) to MD2 (NewMS)?
MD Controller-->>+RX: Can you update in-place from MS1 (OldMS) to MS2 (NewMS)?

RX-->>-MD Controller: Yes!
MD Controller->>MS1 (OldMS): Apply annotation ".../move-machines-to-machineset": "MS2"
MD Controller->>MS2 (NewMS): Apply annotation ".../receive-machines-from-machinesets": "MS1"
```

Workflow #2: MS controller, when reconciling oldMS, move machines to the newMS

```mermaid
sequenceDiagram
autonumber
participant MS Controller as MS Controller<br/>when reconciling<br/>MS1 (OldMS)
participant MS1 (OldMS)
participant MS2 (NewMS)
participant M1 as M1<br/>controlled by<br/>MS1 (OldMS),<br/>selected to be moved to MS2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
participant M1 as M1<br/>controlled by<br/>MS1 (OldMS),<br/>selected to be moved to MS2
participant M1 as M1<br/>controlled by<br/>MS1 (OldMS),<br/>selected to be moved to MS2 (NewMS)

MS Controller-->>MS1 (OldMS): Are you scaling down?
MS1 (OldMS)-->>MS Controller: Yes!
MS Controller-->>MS1 (OldMS): Do you have the ".../move-machines-to-machineset" annotation?
MS1 (OldMS)-->>MS Controller: Yes, I'm instructed to move machines to MS2!
MS Controller-->>MS2 (NewMS): Do you have ".../receive-machines-from-machinesets" annotation?
MS2 (NewMS)-->>MS Controller: Yes, I'm instructed to receive machines MS1!
MS Controller->>M1: Move M1 to MS2 (NewMS)<br/>Apply annotation ".../pending-acknowledge-move": ""<br/>Apply annotation ".../update-in-progress": ""
```

Workflow #3: MD controller recongnizes the newMS being moved to the newMS and it scales up newMS to acknowledge the operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workflow #3: MD controller recongnizes the newMS being moved to the newMS and it scales up newMS to acknowledge the operation
Workflow #3: MD controller recognizes that a Machine has been moved to the new MachineSet and scales up the new MachineSet to acknowledge the operation.


```mermaid
sequenceDiagram
autonumber
participant MD Controller
participant MS2 (NewMS)
participant M1 as M1<br/>now controlled by<br/>MS2 (NewMS)
MD Controller-->>M1: Are you pending acknowledge?
M1-->>MD Controller: Yes!
MD Controller->>MS2 (NewMS): Scale up to acknowledge M1<br/>Apply annotation ".../acknowledged-move": "M1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MD Controller->>MS2 (NewMS): Scale up to acknowledge M1<br/>Apply annotation ".../acknowledged-move": "M1"
MD Controller->>MS2 (NewMS): Scale up to acknowledge receipt of M1<br/>Apply annotation ".../acknowledged-move": "M1"

```

Workflow #4: MS controller, when reconciling newMS, detects a machine has been acknoledged; it cleanups annotation on the machine and this unblocks the in-place upgrade to start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workflow #4: MS controller, when reconciling newMS, detects a machine has been acknoledged; it cleanups annotation on the machine and this unblocks the in-place upgrade to start
Workflow #4: MS controller, when reconciling newMS, detects that a machine has been acknowledged; it cleans up annotations on the machine, allowing the in-place upgrade to begin.


```mermaid
sequenceDiagram
autonumber
participant MS Controller as MS Controller<br/>when reconciling<br/>MS2 (NewMS)
participant MS2 (NewMS)
participant M1 as M1<br/>now controller by<br/>MS2 (NewMS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
participant M1 as M1<br/>now controller by<br/>MS2 (NewMS)
participant M1 as M1<br/>now controlled by<br/>MS2 (NewMS)

MS Controller-->>MS2 (NewMS): Is there some newly acknowledged replicas?
MS2 (NewMS)-->>MS Controller: Yes, M1!
MS Controller->>M1: Remove annotation ".../pending-acknowledge-move": ""
```