Skip to content

Conversation

@fabriziopandini
Copy link
Member

What this PR does / why we need it:
Start collecting some notes about the ongoing work for in-place

Which issue(s) this PR fixes:
Fixes #

Part of #12291

/area documentation

@k8s-ci-robot k8s-ci-robot added the area/documentation Issues or PRs related to documentation label Oct 21, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 21, 2025
Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall!
Only left a couple of non-blocking comments for typos and minor text/diagram wording.

@@ -0,0 +1,88 @@
# In-place updates in Cluster API - Implementations notes

This document is an collection of notes about implementation details for the in-place update proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document is an collection of notes about implementation details for the in-place update proposal.
This document is a collection of notes about implementation details for the in-place update proposal.

MS Controller->>M1: Move M1 to MS2 (NewMS)<br/>Apply annotation ".../pending-acknowledge-move": ""<br/>Apply annotation ".../update-in-progress": ""
```

Workflow #3: MD controller recongnizes the newMS being moved to the newMS and it scales up newMS to acknowledge the operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workflow #3: MD controller recongnizes the newMS being moved to the newMS and it scales up newMS to acknowledge the operation
Workflow #3: MD controller recognizes that a Machine has been moved to the new MachineSet and scales up the new MachineSet to acknowledge the operation.

autonumber
participant MS Controller as MS Controller<br/>when reconciling<br/>MS2 (NewMS)
participant MS2 (NewMS)
participant M1 as M1<br/>now controller by<br/>MS2 (NewMS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
participant M1 as M1<br/>now controller by<br/>MS2 (NewMS)
participant M1 as M1<br/>now controlled by<br/>MS2 (NewMS)

MD Controller->>MS2 (NewMS): Scale up to acknowledge M1<br/>Apply annotation ".../acknowledged-move": "M1"
```

Workflow #4: MS controller, when reconciling newMS, detects a machine has been acknoledged; it cleanups annotation on the machine and this unblocks the in-place upgrade to start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workflow #4: MS controller, when reconciling newMS, detects a machine has been acknoledged; it cleanups annotation on the machine and this unblocks the in-place upgrade to start
Workflow #4: MS controller, when reconciling newMS, detects that a machine has been acknowledged; it cleans up annotations on the machine, allowing the in-place upgrade to begin.

participant M1 as M1<br/>now controlled by<br/>MS2 (NewMS)
MD Controller-->>M1: Are you pending acknowledge?
M1-->>MD Controller: Yes!
MD Controller->>MS2 (NewMS): Scale up to acknowledge M1<br/>Apply annotation ".../acknowledged-move": "M1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MD Controller->>MS2 (NewMS): Scale up to acknowledge M1<br/>Apply annotation ".../acknowledged-move": "M1"
MD Controller->>MS2 (NewMS): Scale up to acknowledge receipt of M1<br/>Apply annotation ".../acknowledged-move": "M1"

- MS controller, when reconciling the newMS, will take over the moved machine and start the actual in-place upgrade operation

- Orchestration of in-place upgrades between MD controller, MS controller, and Machine controller is implemented using annotations.
Following schemas provide a overview of how new annotation are used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Following schemas provide a overview of how new annotation are used
Following schemas provide an overview of how new annotations are used.

- Old MS will be informed to move machines to the newMS, and newMS will be informed it will receive machines from oldMS.
- MS controller manages a subset of Machines
- When scaling down the old MS, if required to move, MS controller is responsible for moving a Machine to newMS
- MS controller, when reconciling the newMS, will take over the moved machine and start the actual in-place upgrade operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- MS controller, when reconciling the newMS, will take over the moved machine and start the actual in-place upgrade operation
- When reconciling the new MachineSet, the MS controller takes ownership of the moved machine and begins the actual in-place upgrade.

Comment on lines 18 to 23
- The implementation respects the existing set of responsibilities of each controller
- MD controller manages MS
- MD controller enforces maxUnavailable, maxSurge
- MD controller decides when to scale up newMS, when to scale down oldMS
- When there is a decision to scale down, MD controller should check if this can be done via in-place vs delete/recreate. If in-place is possible:
- Old MS will be informed to move machines to the newMS, and newMS will be informed it will receive machines from oldMS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The implementation respects the existing set of responsibilities of each controller
- MD controller manages MS
- MD controller enforces maxUnavailable, maxSurge
- MD controller decides when to scale up newMS, when to scale down oldMS
- When there is a decision to scale down, MD controller should check if this can be done via in-place vs delete/recreate. If in-place is possible:
- Old MS will be informed to move machines to the newMS, and newMS will be informed it will receive machines from oldMS.
- The implementation respects the existing set of responsibilities of each controller:
- MD controller manages MS:
- MD controller enforces maxUnavailable, maxSurge
- MD controller decides when to scale up newMS, when to scale down oldMS
- When scaling down, the MD controller checks whether the operation can be performed in-place instead of delete/recreate. If in-place is possible:
- Old MS is instructed to move machines to the newMS, and newMS is informed to receive machines from oldMS.


- In place is always considered as potentially disruptive
- in place must respect maxUnavailable
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
- if maxUnavailable is zero, a new machine must be created first, then as soon as there is a “buffer” for in-place, the in-place update can proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
- if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update can proceed

Just to avoid interpretation of "done" as "finished".

This document is an collection of notes about implementation details for the in-place update proposal.

As soon as the implementation will be completed, some of the notes in this document will be moved back
into the proposal or moved to the user facing documentation about this feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
into the proposal or moved to the user facing documentation about this feature.
into the proposal or into the user-facing documentation for this feature.

@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 21, 2025
@fabriziopandini fabriziopandini added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 24, 2025
@fabriziopandini
Copy link
Member Author

@lentzi90 @furkatgofurov7 thanks for feedback, everything should be addressed now

Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 72a05da489f4d135173f986fa6ad8a30c72a2f97

@sbueringer
Copy link
Member

/assign

participant RX
participant MS1 (OldMS)
participant MS2 (NewMS)
MD Controller-->>+RX: Can you update in-place from MS1 (OldMS) to MD2 (NewMS)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MD Controller-->>+RX: Can you update in-place from MS1 (OldMS) to MD2 (NewMS)?
MD Controller-->>+RX: Can you update in-place from MS1 (OldMS) to MS2 (NewMS)?

participant MS Controller as MS Controller<br/>when reconciling<br/>MS1 (OldMS)
participant MS1 (OldMS)
participant MS2 (NewMS)
participant M1 as M1<br/>controlled by<br/>MS1 (OldMS),<br/>selected to be moved to MS2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
participant M1 as M1<br/>controlled by<br/>MS1 (OldMS),<br/>selected to be moved to MS2
participant M1 as M1<br/>controlled by<br/>MS1 (OldMS),<br/>selected to be moved to MS2 (NewMS)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025
@sbueringer
Copy link
Member

Thx!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 912e9ef1456764b16e97aca8738a85e0a1f80af6

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit ee6cfc6 into kubernetes-sigs:main Oct 28, 2025
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants