Use vtbackup in scheduled backups #658

frouioui · 2025-01-28T23:20:24Z

Context

This Pull Request modifies the scheduled backup feature that was introduced in #553. The implementation in the original PR is rather simple and was a "v1". The CRD VitessBackupSchedule was introduced, giving the option to end-users to configure different VitessBackupScheduleStrategy, which instruct vitess-operator which keyspace/shard the user wants to backup. All the strategies in a VitessBackupSchedule were merged together to form a single vtctldclient command that would be executed in a K8S's Job following the schedule set in the VitessBackupSchedule CRD.

While this approach works, it has limitation: it uses vtctldclient Backup which will temporarily take out a serving tablet from the serving tablet pool and use it to take the backup. While this limitation is documented in the release notes, we want to encourage people to use vtbackup as an alternative.

Proposed Change

This Pull Request modifies the VitessBackupSchedule controller to create one Kubernetes Job per VitessBackupScheduleStrategy defined the VitessBackupSchedule by the user - instead of merging all strategies into a single Job. Each Job will now create a vtbackup pod - instead of creating a pod where a vtctldclient command was executed.

Changes

Since vtbackup can only backup a single keyspace/shard, we need one Job per strategy, meaning that now each VitessBackupSchedule manages multiple Job resources (one per strategy), which was not the case before.
The previous code logic relied heavily on the CRD's Status to schedule next runs, this logic was removed to make Status purely informative to only the end users, not the vitess-operator. Moreover, the previous logic was updating the Status of the resource on, almost, every request without knowing if we actually need to update the Status. This was modified to only update the Status when needed and make it ignore conflicts.
The reconciling loop of VitessBackupSchedule was, at times, not returning the correct result+error expected by the controller-runtime TypedReconciler interface, leading to unnecessary re-queue of requests or missed re-queues. The logic has been modified to return correct values. A periodic resync of the controller has also been added to safeguard potential issues and to ensure resources (Jobs, PVCs, etc) are cleaned up correctly no matter what.
The upgrade test has been modified to also run the 401_scheduled_backups.yaml manifest.
The default ConcurrentPolicy of the Jobs has been modified from Allow to Forbid. With the new strategies requiring far more resources and time to complete, this new default makes sense. This value is still configurable by end-users.
Previously, the vtbackup created with every VitessShard was always attempting to do an initial backup (--initial_backup), which would fail if the VitessBackup definition was added after the VitessShard started running. That logic has been slightly modified so the vitess-operator is more aware of the current state of the VitessShard before creating the vtbackup pod. If there is already a primary in the shard, it will not be an initial backup, but an --allow_first_backup backup - as we only create this pod if there is no backup for this shard. The VitessBackupSchedule relies on the same logic to determine the type of backup it has to run, either: initial, allow first, or normal. Concurrency issues between the initial vtbackup pod and the first scheduled run of a VitessBackupSchedule is, depending on the schedule, likely. In such situation both vtbackup pods come up with the --initial_backup flag set to true. One of the two pods will simply fail early.

Related PRs

Documentation: Use vtbackup in scheduled backups vitessio/website#1946
Vitess examples: Use vtbackup in scheduled backups vitessio/vitess#17869

Signed-off-by: Florent Poinsard <[email protected]>

…uled-backups

Signed-off-by: Florent Poinsard <[email protected]>

…uled-backups

Signed-off-by: Florent Poinsard <[email protected]>

mattlord

LGTM! ❤️ I only had a few minor comments/questions.

pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

Signed-off-by: Florent Poinsard <[email protected]>

deepthi

I only had a few comments on the diff in this PR, but it's unclear to me how AllowFirstBackup works. Is that functionality that was already present, and hence not showing up as a diff in this PR? If I'm just missing it, please point me to the file where I can see it.

docs/api.md

deepthi · 2025-03-05T21:15:38Z

docs/api.md

- &ldquo;Allow&rdquo; (default): allows CronJobs to run concurrently;
- &ldquo;Forbid&rdquo;: forbids concurrent runs, skipping next run if previous run hasn&rsquo;t finished yet;
+- &ldquo;Allow&rdquo;: allows CronJobs to run concurrently;
+- &ldquo;Forbid&rdquo; (default): forbids concurrent runs, skipping next run if previous run hasn&rsquo;t finished yet;


deepthi · 2025-03-05T21:16:11Z

docs/api/index.html

@@ -2602,7 +2592,8 @@ <h3 id="planetscale.com/v2.VitessBackupScheduleStatus">VitessBackupScheduleStatu
 </td>
 <td>
 <em>(Optional)</em>
-<p>A list of pointers to currently running jobs.</p>
+<p>A list of pointers to currently running jobs.
+This field is deprecated and no longer used in &gt;= v2.15. It will be removed in a future release.</p>


Is this file generated? I'm not repeating my comments from api.md here.

docs/api.md and docs/api/index.html are both generated using the same source.

frouioui · 2025-03-05T21:26:21Z

@deepthi AllowFirstBackup is new code, the flag is set here

vitess-operator/pkg/operator/vttablet/flags.go

Line 107 in 072d109

"allow_first_backup": backupSpec.AllowFirstBackup,

on the vtbackup pod. The decision whether to set the flag or not is done in vtbackupSpec.

frouioui · 2025-03-05T21:28:43Z

We have this constant:

vitess-operator/pkg/operator/vitessbackup/labels.go

Line 28 in e9e1594

TypeFirstBackup = "empty"

That is used in both places, once for the vtbackup init pod, and another for the scheduled backup:

vitess-operator/pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

Line 611 in 072d109

backupType = vitessbackup.TypeFirstBackup

vitess-operator/pkg/controller/vitessshard/reconcile_backup_job.go

Line 87 in 28875f9

backupType := vitessbackup.TypeFirstBackup

This flag will be used if: we have no backup and we already have a primary running in the shard.

deepthi · 2025-03-05T21:50:51Z

We have this constant:

vitess-operator/pkg/operator/vitessbackup/labels.go

Line 28 in e9e1594

TypeFirstBackup = "empty"

That is used in both places, once for the vtbackup init pod, and another for the scheduled backup:

vitess-operator/pkg/controller/vitessbackupschedule/vitessbackupschedule_controller.go

Line 611 in 072d109

backupType = vitessbackup.TypeFirstBackup

vitess-operator/pkg/controller/vitessshard/reconcile_backup_job.go

Line 87 in 28875f9

backupType := vitessbackup.TypeFirstBackup

This flag will be used if: we have no backup and we already have a primary running in the shard.

Based on an out-of-band discussion, we should not allow people to take "first" backups on a long-running cluster. If they have not been using VItessBackupSchedules and they want to start using them, there must be a usable backup that can be used as the basis for the first scheduled backup.

Signed-off-by: Florent Poinsard <[email protected]>

frouioui added 15 commits January 23, 2025 15:37

Refactor GetCompletedBackup to be shared between different controllers

c10c0f4

Signed-off-by: Florent Poinsard <[email protected]>

Use Forbid as the default concurrency setting in vbs

9f75193

Signed-off-by: Florent Poinsard <[email protected]>

Fix tests

0e982c7

Signed-off-by: Florent Poinsard <[email protected]>

Run vtbackup pods instead of vtctldclient

97bed04

Signed-off-by: Florent Poinsard <[email protected]>

Merge remote-tracking branch 'origin/main' into use-vtbackup-in-sched…

73a622f

…uled-backups

Edit tests

460e94d

Signed-off-by: Florent Poinsard <[email protected]>

Fix service name length and initial backup issues

e9e1594

Signed-off-by: Florent Poinsard <[email protected]>

wip

504b771

Signed-off-by: Florent Poinsard <[email protected]>

Merge remote-tracking branch 'origin/main' into use-vtbackup-in-sched…

aac1050

…uled-backups

Rework how multiple strategies are handled and how the Status is used

ae049fd

Signed-off-by: Florent Poinsard <[email protected]>

Add more tests both in upgrade and scheduled backup

e35d9e2

Signed-off-by: Florent Poinsard <[email protected]>

Remove unused function in vbsc controller

d29a799

Signed-off-by: Florent Poinsard <[email protected]>

Increase timeout in verifyListBackupsOutputWithSchedule

8d4f5ee

Signed-off-by: Florent Poinsard <[email protected]>

Self review

28875f9

Signed-off-by: Florent Poinsard <[email protected]>

Fix test logic

91897cc

Signed-off-by: Florent Poinsard <[email protected]>

frouioui marked this pull request as ready for review February 25, 2025 22:17

frouioui requested review from deepthi and mattlord February 25, 2025 22:17

This was referenced Feb 25, 2025

Use vtbackup in scheduled backups vitessio/website#1946

Merged

Use vtbackup in scheduled backups vitessio/vitess#17869

Merged

mattlord approved these changes Feb 27, 2025

View reviewed changes

apply review suggestions

072d109

Signed-off-by: Florent Poinsard <[email protected]>

frouioui mentioned this pull request Feb 28, 2025

Support multiple namespaces #666

Merged

deepthi reviewed Mar 5, 2025

View reviewed changes

frouioui added 2 commits March 5, 2025 15:52

disallow automated empty backups

b480a2e

Signed-off-by: Florent Poinsard <[email protected]>

update docs

423e959

Signed-off-by: Florent Poinsard <[email protected]>

deepthi approved these changes Mar 5, 2025

View reviewed changes

GuptaManan100 approved these changes Mar 6, 2025

View reviewed changes

frouioui merged commit 2a9fe95 into main Mar 6, 2025
12 checks passed

frouioui deleted the use-vtbackup-in-scheduled-backups branch March 6, 2025 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use vtbackup in scheduled backups #658

Use vtbackup in scheduled backups #658

frouioui commented Jan 28, 2025 •

edited by deepthi

Loading

mattlord left a comment

deepthi left a comment

deepthi Mar 5, 2025

deepthi Mar 5, 2025

frouioui Mar 5, 2025

frouioui commented Mar 5, 2025

frouioui commented Mar 5, 2025 •

edited

Loading

deepthi commented Mar 5, 2025

Use vtbackup in scheduled backups #658

Use vtbackup in scheduled backups #658

Conversation

frouioui commented Jan 28, 2025 • edited by deepthi Loading

Context

Proposed Change

Changes

Related PRs

mattlord left a comment

Choose a reason for hiding this comment

deepthi left a comment

Choose a reason for hiding this comment

deepthi Mar 5, 2025

Choose a reason for hiding this comment

deepthi Mar 5, 2025

Choose a reason for hiding this comment

frouioui Mar 5, 2025

Choose a reason for hiding this comment

frouioui commented Mar 5, 2025

frouioui commented Mar 5, 2025 • edited Loading

deepthi commented Mar 5, 2025

frouioui commented Jan 28, 2025 •

edited by deepthi

Loading

frouioui commented Mar 5, 2025 •

edited

Loading