-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Fix accidental table deletion during restore job #48820
Open
wubiaoi
wants to merge
1
commit into
apache:master
Choose a base branch
from
wubiaoi:bugfix/ccr-metadata-sync-consistency
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[Bug] Fix accidental table deletion during restore job #48820
wubiaoi
wants to merge
1
commit into
apache:master
from
wubiaoi:bugfix/ccr-metadata-sync-consistency
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
TPC-H: Total hot run time: 32541 ms
|
w41ter
approved these changes
Mar 7, 2025
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
TPC-DS: Total hot run time: 185444 ms
|
ClickBench: Total hot run time: 30.93 s
|
@wubiaoi 可以在 regression-suites/backup_restore 下加个测试 case 吗? |
wyxxxcat
approved these changes
Mar 7, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
如果使用CCR配置库同步任务时,目标库下有同名的表,会导致误删除表,以及master与follower 表meta不一致。
CCR任务对于目标库下已经有相同表的处理流程
在FE端判断如果Restore的表已经存在,会校验新表和原表的scheme等信息是否一致,如果不一致会抛出异常(Table {} already exists but with different schema, "+ "local table: {}, remote table: {}),本次Restore任务失败;这时ccr-syncer服务收到该异常会catch处理,会对表进行alias重命名(__ccr_tablename_timestamp),重新发起Restore请求到FE,如果FE这时Restore成功,syncer服务会执行replace table(swap=false)来替换表,以完成同步。
当前Fe处理逻辑
有一个for循环会对每个需要恢复的表进行判断,如果判断已经存在的表和将要同步的表scheme不同,会直接返回失败并cancel Restore任务;当有多个表重复时,一次Restore只返回一个表异常,这会导致Syncer服务不断的发起Restore操作,直到把所有的表加上alias。
Fe处理逻辑中的问题
因为是恢复alias后的表名,所以走表不存在的处理逻辑,这个时候会使用backup的表scheme来构造table对象,最后将表名更新为alias的名称,问题的关键是添加到restoredTable的逻辑和判断表scheme是否一致是在一个循环中,第一次按正常别名处理后,会在restoredTables中添加alias的表,但循环到第二个表如果表scheme不一致会直接return返回异常,这时不会将第一次的表名set为alias名,相当于直接把源库的表名加到了restoredTable中,这时restore任务失败后,会在cancel善后逻辑中将创建的alias表在restoreTable删除掉,但这个时候其实不是alias的表名,是正确的表名,表就被这么删除掉了!!!
经过不断Restore操作,Syncer服务会把所有表都alias,这时restore任务就可以成功了, 在Syncer中对每个表执行replace table时在master中源表其实是不存在的,会出现异常,永远无法恢复。
为什么FE master和follower表Meta不一致?
master在处理restore job时,只有download、commit、finished、cancel状态将会将restore Job对象存到BDB,在第一个表抛出异常后,状态是pending,不会同步到follower,在多次restore成功后,表名是alias的名称,所以follower记录不会replay drop table的操作,导致follower永远是原始手动创建表的Meta。
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)