Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The backup-cluster may doing an incomplete learn with duplication #2107

Open
ninsmiracle opened this issue Aug 23, 2024 · 1 comment
Open
Labels
type/bug This issue reports a bug.

Comments

@ninsmiracle
Copy link
Contributor

Bug Report

At present, the implementation of dup is that when the backup-cluster executes the dup rpc processing function, multiple requests in dup are written to rocksdb in multiple times.
Each time it is written to rocksdb, the decree of the dup mutation is written at the same time. If the backup-cluster is checkpointed at this time, the data of the decree may not be completely written to rocksdb.
If the learner of the backup-cluster uses this checkpoint to start learning, it will start to request plog from decree+1 after learning. As a result, some dup requests of the decree are not learned, and some data is lost.

int pegasus_write_service::duplicate(int64_t decree,
                                     const dsn::apps::duplicate_request &requests,
                                     dsn::apps::duplicate_response &resp)
{
    // If the `for` loop has not yet been completed, and there is a need to checkpoint.
    // The checkpoint may not include all data cause these request share the same decree.
    // In other word, this creates an inconsistency.
    for (const auto &request : requests.entries) {
    // ...
     }
}
@ninsmiracle ninsmiracle added the type/bug This issue reports a bug. label Aug 23, 2024
@ninsmiracle
Copy link
Contributor Author

But we can use duplicate_log_batch_bytes = 0 to deal with this problem.
So I'm not very sure should I fix this 'bug'.
If I should fix it, executing a dup request should write multiple requests and one decree as a write_batch?
@acelyc111 @empiredan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

1 participant