You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, the implementation of dup is that when the backup-cluster executes the dup rpc processing function, multiple requests in dup are written to rocksdb in multiple times.
Each time it is written to rocksdb, the decree of the dup mutation is written at the same time. If the backup-cluster is checkpointed at this time, the data of the decree may not be completely written to rocksdb.
If the learner of the backup-cluster uses this checkpoint to start learning, it will start to request plog from decree+1 after learning. As a result, some dup requests of the decree are not learned, and some data is lost.
int pegasus_write_service::duplicate(int64_t decree,
const dsn::apps::duplicate_request &requests,
dsn::apps::duplicate_response &resp)
{
// If the `for` loop has not yet been completed, and there is a need to checkpoint.
// The checkpoint may not include all data cause these request share the same decree.
// In other word, this creates an inconsistency.
for (const auto &request : requests.entries) {
// ...
}
}
The text was updated successfully, but these errors were encountered:
But we can use duplicate_log_batch_bytes = 0 to deal with this problem.
So I'm not very sure should I fix this 'bug'.
If I should fix it, executing a dup request should write multiple requests and one decree as a write_batch? @acelyc111@empiredan
Bug Report
At present, the implementation of dup is that when the backup-cluster executes the dup rpc processing function, multiple requests in dup are written to rocksdb in multiple times.
Each time it is written to rocksdb, the decree of the dup mutation is written at the same time. If the backup-cluster is checkpointed at this time, the data of the decree may not be completely written to rocksdb.
If the learner of the backup-cluster uses this checkpoint to start learning, it will start to request plog from decree+1 after learning. As a result, some dup requests of the decree are not learned, and some data is lost.
The text was updated successfully, but these errors were encountered: