Bug(duplication):some nodes can not send duplication data in master when big difference in data size #1840

ninsmiracle · 2024-01-09T03:58:40Z

Bug Report

What did you do?
In online production situation. Some nodes in matser cluster can not send dupcalition data to backup cluster.
What version of Pegasus are you using?
pegasus2.4

3.Why?
The root cause is that the master cluster sent a write RPC to the backup cluster, and the request body size exceeded the max_allowed_write_size set by the backup cluster. The reason why the master cluster sends this illegal data is due to the mechanism in the process of packaging and sending mutations:

the master cluster traverses the writes received in a time period
checks whether all writes have been traversed or whether the current batch bytes are already greater than the duplicate_log_batch_bytes set in the cluster hot standby configuration (config.ini).
If it is not greater than the batch size, then two mutations are combined into one batch.

This can happen if the data length distribution of the table is too large.

For example:
The length of the first mutation A is 200byte. This naturally is to be combined with the next piece.
But the length of the next mutation B is 1048376(1048576-200) .
At this point, A and B are already in a batch, RPC will be sent out smoothly, but the standby cluster can not accept such a large write, and the standby cluster throws an ERR error. The master cluster is delayed recovery, throwing an ERR error.

The text was updated successfully, but these errors were encountered:

…ax_allowed_write_size` (#1841) #1840 Add config `dup_max_allowed_write_size` to restrict the size of dup request in case that the request is reject by remote cluster. New configuration is added: ```diff [replication] dup_max_allowed_write_size = 1048576 ```

ninsmiracle added the type/bug This issue reports a bug. label Jan 9, 2024

ninsmiracle mentioned this issue Jan 9, 2024

fix(duplication): dup request is rejected by remote cluster due to max_allowed_write_size #1841

Merged

ruojieranyishen mentioned this issue May 9, 2024

Bug(Learn)：bakcup or dup a table with per disk throttling on a backup-duplication cluster , some nodes coredump #1969

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug(duplication):some nodes can not send duplication data in master when big difference in data size #1840

Bug(duplication):some nodes can not send duplication data in master when big difference in data size #1840

ninsmiracle commented Jan 9, 2024 •

edited

Loading

Bug(duplication):some nodes can not send duplication data in master when big difference in data size #1840

Bug(duplication):some nodes can not send duplication data in master when big difference in data size #1840

Comments

ninsmiracle commented Jan 9, 2024 • edited Loading

Bug Report

ninsmiracle commented Jan 9, 2024 •

edited

Loading