Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(duplication): dup request is rejected by remote cluster due to max_allowed_write_size #1841

Merged
merged 14 commits into from
Mar 27, 2024

Conversation

ninsmiracle
Copy link
Contributor

@ninsmiracle ninsmiracle commented Jan 9, 2024

What problem does this PR solve?

#1840

What is changed and how does it work?

Add config dup_max_allowed_write_size to restrict the size of dup request in case that the request is reject by remote cluster.

Side effects
  • master cluster send message by batch or not

New configuration is added:

[replication]
+ dup_max_allowed_write_size = 1048576

@github-actions github-actions bot added the cpp label Jan 9, 2024
@acelyc111
Copy link
Member

What's the relationship between the newly added config and duplicate_log_batch_bytes?

@ninsmiracle
Copy link
Contributor Author

ninsmiracle commented Jan 9, 2024

What's the relationship between the newly added config and duplicate_log_batch_bytes?

duplicate_log_batch_bytes is the batch size you can set,always set to 4096.
The 'newly' added config max_allowed_write_size is the max size of a write mutation that a cluster could received. Not only duplication write,normal write will be limit by this config also.

If we only consider duplicate_log_batch_bytes,master will send some mutations that backup cluster can not receive. Backup cluster will log:

E2024-01-04 11:42:02.88 (1704339722088300009 6004) replica.replica7.0400174218c587bd: replica_stub.cpp:1645:response_client(): [email protected]:42801: write fail: client = xx.xx.xx.xxx:42801, code = RPC_RRDB_RRDB_DUPLICATE, timeout = 236, status = replication::partition_status::PS_PRIMARY, error = ERR_INVALID_DATA

W2024-01-04 11:47:33.656 (1704340053656585710 6004) replica.replica7.0400174218c58cd1: replica_2pc.cpp:77:on_client_write(): [22.53@xxxx:42801] client from xxxx:42801 write request body size exceed threshold, request = [default], request_body_size = 1048716, max_allowed_write_size = 1048576, it will be rejected!

We can see it clearly that request_body_size(master cluster send to backup cluster) is greater than max_allowed_write_size.

@acelyc111
Copy link
Member

acelyc111 commented Jan 9, 2024

What's the relationship between the newly added config and duplicate_log_batch_bytes?

duplicate_log_batch_bytes is the batch size you can set,always set to 4096. The 'newly' added config max_allowed_write_size is the max size of a write mutation that a cluster could received. Not only duplication write,normal write will be limit by this config also.

If we only consider duplicate_log_batch_bytes,master will send some mutations that backup cluster can not receive. Backup cluster will log:

E2024-01-04 11:42:02.88 (1704339722088300009 6004) replica.replica7.0400174218c587bd: replica_stub.cpp:1645:response_client(): [email protected]:42801: write fail: client = xx.xx.xx.xxx:42801, code = RPC_RRDB_RRDB_DUPLICATE, timeout = 236, status = replication::partition_status::PS_PRIMARY, error = ERR_INVALID_DATA

W2024-01-04 11:47:33.656 (1704340053656585710 6004) replica.replica7.0400174218c58cd1: replica_2pc.cpp:77:on_client_write(): [22.53@xxxx:42801] client from xxxx:42801 write request body size exceed threshold, request = [default], request_body_size = 1048716, max_allowed_write_size = 1048576, it will be rejected!

We can see it clearly that request_body_size(master cluster send to backup cluster) is greater than max_allowed_write_size.

  1. duplicate_log_batch_bytes is optional, how to comprehend "always" set to 4096?
  2. I only saw dup_max_allowed_write_size is used in duplication module, how it take effect on "normal write"?
  3. What's the config value of duplicate_log_batch_bytes in your environment, if it's less than the config value of max_allowed_write_size in backup cluster (i.e. 1048576), how can this situaction happen?

@ninsmiracle
Copy link
Contributor Author

What's the relationship between the newly added config and duplicate_log_batch_bytes?

duplicate_log_batch_bytes is the batch size you can set,always set to 4096. The 'newly' added config max_allowed_write_size is the max size of a write mutation that a cluster could received. Not only duplication write,normal write will be limit by this config also.
If we only consider duplicate_log_batch_bytes,master will send some mutations that backup cluster can not receive. Backup cluster will log:

E2024-01-04 11:42:02.88 (1704339722088300009 6004) replica.replica7.0400174218c587bd: replica_stub.cpp:1645:response_client(): [email protected]:42801: write fail: client = xx.xx.xx.xxx:42801, code = RPC_RRDB_RRDB_DUPLICATE, timeout = 236, status = replication::partition_status::PS_PRIMARY, error = ERR_INVALID_DATA

W2024-01-04 11:47:33.656 (1704340053656585710 6004) replica.replica7.0400174218c58cd1: replica_2pc.cpp:77:on_client_write(): [22.53@xxxx:42801] client from xxxx:42801 write request body size exceed threshold, request = [default], request_body_size = 1048716, max_allowed_write_size = 1048576, it will be rejected!

We can see it clearly that request_body_size(master cluster send to backup cluster) is greater than max_allowed_write_size.

  1. duplicate_log_batch_bytes is optional, how to comprehend "always" set to 4096?
  2. I only saw dup_max_allowed_write_size is used in duplication module, how it take effect on "normal write"?
  3. What's the config value of duplicate_log_batch_bytes in your environment, if it's less than the config value of max_allowed_write_size in backup cluster (i.e. 1048576), how can this situaction happen?

1.Becauseduplicate_log_batch_bytes default value is 4096
2.I saw max_allowed_write_size in replica_2pc_cpp.Here it can effect on "normal write"
3.It's not easy for master cluster get the value of duplicate_log_batch_bytes config by backup cluster. I don't think it's necessary for master cluster to get this value,because when we doing duplication ,we usually keep the similarity config between two clusters.

I'm sorry that what I said above caused ambiguity.

@ninsmiracle ninsmiracle changed the title Fix(duplication):fix some nodes can not send dup message cause by combine raw write message fix(duplication):fix some nodes can not send dup message cause by combine raw write message Jan 16, 2024
@ninsmiracle ninsmiracle changed the title fix(duplication):fix some nodes can not send dup message cause by combine raw write message fix : (duplication) fix some nodes can not send dup message cause by combine raw write message Jan 26, 2024
@acelyc111 acelyc111 changed the title fix : (duplication) fix some nodes can not send dup message cause by combine raw write message fix(duplication): fix some nodes can not send dup message cause by combine raw write message Jan 29, 2024
@acelyc111
Copy link
Member

Is it necessary to add the new config? Can the issue be resolved by adjust duplicate_log_batch_bytes on master cluster or max_allowed_write_size on backup cluster?

@acelyc111
Copy link
Member

acelyc111 commented Jan 29, 2024

3.It's not easy for master cluster get the value of duplicate_log_batch_bytes config by backup cluster. I don't think it's necessary for master cluster to get this value,because when we doing duplication ,we usually keep the similarity config between two clusters.

In this patch, a new config dup_max_allowed_write_size is added, it face the same issue.

An alternative way is to set DSN_TAG_VARIABLE(max_allowed_write_size, FT_MUTABLE), then you can change the value by HTTP (e.g. ip:port/updateConfig?max_allowed_write_size=2097152) without restart the server.

@ninsmiracle
Copy link
Contributor Author

Is it necessary to add the new config? Can the issue be resolved by adjust duplicate_log_batch_bytes on master cluster or max_allowed_write_size on backup cluster?

If the maintainer of the duplication clusters config max_allowed_write_size to zero , the problem will be resolved. However , if maintainer config it to zero for every duplication cluster , when dup service send mutatioin, packaging functionality will no longer supported . I think it's a unsuitable design.

@acelyc111
Copy link
Member

Is it necessary to add the new config? Can the issue be resolved by adjust duplicate_log_batch_bytes on master cluster or max_allowed_write_size on backup cluster?

If the maintainer of the duplication clusters config max_allowed_write_size to zero , the problem will be resolved. However , if maintainer config it to zero for every duplication cluster , when dup service send mutatioin, packaging functionality will no longer supported . I think it's a unsuitable design.

How about limiting the message size separately for common client write operations and duplication write operations, you can distinguish them by rpc code, then use corresponding limits?

@ninsmiracle
Copy link
Contributor Author

Is it necessary to add the new config? Can the issue be resolved by adjust duplicate_log_batch_bytes on master cluster or max_allowed_write_size on backup cluster?

The same purpose can be achieved through the configuration of operation and maintenance personnel. But I think it is better to provide some support on the Pegasus server side.

@ninsmiracle
Copy link
Contributor Author

ponding limits?

good idea ! I will resubmit this pr in rpc related logic later~

acelyc111
acelyc111 previously approved these changes Mar 1, 2024
src/server/pegasus_mutation_duplicator.cpp Outdated Show resolved Hide resolved
@acelyc111
Copy link
Member

@ninsmiracle Please try to rebase the master branch.

@ninsmiracle ninsmiracle force-pushed the fix_duplication_combine_two_write branch from 5acd593 to dea5a69 Compare March 4, 2024 08:24
acelyc111
acelyc111 previously approved these changes Mar 7, 2024
@empiredan empiredan changed the title fix(duplication): fix some nodes can not send dup message cause by combine raw write message fix(duplication): dup messages are rejected by remote cluster due to max_allowed_write_size Mar 27, 2024
@empiredan empiredan added the type/config-change Added or modified configuration that should be noted on release note of new version. label Mar 27, 2024
@empiredan empiredan changed the title fix(duplication): dup messages are rejected by remote cluster due to max_allowed_write_size fix(duplication): dup request is rejected by remote cluster due to max_allowed_write_size Mar 27, 2024
@empiredan empiredan merged commit 97f7c30 into apache:master Mar 27, 2024
72 of 74 checks passed
@empiredan empiredan added component/duplication cluster duplication type/bug-fix This PR fixes a bug. labels Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/duplication cluster duplication cpp type/bug-fix This PR fixes a bug. type/config-change Added or modified configuration that should be noted on release note of new version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants