feat(duplication): make the task code for incremental loading from private logs configurable #2184

ninsmiracle · 2025-01-20T06:39:04Z

What problem does this PR solve?

What is changed and how does it work?

We can make the task code configurable, allowing the thread priority incremental
loading from private logs of to be adjusted from LOW to COMMON, thereby
enabling support for low-latency real-time duplication.

Performance Testing

I do some test cases as following:
In these test cases, I first wrote 8k QPS of write traffic to the master cluster (this is the traffic that my test cluster will not generate dup log backlogs) to verify the effect of the priority modification. Then I wrote 20k QPS of write traffic to the master cluster (this is the traffic that my test cluster will generate a certain degree of dup log backlogs) to verify the effect of the priority modification.

load/ship task priority	load_from_private_log task priority	qps	duplicate_log_batch_bytes	plog Maximum backlog	Master cluster write delay p99	master/slave dup delay
LOW	LOW	8K	4096	9k	1ms	p95 127ms、p99 27473ms
LOW	COMMON	8K	4096	200	1ms	p95 101ms、p99 109ms
HIGH	HIGH	8K	4096	150	1ms	p95 107ms、p99 115ms
LOW	LOW	20K	4096	61K	1.5ms	p95 139ms、p99 20506ms
LOW	COMMON	20K	4096	42K	1.5ms	p95 126ms、p99 18127ms
LOW	COMMON	20K	0	Continue to increase over time	1.5ms	95 10618ms、p99 303519ms

As you see , change the priority from LOW to COMMON of load_from_private_log will not t increase the online delay. And priority from LOW to HIGH is no benefit for further speeding up duplication.

So based on the above experimental conclusions, I think this issues' argument is valid.

src/replica/duplication/replica_duplicator.cpp

Co-authored-by: Dan Wang <[email protected]>

ninsmiracle · 2025-03-07T06:59:44Z

Add some information about dup sending delay

We conducted multiple control experiments on the test cluster with duplicate_log_batch_bytes of 0, 4096, and 8192. It can be clearly seen that configuring a larger duplicate_log_batch_bytes can improve the consumption capacity of the cluster dup. For the table below, when duplicate_log_batch_bytes is configured to 8192, the cluster is still able to consume writes at 40k write QPS; but if duplicate_log_batch_bytes is configured to 0, the cluster loses its consumption capacity at 20k write QPS. However, if the cluster dup can consume existing writes, the larger the duplicate_log_batch_bytes, the longer the delay in dup a piece of data between the master and slave clusters.

And I think I need to explain the 4th and 5th columns of the following table. When the delay between the master and standby clusters is too small, the delay data displayed by the monitoring is inaccurate. This is due to the counter reporting granularity. So we make a program to read and write the corresponding keys on both sides to determine the precise delay. However, when the delay between the master and slave clusters is too large, the delay of reading and writing each shard takes too long and is sometimes difficult to calculate. Therefore, we mainly use monitoring data to compare the experimental results in the scenario of large delay.

qps	plog Maximum backlog	duplicate_log_batch_bytes	master/slave dup delay p99(Monitoring delay avg)	master/slave dup delay program test
0	3	0		p95 105ms/p99 108ms
0	3	4096		p95 106ms/p99 108ms
0	3	8192		p95 127ms/p99 150ms
8k	13K	0	120ms	p95 106ms/p99 137ms
8K	17k	4096	3.7s	p95 119ms/p99 1673ms
8K	17.2k	8192	6s	p95 138ms/p99 20s
20K	Continue to increase	0	Continue to increase	Difficult to observe
20K	75k	4096	25s	Difficult to observe
20K	70k	8192	25s	Difficult to observe
30K	120k	8192	26s	Difficult to observe
40K	24k	8192	28s	Difficult to observe
45K	Continue to increase	8192	Continue to increase	Difficult to observe

==================================================

And here is an effect of adjusting the parameters of one of our online clusters:

cluster name	duplicate_log_batch_bytes = 4096	duplicate_log_batch_bytes = 0
c3srv-online	p95 1008ms/ p99 1327ms	p95 100ms/ p99 108ms

feat: raise load_from_private_log priority from LOW to COMMON

93f2f18

github-actions bot added the cpp label Jan 20, 2025

acelyc111 closed this Jan 20, 2025

acelyc111 reopened this Jan 20, 2025

acelyc111 reviewed Jan 20, 2025

View reviewed changes

src/replica/duplication/replica_duplicator.cpp Outdated Show resolved Hide resolved

ninsmiracle added 3 commits February 8, 2025 09:46

feat: make the level of load plog configable

91cafdc

fix: pass IWYU

7b675d8

fix: pass tidy CI workflow

f202b42

acelyc111 previously approved these changes Feb 12, 2025

View reviewed changes

fix: format code

38b37d1

ninsmiracle dismissed acelyc111’s stale review via 38b37d1 February 13, 2025 09:30

empiredan reviewed Feb 18, 2025

View reviewed changes

src/replica/duplication/replica_duplicator.cpp Outdated Show resolved Hide resolved

fix by comment

541723f

empiredan reviewed Feb 19, 2025

View reviewed changes

ninsmiracle and others added 5 commits February 25, 2025 15:28

Update src/replica/duplication/replica_duplicator.cpp

aa890f4

Co-authored-by: Dan Wang <[email protected]>

Update src/replica/duplication/replica_duplicator.cpp

2ad7096

Co-authored-by: Dan Wang <[email protected]>

Update src/replica/duplication/replica_duplicator.cpp

fbdc4a3

Co-authored-by: Dan Wang <[email protected]>

Update src/replica/duplication/replica_duplicator.cpp

18b3da3

Co-authored-by: Dan Wang <[email protected]>

fix format

9f753d8

acelyc111 approved these changes Mar 7, 2025

View reviewed changes

empiredan approved these changes Mar 7, 2025

View reviewed changes

empiredan changed the title ~~feat: raise load_from_private_log priority from LOW to COMMON~~ feat(duplication): make the task code for incremental loading from private logs configurable Mar 7, 2025

empiredan merged commit cb9a1d3 into apache:master Mar 7, 2025
95 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(duplication): make the task code for incremental loading from private logs configurable #2184

feat(duplication): make the task code for incremental loading from private logs configurable #2184

ninsmiracle commented Jan 20, 2025 •

edited by empiredan

Loading

ninsmiracle commented Mar 7, 2025 •

edited

Loading

feat(duplication): make the task code for incremental loading from private logs configurable #2184

feat(duplication): make the task code for incremental loading from private logs configurable #2184

Conversation

ninsmiracle commented Jan 20, 2025 • edited by empiredan Loading

What problem does this PR solve?

What is changed and how does it work?

Performance Testing

ninsmiracle commented Mar 7, 2025 • edited Loading

Add some information about dup sending delay

ninsmiracle commented Jan 20, 2025 •

edited by empiredan

Loading

ninsmiracle commented Mar 7, 2025 •

edited

Loading