You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some use cases where parsing binlog is the bottleneck, such as syncing from severals hours/days ago after dumping big uninterested table. Throughput is around 40 to 50 thousands records per second in production for us and exhausted one cpu core. If we could parse binlog in parallel, much higher throughput in this scenario could be reached I think.
To make this possible, we have to break the sequential assumption(within one input stream) from input to sliding window. One possible solution is add a prepare step before submit in scheduler. Sequence sensitive logic such as id allocating should be done before prepare, then start parallel parsing, and finally submit it as before.
The text was updated successfully, but these errors were encountered:
There are some use cases where parsing binlog is the bottleneck, such as syncing from severals hours/days ago after dumping big uninterested table. Throughput is around 40 to 50 thousands records per second in production for us and exhausted one cpu core. If we could parse binlog in parallel, much higher throughput in this scenario could be reached I think.
To make this possible, we have to break the sequential assumption(within one input stream) from input to sliding window. One possible solution is add a
prepare
step beforesubmit
in scheduler. Sequence sensitive logic such as id allocating should be done beforeprepare
, then start parallel parsing, and finally submit it as before.The text was updated successfully, but these errors were encountered: