You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.
Description
The current shard version verification implementation is not perfect enough and has the following problems:
The shard versions of CeresMeta and CeresDB are independent of each other. When inconsistencies occur, they must be restored by restarting the CeresDB node.
shard version synchronization is chaotic and prone to unexpected Version inconsistencies.
The verification logic of shard version limits concurrent DDL. Only one DDL can succeed on a shard at the same time.
Proposal
Redesign and implement shard version related logic.
Additional context
Some current thoughts:
How to synchronize meta version with ceresdb?
Return the latest version in the response of creating and deleting tables (I prefer this solution)
Synchronize the latest version through heartbeat
meta pulls the latest version through the interface provided by ceresdb
Who will persist the shard version information?
Keep it as is, persisted by meta, and ceresdb synchronizes version from meta when opening shard (I prefer this solution)
Version persistence is maintained by ceresdb. When opening shard, ceresdb synchronizes it to meta through response.
How to handle version when operating shards concurrently?
Leave it as is, only one operation will succeed and the others will fail.
When making a batch batch, create a table, delete a table and make a batch, you must consider how to increment the version.
Batch operation, version +1
For each operation in the batch, version +1
Are version inconsistencies allowed within a certain range?
Not allowed, must be completely consistent (current method)
Record the operations on the shard, and ignore the version when operating the shard that allows changes or there will be a certain range of inconsistencies in the operation.
How to recover when versions are inconsistent?
Manually restart the node (current method, not acceptable)
Automatic error correction and recovery
Meta regularly inspects all shard versions. For inconsistent versions, meta initiates repair operations to ceresdb.
ceresdb is responsible for error correction. When receiving a request with an inconsistent version, ceresdb initiates a repair operation to ceresmeta.
How to correct the error specifically and what needs to be done before synchronizing to a consistent version?
Try to rebuild the table or delete the table so that the failed procedure can be executed successfully.
Ignore it directly and force version synchronization.
The text was updated successfully, but these errors were encountered:
## Rationale
For detail, see: #263
In this pr, add the checksum repair logic of shard version.
## Detailed Changes
* Add `MayCorrectShardVersion` in `RegisterNode`, it will correct shard
version when it is inconsistent in ceresmeta and ceresdb.
## Test Plan
I created some local shard version inconsistent scenarios to verify its
repair ability.
## Rationale
Refer to this issue: #263
## Detailed Changes
* Reconstruct the process of create/drop table so that the update of
shard version depends on CeresDB
## Test Plan
Pass existing unit tests and integration tests.
ShiKaiWi
added a commit
to apache/horaedb
that referenced
this issue
Nov 9, 2023
## Rationale
For details, see: apache/incubator-horaedb-meta#263
## Detailed Changes
* Modify the return value of `CreateTableOnShard` & `DropTableOnShard`
to return the latest shard version.
## Test Plan
Pass all unit tests and integration test.
---------
Co-authored-by: xikai.wxk <[email protected]>
Co-authored-by: WEI Xikai <[email protected]>
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
The current shard version verification implementation is not perfect enough and has the following problems:
Proposal
Redesign and implement shard version related logic.
Additional context
Some current thoughts:
The text was updated successfully, but these errors were encountered: