BUG(go client):when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. #1856

lengyuexuexuan · 2024-01-16T12:30:27Z

In the code, when replcia core dump, the function loopForResponse() will return "nil".

Then, the process will be blocked in function CallWithGpid() until the time exceeds the timeout.

why not update the configuration of table and retry previous operation when the above situation occurs?

acelyc111 · 2024-01-29T15:09:23Z

@lengyuexuexuan Thanks for the feedback, could you please submit a patch to fix it?

lengyuexuexuan · 2024-01-30T07:15:58Z

@lengyuexuexuan Thanks for the feedback, could you please submit a patch to fix it?

OK. No problem.

…to primary meta server if it was changed (#1916) #1880 #1856 As for #1856: when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. In this case, the go client only restart to solve the problem. In this pr, the client would update configuration of table automatically when someone replica core dump. After testing, we found that the the replica error is "context.DeadlineExceeded" (incubator-pegasus/go-client/pegasus/table_connector.go) when the replica core dump. Therefore, when client meets the error, the go client will update configuration automatically. Besides, this request will not retry. Because only in the case of timeout, the configuration will be automatically updated. If you try again before then, it will still fail. There is also the risk of infinite retries. Therefore, it is better to directly return the request error to the user and let the user try again. As for #1880: When the client sends an RPC message "RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX" to the meta server, if the meta server isn't primary, the response that forward to the primary meta server will return. According to the above description, assuming that the client does not have a primary meta server configured, we can connect to the primary meta server in this way. About tests: 1. Start onebox, and the primary meta server is not added to the go client configuration. 2. The go client writes data to a certain partition and then kills the replica process.

…to primary meta server if it was changed (apache#1916) apache#1880 apache#1856 As for apache#1856: when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. In this case, the go client only restart to solve the problem. In this pr, the client would update configuration of table automatically when someone replica core dump. After testing, we found that the the replica error is "context.DeadlineExceeded" (incubator-pegasus/go-client/pegasus/table_connector.go) when the replica core dump. Therefore, when client meets the error, the go client will update configuration automatically. Besides, this request will not retry. Because only in the case of timeout, the configuration will be automatically updated. If you try again before then, it will still fail. There is also the risk of infinite retries. Therefore, it is better to directly return the request error to the user and let the user try again. As for apache#1880: When the client sends an RPC message "RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX" to the meta server, if the meta server isn't primary, the response that forward to the primary meta server will return. According to the above description, assuming that the client does not have a primary meta server configured, we can connect to the primary meta server in this way. About tests: 1. Start onebox, and the primary meta server is not added to the go client configuration. 2. The go client writes data to a certain partition and then kills the replica process.

lengyuexuexuan added the type/bug This issue reports a bug. label Jan 16, 2024

This was referenced Feb 19, 2024

fix(go-client): update config once replica server failed and forward to primary meta server if it was changed #1909

Closed

fix(go-client): update config once replica server failed and forward to primary meta server if it was changed #1916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG(go client):when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. #1856

BUG(go client):when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. #1856

lengyuexuexuan commented Jan 16, 2024

acelyc111 commented Jan 29, 2024

lengyuexuexuan commented Jan 30, 2024

BUG(go client):when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. #1856

BUG(go client):when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. #1856

Comments

lengyuexuexuan commented Jan 16, 2024

acelyc111 commented Jan 29, 2024

lengyuexuexuan commented Jan 30, 2024