Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConfigService get_config Blocking Main Thread #241

Open
451846939 opened this issue Aug 12, 2024 · 10 comments
Open

ConfigService get_config Blocking Main Thread #241

451846939 opened this issue Aug 12, 2024 · 10 comments

Comments

@451846939
Copy link
Contributor

When using the ConfigService get_config method in the main Tokio thread, the service fails to start if the connection address is not successful.

Although I have a local default configuration and prefer to fetch configurations from a remote source first, the system should fall back to the local configuration if the remote fetch fails.

However, ConfigService get_config blocks the main thread, preventing the service from proceeding.
This issue stems from NacosGrpcConnection’s poll_ready continuously retrying. I propose adding a configurable maximum retry limit, with a default value of None to retain the original logic.

Users could set this limit as needed to control the retry behavior.

@451846939
Copy link
Contributor Author

I’m attempting to fix this issue.🙋
#242

@CherishCai
Copy link
Collaborator

哇,非常好的提议。

@CherishCai
Copy link
Collaborator

但这里的重试主要是初始化链接时候的,应该不是 get_config 的 block。如果 block 住了是否 build() 没正常连上服务端?

@451846939
Copy link
Contributor Author

但这里的重试主要是初始化链接时候的,应该不是 get_config 的 block。如果 block 住了是否 build() 没正常连上服务端?

是的,因为可能会有网络原因无法连上服务器

@CherishCai
Copy link
Collaborator

设置最大连接次数能规避。
但也提个反问,网络问题应该解决它,而应用启动拦截反而是好事?

@451846939
Copy link
Contributor Author

调用是这样的get_config->
get_config_inner_async
->remote_client.send_request
->self.send_request.send_request(grpc_request)
->future::poll_fn(|cx| svc.poll_ready(cx))
这里poll_ready导致的

@451846939
Copy link
Contributor Author

设置最大连接次数能规避。 但也提个反问,网络问题应该解决它,而应用启动拦截反而是好事?

可能希望单纯的希望当nacos server端网络连接不通的时候不必强依赖nacos也可以启动起来,因为这样强阻塞线程会让其他流程代码无法往下进行

@CherishCai
Copy link
Collaborator

可能希望单纯的希望当nacos server端网络连接不通的时候不必强依赖nacos也可以启动起来,因为这样强阻塞线程会让其他流程代码无法往下进行

看起来也非常奇怪,不过设置最大重试次数抛出 Error 对 sdk 来说没问题。
目前的 PR 实现对于运行期的重链次数设定可能也不友好,failover 重试几次不成功就不再试了,不过不需要的保持默认值即可。

@451846939
Copy link
Contributor Author

或许在FailoverConnection send_request里增加一个时间,在指定时间里没有执行成功,返回一次错误

@CherishCai
Copy link
Collaborator

或许在FailoverConnection send_request里增加一个时间,在指定时间里没有执行成功,返回一次错误

我相信你设定 max_retry 肯定也会遇到这个问题,等你实践和觉得如何改造更佳。

使用 sdk 运行期因为远程 nacos-server 不可用而卡住线程这个确实在之前没有考虑到

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants