Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: As a user, I want kubernetes service discovery to support more configuration items #8311

Open
tangzhenhuang opened this issue Nov 11, 2022 · 14 comments
Labels
good first issue Good for newcomers

Comments

@tangzhenhuang
Copy link
Contributor

Description

Recently, we deployed apisix on different clouds and used the feature of kubernetes service discovery. The problem is that on different clouds, the proxy layer (LB) in front of apiserver has different idle timeouts. However, in apisix's kubernetes service discovery, The time of a watch is fixed, which will cause a problem: when there is no endpoints event in the cluster for a long time, the server will time out instead of the client, and then the service discovery will restart the list-watch after a fixed 40 seconds , so if you can add some configuration items, such as the duration of a watch, retry time or strategy, etc., thank you!

@tokers
Copy link
Contributor

tokers commented Nov 11, 2022

The current watch timeout is hard coded with a built-in sample algorithm. I think we can add a new field for users to configure the watch timeout.

@spacewander spacewander added the good first issue Good for newcomers label Nov 11, 2022
@zhixiongdu027
Copy link
Contributor

I think the goal is to avoid "re list-watch".
and that's not what "40 seconds" brings

@tokers
Copy link
Contributor

tokers commented Nov 13, 2022

I think the goal is to avoid "re list-watch". and that's not what "40 seconds" brings

Any suggestions?

@zhixiongdu027
Copy link
Contributor

zhixiongdu027 commented Nov 14, 2022

In order to solve the problem,
Maybe we can make events via mock endpoints change in a specific namespace to keep tcp active
@crazyMonkey1995 @tokers

@tangzhenhuang
Copy link
Contributor Author

In order to solve the problem, Maybe we can make events via mock endpoints change in a specific namespace to keep tcp active @crazyMonkey1995 @tokers

How about making timeout a configurable parameter? Because the user himself knows what the timeout of the target apiserver (or its proxy) is.

@zhixiongdu027
Copy link
Contributor

Too short watchSeconds value will produce many "re list-watch"
Too long watchSeconds value will cause the proxy to terminate the connection early

do we have to use an proxy before apiserver ?

@tangzhenhuang
Copy link
Contributor Author

Too short watchSeconds value will produce many "re list-watch" Too long watchSeconds value will cause the proxy to terminate the connection early

do we have to use an proxy before apiserver ?

In actual usage scenarios, such as Alibaba Cloud, AWS, Azure, etc., the apiserver will have a proxy

@tzssangglass
Copy link
Member

Too short watchSeconds value will produce many "re list-watch" Too long watchSeconds value will cause the proxy to terminate the connection early

do we have to use an proxy before apiserver ?

In fact, if you use resty.http or ngx.tcp.socket, even if you don't set the timeout, there will be a default timeout, which is 60 s as I remember.

@zhixiongdu027
Copy link
Contributor

zhixiongdu027 commented Nov 16, 2022

In fact, if you use resty.http or ngx.tcp.socket, even if you don't set the timeout, there will be a default timeout, which is 60 s as I remember.

The problem is not here, and in the code it is already set
httpc:set_timeouts

local function watch(httpc, apiserver, informer)
local watch_times = 8
for _ = 1, watch_times do
local watch_seconds = 1800 + math.random(9, 999)
informer.overtime = watch_seconds
local http_seconds = watch_seconds + 120
httpc:set_timeouts(2000, 3000, http_seconds * 1000)

The problem is that in a network topology like the following
discovery --(1)--> proxy --(2)--> apiserver

Position(1) does not match timeout policy for Position(2)

@tzssangglass

@zhixiongdu027
Copy link
Contributor

@crazyMonkey1995 @tokers @tzssangglass

I would like to make a PR for "support configuration watchSeconds and retryInterval" latter

@tzssangglass
Copy link
Member

The problem is that in a network topology like the following
discovery --(1)--> proxy --(2)--> apiserver

we can make 2000, 3000, http_seconds * 1000 in the code httpc:set_timeouts(2000, 3000, http_seconds * 1000) be configurabled by the user.

How about making timeout a configurable parameter? Because the user himself knows what the timeout of the target apiserver (or its proxy) is.

As described here, the user needs to configure the timeout to be smaller than the proxy.

@zhixiongdu027
Copy link
Contributor

I would like to make a PR for "support configuration watchSeconds and retryInterval" latter

I tend to use a config in the following format, or any other suggestions ?

kubernetes:
    service:  ...
    client:    ...
    retry_interval: 30
    min_watch:    1800
    max_watch:   2000

@crazyMonkey1995 @tokers @tzssangglass @spacewander

@tzssangglass
Copy link
Member

kubernetes:
    service:  ...
    client:    ...
    retry_interval: 30
    min_watch:    1800
    max_watch:   2000

what about

kubernetes:
    service:  ...
    client:    ...
    retry_interval: 30
    watch: 
      connect: 
      send:
      read:

@ro4i7
Copy link

ro4i7 commented Mar 12, 2023

Hello @spacewander @tokers @tzssangglass @crazyMonkey1995

if this issue is still open, please assign it to me:
please give the feedback on following solution:

To solve this issue, we can add some configuration items to the Kubernetes service discovery such as the duration of a watch, retry time, or strategy, as shown below:

service:
  client:
    retry_interval: 30
  watch:
    duration: 60
    retry_strategy: exponential_backoff

In this configuration, the duration of a watch is set to 60 seconds, and the retry strategy is set to exponential backoff. The retry interval is set to 30 seconds, which means that the client will retry connecting to the service after 30 seconds if the initial connection attempt fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants