Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Userspace auth incorrectly deny connection when Kmesh restart. #1265

Open
YaoZengzeng opened this issue Mar 5, 2025 · 1 comment · May be fixed by #1275
Open

Userspace auth incorrectly deny connection when Kmesh restart. #1265

YaoZengzeng opened this issue Mar 5, 2025 · 1 comment · May be fixed by #1275
Labels
kind/bug Something isn't working
Milestone

Comments

@YaoZengzeng
Copy link
Member

What happened:

When I try to fix #1192 , I find a lot of 503 issues:

    restart_test.go:66: Minimum success threshold, 1.000000, was not met. 8590/8593 (0.999651) requests failed: 3 errors occurred:
                * request 5681: failed calling enrolled-to-kmesh (cluster=cluster-0)->'http://service-with-waypoint-at-service-granularity.echo-1-36558.svc.cluster.local:80': call failed from enrolled-to-kmesh (cluster=cluster-0) to http://service-with-waypoint-at-service-granularity.echo-1-36558.svc.cluster.local:80 (using http): response[0]: expected response code `200`, got "503". Response: RawContent:       [0] Url=http://service-with-waypoint-at-service-granularity.echo-1-36558.svc.cluster.local:80
        [0] SourceIP=fd00:10:244:1::6
        [0] Latency=228.355062ms
        [0] ActiveRequests=1
        [0] StatusCode=503
        [0] ResponseHeader=Content-Length:95
        [0] ResponseHeader=Content-Type:text/plain
        [0] ResponseHeader=Date:Wed, 05 Mar 2025 07:50:04 GMT
        [0] ResponseHeader=Server:envoy
        [0] ResponseHeader=X-Envoy-Decorator-Operation:service-with-waypoint-at-service-granularity.echo-1-36558.svc.cluster.local:80/*
        [0] ResponseHeader=X-Envoy-Upstream-Service-Time:209
        [0 body] upstream connect error or disconnect/reset before headers. reset reason: connection termination
        
        ID:               
        Method:           
        Protocol:         
        Alpn:             
        URL:              
        Version:          
        Port:             
        Code:             503
        Host:             
        Hostname:         
        Cluster:          
        IstioVersion:     
        IP:               fd00:10:244:1::6
        Request Headers:  map[]
        Response Headers: map[Content-Length:[95] Content-Type:[text/plain] Date:[Wed, 05 Mar 2025 07:50:04 GMT] Server:[envoy] X-Envoy-Decorator-Operation:[service-with-waypoint-at-service-granularity.echo-1-36558.svc.cluster.local:80/*] X-Envoy-Upstream-Service-Time:[209]]

By capturing the packet, I found that the waypoint received a TCP RST. After investigation, it was because Kmesh RBAC rule was triggered.

Although I didn't deploy any auth policy, XDP will still pass the packet to the userspace for checking.

If Kmesh has just been restarted and has not yet sync with istio, it may cause rbac deny, ref: https://github.com/kmesh-net/kmesh/blob/main/pkg/auth/rbac.go#L184

What you expected to happen:

If no auth policy is configured, then xdp should not send the packet to user space for rbac check?

And we should also consider the situation that if Kmesh is restarted and has not yet sync completely with istio, but auth processing is triggered.

cc @supercharge-xsy @weli-l

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kmesh version:
  • Kmesh mode(kmesh has Kernel-Native Mode and Duel-Engine Mode):
  • Istio version:
  • Kernel version:
  • Others:
@YaoZengzeng YaoZengzeng added the kind/bug Something isn't working label Mar 5, 2025
@hzxuzhonghu
Copy link
Member

According to our meeting discussion, we should do the following to tackle this problem

  • Waiting for all the xds cache synced, before starting the rbac module, by that it doesnot solve this problem directly, but will prevent data inconsistence after restart.
  • Add a flag to indicate whether a workkload has bounded auth policy, only call userspace when it has.
  • For the immediate fix, should we allow the packet pass if no destination workload found or keep deny.

@hzxuzhonghu hzxuzhonghu added this to the v1.1 milestone Mar 6, 2025
@weli-l weli-l linked a pull request Mar 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants