Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ai-token-ratelimit 插件配置了限流不生效 #1826

Open
1 task done
Colstuwjx opened this issue Feb 27, 2025 · 11 comments
Open
1 task done

ai-token-ratelimit 插件配置了限流不生效 #1826

Colstuwjx opened this issue Feb 27, 2025 · 11 comments
Assignees

Comments

@Colstuwjx
Copy link

If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via ASRC(Alibaba Security Response Center) where the issue will be triaged appropriately.

  • I have searched the issues of this repository and believe that this is not a duplicate.

Ⅰ. Issue Description

参考 文档 配置了一个 ratelimit 限流,但是实际去请求时并没有实现每分钟 token 级别的限流效果。

Ⅱ. Describe what happened

在 default namespace 下安装了 redis:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis
        ports:
        - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: redis
spec:
  ports:
  - port: 6379
    targetPort: 6379
  selector:
    app: redis

在 mcp 配置了 redis dns 和 azure openai endpoint dns:

apiVersion: networking.higress.io/v1
kind: McpBridge
metadata:
  name: default
  namespace: higress-system
spec:
  registries:
  - domain: redis.default.svc.cluster.local # Kubernetes Service
    name: redis
    type: dns
    port: 6379
  - domain: openai1-xxx.openai.azure.com
    name: llm-gpt-4o-01.internal
    port: 443
    protocol: https
    type: dns

同时在 ingress 里配置了 redis:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    higress.io/destination: redis.dns
    higress.io/ignore-path-case: "false"
  labels:
    higress.io/resource-definer: higress
  name: redis
spec:
  ingressClassName: higress
  rules:
  - http:
      paths:
      - backend:
          resource:
            apiGroup: networking.higress.io
            kind: McpBridge
            name: default
        path: /
        pathType: Prefix

wasm plugin:

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-token-ratelimit
  namespace: higress-system
spec:
  defaultConfig:
    rule_name: default_limit_by_header_beartoken
    rule_items:
    - limit_by_header: Authorization
      limit_keys:
      - key: "*"
        token_per_minute: 1
    redis:
      # 默认情况下,为了减轻数据面的压力,Higress 的 global.onlyPushRouteCluster 配置参数被设置为 true,意味着不会自动发现 Kubernetes Service
      # 如果需要使用 Kubernetes Service 作为服务发现,可以将 global.onlyPushRouteCluster 参数设置为 false,
      # 这样就可以直接将 service_name 设置为 Kubernetes Service, 而无须为 Redis 创建 McpBridge 以及 Ingress 路由
      # service_name: redis.default.svc.cluster.local
      service_name: redis.dns
      service_port: 6379
  url: oci://higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-token-ratelimit:1.0.0
  phase: UNSPECIFIED_PHASE
  priority: 600

业务服务的 ingress (wasmplugin 太多了,就不贴了):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    higress.io/destination: llm-gpt-4o-01.internal.dns:443
  labels:
    higress.io/domain_gateway.local: "true"
    # higress.io/internal: "true"
    higress.io/resource-definer: higress
  name: ai-route-test-ui-gpt4o.internal
  namespace: higress-system
spec:
  ingressClassName: higress
  rules:
  - host: gateway.local
    http:
      paths:
      - backend:
          resource:
            apiGroup: networking.higress.io
            kind: McpBridge
            name: default
        path: /ui/
        pathType: Prefix

本地请求时发现并没有实现限流效果:

# 正常返回,没有限流作用
curl -sv http://gateway.local/ui/v1/chat/completions \
    -X POST \
    -H 'Authorization: Bear sk-test12345' \
    -H 'Content-Type: application/json' \
    -d \
'{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
}'
*   Trying 127.0.0.1:80...
* Connected to gateway.local (127.0.0.1) port 80
> POST /ui/v1/chat/completions HTTP/1.1
> Host: gateway.local
> User-Agent: curl/8.9.1
> Accept: */*
> Authorization: Bear sk-test12345
> Content-Type: application/json
> Content-Length: 104
>
* upload completely sent off: 104 bytes
< HTTP/1.1 200 OK
< x-content-type-options: nosniff
< x-accel-buffering: no
< apim-request-id: b8f40f54-de96-458d-997e-2fb06f74c6a6
< x-ratelimit-remaining-tokens: 1399358
< x-request-id: fa6b77c8-d89e-4b1e-9bfd-2d81be9d8b4c
< x-ms-client-request-id: b8f40f54-de96-458d-997e-2fb06f74c6a6
< resp-start-time: 1740626724406
< cmp-upstream-response-duration: 1196
< x-aml-cluster: hyena-westus3-02
< ms-azureml-model-time: 1245
< azureml-model-session: v20250212-2-160127854
< req-cost-time: 2521
< req-arrive-time: 1740626721885
< x-ratelimit-remaining-requests: 1399
< content-type: application/json
< x-envoy-upstream-service-time: 2514
< x-ms-rai-invoked: true
< date: Thu, 27 Feb 2025 03:25:24 GMT
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< x-ratelimit-limit-requests: 1400
< x-ms-region: East US
< x-ratelimit-limit-tokens: 1400000
< server: istio-envoy
< transfer-encoding: chunked
< 

如果不带 Authorization header 请求的话,ratelimit plugin 会打印一条 header not found 日志,证明实际有正常运行,但是没起到限流作用:

curl -sv http://gateway.local/ui/v1/chat/completions \
    -X POST \
    -H 'Content-Type: application/json' \
    -d \
'{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
}'

# gateway 日志里
2025-02-27T02:36:41.309061Z	debug	envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1392	wasm log higress-system.ai-token-ratelimit: [ai-token-ratelimit] [81a65718-35a0-4472-a7f3-6b07a56ad0da] failed to get request header Authorization: error status returned by host: not found	thread=45

Ⅲ. Describe what you expected to happen

希望能如文档描述的那样,请求达到 token 限制时 response 返回 429,未达到限制时 x-ratelimit-remaining-tokens 可以体现剩余 token 数量,方便后续跟踪定位问题

Ⅳ. How to reproduce it (as minimally and precisely as possible)

以上

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • Higress version: 2.0.7
  • OS : macOS, kind
  • Others:
@cr7258 cr7258 self-assigned this Feb 27, 2025
@johnlanni
Copy link
Collaborator

目前限流依赖厂商返回字段里的tokens usage信息,azure这块的对接遗漏了,已经在这个PR里实现:https://github.com/alibaba/higress/pull/1818/files

已经更新到ai-proxy插件镜像的latest tag,晚些时候这个修复会同步到1.0.0的tag上

@johnlanni
Copy link
Collaborator

azure之前是不需要stream_options参数,默认会返回usage信息的,应该是后来 api 更新了。可以先更新下 ai-proxy 插件镜像,验证下问题是否得到修复

@Colstuwjx
Copy link
Author

试了下没效果?我在 wasmplugin 配置里更新了 url 地址: url: oci://higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-token-ratelimit:latest

@johnlanni
Copy link
Collaborator

johnlanni commented Feb 27, 2025

2025-02-27T02:36:41.309061Z debug envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1392 wasm log higress-system.ai-token-ratelimit: [ai-token-ratelimit] [81a65718-35a0-4472-a7f3-6b07a56ad0da] failed to get request header Authorization: error status returned by host: not found thread=45

我看到这个报错了,这个是请求没有加 Authorization 请求头吧

@Colstuwjx
Copy link
Author

我看到这个报错了,这个是请求没有加 Authorization 请求头吧

这个只是验证哈,因为没太多日志去 debug,我故意去掉 Authorization header 的情况下,token-ratelimit plugin 会打印日志,侧面证明了它是在工作的。但是目前问题还是带上 Authorization header 的情况下,实际不会去记数,我登陆 redis-cli 看了下 keys *,没有设置任何 key,如果连接 redis 有问题的话至少也应该打点错误日志的,所以比较奇怪。

@cr7258
Copy link
Collaborator

cr7258 commented Feb 27, 2025

@Colstuwjx 我刚打了最新的 ai-proxy 的镜像, 你可以把 ai-proxy 改成 higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-proxy:1.0.0 再试一下

@Colstuwjx
Copy link
Author

higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-proxy:1.0.0

本来就是这个版本的,我重启 higress-gateway 会重新拉取最新版本的 plugin 吗?还是说 ai-proxy 配置也要一起删除重建

@CH3CHO
Copy link
Collaborator

CH3CHO commented Feb 28, 2025

本来就是这个版本的,我重启 higress-gateway 会重新拉取最新版本的 plugin 吗?还是说 ai-proxy 配置也要一起删除重建

随便修改一下ai-proxy的配置并保存就可以更新了

@Colstuwjx
Copy link
Author

Colstuwjx commented Feb 28, 2025

随便修改一下ai-proxy的配置并保存就可以更新了

试了下,好像还是不起作用, ai-token-ratelimit 插件还是没跑通功能,另:ai-quota 插件的功能本地没跑通,云上 redis + mcpbridge 倒是测通了 😂

@johnlanni
Copy link
Collaborator

johnlanni commented Feb 28, 2025

确定连上redis了吗,可以进容器执行 curl localhost:15000/logging?wasm=debug -X POST

连上redis可以看到所有向redis发送的RESP指令

@Colstuwjx
Copy link
Author

试了下 redis-cli -h <elasticache-domain> MONITOR,发现 ai-quota 请求是有查询的:

1740723415.104454 [0 CLIENT_IP_XXX:40426] "get" "chat_quota:gpt-4o-consumer-01"

但是 ratelimit 好像完全没请求的样子,gateway 里也没找到相关日志,但是是有配置的:

(base) root@ip-<CLIENT_IP_XXX>:~/higress# kubectl get wasmplugin ai-token-ratelimit-1.0.0 -n higress-system -o yaml
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  annotations:
    higress.io/wasm-plugin-description: Implement token rate limiting based on specific
      keys, where the key source can be URL parameters, HTTP request headers, client
      IP addresses, etc.
    higress.io/wasm-plugin-icon: https://img.alicdn.com/imgextra/i1/O1CN018iKKih1iVx287RltL_!!6000000004419-2-tps-42-42.png
    higress.io/wasm-plugin-title: AI Token Rate Limit
  creationTimestamp: "2025-02-28T02:06:07Z"
  generation: 10
  labels:
    higress.io/resource-definer: higress
    higress.io/wasm-plugin-built-in: "true"
    higress.io/wasm-plugin-category: ai
    higress.io/wasm-plugin-name: ai-token-ratelimit
    higress.io/wasm-plugin-version: 1.0.0
  name: ai-token-ratelimit-1.0.0
  namespace: higress-system
  resourceVersion: "959873"
  uid: 47f0841a-5ca8-4921-9b6b-aecd4a25af34
spec:
  defaultConfigDisable: true
  failStrategy: FAIL_OPEN
  matchRules:
  - config:
      redis:
        password: ""
        service_name: redis.dns
        service_port: 6379
        username: ""
      rule_items:
      - limit_by_header: Authorization
        limit_keys:
        - key: '*'
          token_per_minute: 3
      rule_name: default_rule
    configDisable: true
    ingress:
    - ai-route-gpt-4o.internal
  phase: UNSPECIFIED_PHASE
  priority: 600
  url: oci://higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/ai-token-ratelimit:1.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants