Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions models/alibaba/qwen3-235b-a22b-fp8/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ kind: ModelSpec
metadata:
name: qwen3-235b-a22b-fp8
spec:
config:
maxTokens: 40960
Comment on lines +6 to +7

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding maxTokens is good for configuring the model's context window. However, ensure that this value aligns with the model's capabilities and doesn't exceed its maximum supported context length, which could lead to unexpected behavior or errors. It's important to validate that the model functions correctly with this specific maxTokens value.

deployments:
- customRuntimeArgs: []
- customRuntimeArgs:
- --enable_reasoning
- --reasoning_parser=deepseek_r1
resourceRequirements:
cpu: 16
gpuCount: 8
gpuType: nvidia-vgpu
gpuType: vgpu
memory: 640
perGPUMemoryGB: 80
runtime: vllm
Expand Down