We are very pleased to announce the official release of vLLM Kunlun v0.10.1.1!

Going forward, if there is demand, we will continue to release patch updates and feature enhancement versions, and will periodically share the latest features and models supported by vLLM Kunlun. Stay tuned.

0.10.1.1 Release

Highlights✨

Comprehensive enhancements to multimodal capabilities now support 5+ series multimodal models, with overall inference throughput reaching up to 90% of the Axx platform.
A major breakthrough in sampling performance completely eliminates the Top-K sorting bottleneck; when enabled, end-to-end throughput can improve by up to 10× compared to the native implementation.
Quantized inference is now fully production-ready, with support for AWQ / GPTQ quantization for dense models, delivering significant gains compared to FP16:
- Significant reduction in GPU memory usage.
- Compute throughput is doubled.
Support for multi-LoRA inference.
Support for Piecewise CUDA Graph, significantly reducing scheduling and kernel launch overhead.
Support for the vLLM V1 inference engine.

Supported models

Qwen2.5
Qwen2.5-VL
Qwen3
Qwen3-MoE
GLM4.1v
GLM4.5
GLM4.5Air
GLM4.5v
InternVL25
InternVL35
QiFanVL

Operator updates🚀

KLX xtorch_ops operator library
- Added Flash-Infer Top-K / Top-P sampling operators. Compared to the original sorting-based logic, sampling-stage performance is improved by tens to hundreds of times.

BUG FIX❤️‍🩹

Fixed issues with YaRN positional encoding, resolving garbled outputs in some models when exceeding the native context length.
Fixed Rotary Positional Encoding (RoPE) precision issues.
Fixed abnormal errors when repetition_penalty > 1.
Fixed XPU INT4 data layout issues, significantly improving the performance of AWQ / GPTQ–related operators on XPU.

Known issues⚠️

Errors may occur when invoking xgrammar in Function Call scenarios.
- Cause: The relevant operators are not yet supported.
- Future: Support will be gradually added in upcoming releases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.1.1

Choose a tag to compare

Sorry, something went wrong.