fix: improve vLLM plugin compatibility and NCCL receive handling#109
fix: improve vLLM plugin compatibility and NCCL receive handling#109chaokunyang merged 1 commit intoinclusionAI:mainfrom
Conversation
|
Test results (docker image): |
There was a problem hiding this comment.
Code Review
This pull request implements compatibility for newer vLLM versions by dynamically importing components and patching the build_app function to register Awex routes. It also refines PyTorch version checks and adds support for non-contiguous tensors in weight synchronization tests. Review feedback identifies a security vulnerability regarding unauthenticated endpoints and suggests a more direct method for retrieving the PyTorch version.
| @router.post("/areal_awex_init") | ||
| async def awex_init(request: AwexInitRequest, raw_request: Request): |
There was a problem hiding this comment.
The new endpoints /areal_awex_init and /areal_awex_update are registered without any explicit authentication or authorization dependencies. Since these endpoints can trigger significant state changes (like re-initializing the NCCL group or updating model weights), they could be exploited if the vLLM server is exposed. Consider ensuring these routes are protected by the same security mechanisms (e.g., API key checks) used for the standard OpenAI-compatible endpoints.
| pg_options_param_name = ( | ||
| "backend_options" if str(torch.__version__) >= "2.6" else "pg_options" | ||
| "backend_options" | ||
| if Version(version("torch")) >= Version("2.6") |
There was a problem hiding this comment.
While using packaging.version.Version correctly fixes the string comparison bug, using importlib.metadata.version("torch") is less direct and potentially less robust than using the __version__ attribute already available on the imported torch module. The metadata query can fail in certain environments (e.g., non-standard installations) even if the module is successfully loaded.
| if Version(version("torch")) >= Version("2.6") | |
| if Version(torch.__version__) >= Version("2.6") |

Summary
awex/vllm_plugin.pyfor newer vLLM OpenAI protocol/router changes and ensure Awex routes are attached viabuild_apppatching when shared router is absent.packaging.version, registry sharding strategy accessor, and formatting/cleanup updates).