-
Notifications
You must be signed in to change notification settings - Fork 42
Added vLLM plugin post for clean non-upstreamable changes #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Dhruvil Bhatt <[email protected]>
cdfe231 to
4866bfa
Compare
Signed-off-by: Dhruvil Bhatt <[email protected]>
Signed-off-by: youkaichao <[email protected]>
|
|
||
| This pattern has proven effective in production environments and scales from experimental prototypes to multi-model production deployments. | ||
|
|
||
| If you're interested in plugin-based architectures for inference systems or want to explore how to structure runtime patching in a clean way, feel free to reach out. Always happy to chat about scalable LLM deployment and design patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to reach out
how?
| ### Key takeaways: | ||
| - ✅ Use `VLLMPatch[TargetClass]` for surgical, class-level modifications | ||
| - ✅ Register via `vllm.general_plugins` entry point in `setup.py` | ||
| - ✅ Control patches with `VLLM_CUSTOM_PATCHES` environment variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please clarify that VLLM_CUSTOM_PATCHES is not a vllm standard environment variable. every users might need to choose their own env var name.
youkaichao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for contributing! looks good in general, please fix the two comments.
| - ❌ **Every vLLM upgrade breaks your patch** - Because you replaced full files, not just the individual lines of interest | ||
| - ❌ **Debugging becomes painful** - Is the bug in your patch? In unchanged vanilla code? Or because monkey patching rewired behavior unexpectedly? | ||
| - ❌ **Operational complexity grows over time** - Every vLLM release forces you to diff and re-sync your copied files - exactly the same problem as maintaining a fork, just disguised inside your Python package | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another disadvantage we hit sometimes: monkey patch doesn't work for some module. for example, monkey patch scheduler module usually doesn't work. Since scheduler is called by EngineCore process, monkey patch through process works in unexpected way usually. You need usually monkey patch EngineCore at all.
|
|
||
| This approach keeps the operational overhead minimal while maintaining long-term flexibility - something both small teams and large platform groups will appreciate. | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you mention that there are 4 kind of plugins supported by vLLM? they're called and used by diffrerent case and process.
Since the arch.png is for platform plugin while the case in the article is for generic plugin, I think it's good to explain more.
Original medium blog post published here - https://medium.com/@dhruvilbhattlm10/building-clean-maintainable-vllm-modifications-using-the-plugin-system-e80df0f62861