Added vLLM plugin post for clean non-upstreamable changes #119

Dhruvilbhatt · 2025-11-18T19:10:01Z

Original medium blog post published here - https://medium.com/@dhruvilbhattlm10/building-clean-maintainable-vllm-modifications-using-the-plugin-system-e80df0f62861

Signed-off-by: Dhruvil Bhatt <[email protected]>

_posts/2025-11-17-vllm-plugin-system.md

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2025-11-20T10:51:46Z

_posts/2025-11-17-vllm-plugin-system.md

+
+This pattern has proven effective in production environments and scales from experimental prototypes to multi-model production deployments.
+
+If you're interested in plugin-based architectures for inference systems or want to explore how to structure runtime patching in a clean way, feel free to reach out. Always happy to chat about scalable LLM deployment and design patterns.


feel free to reach out

how?

youkaichao · 2025-11-20T10:52:33Z

_posts/2025-11-17-vllm-plugin-system.md

+### Key takeaways:
+- ✅ Use `VLLMPatch[TargetClass]` for surgical, class-level modifications
+- ✅ Register via `vllm.general_plugins` entry point in `setup.py`
+- ✅ Control patches with `VLLM_CUSTOM_PATCHES` environment variable


please clarify that VLLM_CUSTOM_PATCHES is not a vllm standard environment variable. every users might need to choose their own env var name.

youkaichao

thanks for contributing! looks good in general, please fix the two comments.

wangxiyuan · 2025-11-20T11:13:15Z

_posts/2025-11-17-vllm-plugin-system.md

+- ❌ **Every vLLM upgrade breaks your patch** - Because you replaced full files, not just the individual lines of interest
+- ❌ **Debugging becomes painful** - Is the bug in your patch? In unchanged vanilla code? Or because monkey patching rewired behavior unexpectedly?
+- ❌ **Operational complexity grows over time** - Every vLLM release forces you to diff and re-sync your copied files - exactly the same problem as maintaining a fork, just disguised inside your Python package
+


another disadvantage we hit sometimes: monkey patch doesn't work for some module. for example, monkey patch scheduler module usually doesn't work. Since scheduler is called by EngineCore process, monkey patch through process works in unexpected way usually. You need usually monkey patch EngineCore at all.

wangxiyuan · 2025-11-20T11:19:24Z

_posts/2025-11-17-vllm-plugin-system.md

+
+This approach keeps the operational overhead minimal while maintaining long-term flexibility - something both small teams and large platform groups will appreciate.
+
+---


can you mention that there are 4 kind of plugins supported by vLLM? they're called and used by diffrerent case and process.

exmaple: https://github.com/wangxiyuan/vllm/blob/1f5ba1fea6171e82126fea80c521a79522b7a30d/vllm/plugins/__init__.py#L14-L22

Since the arch.png is for platform plugin while the case in the article is for generic plugin, I think it's good to explain more.

vercel bot deployed to Preview November 18, 2025 19:10 View deployment

Added vLLM plugin post for clean non-upstreamable changes

4866bfa

Signed-off-by: Dhruvil Bhatt <[email protected]>

Dhruvilbhatt force-pushed the bhattdbh/plugin-blog branch from cdfe231 to 4866bfa Compare November 18, 2025 22:57

vercel bot deployed to Preview November 18, 2025 22:57 View deployment

Updated layout/title/note section in the blog

6d8bf33

Signed-off-by: Dhruvil Bhatt <[email protected]>

vercel bot deployed to Preview November 18, 2025 23:21 View deployment

Merge branch 'vllm-project:main' into bhattdbh/plugin-blog

5011b44

vercel bot deployed to Preview November 19, 2025 18:09 View deployment

youkaichao reviewed Nov 20, 2025

View reviewed changes

_posts/2025-11-17-vllm-plugin-system.md Outdated Show resolved Hide resolved

remove duplicate title

433244f

Signed-off-by: youkaichao <[email protected]>

vercel bot deployed to Preview November 20, 2025 10:48 View deployment

youkaichao reviewed Nov 20, 2025

View reviewed changes

wangxiyuan reviewed Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added vLLM plugin post for clean non-upstreamable changes #119

Added vLLM plugin post for clean non-upstreamable changes #119

Dhruvilbhatt commented Nov 18, 2025

Uh oh!

Uh oh!

youkaichao Nov 20, 2025

Uh oh!

youkaichao Nov 20, 2025

Uh oh!

youkaichao left a comment

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		This pattern has proven effective in production environments and scales from experimental prototypes to multi-model production deployments.

		If you're interested in plugin-based architectures for inference systems or want to explore how to structure runtime patching in a clean way, feel free to reach out. Always happy to chat about scalable LLM deployment and design patterns.


		This approach keeps the operational overhead minimal while maintaining long-term flexibility - something both small teams and large platform groups will appreciate.

		---

Added vLLM plugin post for clean non-upstreamable changes #119

Are you sure you want to change the base?

Added vLLM plugin post for clean non-upstreamable changes #119

Conversation

Dhruvilbhatt commented Nov 18, 2025

Uh oh!

Uh oh!

youkaichao Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants