Skip to content

RFC: Slim runtime support for agent_strategy and tool plugin invocation #102

@BenjaminX

Description

@BenjaminX

RFC: Slim runtime support for agent_strategy and tool plugin invocation

SlimRuntime currently invokes the dify-plugin-daemon-slim binary for model-only plugin actions (invoke_llm, invoke_text_embedding, invoke_rerank, invoke_tts, invoke_speech2text, invoke_moderation). The slim binary already supports two more plugin classes — agent_strategy and tool — but Graphon does not surface them. As a result:

  • BuiltinNodeTypes.AGENT is registered in the enum and BUILT_IN_NODE_TYPES, but no AgentNode class exists.
  • ToolNode requires a ToolNodeRuntimeProtocol injection at construction time and has no in-tree default implementation; downstream integrators must reimplement plugin-daemon plumbing themselves.

This RFC proposes extending SlimRuntime with two new invocation surfaces, adding the corresponding model_runtime protocols, providing a default ToolNodeRuntimeProtocol implementation backed by Slim, and shipping a first-class AgentNode.

The work splits into three independent PRs that can land in sequence.


Motivation

Dify chatflows exported as DSL today routinely include agent and tool nodes (the official langgenius/agent strategy plugin and the entire Dify Tool plugin marketplace). Any downstream Graphon integrator that wants to run those DSLs must:

  1. Re-implement the slim subprocess protocol in user code, in two places (agent + tool).
  2. Re-implement ToolNodeRuntimeProtocol from scratch.
  3. Build their own AgentNode against private contracts.

This duplicates work, drifts from upstream, and forces every integrator to learn slim's stdin/stdout JSON format. Bringing these two plugin classes into SlimRuntime keeps the existing "model_runtime is the unified plugin invocation layer" invariant, makes Graphon a complete runtime for Dify-exported DSLs, and unlocks AgentNode support.

The Slim binary itself already exposes the routes — dify-plugin-daemon's pkg/slim/remote.go registers invoke_agent_strategy → /agent_strategy/invoke and invoke_tool → /tool/invoke, and cmd/slim/main.go accepts these as -action values. The capability gap is purely on the Graphon side.


Current state (verified)

Surface Status Reference
SlimRuntime.invoke_llm ✅ implemented src/graphon/model_runtime/slim/runtime.py:318
SlimRuntime.invoke_text_embedding / _rerank / _tts / _speech_to_text / _moderation ✅ implemented same module
SlimRuntime.invoke_agent_strategy ❌ missing n/a
SlimRuntime.invoke_tool ❌ missing n/a
SlimRuntime.validate_tool_provider_credentials / get_tool_runtime_parameters ❌ missing n/a
model_runtime/protocols/agent_strategy_runtime.py ❌ missing dir contains only model-class protocols
model_runtime/protocols/tool_runtime.py ❌ missing same
ToolNodeRuntimeProtocol ✅ defined, no in-tree implementation src/graphon/nodes/runtime.py:21-57 (TODO comment at line 68-70 explicitly anticipates a default adapter)
BuiltinNodeTypes.AGENT = "agent" ✅ enum + BUILT_IN_NODE_TYPES src/graphon/enums.py:47, 75
nodes/agent/ directory + AgentNode class ❌ missing n/a
node_events/agent.py (AgentLogEvent) and graph_events/agent.py (NodeRunAgentLogEvent) ✅ already wired into base node event dispatch src/graphon/nodes/base/node.py:845

The event side is already prepared for an AgentNode to emit AgentLogEvents — only the node class and its runtime adapter are missing.


Proposed design

A. Extend SlimRuntime with three new actions

Introduce three new action strings, mirroring the established pattern of _invoke_unary_action() / _invoke_streaming_action():

Method Slim action Streaming? Mirrors
invoke_agent_strategy(...) "invoke_agent_strategy" yes (NDJSON) invoke_llm (streaming)
invoke_tool(...) "invoke_tool" yes (NDJSON) invoke_llm (streaming)
validate_tool_provider_credentials(...) "validate_tool_provider_credentials" no validate_provider_credentials
get_tool_runtime_parameters(...) "get_tool_runtime_parameters" no get_model_schema

Both streaming endpoints yield typed message objects:

  • AgentInvokeMessage (text / log / tool_call / tool_call_error / final). Aligns with dify_plugin.entities.agent.AgentInvokeMessage on the plugin side.
  • ToolInvokeMessage (text / link / image / file / json / variable / blob_chunk). Aligns with dify_plugin.entities.tool.ToolInvokeMessage.

B. New model_runtime/protocols/ files

model_runtime/protocols/
├── agent_strategy_runtime.py   # AgentStrategyRuntime(ModelProviderRuntime, Protocol)
└── tool_runtime.py             # ToolRuntime(ModelProviderRuntime, Protocol)

Each protocol exposes the methods above with @runtime_checkable and follows the same conventions as LLMModelRuntime (kw-only, provider / credentials / structured params). Both are independent capabilities — implementations may opt into either or both.

ModelRuntime (the aggregate Protocol in model_runtime/protocols/runtime.py) remains backwards-compatible: the new protocols are not mixed into the aggregate to preserve the "split by capability" invariant established in #57. Consumers depend on the narrow protocol they actually need.

C. New AgentNode and AgentNodeRuntimeProtocol

Add a node directory mirroring nodes/tool/:

nodes/agent/
├── __init__.py
├── agent_node.py     # class AgentNode(Node[AgentNodeData])
├── entities.py       # AgentNodeData, AgentParameter, etc.
└── exc.py

AgentNodeData reflects the DSL shape produced by Dify Studio v1.7+:

class AgentNodeData(BaseNodeData):
    type: NodeType = BuiltinNodeTypes.AGENT
    agent_strategy_provider_name: str   # e.g. "langgenius/agent/agent"
    agent_strategy_name: str             # e.g. "function_calling"
    agent_parameters: Mapping[str, AgentParameterValue]
    plugin_unique_identifier: str
    output_schema: Mapping[str, Any] = Field(default_factory=dict)
    tool_node_version: str = "2"

AgentParameterValue is the {type: constant|variable|...; value: ...} typed-input wrapper Dify Studio emits.

Add AgentNodeRuntimeProtocol next to ToolNodeRuntimeProtocol in nodes/runtime.py:

class AgentNodeRuntimeProtocol(Protocol):
    def get_strategy_handle(...) -> AgentStrategyRuntimeHandle: ...
    def invoke(...) -> Generator[AgentInvokeMessage, None, None]: ...
    def get_usage(...) -> LLMUsage: ...

AgentNode._run() dispatches AgentInvokeMessage to graphon's existing AgentLogEvent and StreamChunkEvent channels (the dispatch helper at nodes/base/node.py:845 is already wired).

D. Default ToolNodeRuntimeProtocol implementation backed by SlimRuntime

Provide a concrete SlimToolNodeRuntime (and SlimAgentNodeRuntime) under, e.g., model_runtime/slim/node_adapters.py:

class SlimToolNodeRuntime(ToolNodeRuntimeProtocol):
    def __init__(self, runtime: SlimRuntime, *, file_factory: ...): ...
    def invoke(self, ...) -> Generator[ToolRuntimeMessage, None, None]: ...
    # ... maps SlimRuntime.invoke_tool() messages → ToolRuntimeMessage

This makes ToolNode "just work" out of the box once a SlimRuntime is configured, and resolves the # TODO: Make runtime optional once Graphon provides a default tool runtime adapter at nodes/runtime.py:68-70.

E. Test surface

  • Unit tests for each new SlimRuntime method using the existing slim test harness pattern (tests/model_runtime/slim/test_runtime.py).
  • Integration-style tests for AgentNode and SlimToolNodeRuntime using fake-slim fixtures (no real binary needed).
  • One end-to-end smoke test exercising AgentNode → SlimRuntime.invoke_agent_strategy → fake slim to validate the full event dispatch path.

Implementation plan (3 PRs)

PR-1 — feat(slim): support agent_strategy and tool plugin invocation

Scope:

  • New action strings + _invoke_unary_action / _invoke_streaming_action plumbing.
  • New methods on SlimRuntime: invoke_agent_strategy, invoke_tool, validate_tool_provider_credentials, get_tool_runtime_parameters.
  • New entities: AgentInvokeMessage, ToolInvokeMessage, request/response DTOs (mirror dify-plugin-daemon's pkg/entities/requests/agent.go and tool.go).
  • New protocols: model_runtime/protocols/agent_strategy_runtime.py, model_runtime/protocols/tool_runtime.py.
  • Unit tests (slim subprocess fakes).

Non-goals: AgentNode, ToolNode adapter — both deferred to PR-2/3.

PR-2 — feat(nodes): default ToolNodeRuntimeProtocol implementation backed by SlimRuntime

Scope:

  • SlimToolNodeRuntime adapter (depends on PR-1).
  • Update ToolNode constructor TODO comment.
  • Integration tests.
  • Optional: examples/ snippet showing tool-only DSL run.

Non-goals: AgentNode.

PR-3 — feat(nodes): introduce AgentNode and SlimAgentNodeRuntime

Scope:

  • nodes/agent/ directory + AgentNode class + entities.
  • AgentNodeRuntimeProtocol in nodes/runtime.py.
  • SlimAgentNodeRuntime adapter.
  • DSL parsing of typed-input wrappers (agent_parameters).
  • Event dispatch verification (existing AgentLogEvent plumbing should require minimal touch).
  • Integration tests using langgenius/agent plugin fixtures.

Backwards compatibility

  • All additions are net-new — no existing public method signatures change.
  • ModelRuntime aggregate Protocol is unchanged: the two new capability protocols stand alongside, consistent with refactor(runtime)!: split model runtime protocols by capability #57's "split by capability" direction.
  • ToolNode's constructor still requires runtime: ToolNodeRuntimeProtocol; PR-2 only ships an adapter, it does not change the contract. Whether runtime becomes optional with a Slim-backed default is a separate decision and is out of scope here.
  • No DSL format assumptions that aren't already present in Dify Studio v1.7+ exports.

Open questions

  1. AgentNode data-model fidelity. Dify Studio's agent_parameters uses a typed-input wrapper ({type: constant|variable|selector; value: ...}). Should AgentNodeData keep this shape (clean DSL round-trip) or pre-resolve to plain dicts (cleaner internal API)? PR-3 currently proposes "keep the wrapper, resolve at _run() boundary."

  2. ToolNodeRuntimeProtocol default constructor argument. Once SlimToolNodeRuntime exists, should ToolNode.__init__ accept runtime=None and fall back to a global SlimRuntime set on GraphInitParams.run_context, or stay strictly explicit? Maintainers' call.

  3. Streaming protocol for invoke_tool. Slim's tool action emits ToolInvokeMessage chunks. Should SlimRuntime.invoke_tool() return Generator[ToolInvokeMessage, None, None] directly, or pre-buffer non-streaming tools? Proposing: always-streaming, callers buffer if they need.

  4. Plugin daemon vs slim local mode. The agent_strategy plugin (langgenius/agent) calls back into the daemon during a single invocation (e.g. to invoke tools and LLMs). In strict local-slim mode without a remote daemon, this nested-call pattern may be unsupported. Should this RFC require remote-daemon mode for AgentNode, or is local-slim sufficient? Needs clarification from maintainers familiar with pkg/slim/local.go vs pkg/slim/remote.go semantics.

  5. Versioning of plugin schemas. Dify DSL emits tool_node_version: '2' on agent nodes. Should AgentNode validate the supported versions explicitly and reject unknown ones, or be permissive?


Acceptance criteria

  • PR-1 merged: SlimRuntime exposes invoke_agent_strategy and invoke_tool with passing tests.
  • PR-2 merged: a Graphon integrator can construct ToolNode using only SlimRuntime + the shipped adapter, no custom protocol code.
  • PR-3 merged: a Dify-exported chatflow YAML containing an agent node executes end-to-end via Graph.init() + GraphEngine.run() without external implementations of agent/tool runtime.
  • No regressions in existing slim tests or model_runtime protocols.
  • make tc and make test pass.

Prior art / references

  • dify-plugin-daemon request schemas: pkg/entities/requests/agent.go, pkg/entities/requests/tool.go.
  • dify-plugin-daemon HTTP route table: internal/server/controllers/definitions/definitions.go.
  • dify-plugin-daemon slim entrypoint: cmd/slim/main.go, pkg/slim/local.go, pkg/slim/remote.go.
  • dify_plugin Python SDK Agent template: cmd/commandline/plugin/templates/python/agent_strategy.py.
  • Existing Graphon pattern to mirror: SlimRuntime.invoke_llm (src/graphon/model_runtime/slim/runtime.py:318+) and LLMModelRuntime (src/graphon/model_runtime/protocols/llm_runtime.py).

Looking for feedback on

  • Whether maintainers consider AgentNode in scope for the core repo or prefer it stays in a downstream package.
  • Reactions to splitting the work as PR-1 → PR-2 → PR-3 vs a single bundled PR.
  • Open questions 1–5 above.
  • Whether anyone is already working on this so I can avoid duplication.

I have a working prototype against my fork that I am preparing to extract into PRs once the design direction is agreed.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions