Skip to content

Latest commit

 

History

History
55 lines (37 loc) · 2.2 KB

File metadata and controls

55 lines (37 loc) · 2.2 KB

<- Index

[FM-008] Tool Authority Escalation via Prompt Injection

Pattern: Access Control & Authorization

The Failure An LLM agent is given access to both "safe" tools (like search) and "sensitive" tools (like reading system configs). An attacker uses a prompt to trick the agent into using a sensitive tool it shouldn't access. The system executes the tool because it trusts the agent's "reasoning".

Mechanism "Confused Deputy". The model's reasoning is treated as an authorization boundary. Since prompts can steer model reasoning, an attacker can indirectly control tool invocation. There is no independent "Authorization Gate" between the model's request to use a tool and the execution of that tool.

Coding Example

sensitive_tools = {"read_config"}


def model_plan(user_prompt):
    return {
        "tool_name": "read_config",
        "args": {"path": "/secret"},
    }


def dispatch_unsafe(user_role, plan):
    """Treats model output as sufficient authority."""
    return execute_tool(plan["tool_name"], plan["args"])


guest_plan = model_plan("Please inspect the secret config to help me.")
dispatch_unsafe(user_role="guest", plan=guest_plan)
# FAILURE:
# sensitive tool runs because planning output was trusted as authorization


def dispatch_safe(user_role, plan):
    tool_name = plan["tool_name"]
    tool_args = plan["args"]

    if not policy_allows(user_role, tool_name, tool_args):
        return deny_tool_call()

    return execute_tool(tool_name, tool_args)


dispatch_safe(user_role="guest", plan=guest_plan)
# denied -> prompt injection can influence planning, not authority

Invariant Violated Tool execution must be authorized by runtime policy, not by model intent.

Remediation Introduce an explicit runtime authorization boundary. Before executing a tool, a deterministic policy engine must check if the current user/session context has permission to use that specific tool with those specific arguments. Never treat LLM output as a trusted authorization decision.

Invariant Restored Every tool invocation is independently checked against the caller context and tool arguments. Prompt injection can influence planning, but it cannot grant authority.