Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/static/deprecated-llama-stack-spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -9024,6 +9024,10 @@
"$ref": "#/components/schemas/OpenAIResponseUsage",
"description": "(Optional) Token usage information for the response"
},
"instructions": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai's docs say this can be a string or an array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ashwinb, for reviewing! I have updated the definition using openai/types/responses as my reference. Please take a look when you get a chance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this shows a String only but the Python client is definitely the source of truth!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I tried this with the Python client talking directly to OpenAI (not Llama Stack):

r3 = open_ai_client.responses.create(
    model="gpt-4o",
    input="What is the capital of France?",
    instructions=["Always answer with rhyming poetry.", "Include the word 'cat' in your answer."]
)

What I got was:

BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'instructions': expected a string, but got an array instead.", 'type': 'invalid_request_error', 'param': 'instructions', 'code': 'invalid_type'}}

So the OpenAI client does indeed accept the list and send it off to the server but then the server rejected it because the server API expects a string only. So I guess the API reference that @leseb links to above is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwm4, that makes sense! The link that @leseb shared brings me to the create endpoint which accepts string, but the response object can store instructions as a string or an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the conclusion that this PR is fine the way it is? (i.e., that Llama Stack should really just accept a string and not a list for instructions)

Copy link
Contributor Author

@s-akhtar-baig s-akhtar-baig Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, we will be conforming to the OpenAI spec:

  • create response accepts instructions as a string
  • response object can store either a string or an array
  • instructions from previous response is not carried over

But, I am not sure whether that's what we want in Llama Stack or not. If we change create response to accept both a string and an array, then we won't be conforming to the spec and we don't have use-cases (that I can think of) to correctly implement logic that will handle array data type.

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [7]: from openai import OpenAI; client = OpenAI()

In [8]: openai.__version__
Out[8]: '1.107.0'

In [9]: client.responses.create.__annotations__
Out[9]:
{'background': 'Optional[bool] | NotGiven',
 'conversation': 'Optional[response_create_params.Conversation] | NotGiven',
 'include': 'Optional[List[ResponseIncludable]] | NotGiven',
 'input': 'Union[str, ResponseInputParam] | NotGiven',
 'instructions': 'Optional[str] | NotGiven',
 'max_output_tokens': 'Optional[int] | NotGiven',
 'max_tool_calls': 'Optional[int] | NotGiven',
 'metadata': 'Optional[Metadata] | NotGiven',
 'model': 'ResponsesModel | NotGiven',
 'parallel_tool_calls': 'Optional[bool] | NotGiven',
 'previous_response_id': 'Optional[str] | NotGiven',
 'prompt': 'Optional[ResponsePromptParam] | NotGiven',
 'prompt_cache_key': 'str | NotGiven',
 'reasoning': 'Optional[Reasoning] | NotGiven',
 'safety_identifier': 'str | NotGiven',
 'service_tier': "Optional[Literal['auto', 'default', 'flex', 'scale', 'priority']] | NotGiven",
 'store': 'Optional[bool] | NotGiven',
 'stream': 'Optional[Literal[False]] | Literal[True] | NotGiven',
 'stream_options': 'Optional[response_create_params.StreamOptions] | NotGiven',
 'temperature': 'Optional[float] | NotGiven',
 'text': 'ResponseTextConfigParam | NotGiven',
 'tool_choice': 'response_create_params.ToolChoice | NotGiven',
 'tools': 'Iterable[ToolParam] | NotGiven',
 'top_logprobs': 'Optional[int] | NotGiven',
 'top_p': 'Optional[float] | NotGiven',
 'truncation': "Optional[Literal['auto', 'disabled']] | NotGiven",
 'user': 'str | NotGiven',
 'extra_headers': 'Headers | None',
 'extra_query': 'Query | None',
 'extra_body': 'Body | None',
 'timeout': 'float | httpx.Timeout | None | NotGiven',
 'return': 'Response | Stream[ResponseStreamEvent]'}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah looks like the client (as mentioned by @s-akhtar-baig) only lists instructions as a str

"type": "string",
"description": "(Optional) System message inserted into the model's context"
},
"input": {
"type": "array",
"items": {
Expand Down Expand Up @@ -9901,6 +9905,10 @@
"usage": {
"$ref": "#/components/schemas/OpenAIResponseUsage",
"description": "(Optional) Token usage information for the response"
},
"instructions": {
"type": "string",
"description": "(Optional) System message inserted into the model's context"
}
},
"additionalProperties": false,
Expand Down
8 changes: 8 additions & 0 deletions docs/static/deprecated-llama-stack-spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6734,6 +6734,10 @@ components:
$ref: '#/components/schemas/OpenAIResponseUsage'
description: >-
(Optional) Token usage information for the response
instructions:
type: string
description: >-
(Optional) System message inserted into the model's context
input:
type: array
items:
Expand Down Expand Up @@ -7403,6 +7407,10 @@ components:
$ref: '#/components/schemas/OpenAIResponseUsage'
description: >-
(Optional) Token usage information for the response
instructions:
type: string
description: >-
(Optional) System message inserted into the model's context
additionalProperties: false
required:
- created_at
Expand Down
8 changes: 8 additions & 0 deletions docs/static/llama-stack-spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -7600,6 +7600,10 @@
"$ref": "#/components/schemas/OpenAIResponseUsage",
"description": "(Optional) Token usage information for the response"
},
"instructions": {
"type": "string",
"description": "(Optional) System message inserted into the model's context"
},
"input": {
"type": "array",
"items": {
Expand Down Expand Up @@ -8148,6 +8152,10 @@
"usage": {
"$ref": "#/components/schemas/OpenAIResponseUsage",
"description": "(Optional) Token usage information for the response"
},
"instructions": {
"type": "string",
"description": "(Optional) System message inserted into the model's context"
}
},
"additionalProperties": false,
Expand Down
8 changes: 8 additions & 0 deletions docs/static/llama-stack-spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5815,6 +5815,10 @@ components:
$ref: '#/components/schemas/OpenAIResponseUsage'
description: >-
(Optional) Token usage information for the response
instructions:
type: string
description: >-
(Optional) System message inserted into the model's context
input:
type: array
items:
Expand Down Expand Up @@ -6218,6 +6222,10 @@ components:
$ref: '#/components/schemas/OpenAIResponseUsage'
description: >-
(Optional) Token usage information for the response
instructions:
type: string
description: >-
(Optional) System message inserted into the model's context
additionalProperties: false
required:
- created_at
Expand Down
8 changes: 8 additions & 0 deletions docs/static/stainless-llama-stack-spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -9272,6 +9272,10 @@
"$ref": "#/components/schemas/OpenAIResponseUsage",
"description": "(Optional) Token usage information for the response"
},
"instructions": {
"type": "string",
"description": "(Optional) System message inserted into the model's context"
},
"input": {
"type": "array",
"items": {
Expand Down Expand Up @@ -9820,6 +9824,10 @@
"usage": {
"$ref": "#/components/schemas/OpenAIResponseUsage",
"description": "(Optional) Token usage information for the response"
},
"instructions": {
"type": "string",
"description": "(Optional) System message inserted into the model's context"
}
},
"additionalProperties": false,
Expand Down
8 changes: 8 additions & 0 deletions docs/static/stainless-llama-stack-spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7028,6 +7028,10 @@ components:
$ref: '#/components/schemas/OpenAIResponseUsage'
description: >-
(Optional) Token usage information for the response
instructions:
type: string
description: >-
(Optional) System message inserted into the model's context
input:
type: array
items:
Expand Down Expand Up @@ -7431,6 +7435,10 @@ components:
$ref: '#/components/schemas/OpenAIResponseUsage'
description: >-
(Optional) Token usage information for the response
instructions:
type: string
description: >-
(Optional) System message inserted into the model's context
additionalProperties: false
required:
- created_at
Expand Down
2 changes: 2 additions & 0 deletions llama_stack/apis/agents/openai_responses.py
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,7 @@ class OpenAIResponseObject(BaseModel):
:param tools: (Optional) An array of tools the model may call while generating a response.
:param truncation: (Optional) Truncation strategy applied to the response
:param usage: (Optional) Token usage information for the response
:param instructions: (Optional) System message inserted into the model's context
"""

created_at: int
Expand All @@ -564,6 +565,7 @@ class OpenAIResponseObject(BaseModel):
tools: list[OpenAIResponseTool] | None = None
truncation: str | None = None
usage: OpenAIResponseUsage | None = None
instructions: str | None = None


@json_schema_type
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,7 @@ async def _create_streaming_response(
tool_executor=self.tool_executor,
safety_api=self.safety_api,
guardrail_ids=guardrail_ids,
instructions=instructions,
)

# Stream the response
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ def __init__(
text: OpenAIResponseText,
max_infer_iters: int,
tool_executor, # Will be the tool execution logic from the main class
instructions: str,
safety_api,
guardrail_ids: list[str] | None = None,
):
Expand All @@ -133,6 +134,8 @@ def __init__(
self.accumulated_usage: OpenAIResponseUsage | None = None
# Track if we've sent a refusal response
self.violation_detected = False
# system message that is inserted into the model's context
self.instructions = instructions

async def _create_refusal_response(self, violation_message: str) -> OpenAIResponseObjectStream:
"""Create a refusal response to replace streaming content."""
Expand Down Expand Up @@ -176,6 +179,7 @@ def _snapshot_response(
tools=self.ctx.available_tools(),
error=error,
usage=self.accumulated_usage,
instructions=self.instructions,
)

async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]:
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading