-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Open
Description
Checklist
- I searched related issues but found no solution.
- The bug persists in the latest version.
- Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- Please use English. Otherwise, it will be closed.
Describe the bug
Running multiple tool calls with Qwen3-Next-80B-A3B-Thinking doesn't work as expected with speculative decoding. When tool_choice="auto", tool calls are unparsed, and returned within content. When tool_choice="required", only the first tool call is returned. With single tool calls, both modes work correctly.
Without speculative decoding, multiple tool calls are still problematic. Only with tool_choice="auto" and stream=False output is correct. Any other configuration produces only the first tool call.
This behaviour is observed in 0.5.8. The last functional version seems to be 0.5.3rc0, but i didn't check it with speculative decoding enabled.
Reproduction
Running command:
containers:
command:
- python3
- "-m"
- sglang.launch_server
args:
- "--served-model-name"
- qwen3-next-80b-a3b-thinking-fp8
- "--file-storage-path"
- /models
- "--model-path"
- >-
/models/models--Qwen--Qwen3-Next-80B-A3B-Thinking-FP8--1a28d48a94e799860201879be67616b9e21c4bd2/
- "--host"
- 0.0.0.0
- "--port"
- "8080"
- "--max-running-requests"
- "32"
- "--dp-size"
- "1"
- "--tp-size"
- "1"
- "--tool-call-parser"
- qwen25
- "--reasoning-parser"
- deepseek-r1
- "--speculative-algo"
- EAGLE
- "--speculative-num-steps"
- "3"
- "--speculative-eagle-topk"
- "1"
- "--speculative-num-draft-tokens"
- "4"
- "--enable-metrics"
- "--log-level"
- debug
Request template:
curl --request POST \
--url http://localhost:30000/v1/chat/completions \
--header 'authorization: Bearer {{api_key}}' \
--header 'content-type: application/json' \
--data '{
"model": "qwen3-next-80b-a3b-thinking",
"messages": [
{
"role": "user",
"content": "What is the weather today in New York City, Baltimore and Minneapolis? What about Los Angeles?"
}
],
"temperature": 0,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. '\''San Francisco'\''"
},
"state": {
"type": "string",
"description": "the two-letter abbreviation for the state that the city is in, e.g. '\''CA'\'' which would mean '\''California'\''"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": [
"celsius",
"fahrenheit"
]
}
},
"required": [
"city",
"state",
"unit"
]
}
}
}
]
}'
- Response with speculative decoding enabled and
tool_choice="auto":
{
"id": "f39dc9af9dd5425a85f92a0d5d66b3f7",
"object": "chat.completion",
"created": 1770018233,
"model": "qwen3-next-80b-a3b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n</tools>\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}}\n</tools>\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}}\n</tools>\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}}\n</tools>",
"reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to get the weather for one location at a time. Wait, the function parameters require city, state, and unit. So each city needs to be specified with its state abbreviation.\n\nFirst, I need to figure out the state for each city. Let's check:\n\n- New York City: The state is NY (New York). But wait, sometimes people might confuse it with New York State, but NYC is in NY.\n- Baltimore: That's in Maryland, so MD.\n- Minneapolis: Minnesota, so MN.\n- Los Angeles: California, CA.\n\nBut the user didn't specify the states, so I need to assume the correct state abbreviations for each city. The tool requires the state as a two-letter abbreviation, so I have to make sure I get those right.\n\nNow, the function call for each city. The user is asking for all four cities, but the tool can only handle one at a time. So I need to make four separate tool calls. But the problem is, the assistant can only call one function at a time, right? Wait, the instructions say \"for each function call, return a json object...\" So maybe I need to make multiple tool calls, each for a different city.\n\nBut how does the system handle multiple tool calls? The user's question has four cities, so I need to generate four separate tool calls. Let me check the tool definition again. The function get_current_weather requires city, state, and unit. The unit is required, but the user didn't specify the unit. So maybe I should default to one, like celsius or fahrenheit. But the problem is the user didn't specify, so perhaps the assistant should choose a default. However, the tool's parameters have an enum for unit: celsius or fahrenheit. So maybe the assistant needs to pick one. But the user didn't specify, so perhaps the assistant should ask for clarification. Wait, but the problem says \"you may call one or more functions to assist with the user query\". So maybe the assistant should proceed with a default unit, like celsius, or maybe the user expects fahrenheit since it's common in the US. But the problem is, the user didn't specify, so perhaps the assistant should choose a default. However, the tool requires the unit, so the assistant must provide it. So for each city, I need to specify the unit. Let's assume the user wants Fahrenheit, as it's commonly used in the US for weather. So unit: \"fahrenheit\".\n\nSo for each city:\n\n1. New York City: city \"New York City\", state \"NY\", unit \"fahrenheit\"\n2. Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n3. Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n4. Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nBut wait, the city parameter is a string. For New York City, the city name is \"New York City\", but sometimes the tool might expect just \"New York\" but the user said \"New York City\", so probably \"New York City\" is correct. However, some weather APIs might require just \"New York\" but the tool's description says \"e.g. 'San Francisco'\", so the city name as given. So \"New York City\" is okay.\n\nBut let's check: for Baltimore, the state is MD. Minneapolis is MN. Los Angeles is CA. Correct.\n\nSo the assistant needs to make four separate tool calls. Each with the respective city, state, and unit.\n\nSo the first tool call for New York City:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"New York City\",\n \"state\": \"NY\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nThen Baltimore:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"Baltimore\",\n \"state\": \"MD\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nThen Minneapolis:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"Minneapolis\",\n \"state\": \"MN\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nThen Los Angeles:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"Los Angeles\",\n \"state\": \"CA\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nBut the problem is, the user's question is asking for all four cities, so the assistant should call each one. However, the system might expect multiple tool calls in the response. But how does the assistant structure the response? The instructions say \"for each function call, return a json object within <tools> tags\". So for each call, a separate <tools> block. But in the example, the user might have multiple tool calls. So the assistant should output each tool call in sequence.\n\nBut the user's question is a single query, so the assistant needs to make four separate function calls. So the response should have four tool_call blocks.\n\nWait, but the problem says \"You may call one or more functions to assist with the user query.\" So yes, multiple calls are allowed.\n\nSo the correct approach is to generate four separate tool calls, each for a different city.\n\nBut let me check if the city names are correctly formatted. For example, \"New York City\" as the city parameter. Some APIs might require \"New York\" but the tool's example uses \"San Francisco\", so probably \"New York City\" is acceptable. Similarly, \"Los Angeles\" is correct.\n\nYes, so each city's state is correctly identified.\n\nSo the assistant should output four tool calls, each with the respective city, state, and unit (assuming fahrenheit as default since the user didn't specify, and it's common in the US).\n\nAlternatively, maybe the unit should be specified as part of the parameters. Since the user didn't specify, but the tool requires it, the assistant has to choose. So perhaps the default is fahrenheit. Alternatively, maybe the assistant should ask for clarification, but the problem says \"you may call one or more functions\", so perhaps the assistant should proceed with a default.\n\nYes, so proceed with fahrenheit.\n\nSo the four tool calls as above.\n\nNow, how to format the response. The user's instruction says: for each function call, return a json object within <tools> tags.\n\nSo the response should have four <tools> blocks, each containing the JSON for a function call.\n\nBut in the example given in the problem statement, the tools are wrapped in <tools> and </tools> tags, and inside there's a JSON object. Wait, looking back:\n\n\"For each function call, return a json object with function name and arguments within <tools></tools> XML tags:\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\"\n\nWait, the example shows that each tool call is inside its own <tools> tags. So for each call, a separate <tools> block.\n\nSo the assistant's response should have four separate <tools> blocks, each containing the JSON for a single function call.\n\nSo the correct response would be four instances of:\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n</tools>\n\nand similarly for the others.\n\nYes. So the assistant needs to output four such blocks.\n\nBut let me check the exact syntax. The example shows:\n\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\n\nSo each tool call is a separate XML block with the JSON inside.\n\nTherefore, the correct response is four separate <tools> blocks, each with the respective JSON.\n\nSo for New York City, Baltimore, Minneapolis, Los Angeles.\n\nYes.\n\nSo the assistant should generate four tool calls, each for one city, with the correct state and unit.\n\nTherefore, the answer is to make four separate function calls.\n",
"tool_calls": null
},
"logprobs": null,
"finish_reason": "stop",
"matched_stop": 151645
}
],
"usage": {
"prompt_tokens": 263,
"total_tokens": 2118,
"completion_tokens": 1855,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}
- Response with speculative decoding enabled and
tool_choice="required":
{
"id": "3ee8d4c6fb4a4efb995a384e878f6a2d",
"object": "chat.completion",
"created": 1770018422,
"model": "qwen3-next-80b-a3b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to get the weather for one location at a time. Wait, the function parameters require city, state, and unit. So each city needs to be specified with its state abbreviation.\n\nFirst, I need to figure out the state for each city. Let's check:\n\n- New York City: The state is NY (New York). But wait, sometimes people might confuse it with New York State, but NYC is in NY.\n- Baltimore: That's in Maryland, so MD.\n- Minneapolis: Minnesota, so MN.\n- Los Angeles: California, CA.\n\nBut the user didn't specify the states, so I need to assume the correct state abbreviations for each city. The tool requires the state as a two-letter abbreviation, so I have to make sure I get those right.\n\nNow, the function call for each city. The user is asking for all four cities, but the tool can only handle one at a time. So I need to make four separate tool calls. But the problem is, the assistant can only call one function at a time, right? Wait, the instructions say \"for each function call, return a json object...\" So maybe I need to make multiple tool calls, each for a different city.\n\nBut how does the system handle multiple tool calls? The user's question has four cities, so I need to generate four separate tool calls. Let me check the tool definition again. The function get_current_weather requires city, state, and unit. The unit is required, but the user didn't specify the unit. So maybe I should default to one, like celsius or fahrenheit. But the problem is the user didn't specify, so perhaps the assistant should choose a default. However, the tool's parameters have an enum for unit: celsius or fahrenheit. So maybe the assistant needs to pick one. But the user didn't specify, so perhaps the assistant should ask for clarification. Wait, but the problem says \"you may call one or more functions to assist with the user query\". So maybe the assistant should proceed with a default unit, like celsius, or maybe the user expects fahrenheit since it's common in the US. But the problem is, the user didn't specify, so perhaps the assistant should choose a default. However, the tool requires the unit, so the assistant must provide it. So for each city, I need to specify the unit. Let's assume the user wants Fahrenheit, as it's commonly used in the US for weather. So unit: \"fahrenheit\".\n\nSo for each city:\n\n1. New York City: city \"New York City\", state \"NY\", unit \"fahrenheit\"\n2. Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n3. Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n4. Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nBut wait, the city parameter is a string. For New York City, the city name is \"New York City\", but sometimes the tool might expect just \"New York\" but the user said \"New York City\", so probably \"New York City\" is correct. However, some weather APIs might require just \"New York\" but the tool's description says \"e.g. 'San Francisco'\", so the city name as given. So \"New York City\" is okay.\n\nBut let's check: for Baltimore, the state is MD. Minneapolis is MN. Los Angeles is CA. Correct.\n\nSo the assistant needs to make four separate tool calls. Each with the respective city, state, and unit.\n\nSo the first tool call for New York City:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"New York City\",\n \"state\": \"NY\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nThen Baltimore:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"Baltimore\",\n \"state\": \"MD\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nThen Minneapolis:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"Minneapolis\",\n \"state\": \"MN\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nThen Los Angeles:\n\n{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"Los Angeles\",\n \"state\": \"CA\",\n \"unit\": \"fahrenheit\"\n }\n}\n\nBut the problem is, the user's question is asking for all four cities, so the assistant should call each one. However, the system might expect multiple tool calls in the response. But how does the assistant structure the response? The instructions say \"for each function call, return a json object within <tools> tags\". So for each call, a separate <tools> block. But in the example, the user might have multiple tool calls. So the assistant should output each tool call in sequence.\n\nBut the user's question is a single query, so the assistant needs to make four separate function calls. So the response should have four tool_call blocks.\n\nWait, but the problem says \"You may call one or more functions to assist with the user query.\" So yes, multiple calls are allowed.\n\nSo the correct approach is to generate four separate tool calls, each for a different city.\n\nBut let me check if the city names are correctly formatted. For example, \"New York City\" as the city parameter. Some APIs might require \"New York\" but the tool's example uses \"San Francisco\", so probably \"New York City\" is acceptable. Similarly, \"Los Angeles\" is correct.\n\nYes, so each city's state is correctly identified.\n\nSo the assistant should output four tool calls, each with the respective city, state, and unit (assuming fahrenheit as default since the user didn't specify, and it's common in the US).\n\nAlternatively, maybe the unit should be specified as part of the parameters. Since the user didn't specify, but the tool requires it, the assistant has to choose. So perhaps the default is fahrenheit. Alternatively, maybe the assistant should ask for clarification, but the problem says \"you may call one or more functions\", so perhaps the assistant should proceed with a default.\n\nYes, so proceed with fahrenheit.\n\nSo the four tool calls as above.\n\nNow, how to format the response. The user's instruction says: for each function call, return a json object within <tools> tags.\n\nSo the response should have four <tools> blocks, each containing the JSON for a function call.\n\nBut in the example given in the problem statement, the tools are wrapped in <tools> and </tools> tags, and inside there's a JSON object. Wait, looking back:\n\n\"For each function call, return a json object with function name and arguments within <tools></tools> XML tags:\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\"\n\nWait, the example shows that each tool call is inside its own <tools> tags. So for each call, a separate <tools> block.\n\nSo the assistant's response should have four separate <tools> blocks, each containing the JSON for a single function call.\n\nSo the correct response would be four instances of:\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n</tools>\n\nand similarly for the others.\n\nYes. So the assistant needs to output four such blocks.\n\nBut let me check the exact syntax. The example shows:\n\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\n\nSo each tool call is a separate XML block with the JSON inside.\n\nTherefore, the correct response is four separate <tools> blocks, each with the respective JSON.\n\nSo for New York City, Baltimore, Minneapolis, Los Angeles.\n\nYes.\n\nSo the assistant should generate four tool calls, each for one city, with the correct state and unit.\n\nTherefore, the answer is to make four separate function calls.\n",
"tool_calls": [
{
"id": "call_430ce9387eb54338a820502b",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}"
}
}
]
},
"logprobs": null,
"finish_reason": "tool_calls",
"matched_stop": null
}
],
"usage": {
"prompt_tokens": 263,
"total_tokens": 2004,
"completion_tokens": 1741,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}
- Response with speculative decoding disabled and
tool_choice="auto":
{
"id": "dce0a8d770b248228cbb3d571f032267",
"object": "chat.completion",
"created": 1770021123,
"model": "qwen3-next-80b-a3b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to call the get_current_weather function once per request, right? Wait, the problem says I can call one or more functions. So maybe I need to make multiple tool calls for each city.\n\nWait, but the parameters for the function require city, state, and unit. So for each city, I need to know the state abbreviation. Let's check each city.\n\nNew York City is in New York state, so state is NY. Baltimore is in Maryland, so MD. Minneapolis is in Minnesota, MN. Los Angeles is in California, CA. The unit isn't specified, but the user didn't mention it, so maybe default to celsius or fahrenheit? The tool requires unit, so I need to choose one. The problem says the unit parameter has enum celsius or fahrenheit. Since the user didn't specify, maybe I should pick one. But the problem says the required parameters are city, state, unit. So I have to include unit. Maybe the default is celsius? Or maybe the user expects fahrenheit since it's US cities. But the tool requires it, so I need to specify. Let's assume the user wants fahrenheit, as that's common in the US. Or maybe the system defaults to a specific unit. But the problem doesn't say, so perhaps I should just pick one. Let's go with fahrenheit for all.\n\nSo for each city, I need to make a separate tool call. Let's list them:\n\n1. New York City, NY, fahrenheit\n2. Baltimore, MD, fahrenheit\n3. Minneapolis, MN, fahrenheit\n4. Los Angeles, CA, fahrenheit\n\nWait, but the user said \"What is the weather today in New York City, Baltimore and Minneapolis? What about Los Angeles?\" So four cities total. So I need to make four separate function calls.\n\nBut the tool call format requires each to be in its own XML tag. So for each city, generate a tool_call with the respective parameters.\n\nLet me check the parameters again. The function requires city, state, unit. So for each city:\n\n- New York City: city is \"New York City\", state \"NY\", unit \"fahrenheit\"\n- Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n- Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n- Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nYes. So each of these needs to be a separate tool call. So the assistant should output four tool_call blocks.\n\nWait, but the user might not know the states, but the problem says the state is required. So the assistant has to know the correct state abbreviations. For example, New York City is in NY, Baltimore in MD, Minneapolis in MN, Los Angeles in CA. That's correct.\n\nSo the correct approach is to make four separate function calls, each with the respective city, state, and unit.\n\nSo the response should have four tool_call XML blocks, each with the parameters for each city.\n\nLet me structure them.\n\nFirst for New York City:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n\nThen Baltimore:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}}\n\nMinneapolis:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}}\n\nLos Angeles:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}}\n\nYes. So each of these should be in their own tool_call tags.\n\nI need to make sure that the city names are correctly spelled. New York City is correct. Baltimore, Minneapolis, Los Angeles all correct. States are NY, MD, MN, CA.\n\nUnit is fahrenheit. Alternatively, maybe the user expects celsius, but since the problem doesn't specify, and the US typically uses fahrenheit, it's safer to use fahrenheit. But the problem says the unit is required, so the assistant has to choose one. So fahrenheit is appropriate here.\n\nSo the assistant should output four tool calls.\n",
"tool_calls": [
{
"id": "call_9efe8481e0af40bcba5caf95",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}"
}
},
{
"id": "call_a276a95b3ce440bbb5b3ebca",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}"
}
},
{
"id": "call_03dd5c2a7dab44eb809f4d44",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}"
}
},
{
"id": "call_5e09c970131f4ceab97724d1",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}"
}
}
]
},
"logprobs": null,
"finish_reason": "tool_calls",
"matched_stop": null
}
],
"usage": {
"prompt_tokens": 263,
"total_tokens": 1342,
"completion_tokens": 1079,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}
- Response with speculative decoding disabled and
tool_choice="required":
{
"id": "c6c6af595b474582bd7bb01f0c4c53dc",
"object": "chat.completion",
"created": 1770021170,
"model": "qwen3-next-80b-a3b-thinking",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to call the get_current_weather function once per request, right? Wait, the problem says I can call one or more functions. So maybe I need to make multiple tool calls for each city.\n\nWait, but the parameters for the function require city, state, and unit. So for each city, I need to know the state abbreviation. Let's check each city.\n\nNew York City is in New York state, so state is NY. Baltimore is in Maryland, so MD. Minneapolis is in Minnesota, MN. Los Angeles is in California, CA. The unit isn't specified, but the user didn't mention it, so maybe default to celsius or fahrenheit? The tool requires unit, so I need to choose one. The problem says the unit parameter has enum celsius or fahrenheit. Since the user didn't specify, maybe I should pick one. But the problem says the required parameters are city, state, unit. So I have to include unit. Maybe the default is celsius? Or maybe the user expects fahrenheit since it's US cities. But the tool requires it, so I need to specify. Let's assume the user wants fahrenheit, as that's common in the US. Or maybe the system defaults to a specific unit. But the problem doesn't say, so perhaps I should just pick one. Let's go with fahrenheit for all.\n\nSo for each city, I need to make a separate tool call. Let's list them:\n\n1. New York City, NY, fahrenheit\n2. Baltimore, MD, fahrenheit\n3. Minneapolis, MN, fahrenheit\n4. Los Angeles, CA, fahrenheit\n\nWait, but the user said \"What is the weather today in New York City, Baltimore and Minneapolis? What about Los Angeles?\" So four cities total. So I need to make four separate function calls.\n\nBut the tool call format requires each to be in its own XML tag. So for each city, generate a tool_call with the respective parameters.\n\nLet me check the parameters again. The function requires city, state, unit. So for each city:\n\n- New York City: city is \"New York City\", state \"NY\", unit \"fahrenheit\"\n- Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n- Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n- Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nYes. So each of these needs to be a separate tool call. So the assistant should output four tool_call blocks.\n\nWait, but the user might not know the states, but the problem says the state is required. So the assistant has to know the correct state abbreviations. For example, New York City is in NY, Baltimore in MD, Minneapolis in MN, Los Angeles in CA. That's correct.\n\nSo the correct approach is to make four separate function calls, each with the respective city, state, and unit.\n\nSo the response should have four tool_call XML blocks, each with the parameters for each city.\n\nLet me structure them.\n\nFirst for New York City:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n\nThen Baltimore:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}}\n\nMinneapolis:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}}\n\nLos Angeles:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}}\n\nYes. So each of these should be in their own tool_call tags.\n\nI need to make sure that the city names are correctly spelled. New York City is correct. Baltimore, Minneapolis, Los Angeles all correct. States are NY, MD, MN, CA.\n\nUnit is fahrenheit. Alternatively, maybe the user expects celsius, but since the problem doesn't specify, and the US typically uses fahrenheit, it's safer to use fahrenheit. But the problem says the unit is required, so the assistant has to choose one. So fahrenheit is appropriate here.\n\nSo the assistant should output four tool calls.\n",
"tool_calls": [
{
"id": "call_8b571dc2ba4e41e9b715c5c0",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}"
}
}
]
},
"logprobs": null,
"finish_reason": "tool_calls",
"matched_stop": null
}
],
"usage": {
"prompt_tokens": 263,
"total_tokens": 1237,
"completion_tokens": 974,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}
Environment
Python: 3.12.3 (main, Jan 8 2026, 11:30:50) [GCC 13.3.0]
CUDA available: True
GPU 0: NVIDIA H200
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 575.57.08
PyTorch: 2.9.1+cu129
sglang: 0.5.8
sgl_kernel: 0.3.21
flashinfer_python: 0.6.1
flashinfer_cubin: 0.6.1
flashinfer_jit_cache: 0.6.1+cu129
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.4.1
aiohttp: 3.13.3
fastapi: 0.128.0
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.34.0
orjson: 3.11.5
outlines: 0.1.11
packaging: 26.0
psutil: 7.2.1
pydantic: 2.12.5
python-multipart: 0.0.21
pyzmq: 27.1.0
uvicorn: 0.40.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.76.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology:
GPU0 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 NIC10 NIC11 NIC12 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE PIX NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS 0-23,96-119 0 N/A
NIC0 NODE X NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC1 PIX NODE X NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC2 NODE NODE NODE X SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS
NIC3 SYS SYS SYS SYS X SYS SYS SYS SYS SYS SYS SYS SYS NODE
NIC4 SYS SYS SYS SYS SYS X NODE NODE SYS SYS SYS SYS SYS SYS
NIC5 SYS SYS SYS SYS SYS NODE X NODE SYS SYS SYS SYS SYS SYS
NIC6 SYS SYS SYS SYS SYS NODE NODE X SYS SYS SYS SYS SYS SYS
NIC7 SYS SYS SYS SYS SYS SYS SYS SYS X PIX PXB PXB NODE SYS
NIC8 SYS SYS SYS SYS SYS SYS SYS SYS PIX X PXB PXB NODE SYS
NIC9 SYS SYS SYS SYS SYS SYS SYS SYS PXB PXB X PIX NODE SYS
NIC10 SYS SYS SYS SYS SYS SYS SYS SYS PXB PXB PIX X NODE SYS
NIC11 SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE NODE NODE X SYS
NIC12 SYS SYS SYS SYS NODE SYS SYS SYS SYS SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_5
NIC4: mlx5_6
NIC5: mlx5_7
NIC6: mlx5_8
NIC7: mlx5_9
NIC8: mlx5_10
NIC9: mlx5_11
NIC10: mlx5_12
NIC11: mlx5_13
NIC12: mlx5_bond_0
ulimit soft: 1073741816
Metadata
Metadata
Assignees
Labels
No labels