[Bug] Broken multiple tool calls in Qwen3-Next-80B-A3B-Thinking with Speculative Decoding

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

Running multiple tool calls with Qwen3-Next-80B-A3B-Thinking doesn't work as expected with speculative decoding. When `tool_choice="auto"`, tool calls are unparsed, and returned within content. When `tool_choice="required"`, only the first tool call is returned. With single tool calls, both modes work correctly.

Without speculative decoding, multiple tool calls are still problematic. Only with `tool_choice="auto"` and `stream=False` output is correct. Any other configuration produces only the first tool call.

This behaviour is observed in 0.5.8. The last functional version seems to be 0.5.3rc0, but i didn't check it with speculative decoding enabled.

### Reproduction

Running command:
```
containers:
  command:
    - python3
    - "-m"
    - sglang.launch_server
  args:
    - "--served-model-name"
    - qwen3-next-80b-a3b-thinking-fp8
    - "--file-storage-path"
    - /models
    - "--model-path"
    - >-
      /models/models--Qwen--Qwen3-Next-80B-A3B-Thinking-FP8--1a28d48a94e799860201879be67616b9e21c4bd2/
    - "--host"
    - 0.0.0.0
    - "--port"
    - "8080"
    - "--max-running-requests"
    - "32"
    - "--dp-size"
    - "1"
    - "--tp-size"
    - "1"
    - "--tool-call-parser"
    - qwen25
    - "--reasoning-parser"
    - deepseek-r1
    - "--speculative-algo"
    - EAGLE
    - "--speculative-num-steps"
    - "3"
    - "--speculative-eagle-topk"
    - "1"
    - "--speculative-num-draft-tokens"
    - "4"
    - "--enable-metrics"
    - "--log-level"
    - debug
```

Request template:
```
curl --request POST \
  --url http://localhost:30000/v1/chat/completions \
  --header 'authorization: Bearer {{api_key}}' \
  --header 'content-type: application/json' \
  --data '{
  "model": "qwen3-next-80b-a3b-thinking",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather today in New York City, Baltimore and Minneapolis? What about Los Angeles?"
    }
  ],
  "temperature": 0,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "The city to find the weather for, e.g. '\''San Francisco'\''"
            },
            "state": {
              "type": "string",
              "description": "the two-letter abbreviation for the state that the city is in, e.g. '\''CA'\'' which would mean '\''California'\''"
            },
            "unit": {
              "type": "string",
              "description": "The unit to fetch the temperature in",
              "enum": [
                "celsius",
                "fahrenheit"
              ]
            }
          },
          "required": [
            "city",
            "state",
            "unit"
          ]
        }
      }
    }
  ]
}'
```

1) Response with speculative decoding enabled and `tool_choice="auto"`:
```
{
  "id": "f39dc9af9dd5425a85f92a0d5d66b3f7",
  "object": "chat.completion",
  "created": 1770018233,
  "model": "qwen3-next-80b-a3b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n</tools>\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}}\n</tools>\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}}\n</tools>\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}}\n</tools>",
        "reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to get the weather for one location at a time. Wait, the function parameters require city, state, and unit. So each city needs to be specified with its state abbreviation.\n\nFirst, I need to figure out the state for each city. Let's check:\n\n- New York City: The state is NY (New York). But wait, sometimes people might confuse it with New York State, but NYC is in NY.\n- Baltimore: That's in Maryland, so MD.\n- Minneapolis: Minnesota, so MN.\n- Los Angeles: California, CA.\n\nBut the user didn't specify the states, so I need to assume the correct state abbreviations for each city. The tool requires the state as a two-letter abbreviation, so I have to make sure I get those right.\n\nNow, the function call for each city. The user is asking for all four cities, but the tool can only handle one at a time. So I need to make four separate tool calls. But the problem is, the assistant can only call one function at a time, right? Wait, the instructions say \"for each function call, return a json object...\" So maybe I need to make multiple tool calls, each for a different city.\n\nBut how does the system handle multiple tool calls? The user's question has four cities, so I need to generate four separate tool calls. Let me check the tool definition again. The function get_current_weather requires city, state, and unit. The unit is required, but the user didn't specify the unit. So maybe I should default to one, like celsius or fahrenheit. But the problem is the user didn't specify, so perhaps the assistant should choose a default. However, the tool's parameters have an enum for unit: celsius or fahrenheit. So maybe the assistant needs to pick one. But the user didn't specify, so perhaps the assistant should ask for clarification. Wait, but the problem says \"you may call one or more functions to assist with the user query\". So maybe the assistant should proceed with a default unit, like celsius, or maybe the user expects fahrenheit since it's common in the US. But the problem is, the user didn't specify, so perhaps the assistant should choose a default. However, the tool requires the unit, so the assistant must provide it. So for each city, I need to specify the unit. Let's assume the user wants Fahrenheit, as it's commonly used in the US for weather. So unit: \"fahrenheit\".\n\nSo for each city:\n\n1. New York City: city \"New York City\", state \"NY\", unit \"fahrenheit\"\n2. Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n3. Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n4. Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nBut wait, the city parameter is a string. For New York City, the city name is \"New York City\", but sometimes the tool might expect just \"New York\" but the user said \"New York City\", so probably \"New York City\" is correct. However, some weather APIs might require just \"New York\" but the tool's description says \"e.g. 'San Francisco'\", so the city name as given. So \"New York City\" is okay.\n\nBut let's check: for Baltimore, the state is MD. Minneapolis is MN. Los Angeles is CA. Correct.\n\nSo the assistant needs to make four separate tool calls. Each with the respective city, state, and unit.\n\nSo the first tool call for New York City:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"New York City\",\n    \"state\": \"NY\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nThen Baltimore:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"Baltimore\",\n    \"state\": \"MD\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nThen Minneapolis:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"Minneapolis\",\n    \"state\": \"MN\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nThen Los Angeles:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"Los Angeles\",\n    \"state\": \"CA\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nBut the problem is, the user's question is asking for all four cities, so the assistant should call each one. However, the system might expect multiple tool calls in the response. But how does the assistant structure the response? The instructions say \"for each function call, return a json object within <tools> tags\". So for each call, a separate <tools> block. But in the example, the user might have multiple tool calls. So the assistant should output each tool call in sequence.\n\nBut the user's question is a single query, so the assistant needs to make four separate function calls. So the response should have four tool_call blocks.\n\nWait, but the problem says \"You may call one or more functions to assist with the user query.\" So yes, multiple calls are allowed.\n\nSo the correct approach is to generate four separate tool calls, each for a different city.\n\nBut let me check if the city names are correctly formatted. For example, \"New York City\" as the city parameter. Some APIs might require \"New York\" but the tool's example uses \"San Francisco\", so probably \"New York City\" is acceptable. Similarly, \"Los Angeles\" is correct.\n\nYes, so each city's state is correctly identified.\n\nSo the assistant should output four tool calls, each with the respective city, state, and unit (assuming fahrenheit as default since the user didn't specify, and it's common in the US).\n\nAlternatively, maybe the unit should be specified as part of the parameters. Since the user didn't specify, but the tool requires it, the assistant has to choose. So perhaps the default is fahrenheit. Alternatively, maybe the assistant should ask for clarification, but the problem says \"you may call one or more functions\", so perhaps the assistant should proceed with a default.\n\nYes, so proceed with fahrenheit.\n\nSo the four tool calls as above.\n\nNow, how to format the response. The user's instruction says: for each function call, return a json object within <tools> tags.\n\nSo the response should have four <tools> blocks, each containing the JSON for a function call.\n\nBut in the example given in the problem statement, the tools are wrapped in <tools> and </tools> tags, and inside there's a JSON object. Wait, looking back:\n\n\"For each function call, return a json object with function name and arguments within <tools></tools> XML tags:\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\"\n\nWait, the example shows that each tool call is inside its own <tools> tags. So for each call, a separate <tools> block.\n\nSo the assistant's response should have four separate <tools> blocks, each containing the JSON for a single function call.\n\nSo the correct response would be four instances of:\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n</tools>\n\nand similarly for the others.\n\nYes. So the assistant needs to output four such blocks.\n\nBut let me check the exact syntax. The example shows:\n\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\n\nSo each tool call is a separate XML block with the JSON inside.\n\nTherefore, the correct response is four separate <tools> blocks, each with the respective JSON.\n\nSo for New York City, Baltimore, Minneapolis, Los Angeles.\n\nYes.\n\nSo the assistant should generate four tool calls, each for one city, with the correct state and unit.\n\nTherefore, the answer is to make four separate function calls.\n",
        "tool_calls": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "matched_stop": 151645
    }
  ],
  "usage": {
    "prompt_tokens": 263,
    "total_tokens": 2118,
    "completion_tokens": 1855,
    "prompt_tokens_details": null,
    "reasoning_tokens": 0
  },
  "metadata": {
    "weight_version": "default"
  }
}
```

2) Response with speculative decoding enabled and `tool_choice="required"`:
```
{
  "id": "3ee8d4c6fb4a4efb995a384e878f6a2d",
  "object": "chat.completion",
  "created": 1770018422,
  "model": "qwen3-next-80b-a3b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to get the weather for one location at a time. Wait, the function parameters require city, state, and unit. So each city needs to be specified with its state abbreviation.\n\nFirst, I need to figure out the state for each city. Let's check:\n\n- New York City: The state is NY (New York). But wait, sometimes people might confuse it with New York State, but NYC is in NY.\n- Baltimore: That's in Maryland, so MD.\n- Minneapolis: Minnesota, so MN.\n- Los Angeles: California, CA.\n\nBut the user didn't specify the states, so I need to assume the correct state abbreviations for each city. The tool requires the state as a two-letter abbreviation, so I have to make sure I get those right.\n\nNow, the function call for each city. The user is asking for all four cities, but the tool can only handle one at a time. So I need to make four separate tool calls. But the problem is, the assistant can only call one function at a time, right? Wait, the instructions say \"for each function call, return a json object...\" So maybe I need to make multiple tool calls, each for a different city.\n\nBut how does the system handle multiple tool calls? The user's question has four cities, so I need to generate four separate tool calls. Let me check the tool definition again. The function get_current_weather requires city, state, and unit. The unit is required, but the user didn't specify the unit. So maybe I should default to one, like celsius or fahrenheit. But the problem is the user didn't specify, so perhaps the assistant should choose a default. However, the tool's parameters have an enum for unit: celsius or fahrenheit. So maybe the assistant needs to pick one. But the user didn't specify, so perhaps the assistant should ask for clarification. Wait, but the problem says \"you may call one or more functions to assist with the user query\". So maybe the assistant should proceed with a default unit, like celsius, or maybe the user expects fahrenheit since it's common in the US. But the problem is, the user didn't specify, so perhaps the assistant should choose a default. However, the tool requires the unit, so the assistant must provide it. So for each city, I need to specify the unit. Let's assume the user wants Fahrenheit, as it's commonly used in the US for weather. So unit: \"fahrenheit\".\n\nSo for each city:\n\n1. New York City: city \"New York City\", state \"NY\", unit \"fahrenheit\"\n2. Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n3. Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n4. Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nBut wait, the city parameter is a string. For New York City, the city name is \"New York City\", but sometimes the tool might expect just \"New York\" but the user said \"New York City\", so probably \"New York City\" is correct. However, some weather APIs might require just \"New York\" but the tool's description says \"e.g. 'San Francisco'\", so the city name as given. So \"New York City\" is okay.\n\nBut let's check: for Baltimore, the state is MD. Minneapolis is MN. Los Angeles is CA. Correct.\n\nSo the assistant needs to make four separate tool calls. Each with the respective city, state, and unit.\n\nSo the first tool call for New York City:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"New York City\",\n    \"state\": \"NY\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nThen Baltimore:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"Baltimore\",\n    \"state\": \"MD\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nThen Minneapolis:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"Minneapolis\",\n    \"state\": \"MN\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nThen Los Angeles:\n\n{\n  \"name\": \"get_current_weather\",\n  \"arguments\": {\n    \"city\": \"Los Angeles\",\n    \"state\": \"CA\",\n    \"unit\": \"fahrenheit\"\n  }\n}\n\nBut the problem is, the user's question is asking for all four cities, so the assistant should call each one. However, the system might expect multiple tool calls in the response. But how does the assistant structure the response? The instructions say \"for each function call, return a json object within <tools> tags\". So for each call, a separate <tools> block. But in the example, the user might have multiple tool calls. So the assistant should output each tool call in sequence.\n\nBut the user's question is a single query, so the assistant needs to make four separate function calls. So the response should have four tool_call blocks.\n\nWait, but the problem says \"You may call one or more functions to assist with the user query.\" So yes, multiple calls are allowed.\n\nSo the correct approach is to generate four separate tool calls, each for a different city.\n\nBut let me check if the city names are correctly formatted. For example, \"New York City\" as the city parameter. Some APIs might require \"New York\" but the tool's example uses \"San Francisco\", so probably \"New York City\" is acceptable. Similarly, \"Los Angeles\" is correct.\n\nYes, so each city's state is correctly identified.\n\nSo the assistant should output four tool calls, each with the respective city, state, and unit (assuming fahrenheit as default since the user didn't specify, and it's common in the US).\n\nAlternatively, maybe the unit should be specified as part of the parameters. Since the user didn't specify, but the tool requires it, the assistant has to choose. So perhaps the default is fahrenheit. Alternatively, maybe the assistant should ask for clarification, but the problem says \"you may call one or more functions\", so perhaps the assistant should proceed with a default.\n\nYes, so proceed with fahrenheit.\n\nSo the four tool calls as above.\n\nNow, how to format the response. The user's instruction says: for each function call, return a json object within <tools> tags.\n\nSo the response should have four <tools> blocks, each containing the JSON for a function call.\n\nBut in the example given in the problem statement, the tools are wrapped in <tools> and </tools> tags, and inside there's a JSON object. Wait, looking back:\n\n\"For each function call, return a json object with function name and arguments within <tools></tools> XML tags:\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\"\n\nWait, the example shows that each tool call is inside its own <tools> tags. So for each call, a separate <tools> block.\n\nSo the assistant's response should have four separate <tools> blocks, each containing the JSON for a single function call.\n\nSo the correct response would be four instances of:\n\n<tools>\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n</tools>\n\nand similarly for the others.\n\nYes. So the assistant needs to output four such blocks.\n\nBut let me check the exact syntax. The example shows:\n\n<tools>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tools>\n\nSo each tool call is a separate XML block with the JSON inside.\n\nTherefore, the correct response is four separate <tools> blocks, each with the respective JSON.\n\nSo for New York City, Baltimore, Minneapolis, Los Angeles.\n\nYes.\n\nSo the assistant should generate four tool calls, each for one city, with the correct state and unit.\n\nTherefore, the answer is to make four separate function calls.\n",
        "tool_calls": [
          {
            "id": "call_430ce9387eb54338a820502b",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}"
            }
          }
        ]
      },
      "logprobs": null,
      "finish_reason": "tool_calls",
      "matched_stop": null
    }
  ],
  "usage": {
    "prompt_tokens": 263,
    "total_tokens": 2004,
    "completion_tokens": 1741,
    "prompt_tokens_details": null,
    "reasoning_tokens": 0
  },
  "metadata": {
    "weight_version": "default"
  }
}
```

3) Response with speculative decoding disabled and `tool_choice="auto"`:
```
{
  "id": "dce0a8d770b248228cbb3d571f032267",
  "object": "chat.completion",
  "created": 1770021123,
  "model": "qwen3-next-80b-a3b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to call the get_current_weather function once per request, right? Wait, the problem says I can call one or more functions. So maybe I need to make multiple tool calls for each city.\n\nWait, but the parameters for the function require city, state, and unit. So for each city, I need to know the state abbreviation. Let's check each city.\n\nNew York City is in New York state, so state is NY. Baltimore is in Maryland, so MD. Minneapolis is in Minnesota, MN. Los Angeles is in California, CA. The unit isn't specified, but the user didn't mention it, so maybe default to celsius or fahrenheit? The tool requires unit, so I need to choose one. The problem says the unit parameter has enum celsius or fahrenheit. Since the user didn't specify, maybe I should pick one. But the problem says the required parameters are city, state, unit. So I have to include unit. Maybe the default is celsius? Or maybe the user expects fahrenheit since it's US cities. But the tool requires it, so I need to specify. Let's assume the user wants fahrenheit, as that's common in the US. Or maybe the system defaults to a specific unit. But the problem doesn't say, so perhaps I should just pick one. Let's go with fahrenheit for all.\n\nSo for each city, I need to make a separate tool call. Let's list them:\n\n1. New York City, NY, fahrenheit\n2. Baltimore, MD, fahrenheit\n3. Minneapolis, MN, fahrenheit\n4. Los Angeles, CA, fahrenheit\n\nWait, but the user said \"What is the weather today in New York City, Baltimore and Minneapolis? What about Los Angeles?\" So four cities total. So I need to make four separate function calls.\n\nBut the tool call format requires each to be in its own XML tag. So for each city, generate a tool_call with the respective parameters.\n\nLet me check the parameters again. The function requires city, state, unit. So for each city:\n\n- New York City: city is \"New York City\", state \"NY\", unit \"fahrenheit\"\n- Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n- Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n- Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nYes. So each of these needs to be a separate tool call. So the assistant should output four tool_call blocks.\n\nWait, but the user might not know the states, but the problem says the state is required. So the assistant has to know the correct state abbreviations. For example, New York City is in NY, Baltimore in MD, Minneapolis in MN, Los Angeles in CA. That's correct.\n\nSo the correct approach is to make four separate function calls, each with the respective city, state, and unit.\n\nSo the response should have four tool_call XML blocks, each with the parameters for each city.\n\nLet me structure them.\n\nFirst for New York City:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n\nThen Baltimore:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}}\n\nMinneapolis:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}}\n\nLos Angeles:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}}\n\nYes. So each of these should be in their own tool_call tags.\n\nI need to make sure that the city names are correctly spelled. New York City is correct. Baltimore, Minneapolis, Los Angeles all correct. States are NY, MD, MN, CA.\n\nUnit is fahrenheit. Alternatively, maybe the user expects celsius, but since the problem doesn't specify, and the US typically uses fahrenheit, it's safer to use fahrenheit. But the problem says the unit is required, so the assistant has to choose one. So fahrenheit is appropriate here.\n\nSo the assistant should output four tool calls.\n",
        "tool_calls": [
          {
            "id": "call_9efe8481e0af40bcba5caf95",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}"
            }
          },
          {
            "id": "call_a276a95b3ce440bbb5b3ebca",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}"
            }
          },
          {
            "id": "call_03dd5c2a7dab44eb809f4d44",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}"
            }
          },
          {
            "id": "call_5e09c970131f4ceab97724d1",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}"
            }
          }
        ]
      },
      "logprobs": null,
      "finish_reason": "tool_calls",
      "matched_stop": null
    }
  ],
  "usage": {
    "prompt_tokens": 263,
    "total_tokens": 1342,
    "completion_tokens": 1079,
    "prompt_tokens_details": null,
    "reasoning_tokens": 0
  },
  "metadata": {
    "weight_version": "default"
  }
}
```

4) Response with speculative decoding disabled and `tool_choice="required"`:
```
{
  "id": "c6c6af595b474582bd7bb01f0c4c53dc",
  "object": "chat.completion",
  "created": 1770021170,
  "model": "qwen3-next-80b-a3b-thinking",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "reasoning_content": "Okay, let's see. The user is asking for the weather in multiple cities: New York City, Baltimore, Minneapolis, and Los Angeles. Hmm, but the tool provided only allows me to call the get_current_weather function once per request, right? Wait, the problem says I can call one or more functions. So maybe I need to make multiple tool calls for each city.\n\nWait, but the parameters for the function require city, state, and unit. So for each city, I need to know the state abbreviation. Let's check each city.\n\nNew York City is in New York state, so state is NY. Baltimore is in Maryland, so MD. Minneapolis is in Minnesota, MN. Los Angeles is in California, CA. The unit isn't specified, but the user didn't mention it, so maybe default to celsius or fahrenheit? The tool requires unit, so I need to choose one. The problem says the unit parameter has enum celsius or fahrenheit. Since the user didn't specify, maybe I should pick one. But the problem says the required parameters are city, state, unit. So I have to include unit. Maybe the default is celsius? Or maybe the user expects fahrenheit since it's US cities. But the tool requires it, so I need to specify. Let's assume the user wants fahrenheit, as that's common in the US. Or maybe the system defaults to a specific unit. But the problem doesn't say, so perhaps I should just pick one. Let's go with fahrenheit for all.\n\nSo for each city, I need to make a separate tool call. Let's list them:\n\n1. New York City, NY, fahrenheit\n2. Baltimore, MD, fahrenheit\n3. Minneapolis, MN, fahrenheit\n4. Los Angeles, CA, fahrenheit\n\nWait, but the user said \"What is the weather today in New York City, Baltimore and Minneapolis? What about Los Angeles?\" So four cities total. So I need to make four separate function calls.\n\nBut the tool call format requires each to be in its own XML tag. So for each city, generate a tool_call with the respective parameters.\n\nLet me check the parameters again. The function requires city, state, unit. So for each city:\n\n- New York City: city is \"New York City\", state \"NY\", unit \"fahrenheit\"\n- Baltimore: city \"Baltimore\", state \"MD\", unit \"fahrenheit\"\n- Minneapolis: city \"Minneapolis\", state \"MN\", unit \"fahrenheit\"\n- Los Angeles: city \"Los Angeles\", state \"CA\", unit \"fahrenheit\"\n\nYes. So each of these needs to be a separate tool call. So the assistant should output four tool_call blocks.\n\nWait, but the user might not know the states, but the problem says the state is required. So the assistant has to know the correct state abbreviations. For example, New York City is in NY, Baltimore in MD, Minneapolis in MN, Los Angeles in CA. That's correct.\n\nSo the correct approach is to make four separate function calls, each with the respective city, state, and unit.\n\nSo the response should have four tool_call XML blocks, each with the parameters for each city.\n\nLet me structure them.\n\nFirst for New York City:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}}\n\nThen Baltimore:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Baltimore\", \"state\": \"MD\", \"unit\": \"fahrenheit\"}}\n\nMinneapolis:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Minneapolis\", \"state\": \"MN\", \"unit\": \"fahrenheit\"}}\n\nLos Angeles:\n\n{\"name\": \"get_current_weather\", \"arguments\": {\"city\": \"Los Angeles\", \"state\": \"CA\", \"unit\": \"fahrenheit\"}}\n\nYes. So each of these should be in their own tool_call tags.\n\nI need to make sure that the city names are correctly spelled. New York City is correct. Baltimore, Minneapolis, Los Angeles all correct. States are NY, MD, MN, CA.\n\nUnit is fahrenheit. Alternatively, maybe the user expects celsius, but since the problem doesn't specify, and the US typically uses fahrenheit, it's safer to use fahrenheit. But the problem says the unit is required, so the assistant has to choose one. So fahrenheit is appropriate here.\n\nSo the assistant should output four tool calls.\n",
        "tool_calls": [
          {
            "id": "call_8b571dc2ba4e41e9b715c5c0",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"city\": \"New York City\", \"state\": \"NY\", \"unit\": \"fahrenheit\"}"
            }
          }
        ]
      },
      "logprobs": null,
      "finish_reason": "tool_calls",
      "matched_stop": null
    }
  ],
  "usage": {
    "prompt_tokens": 263,
    "total_tokens": 1237,
    "completion_tokens": 974,
    "prompt_tokens_details": null,
    "reasoning_tokens": 0
  },
  "metadata": {
    "weight_version": "default"
  }
}
```

### Environment

```
Python: 3.12.3 (main, Jan  8 2026, 11:30:50) [GCC 13.3.0]
CUDA available: True
GPU 0: NVIDIA H200
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 575.57.08
PyTorch: 2.9.1+cu129
sglang: 0.5.8
sgl_kernel: 0.3.21
flashinfer_python: 0.6.1
flashinfer_cubin: 0.6.1
flashinfer_jit_cache: 0.6.1+cu129
triton: 3.5.1
transformers: 4.57.1
torchao: 0.9.0
numpy: 2.4.1
aiohttp: 3.13.3
fastapi: 0.128.0
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.34.0
orjson: 3.11.5
outlines: 0.1.11
packaging: 26.0
psutil: 7.2.1
pydantic: 2.12.5
python-multipart: 0.0.21
pyzmq: 27.1.0
uvicorn: 0.40.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.76.0
litellm: Module Not Found
decord2: 3.0.0
NVIDIA Topology: 
        GPU0    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    NIC9    NIC10   NIC11   NIC12   CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NODE    PIX     NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     0-23,96-119     0               N/A
NIC0    NODE     X      NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC1    PIX     NODE     X      NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC2    NODE    NODE    NODE     X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC3    SYS     SYS     SYS     SYS      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE
NIC4    SYS     SYS     SYS     SYS     SYS      X      NODE    NODE    SYS     SYS     SYS     SYS     SYS     SYS
NIC5    SYS     SYS     SYS     SYS     SYS     NODE     X      NODE    SYS     SYS     SYS     SYS     SYS     SYS
NIC6    SYS     SYS     SYS     SYS     SYS     NODE    NODE     X      SYS     SYS     SYS     SYS     SYS     SYS
NIC7    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX     PXB     PXB     NODE    SYS
NIC8    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X      PXB     PXB     NODE    SYS
NIC9    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB      X      PIX     NODE    SYS
NIC10   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     PIX      X      NODE    SYS
NIC11   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     NODE    NODE    NODE    NODE     X      SYS
NIC12   SYS     SYS     SYS     SYS     NODE    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_5
  NIC4: mlx5_6
  NIC5: mlx5_7
  NIC6: mlx5_8
  NIC7: mlx5_9
  NIC8: mlx5_10
  NIC9: mlx5_11
  NIC10: mlx5_12
  NIC11: mlx5_13
  NIC12: mlx5_bond_0


ulimit soft: 1073741816
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Broken multiple tool calls in Qwen3-Next-80B-A3B-Thinking with Speculative Decoding #18102

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Broken multiple tool calls in Qwen3-Next-80B-A3B-Thinking with Speculative Decoding #18102

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions