Skip to content

common : add GLM-4.5 tool calling support #15186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dhandhalyabhavik
Copy link

@dhandhalyabhavik dhandhalyabhavik commented Aug 8, 2025

  • Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
  • Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format
  • Add template detection based on <arg_key> and <arg_value> tags
  • Fix null content handling in message parsing and serialization
  • Ensure GLM-4.5 detection runs before Hermes to avoid misidentification

This enables tool calling functionality for GLM-4.5 models when using --jinja flag. The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.

Personally verified working on Cherry Studio windows app with function as option.

Unfortunately its not working with OpenAI API SDK because jinja requires dict parser but OpenAI requires json.

Now works with OpenAI SDK too.
above issue is now fixed with corrected Jinja template. The template works great with cline too. I extensively tested it.

Corrected Jinja template.

[gMASK]<sop>
{%- if tools -%}
<|system|>
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{% for tool in tools %}
{{ tool | tojson }}
{% endfor %}
</tools>

For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}
<arg_key>{arg-key-1}</arg_key>
<arg_value>{arg-value-1}</arg_value>
<arg_key>{arg-key-2}</arg_key>
<arg_value>{arg-value-2}</arg_value>
...
</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
    {%- if content is string -%}
        {{- content }}
    {%- elif content is iterable and content is not mapping -%}
        {%- for item in content -%}
            {%- if item is mapping and item.type == 'text' -%}
                {{- item.text }}
            {%- elif item is string -%}
                {{- item }}
            {%- endif -%}
        {%- endfor -%}
    {%- else -%}
        {{- content }}
    {%- endif -%}
{%- endmacro -%}
{%- set ns = namespace(last_user_index=-1) %}
{%- for m in messages %}
    {%- if m.role == 'user' %}
        {% set ns.last_user_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}
{% for m in messages %}
{%- if m.role == 'user' -%}<|user|>
{%- set user_content = visible_text(m.content) -%}
{{ user_content }}
{%- if enable_thinking is defined and not enable_thinking -%}
{%- if not user_content.endswith("/nothink") -%}
{{- '/nothink' -}}
{%- endif -%}
{%- endif -%}
{%- elif m.role == 'assistant' -%}
<|assistant|>
{%- set reasoning_content = '' %}
{%- set content = visible_text(m.content) %}
{%- if m.reasoning_content is string %}
    {%- set reasoning_content = m.reasoning_content %}
{%- else %}
    {%- if '</think>' in content %}
        {%- set think_parts = content.split('</think>') %}
        {%- if think_parts|length > 1 %}
            {%- set before_end_think = think_parts[0] %}
            {%- set after_end_think = think_parts[1] %}
            {%- set think_start_parts = before_end_think.split('<think>') %}
            {%- if think_start_parts|length > 1 %}
                {%- set reasoning_content = think_start_parts[-1].lstrip('\n') %}
            {%- endif %}
            {%- set content = after_end_think.lstrip('\n') %}
        {%- endif %}
    {%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '\n<think></think>' }}
{%- endif -%}
{%- if content.strip() -%}
{{ '\n' + content.strip() }}
{%- endif -%}
{% if m.tool_calls %}
{% for tc in m.tool_calls %}
{%- if tc.function %}
    {%- set tc = tc.function %}
{%- endif %}
{{ '\n<tool_call>' + tc.name }}
{% set _args = tc.arguments %}
{% for k, v in _args.items() %}
<arg_key>{{ k }}</arg_key>
<arg_value>{{ v | tojson if v is not string else v }}</arg_value>
{% endfor %}
</tool_call>{% endfor %}
{% endif %}
{%- elif m.role == 'tool' -%}
{%- if m.content is string -%}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
    {{- '<|observation|>' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- m.content }}
{{- '\n</tool_response>' }}
{%- else -%}
<|observation|>{% for tr in m.content %}

<tool_response>
{{ tr.output if tr.output is defined else tr }}
</tool_response>{% endfor -%}
{% endif -%}
{%- elif m.role == 'system' -%}
<|system|>
{{ visible_text(m.content) }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
{%- endif -%}

- Add COMMON_CHAT_FORMAT_GLM_4_5 format enum
- Implement GLM-4.5 tool call parser for <tool_call><arg_key><arg_value> format
- Add template detection based on <arg_key> and <arg_value> tags
- Fix null content handling in message parsing and serialization
- Ensure GLM-4.5 detection runs before Hermes to avoid misidentification

This enables tool calling functionality for GLM-4.5 models when using --jinja flag.
The parser handles GLM-4.5's XML-like tool call format with key-value argument pairs.
@ajunca
Copy link

ajunca commented Aug 9, 2025

I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly.
Then though this other problem #15046 arise.

@dhandhalyabhavik
Copy link
Author

I tried the PR, and it fixes tool calling on GLM 4.5 Air (unsloth version) getting called correctly. Then though this other problem #15046 arise.

But its Qwen tool calling issue right? I think once other pending PRs are merged you should not see the issue.

@ajunca
Copy link

ajunca commented Aug 9, 2025

Yea, I don't think is related to this specific PR. But the problem is shared with this Qwen tool calling issue.

@dhandhalyabhavik
Copy link
Author

Cline

Works great now with Cline 💪,

image.png

Cherry studio with MCP

Works great with MCP settings too 🔥.

image.png

@TNohSam
Copy link

TNohSam commented Aug 10, 2025

Hey, quick thought — I might be misunderstanding this, but it looks like this PR will parse GLM’s XML-style tool calls and turn them into JSON tool_calls before they reach the client.

If that’s the case, projects like Roo Code (which currently only know how to handle XML tool calls) might suddenly stop recognizing the output from GLM models when running through llama.cpp.

Am I right about this?

@jfgonsalves
Copy link

Does this template parse the thinking tags correctly? I'm getting my responses inline instead of in the reasoning_content field.

@bfroemel
Copy link

Very nice!

#15162 aims to achieve the same for Qwen3 Coder; only seems more mature/higher quality (using minja and letting it handle quoting/escaping argument strings, storing the jinja template in ./models/templates, having test cases in ./tests/test-chat.cpp,). Maybe @ochafik and @dhandhalyabhavik can sync up/collaborate and bring both PRs in a consistent way forward?

@dhandhalyabhavik
Copy link
Author

dhandhalyabhavik commented Aug 11, 2025

Hello everyone, thanks for insightful comments, Let me answer all of you,

@TNohSam There are two ways to implement tool calling,
(1) use instruction following template, write parsing code and parse manually.
(2) OpenAI compatible tool calling where functions or tools are part of their chat object class <--- This is what people refer when they say model supports tool calling

I have tested Roo Code just now, it is working fine. Both type of function or tool calling will work with the current PR.

@jfgonsalves enable reasoning_content via llama-server's flag. Check flags.

@bfroemel sure, @ochafik can you please review my added changes? Help me merge this PR. I would really appreciate. Thank you.

@dhandhalyabhavik
Copy link
Author

@jfgonsalves

You can enable reasoning_content via flag.

There is parser logic common for all models that will do this job. Check out the code here

This PR has nothing to do with it. Thank you for pointing it out though.

check it our here
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants