generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
🕵️♂️ Agent training #4300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
qgallouedec
wants to merge
216
commits into
main
Choose a base branch
from
tool-call-finally
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,136
−121
Open
🕵️♂️ Agent training #4300
Changes from 193 commits
Commits
Show all changes
216 commits
Select commit
Hold shift + click to select a range
552e899
Refactor image handling: replace `image_split_sizes` with `image_grid…
qgallouedec 449ef07
simpler
qgallouedec c8933aa
gfpo
qgallouedec 229c554
multi-image grpo
qgallouedec 3ca6ad5
log with wandb
qgallouedec dcf4b92
no vlm reward models
qgallouedec 30ad7ca
rloo
qgallouedec 86cc30b
gfpo
qgallouedec 088897b
fix
qgallouedec d2adc63
test peft
qgallouedec f4c82bf
fix gfpo
qgallouedec 1257796
rloo test
qgallouedec 099a39b
peft rloo
qgallouedec 529add6
oops
qgallouedec fc6b11f
update test
qgallouedec ae1f497
generate method
qgallouedec f998432
debug
qgallouedec fa73876
skip failing test
qgallouedec 52d8bd9
Merge branch 'main' into drop-image_split_sizes
qgallouedec dfc0d38
Merge branch 'drop-image_split_sizes' into multi-image-support
qgallouedec fc52e68
test fixed!
qgallouedec 4d12aeb
Merge branch 'multi-image-support' into generate-method
qgallouedec 4fc2b5b
gfpo
qgallouedec b628744
rm vllm
qgallouedec d3a769f
fix doc
qgallouedec e17ec42
Merge branch 'main' into drop-image_split_sizes
qgallouedec efbb03a
Merge branch 'drop-image_split_sizes' into multi-image-support
qgallouedec 562c662
Merge branch 'main' into multi-image-support
qgallouedec 485781c
Merge branch 'main' into multi-image-support
qgallouedec 05270f8
update layers to ignore
qgallouedec 1c53094
clarify image column desc
qgallouedec 9b6652e
rm VLM x RM warning
qgallouedec c500440
Merge branch 'multi-image-support' into generate-method
qgallouedec a6a8c44
Merge branch 'main' into generate-method
qgallouedec d8665e1
Merge branch 'main' into generate-method
qgallouedec 365d501
Merge branch 'main' into generate-method
qgallouedec cdb4c76
Merge branch 'main' into generate-method
qgallouedec c83e710
same for rloo
qgallouedec ec6ad25
nits style and align
qgallouedec b4cadde
Merge branch 'main' into generate-method
qgallouedec b0dceb9
restart
qgallouedec ebe32c2
progress
qgallouedec 0213662
progress continues
qgallouedec 8b3a724
progress again again
qgallouedec c1ae6aa
back to working point
qgallouedec 1a66b43
revert chage data utils
qgallouedec 2dc69a6
Merge branch 'main' into generate-method
qgallouedec 9435a94
refactor in grpo
qgallouedec d3f1d3c
Merge branch 'main' into refactor_generate
qgallouedec 3d8ea27
wrong merge commit
qgallouedec 27dc958
fix num_input_tokens_seen
qgallouedec 53772ef
getting closer
qgallouedec 8766fa5
consistent naming
qgallouedec 236b78b
better
qgallouedec 9da4830
simplify a bit + comment
qgallouedec b3bd0b0
another one
qgallouedec d79b9e1
get prompt ids from generation
qgallouedec 8d34d54
remove pad token removal
qgallouedec e770efe
Merge branch 'refactor_generate' into refactor_generate_2
qgallouedec 0e2ae34
rely on generator for prompt truncation
qgallouedec 46d8eb7
revert
qgallouedec 11acc75
rm enforce eager
qgallouedec acee7d8
rm truncate_with_protected_tokens
qgallouedec 0b5865e
ensure proper truncation and side
qgallouedec d8af003
rm useless comment
qgallouedec fc263a3
rm imports
qgallouedec 35f99fd
requires padding
qgallouedec 8149d05
rm truncation test
qgallouedec 9925199
move forward_kwargs outside of generate
qgallouedec 48a1c30
don't re-prepare data
qgallouedec 15c6620
refactor: update prepare_multimodal_messages to accept images directl…
qgallouedec 55a2480
rloo + doc
qgallouedec c8041e1
Merge branch 'refactor_generate' into refactor_generate_2
qgallouedec b8c0c9b
Merge branch 'refactor_generate_2' into refactor_generate_3
qgallouedec 7b7a11d
test and doc
qgallouedec c5064d6
gfpo
qgallouedec effb41b
Merge branch 'main' into refactor_generate
qgallouedec e82bfb4
Merge branch 'main' into refactor_generate
qgallouedec 4b9c126
Merge branch 'refactor_generate' into refactor_generate_2
qgallouedec 3f02702
Merge branch 'refactor_generate_2' into refactor_generate_3
qgallouedec b0e0279
Merge branch 'refactor_generate_3' into refactor_generate_4
qgallouedec a01b9ca
Merge branch 'refactor_generate_4' into refactor_generate_5
qgallouedec 6bc15a3
wip
qgallouedec f11759e
Merge branch 'main' into refactor_generate_2
qgallouedec e7aa945
fix vllm client server
qgallouedec e164ec5
repicate all_prompt_ids
qgallouedec 49577ad
Same for RLOO
qgallouedec 5fca5b8
fix normal generation path
qgallouedec 5cc6af5
Merge branch 'refactor_generate_2' into refactor_generate_3
qgallouedec 4dce145
remove vision tokens
qgallouedec ddfd3b5
same for rloo
qgallouedec c434fa2
truncation_side=left
qgallouedec 377b081
rm test_training_vlm_and_prompt_truncation
qgallouedec d599c20
Merge branch 'main' into refactor_generate_2
qgallouedec e82db74
🔣 Fix test: replace `trainer.tokenizer` by `trainer.processing_class`…
qgallouedec 192deb3
Fix CI ImportError: FlashAttention2 and decorator order for all param…
albertvillanova cf9d8e7
Hotfix wrong formatting of docstrings with blockquote tips (#4187)
albertvillanova f9c3c3c
🌡️ Have vLLM return processed (temperature scaled) log probs (#4163)
YonatanGideoni 6489479
Replace remaining trainer.tokenizer with trainer.processing_class in …
albertvillanova 21a67fc
[DOCS] Lora without regret (#4181)
burtenshaw c1e7ad2
[DOCS/FIX] lora without regrets - fix lr (#4207)
burtenshaw 5d34144
Remove custome_container for building the docs (#4198)
albertvillanova ae2a0e7
Remove tokenizer creation from `sft` example script (#4197)
sergiopaniego 6543f51
Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)
albertvillanova 8319ce0
Replace unittest with pytest (#4188)
albertvillanova 4fdaa4c
Updated vLLM integration guide (#4162)
sergiopaniego d258e36
Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)
sergiopaniego 7f5b499
Replace setup with pyproject and fix packaging unintended modules (#4…
albertvillanova df386f9
Merge branch 'main' into refactor_generate_2
qgallouedec 5b9a6ab
Merge branch 'main' into refactor_generate_2
qgallouedec 766bbce
Merge branch 'refactor_generate_2' into refactor_generate_3
qgallouedec ac2717f
Merge branch 'refactor_generate_3' into refactor_generate_4
qgallouedec 4a274d5
Merge branch 'main' into refactor_generate_2
qgallouedec db552be
Merge branch 'refactor_generate_2' into refactor_generate_3
qgallouedec 2c012dc
Merge branch 'refactor_generate_3' into refactor_generate_4
qgallouedec cb1d420
Merge branch 'refactor_generate_4' into refactor_generate_5
qgallouedec a84325c
style
qgallouedec 34034e7
Merge branch 'refactor_generate_3' into refactor_generate_4
qgallouedec 2ce6c1f
token_type_ids and RLOO
qgallouedec ddf3405
gfpo
qgallouedec e3c679c
style
qgallouedec ee03478
remove test case for prompt truncation
qgallouedec ed54e2a
Merge branch 'refactor_generate_3' into refactor_generate_4
qgallouedec 5e4a026
Merge branch 'refactor_generate_4' into refactor_generate_5
qgallouedec 45290c9
Merge branch 'main' into refactor_generate_3
qgallouedec a0ee1e6
Merge branch 'refactor_generate_3' into refactor_generate_4
qgallouedec f6e7c20
Merge branch 'refactor_generate_4' into refactor_generate_5
qgallouedec 919ff5b
Merge branch 'main' into refactor_generate_5
qgallouedec fe11512
dedup and some fixes
qgallouedec c0c8807
fix style
qgallouedec ba8b938
rloo
qgallouedec 7a2936e
style
qgallouedec 1a6f040
test
qgallouedec b5c0078
Merge branch 'refactor_generate_5' into tool-call-finally
qgallouedec 26ffb04
style
qgallouedec ced5450
safe prepare_multimodal_messages_vllm
qgallouedec 23d13f9
oops
qgallouedec f98fe13
Merge branch 'refactor_generate_5' into tool-call-finally
qgallouedec 5f87ee9
fix return-dict
qgallouedec 89cff94
Merge branch 'refactor_generate_5' into tool-call-finally
qgallouedec 0dac326
Merge branch 'main' into tool-call-finally
qgallouedec 14afe75
Merge branch 'main' into tool-call-finally
qgallouedec ddcbbae
Merge branch 'main' into tool-call-finally
qgallouedec cb16cab
Merge branch 'main' into tool-call-finally
qgallouedec 9102ba3
Merge branch 'main' into tool-call-finally
qgallouedec 2d945f2
move extraction to util + doc
qgallouedec 65ad930
using response parser
qgallouedec 67e8f29
backward compat
qgallouedec a4eac3c
fixes
qgallouedec 1e32b0a
don't truncate prompt
qgallouedec e816ef4
remove max_length
qgallouedec 400bee4
move to chat template utils
qgallouedec b86483c
tool mask
qgallouedec 93c7999
hard coded chat template
qgallouedec 24ea4a4
almost done!!
qgallouedec 5edee5c
Merge branch 'main' into tool-call-finally
qgallouedec 9dfc511
fix chat template
qgallouedec 2542320
just report error (not the traceback
qgallouedec 1db53c1
style
qgallouedec f31996a
deprecate max_length + chat utils doc
qgallouedec 6f2524d
test chat template utils
qgallouedec eb9eca9
test
qgallouedec 19fa924
remove max_prompt_length
qgallouedec 278703e
better doc
qgallouedec 6828ba2
doc example and skip version below dev
qgallouedec ae653d8
fix overlong case
qgallouedec 96387b3
test parse
qgallouedec 714b9ea
example in the doc
qgallouedec 3a1c7fb
comment in test
qgallouedec a1ebcba
version.parse -> Version
qgallouedec c340f52
comment chat template for vllm
qgallouedec d338c84
qol
qgallouedec f8444df
use chat template arg instead of ugly patch
qgallouedec 6ac02e0
refactor: simplify response parsing in tokenizer and trainer
qgallouedec b8125bf
why it doesn't render well?
qgallouedec be255df
Merge branch 'main' into tool-call-finally
qgallouedec 37d77ba
raw
qgallouedec a136592
style
qgallouedec e63a46c
fix: update xfail reason for tool parsing in TestParseResponse
qgallouedec d082309
revert rloo for now
qgallouedec 0707baa
grpo with replay buffer
qgallouedec 753d70d
jmespath dep
qgallouedec 06414f2
is_jmespath_available
qgallouedec 21792da
style
qgallouedec 850a9eb
new section
qgallouedec 438b586
ignore TestParseResponse for transformers<5
qgallouedec 1c026ce
fix qwen schema
qgallouedec c54bf4f
another fix
qgallouedec 9f0aa3d
remove unsused schemas
qgallouedec fbb625f
rename processor to tokenizer in add_response_schema function
qgallouedec ce6341b
deprecate max_prompt_length argument and add warning for future removal
qgallouedec 493881f
Apply suggestions from code review
qgallouedec 4d6a064
nit simplification
qgallouedec 5a9bb20
Docs updated
sergiopaniego 90a1ed1
Add monkey-patch for vLLM compatibility with TRL
qgallouedec a584e42
VLLM_LOGGING_LEVEL", "ERROR
qgallouedec fb4c694
Merge branch 'main' into tool-call-finally
qgallouedec aa2615a
Merge branch 'main' into tool-call-finally
qgallouedec c36ea41
Merge branch 'main' into tool-call-finally
qgallouedec caf1ad2
flip tool mask
qgallouedec 94c2ff2
isolate tool call loop
qgallouedec 3cbb28e
Add example script
sergiopaniego 6074ade
code quality
sergiopaniego fc3d759
Update to more strict reward funcs
sergiopaniego e37508d
Update steps
sergiopaniego af749c1
Clarify token counting in reward metrics and adjust completion length…
qgallouedec 988efc1
Updated example script with elaborated reward funcs
sergiopaniego ce7d607
Add example notebook and update docs
sergiopaniego 6f65553
Merge branch 'main' into tool-call-finally
qgallouedec 797da51
Merge branch 'main' into tool-call-finally
qgallouedec f0d7972
Apply suggestions from code review
qgallouedec faf9667
minor version fixes
qgallouedec bd8905d
vllm_max_model_length
qgallouedec bfec5f1
update comment with new vllm range
qgallouedec 70680db
style
qgallouedec 2f814e9
fix version and vllm
qgallouedec File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Chat template utilities | ||
|
|
||
| ## add_response_schema | ||
|
|
||
| [[autodoc]] chat_template_utils.add_response_schema | ||
|
|
||
| ## is_chat_template_prefix_preserving | ||
|
|
||
| [[autodoc]] chat_template_utils.is_chat_template_prefix_preserving | ||
|
|
||
| ## get_training_chat_template | ||
|
|
||
| [[autodoc]] chat_template_utils.get_training_chat_template | ||
|
|
||
| ## parse_response | ||
|
|
||
| [[autodoc]] chat_template_utils.parse_response |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
qgallouedec marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,171 @@ | ||
| # Copyright 2020-2025 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import textwrap | ||
|
|
||
| import pytest | ||
| import transformers | ||
| from packaging.version import Version | ||
| from transformers import AutoTokenizer | ||
|
|
||
| from trl.chat_template_utils import ( | ||
| add_response_schema, | ||
| get_training_chat_template, | ||
| is_chat_template_prefix_preserving, | ||
| parse_response, | ||
| ) | ||
|
|
||
|
|
||
| class TestAddResponseSchema: | ||
| @pytest.mark.xfail( | ||
| condition=Version(transformers.__version__) < Version("5.0.0.dev0"), | ||
| reason="Response parsing is not supported in transformers versions below 5.0.0.dev0", | ||
| strict=True, | ||
| ) | ||
| def test_add_response_schema(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| tokenizer = add_response_schema(tokenizer) | ||
| assistant_text = '<tool_call>\n{"name": "multiply", "arguments": {"a": 3, "b": 4}}\n</tool_call><|im_end|>' | ||
| parsed = tokenizer.parse_response(assistant_text) | ||
| expected = { | ||
| "role": "assistant", | ||
| "content": "", | ||
| "tool_calls": [{"type": "function", "function": {"name": "multiply", "arguments": {"a": 3, "b": 4}}}], | ||
| } | ||
| assert parsed == expected | ||
|
|
||
|
|
||
| class TestIsChatTemplatePrefixPreserving: | ||
| def test_prefix_preserving_template(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| tokenizer.chat_template = textwrap.dedent(r""" | ||
| {%- for message in messages %} | ||
|
|
||
| {%- if message.role == 'user' %} | ||
| {{- '<|im_start|>user\n' + message.content + '<|im_end|>\n' }} | ||
| {%- elif message.role == 'assistant' %} | ||
| {{- '<|im_start|>assistant\n' + message.content + '<|im_end|>\n' }} | ||
| {%- endif %} | ||
|
|
||
| {%- endfor %} | ||
|
|
||
| {%- if add_generation_prompt %} | ||
| {{- '<|im_start|>assistant\n' }} | ||
| {%- endif %}""") | ||
| assert is_chat_template_prefix_preserving(tokenizer) is True | ||
|
|
||
| def test_non_prefix_preserving_template(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| # The following template is quite typical of models like Qwen3 and GPT-OSS, where the thinking part is | ||
| # only present for last assistant message, which makes it non-prefix-preserving. | ||
| # docstyle-ignore | ||
| tokenizer.chat_template = textwrap.dedent(r""" | ||
| {%- if messages[0].role == 'system' %} | ||
| {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }} | ||
| {%- endif %} | ||
| {%- set ns = namespace(last_query_index=messages|length - 1) %} | ||
| {%- for message in messages[::-1] %} | ||
| {%- set index = (messages|length - 1) - loop.index0 %} | ||
| {%- if message.role == "user" and message.content is string %} | ||
| {%- set ns.last_query_index = index %} | ||
| {%- break %} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- for message in messages %} | ||
| {%- set content = message.content if message.content is string else '' %} | ||
| {%- if message.role == "user" or (message.role == "system" and not loop.first) %} | ||
| {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>\n' }} | ||
| {%- elif message.role == "assistant" %} | ||
| {%- set reasoning_content = '' %} | ||
| {%- if message.reasoning_content is string %} | ||
| {%- set reasoning_content = message.reasoning_content %} | ||
| {%- else %} | ||
| {%- if '</think>' in content %} | ||
| {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %} | ||
| {%- set content = content.split('</think>')[-1].lstrip('\n') %} | ||
| {%- endif %} | ||
| {%- endif %} | ||
| {%- if loop.index0 > ns.last_query_index %} | ||
| {%- if loop.last or (not loop.last and reasoning_content) %} | ||
| {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }} | ||
| {%- else %} | ||
| {{- '<|im_start|>' + message.role + '\n' + content }} | ||
| {%- endif %} | ||
| {%- else %} | ||
| {{- '<|im_start|>' + message.role + '\n' + content }} | ||
| {%- endif %} | ||
| {{- '<|im_end|>\n' }} | ||
| {%- endif %} | ||
| {%- endfor %} | ||
| {%- if add_generation_prompt %} | ||
| {{- '<|im_start|>assistant\n' }} | ||
| {%- if enable_thinking is defined and enable_thinking is false %} | ||
| {{- '<think>\n\n</think>\n\n' }} | ||
| {%- endif %} | ||
| {%- endif %}""") | ||
| assert is_chat_template_prefix_preserving(tokenizer) is False | ||
|
|
||
|
|
||
| class TestGetTrainingChatTemplate: | ||
| def test_qwen3(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| assert is_chat_template_prefix_preserving(tokenizer) is False | ||
| tokenizer.chat_template = get_training_chat_template(tokenizer) | ||
| assert is_chat_template_prefix_preserving(tokenizer) is True | ||
|
|
||
|
|
||
| @pytest.mark.xfail( | ||
| condition=Version(transformers.__version__) < Version("5.0.0.dev0"), | ||
| reason="Tool parsing is not supported in transformers versions below 5.0.0.dev0", | ||
| strict=True, | ||
| ) | ||
| class TestParseResponse: | ||
| def test_parse_response(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| tokenizer = add_response_schema(tokenizer) | ||
| text = '<tool_call>\n{"name": "multiply", "arguments": {"a": 3, "b": 4}}\n</tool_call><|im_end|>' | ||
| assistant_text = tokenizer(text)["input_ids"] | ||
| parsed = parse_response(tokenizer, assistant_text) | ||
| expected = { | ||
| "role": "assistant", | ||
| "content": "", | ||
| "tool_calls": [{"type": "function", "function": {"name": "multiply", "arguments": {"a": 3, "b": 4}}}], | ||
| } | ||
| assert parsed == expected | ||
|
|
||
| def test_parse_response_no_tool_call(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| tokenizer = add_response_schema(tokenizer) | ||
| text = "Here is the answer to your question.<|im_end|>" | ||
| assistant_text = tokenizer(text)["input_ids"] | ||
| parsed = parse_response(tokenizer, assistant_text) | ||
| expected = { | ||
| "role": "assistant", | ||
| "content": "Here is the answer to your question.", | ||
| } | ||
|
|
||
| assert parsed == expected | ||
|
|
||
| def test_parse_response_malformed_tool_call(self): | ||
| tokenizer = AutoTokenizer.from_pretrained("trl-internal-testing/tiny-Qwen3MoeForSequenceClassification") | ||
| tokenizer = add_response_schema(tokenizer) | ||
| text = '<tool_call>\n{"name": "multiply", "arguments": {"a": 3, "b": 4}\n</tool_call><|im_end|>' | ||
| assistant_text = tokenizer(text)["input_ids"] | ||
| parsed = parse_response(tokenizer, assistant_text) | ||
| expected = { | ||
| "role": "assistant", | ||
| "content": '<tool_call>\n{"name": "multiply", "arguments": {"a": 3, "b": 4}\n</tool_call>', | ||
| } | ||
|
|
||
| assert parsed == expected |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.