Chat template: update for processor #35953

zucchini-nlp · 2025-01-29T09:35:56Z

What does this PR do?

Prerequisite: we need images as batched list support for all VLMs, which is now done thanks to Yoni 💛

This PR adds:

Support for batched inputs, which I forgot to add the last time
Tiny hack for add_special_tokens as per @Rocketknight1's request
More tests for chat templates in the Mixin, so we don't write the same test per model. Note that for test to run, we need to have the correct chat_template in processor, not a dummy one
Video loading that can use video_fps which is claimed to be better than sampling N frames uniformly, especially for longer videos. NOTE: that will result in videos of different num_frames and can lead to errors when batching, so should be used only when bs=1. We don't support videos of different frame counts, yet
Ability to load from list of image frames saved as image files

HuggingFaceDocBuilderDev · 2025-01-29T10:01:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel

Thanks! Looks good to me, I was not very thorough in chat-template code, leaving it for @Rocketknight1

qubvel · 2025-01-31T14:10:21Z

src/transformers/image_utils.py

+            Number of frames to sample uniformly. Should be passed only when `fps=None`.
+            If not specified and `fps==None`, all frames are sampled.
+        fps (`int`, *optional*):
+            Number of frames to sample per second. Should be passed only when `num_frames=None`.


Is it worth adding a check that just one of the args provided?

I think neither of them has a special priority, and users should choose only one sampling method to sample frames. Otherwise we assume it was user error, as we can't infer their actual intentions

Right, that was a question hehe, agreed and added yes. I'll look again where exactly it fits the best :)

qubvel · 2025-01-31T14:21:08Z

src/transformers/image_utils.py

+    if fps is not None and num_frames is not None:
+        raise ValueError("`num_frames` and `fps` are mutually exclusive arguments, please use only one!")
+


ahh, I see you are doing a check here.. probably better to delegate it to the function itself, but it's up to you

qubvel · 2025-01-31T14:30:14Z

tests/test_processing_common.py

+        # Load with `video_fps` and `num_frames` args, should raise an error
+        with self.assertRaises(ValueError):
+            out_dict_with_video = processor.apply_chat_template(
+                messages,
+                add_generation_prompt=True,
+                tokenize=True,
+                return_dict=True,
+                video_fps=video_fps,
+                num_frames=num_frames,
+            )


Thanks for adding a test for the error raised!

src/transformers/models/aria/modular_aria.py

zucchini-nlp · 2025-01-31T14:47:44Z

src/transformers/processing_utils.py

-        if chat_template is not None:
-            setattr(processor, "chat_template", chat_template)


btw, this was moved to the get_processor_dict, as it created a situation for hacky manipulations. Example: mllama forgot to pass chat_template to the MLlamaProcessor.__init__, but still magically ended with a valid template. Took me some time tofigure it out, let's not create options for users to hack

pcuenca · 2025-02-06T10:09:28Z

src/transformers/processing_utils.py

+                if videos:
+                    batch_videos.append(videos)
+
+            # Tokenizer's `apply_chat_template` never adds special tokens when tokenizing


It always adds them, no?

oops, misleading comment, it adds always yes

Suggested change

# Tokenizer's `apply_chat_template` never adds special tokens when tokenizing

# Tokenizer's `apply_chat_template` adds special tokens when tokenizing

pcuenca · 2025-02-06T10:11:16Z

src/transformers/processing_utils.py

+
+            # Tokenizer's `apply_chat_template` never adds special tokens when tokenizing
+            # But processor's `apply_chat_template` didn't have an option to tokenize, so users had to format the prompt
+            # and pass it to the processor. Users thus never worried about special tokens relying on processor hadnling


Suggested change

# and pass it to the processor. Users thus never worried about special tokens relying on processor hadnling

# and pass it to the processor. Users thus never worried about special tokens relying on processor handling

Actually, users had to be careful to use add_special_tokens=False when tokenizing the rendered template. Is this not the case for some VLMs?

They had to but we never added that in the docs. For example in llava we didn't add bos in template but idefics copied llava, and added bos thus causing the problem of two special tokens

Probably we need to make this quirk available in model docs, like a small comment in demo inference code for ex

I think the standard is for chat templates to handle special tokens, so idefics is doing it "right" and llava is an exception (that we have to provide BC for, of course). Perhaps we could default to the expected behaviour in our reference code, and treat exceptional cases based on some other property?

yeah, I think we'll need to:

Update demo inference code when tokenize=False, and let users know why we pass add_special_tokens=False

Slowly see if updating templates in llava doesn't increase comments from angry users, like after >5 minor releases. Might as well be bad idea, until we make tokenize=True a go-to default for everyone. Most users are stuck with old version of transformers

For any new model, make sure the bos/eos are in the template and not hacked from within processing code!

Rocketknight1 · 2025-02-06T14:28:15Z

src/transformers/processing_utils.py

+            if self.tokenizer.bos_token is not None and single_prompt.startswith(self.tokenizer.bos_token):
+                kwargs["add_special_tokens"] = False


❤️! Yes, I think this is the right solution - it's a little hacky, but it should cover all the cases I can think of correctly.

zucchini-nlp · 2025-02-07T08:20:26Z

Cool, I will merge it then. Core modeling is not touched and code owners review should be enough

pcuenca · 2025-02-07T10:27:30Z

Looking forward to this landing on a release so we can show one-step processor snippets for chat VLMs! 🔥

* update * we need batched nested input to always process correctly * update a bit * fix copies

zucchini-nlp added 2 commits January 28, 2025 13:10

update

20b13d8

we need batched nested input to always process correctly

9826614

zucchini-nlp mentioned this pull request Jan 29, 2025

Idefics: remove double BOS token #35950

Open

zucchini-nlp added 4 commits January 30, 2025 15:06

Merge branch 'main' into chat-templates-vlms

8b97ea0

Merge branch 'main' into chat-templates-vlms

4ae7dab

update a bit

7eb23fa

fix copies

aa08f53

zucchini-nlp requested review from Rocketknight1 and qubvel January 31, 2025 09:15

qubvel approved these changes Jan 31, 2025

View reviewed changes

zucchini-nlp commented Jan 31, 2025

View reviewed changes

zucchini-nlp mentioned this pull request Feb 6, 2025

Processor: prevent duplicated tokens in apply_chat_template #36064

Closed

pcuenca reviewed Feb 6, 2025

View reviewed changes

Rocketknight1 approved these changes Feb 6, 2025

View reviewed changes

Merge branch 'main' into chat-templates-vlms

d05f2e8

Merge branch 'main' into chat-templates-vlms

627ac03

zucchini-nlp merged commit eebd2c9 into huggingface:main Feb 10, 2025
25 checks passed

yonigozlan mentioned this pull request Feb 14, 2025

Uniformize LlavaNextVideoProcessor kwargs #35613

Merged

sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 16, 2025

Chat template: update for processor (huggingface#35953)

da6e419

* update * we need batched nested input to always process correctly * update a bit * fix copies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat template: update for processor #35953

Chat template: update for processor #35953

zucchini-nlp commented Jan 29, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 29, 2025

qubvel left a comment

qubvel Jan 31, 2025

zucchini-nlp Jan 31, 2025 •

edited

Loading

qubvel Jan 31, 2025

qubvel Jan 31, 2025

zucchini-nlp Jan 31, 2025

pcuenca Feb 6, 2025

zucchini-nlp Feb 6, 2025

pcuenca Feb 7, 2025

pcuenca Feb 6, 2025

zucchini-nlp Feb 6, 2025 •

edited

Loading

pcuenca Feb 6, 2025

zucchini-nlp Feb 6, 2025 •

edited

Loading

Rocketknight1 Feb 6, 2025

zucchini-nlp commented Feb 7, 2025

pcuenca commented Feb 7, 2025

		if fps is not None and num_frames is not None:
		raise ValueError("`num_frames` and `fps` are mutually exclusive arguments, please use only one!")

		if chat_template is not None:
		setattr(processor, "chat_template", chat_template)

	# Tokenizer's `apply_chat_template` never adds special tokens when tokenizing
	# Tokenizer's `apply_chat_template` adds special tokens when tokenizing

	# and pass it to the processor. Users thus never worried about special tokens relying on processor hadnling
	# and pass it to the processor. Users thus never worried about special tokens relying on processor handling

		if self.tokenizer.bos_token is not None and single_prompt.startswith(self.tokenizer.bos_token):
		kwargs["add_special_tokens"] = False

Chat template: update for processor #35953

Chat template: update for processor #35953

Conversation

zucchini-nlp commented Jan 29, 2025 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jan 29, 2025

qubvel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp commented Feb 7, 2025

pcuenca commented Feb 7, 2025

zucchini-nlp commented Jan 29, 2025 •

edited

Loading

zucchini-nlp Jan 31, 2025 •

edited

Loading

zucchini-nlp Feb 6, 2025 •

edited

Loading

zucchini-nlp Feb 6, 2025 •

edited

Loading