-
Notifications
You must be signed in to change notification settings - Fork 130
Closed
Description
Hello! 👋
I have a chat template and tokenization problem when input text contains emoji.
I decoded the tokenized input text after tokenizer.applyChatTemplate and the result is as follows.
<|eot_id|><|start_header_id|>user<|end_header_id|>
🥳🥳🥳<|e<|eot_id|><|start_header_id|>istant<|e<|end_header_id|>
But it has to be
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
🥳🥳🥳<|eot_id|><|start_header_id|>assistant<|end_header_id|>
All the other emojis occur same problem after it encodes.
I got same problem when I use different model (Qwen3-0.6B)
Any comments or help would be greatly appreciated :)
danny980521
Metadata
Metadata
Assignees
Labels
No labels