Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use llama-8b-instruct tokenizer.json and tokenizer_config.json for llama3_8b_fp16 dataset #953

Merged
merged 3 commits into from
Feb 11, 2025

Conversation

stbaione
Copy link
Contributor

The tokenizers specified for this dataset are for llama3_8b_fp16, while the model is llama3_8b_fp16_instruct. The eos_token for 8b and 8b-instruct are different:

8b:

<|begin_of_text|> {generated_text} <|end_of_text|>

<|end_of_text|> - 128001

8b-Instruct:

<|begin_of_text|> {generated_text} <|eot_id|>


<|eot_id|> - 128009

Using the wrong config causes Llama to output text forever. Our model generated 128009s, but the server doesn't recognize it as the proper stop token and keeps calling for generations.

More details here

@stbaione stbaione requested a review from renxida February 11, 2025 16:30
@stbaione stbaione merged commit fbc69de into nod-ai:main Feb 11, 2025
30 of 34 checks passed
monorimet pushed a commit that referenced this pull request Feb 13, 2025
…lama3_8b_fp16` dataset (#953)

The tokenizers specified for this dataset are for `llama3_8b_fp16`,
while the model is `llama3_8b_fp16_instruct`. The `eos_token` for `8b`
and `8b-instruct` are different:

```text
8b:

<|begin_of_text|> {generated_text} <|end_of_text|>

<|end_of_text|> - 128001

8b-Instruct:

<|begin_of_text|> {generated_text} <|eot_id|>


<|eot_id|> - 128009
```

Using the wrong config causes Llama to output text forever. Our model
generated `128009`s, but the server doesn't recognize it as the proper
stop token and keeps calling for generations.

More details
[here](#934 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants