Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use llama chat tags in example requests #951

Merged
merged 2 commits into from
Feb 11, 2025

Conversation

stbaione
Copy link
Contributor

Use llama chat tags in example requests.

More details on this can be found here

"text": "Name the capital of the United States.",
"text": "<|begin_of_text|>Name the capital of the United States.<|eot_id|>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the server be inserting these itself? Maybe for a specific endpoint? Right now we just have "generate"?

Other projects have endpoints like chat/completions that don't include these tags:

https://github.com/sgl-project/sglang/blob/bb418ced802c6dbb6b0ae0d65218327129148769/python/sglang/srt/conversation.py#L49-L279

Are we expecting shortfin to sit under a frontend like sglang that will insert the tags for us? Or should we be putting similar logic in our apps too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they're called ChatTemplates. It's a feature that we should add. For now, our SGLang serving docs use the llama tag and apply it through the sglang frontend language.

By default, if a chat_template is not specified, they apply it based on model name/path: https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/backend/runtime_endpoint.py#L48

Shouldn't be a hard thing to add, just not related to this PR, which is aimed at cleaning up the user doc experience

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we expect a user to actually sent the tags themselves in requests? I'm fine with the quick fix if it reduces confusion, but I want to understand our architecture/strategy more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect that, I'll need to create a formal issue for this. The way that I see it is basically replicating what SGLang does. We have chat templates for the model variants that we support (just "llama3" for now).

If a user starts a server with a llama model, it'll use the chat templates by default. Otherwise, they can start the server with a specific template (i.e. "none"), and it'll use those.

So, by default, the llama tags will be applied if you were to just spin up the server. Would have to explicitly specify otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that as an issue to add to #921

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough for now (though see the discussion)

@stbaione stbaione merged commit b78f901 into nod-ai:main Feb 11, 2025
31 of 34 checks passed
@stbaione stbaione mentioned this pull request Feb 11, 2025
monorimet pushed a commit that referenced this pull request Feb 13, 2025
Use llama chat tags in example requests.

More details on this can be found
[here](#934 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants