Use llama chat tags in example requests #951

stbaione · 2025-02-11T15:39:03Z

Use llama chat tags in example requests.

More details on this can be found here

ScottTodd · 2025-02-11T16:41:14Z

docs/shortfin/llm/user/llama_serving.md

-        "text": "Name the capital of the United States.",
+        "text": "<|begin_of_text|>Name the capital of the United States.<|eot_id|>",


Should the server be inserting these itself? Maybe for a specific endpoint? Right now we just have "generate"?

Other projects have endpoints like chat/completions that don't include these tags:

https://docs.sglang.ai/start/send_request.html

https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-completions-api-with-vllm

https://github.com/sgl-project/sglang/blob/bb418ced802c6dbb6b0ae0d65218327129148769/python/sglang/srt/conversation.py#L49-L279

Are we expecting shortfin to sit under a frontend like sglang that will insert the tags for us? Or should we be putting similar logic in our apps too?

Yeah, they're called ChatTemplates. It's a feature that we should add. For now, our SGLang serving docs use the llama tag and apply it through the sglang frontend language.

By default, if a chat_template is not specified, they apply it based on model name/path: https://github.com/sgl-project/sglang/blob/main/python/sglang/lang/backend/runtime_endpoint.py#L48

Shouldn't be a hard thing to add, just not related to this PR, which is aimed at cleaning up the user doc experience

But do we expect a user to actually sent the tags themselves in requests? I'm fine with the quick fix if it reduces confusion, but I want to understand our architecture/strategy more.

I don't expect that, I'll need to create a formal issue for this. The way that I see it is basically replicating what SGLang does. We have chat templates for the model variants that we support (just "llama3" for now).

If a user starts a server with a llama model, it'll use the chat templates by default. Otherwise, they can start the server with a specific template (i.e. "none"), and it'll use those.

So, by default, the llama tags will be applied if you were to just spin up the server. Would have to explicitly specify otherwise.

I see that as an issue to add to #921

ScottTodd

Good enough for now (though see the discussion)

Use llama chat tags in example requests. More details on this can be found [here](#934 (comment))

Use llama chat tags in example requests

f51ebda

stbaione requested review from renxida and ScottTodd February 11, 2025 15:39

Merge branch 'main' into docs-add-chat-tag

7798b2b

ScottTodd reviewed Feb 11, 2025

View reviewed changes

ScottTodd approved these changes Feb 11, 2025

View reviewed changes

stbaione merged commit b78f901 into nod-ai:main Feb 11, 2025
31 of 34 checks passed

stbaione mentioned this pull request Feb 11, 2025

Chat Templates #955

Open

monorimet pushed a commit that referenced this pull request Feb 13, 2025

Use llama chat tags in example requests (#951)

f7c0b3f

Use llama chat tags in example requests. More details on this can be found [here](#934 (comment))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use llama chat tags in example requests #951

Use llama chat tags in example requests #951

stbaione commented Feb 11, 2025

ScottTodd Feb 11, 2025

stbaione Feb 11, 2025

ScottTodd Feb 11, 2025

stbaione Feb 11, 2025

stbaione Feb 11, 2025

ScottTodd left a comment

		"text": "Name the capital of the United States.",
		"text": "<\|begin_of_text\|>Name the capital of the United States.<\|eot_id\|>",

Use llama chat tags in example requests #951

Use llama chat tags in example requests #951

Conversation

stbaione commented Feb 11, 2025

ScottTodd Feb 11, 2025

Choose a reason for hiding this comment

stbaione Feb 11, 2025

Choose a reason for hiding this comment

ScottTodd Feb 11, 2025

Choose a reason for hiding this comment

stbaione Feb 11, 2025

Choose a reason for hiding this comment

stbaione Feb 11, 2025

Choose a reason for hiding this comment

ScottTodd left a comment

Choose a reason for hiding this comment