Use llama chat tags in example requests (#951)

Use llama chat tags in example requests. More details on this can be found [here](#934 (comment))
nod-ai · Feb 13, 2025 · f7c0b3f · f7c0b3f
1 parent 6621138
commit f7c0b3f
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/docs/shortfin/llm/user/llama_serving.md b/docs/shortfin/llm/user/llama_serving.md
@@ -262,7 +262,7 @@ Next, let's send a generation request:
 curl http://localhost:8000/generate \
     -H "Content-Type: application/json" \
     -d '{
-        "text": "Name the capital of the United States.",
+        "text": "<|begin_of_text|>Name the capital of the United States.<|eot_id|>",
         "sampling_params": {"max_completion_tokens": 50}
     }'
 ```
@@ -281,7 +281,7 @@ port = 8000 # Change if running on a different port
 generate_url = f"http://localhost:{port}/generate"
 
 def generation_request():
-    payload = {"text": "Name the capital of the United States.", "sampling_params": {"max_completion_tokens": 50}}
+    payload = {"text": "<|begin_of_text|>Name the capital of the United States.<|eot_id|>", "sampling_params": {"max_completion_tokens": 50}}
     try:
         resp = requests.post(generate_url, json=payload)
         resp.raise_for_status()  # Raises an HTTPError for bad responses