-
Notifications
You must be signed in to change notification settings - Fork 6.9k
add sglang engine demo for /v1/chat/completions #58366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
b170767
3c5f57d
5cee539
dc139e8
a97e6dd
1ec5a11
2374eb0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,51 @@ | ||||||||||||||||||||||
| import ray | ||||||||||||||||||||||
| import requests | ||||||||||||||||||||||
| from fastapi import FastAPI | ||||||||||||||||||||||
| from ray import serve | ||||||||||||||||||||||
| from ray.serve.handle import DeploymentHandle | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| app = FastAPI() | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| @serve.deployment | ||||||||||||||||||||||
| @serve.ingress(app) | ||||||||||||||||||||||
| class MyFastAPIDeployment: | ||||||||||||||||||||||
| def __init__(self, engine_handle: DeploymentHandle): | ||||||||||||||||||||||
| self._engine_handle = engine_handle | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| @app.get("/v1/chat/completions/{in_prompt}") # serve as OPENAI /v1/chat/completions endpoint | ||||||||||||||||||||||
| async def root(self, in_prompt: str): # make endpoint async | ||||||||||||||||||||||
| # Asynchronous call to a method of that deployment (executes remotely) used remote | ||||||||||||||||||||||
| res = await self._engine_handle.chat.remote(in_prompt) | ||||||||||||||||||||||
| return "useing Llama-3.1-8B-Instruct for your !", in_prompt, res | ||||||||||||||||||||||
|
||||||||||||||||||||||
| @app.get("/v1/chat/completions/{in_prompt}") # serve as OPENAI /v1/chat/completions endpoint | |
| async def root(self, in_prompt: str): # make endpoint async | |
| # Asynchronous call to a method of that deployment (executes remotely) used remote | |
| res = await self._engine_handle.chat.remote(in_prompt) | |
| return "useing Llama-3.1-8B-Instruct for your !", in_prompt, res | |
| @app.get("/v1/chat/completions/{in_prompt}") # serve as an OpenAI-like /v1/chat/completions endpoint | |
| async def chat_completions(self, in_prompt: str): | |
| res = await self._engine_handle.chat.remote(in_prompt) | |
| return {"model": "Llama-3.1-8B-Instruct", "prompt": in_prompt, "response": res} |
cursor[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
cursor[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
cursor[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block has a couple of issues:
- Hardcoded Path: The
model_pathis hardcoded. This makes the demo not portable and will fail on other machines. It's better to use an environment variable. You'll need to addimport osat the top of the file. - Style: The dictionary initialization for
engine_kwargshas some style issues. According to PEP 8, there should be no spaces around the equals sign for keyword arguments. Also, when using thedict()constructor, keys are identifiers and should not be quoted as strings.
| self.engine_kwargs = dict( | |
| model_path = "/scratch2/huggingface/hub/meta-llama/Llama-3.1-8B-Instruct/", | |
| mem_fraction_static = 0.8, | |
| tp_size = 8, | |
| ) | |
| self.engine_kwargs = dict( | |
| model_path=os.environ.get("MODEL_PATH", "/scratch2/huggingface/hub/meta-llama/Llama-3.1-8B-Instruct/"), | |
| mem_fraction_static=0.8, | |
| tp_size=8, | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print for logging is generally discouraged in library code. It's better to use the logging module, which allows for configurable log levels, formatting, and output streams. You can add import logging and logger = logging.getLogger(__name__) at the top of the file.
| print('In SGLangServer CHAT with message', message) | |
| logger.info(f'In SGLangServer CHAT with message: {message}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change to logger is reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few style issues here that go against PEP 8 guidelines:
- Variable names
sglangServerandmy_Appshould besnake_case(e.g.,sglang_server,my_app). - There should not be a space around the equals sign in
blocking = True.
| sglangServer = SGLangServer.bind() | |
| my_App = MyFastAPIDeployment.bind(sglangServer) | |
| handle: DeploymentHandle = serve.run(my_App, blocking = True) | |
| sglang_server = SGLangServer.bind() | |
| my_app = MyFastAPIDeployment.bind(sglang_server) | |
| handle: DeploymentHandle = serve.run(my_app, blocking=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
requestsmodule is imported but never used. It should be removed to keep the code clean.