-
Notifications
You must be signed in to change notification settings - Fork 607
Dlinfer readme #3938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Dlinfer readme #3938
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
1e70ab7
Revise Ascend guide
jinminxi104 2c8fba9
update ascend readme
jinminxi104 89c75b2
Revise Huawei Ascend guide
jinminxi104 cd14a39
update dlinfer-related docs
jinminxi104 165ffc7
update dlinfer-related docs
jinminxi104 a7c6090
lint
jinminxi104 664b2b7
Update Docker pull commands for Ascend models
jinminxi104 cec33a0
Update Docker pull commands for Ascend models
jinminxi104 b1a3bc0
Update device type from 'ascend' to 'camb'
jinminxi104 0c90cd2
Update Docker commands for Ascend model services
jinminxi104 c81e205
Change Docker image tags to 'a2-latest'
jinminxi104 0514653
Update introduction in Get Started guide for Ascend
jinminxi104 3d28648
Clarify supported models for Huawei Ascend
jinminxi104 648906a
Update get_started.md with multi-card inference info
jinminxi104 55bc0cd
Update get_started.md with Ray startup instructions
jinminxi104 89f381b
lit
jinminxi104 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Cambricon | ||
|
||
The usage of lmdeploy on a Cambricon device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. | ||
Please read the original [Get Started](../get_started.md) guide before reading this tutorial. | ||
|
||
Here is the [supported model list](../../supported_models/supported_models.md#PyTorchEngine-on-Other-Platforms). | ||
|
||
> \[!IMPORTANT\] | ||
> We have uploaded a docker image to aliyun. | ||
> Please try to pull the image by following command: | ||
> | ||
> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest` | ||
|
||
> \[!IMPORTANT\] | ||
> Currently, launching multi-device inference on Cambricon accelerators requires manually starting Ray. | ||
> | ||
> Below is an example for a 2-devices setup: | ||
> | ||
> ```shell | ||
> export MLU_VISIBLE_DEVICES=0,1 | ||
> ray start --head --resources='{"MLU": 2}' | ||
> ``` | ||
|
||
## Offline batch inference | ||
|
||
### LLM inference | ||
|
||
Set `device_type="camb"` in the `PytorchEngineConfig`: | ||
|
||
```python | ||
from lmdeploy import pipeline | ||
from lmdeploy import PytorchEngineConfig | ||
pipe = pipeline("internlm/internlm2_5-7b-chat", | ||
backend_config=PytorchEngineConfig(tp=1, device_type="camb")) | ||
question = ["Shanghai is", "Please introduce China", "How are you?"] | ||
response = pipe(question) | ||
print(response) | ||
``` | ||
|
||
### VLM inference | ||
|
||
Set `device_type="camb"` in the `PytorchEngineConfig`: | ||
|
||
```python | ||
from lmdeploy import pipeline, PytorchEngineConfig | ||
from lmdeploy.vl import load_image | ||
pipe = pipeline('OpenGVLab/InternVL2-2B', | ||
backend_config=PytorchEngineConfig(tp=1, device_type='camb')) | ||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
## Online serving | ||
|
||
### Serve a LLM model | ||
|
||
Add `--device camb` in the serve command. | ||
|
||
```bash | ||
lmdeploy serve api_server --backend pytorch --device camb internlm/internlm2_5-7b-chat | ||
``` | ||
|
||
Run the following commands to launch docker container for lmdeploy LLM serving: | ||
|
||
```bash | ||
docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ | ||
bash -i -c "lmdeploy serve api_server --backend pytorch --device camb internlm/internlm2_5-7b-chat" | ||
``` | ||
|
||
### Serve a VLM model | ||
|
||
Add `--device camb` in the serve command | ||
|
||
```bash | ||
lmdeploy serve api_server --backend pytorch --device camb OpenGVLab/InternVL2-2B | ||
``` | ||
|
||
Run the following commands to launch docker container for lmdeploy VLM serving: | ||
|
||
```bash | ||
docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ | ||
bash -i -c "lmdeploy serve api_server --backend pytorch --device camb OpenGVLab/InternVL2-2B" | ||
``` | ||
|
||
## Inference with Command line Interface | ||
|
||
Add `--device camb` in the serve command. | ||
|
||
```bash | ||
lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device camb | ||
``` | ||
|
||
Run the following commands to launch lmdeploy chatting after starting container: | ||
|
||
```bash | ||
docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ | ||
bash -i -c "lmdeploy chat --backend pytorch --device camb internlm/internlm2_5-7b-chat" | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# MetaX-tech | ||
|
||
The usage of lmdeploy on a MetaX-tech device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. | ||
Please read the original [Get Started](../get_started.md) guide before reading this tutorial. | ||
|
||
Here is the [supported model list](../../supported_models/supported_models.md#PyTorchEngine-on-Other-Platforms). | ||
|
||
> \[!IMPORTANT\] | ||
> We have uploaded a docker image to aliyun. | ||
> Please try to pull the image by following command: | ||
> | ||
> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest` | ||
|
||
## Offline batch inference | ||
|
||
### LLM inference | ||
|
||
Set `device_type="maca"` in the `PytorchEngineConfig`: | ||
|
||
```python | ||
from lmdeploy import pipeline | ||
from lmdeploy import PytorchEngineConfig | ||
pipe = pipeline("internlm/internlm2_5-7b-chat", | ||
backend_config=PytorchEngineConfig(tp=1, device_type="maca")) | ||
question = ["Shanghai is", "Please introduce China", "How are you?"] | ||
response = pipe(question) | ||
print(response) | ||
``` | ||
|
||
### VLM inference | ||
|
||
Set `device_type="maca"` in the `PytorchEngineConfig`: | ||
|
||
```python | ||
from lmdeploy import pipeline, PytorchEngineConfig | ||
from lmdeploy.vl import load_image | ||
pipe = pipeline('OpenGVLab/InternVL2-2B', | ||
backend_config=PytorchEngineConfig(tp=1, device_type='maca')) | ||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
## Online serving | ||
|
||
### Serve a LLM model | ||
|
||
Add `--device maca` in the serve command. | ||
|
||
```bash | ||
lmdeploy serve api_server --backend pytorch --device maca internlm/internlm2_5-7b-chat | ||
``` | ||
|
||
Run the following commands to launch docker container for lmdeploy LLM serving: | ||
|
||
```bash | ||
docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ | ||
bash -i -c "lmdeploy serve api_server --backend pytorch --device maca internlm/internlm2_5-7b-chat" | ||
``` | ||
|
||
### Serve a VLM model | ||
|
||
Add `--device maca` in the serve command | ||
|
||
```bash | ||
lmdeploy serve api_server --backend pytorch --device maca OpenGVLab/InternVL2-2B | ||
``` | ||
|
||
Run the following commands to launch docker container for lmdeploy VLM serving: | ||
|
||
```bash | ||
docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ | ||
bash -i -c "lmdeploy serve api_server --backend pytorch --device maca OpenGVLab/InternVL2-2B" | ||
``` | ||
|
||
## Inference with Command line Interface | ||
|
||
Add `--device maca` in the serve command. | ||
|
||
```bash | ||
lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device maca | ||
``` | ||
|
||
Run the following commands to launch lmdeploy chatting after starting container: | ||
|
||
```bash | ||
docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ | ||
bash -i -c "lmdeploy chat --backend pytorch --device maca internlm/internlm2_5-7b-chat" | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.