Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/en/get_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,9 @@ On Other Platforms
:caption: NPU(Huawei)

ascend/get_started.md

.. toctree::
:maxdepth: 1
:caption: PPU

ppu/get_started.md
74 changes: 74 additions & 0 deletions docs/en/get_started/ppu/get_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Get Started with PPU

The usage of lmdeploy on a ppu device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy.
Please read the original [Get Started](../get_started.md) guide before reading this tutorial.

## Installation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this links describe installation on ascend platform.
cc @jinminxi104

Please refer to [dlinfer installation guide](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to setup the device software toolkit?

## Offline batch inference

> \[!TIP\]
> Graph mode is supported on ppu.
> Users can set `eager_mode=False` to enable graph mode, or set `eager_mode=True` to disable graph mode.

### LLM inference

Set `device_type="ppu"` in the `PytorchEngineConfig`:

```python
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig

pipe = pipeline("internlm/internlm2_5-7b-chat",
backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=False))
question = ['Hi, pls intro yourself', 'Shanghai is']
response = pipe(question)
print(response)
```

### VLM inference

Set `device_type="ppu"` in the `PytorchEngineConfig`:

```python
from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('OpenGVLab/InternVL2-2B',
backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=False))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
```

## Online serving

> \[!TIP\]
> Graph mode is supported on ppu.
> Graph mode is default enabled in online serving. Users can add `--eager-mode` to disable graph mode.

### Serve an LLM model

Add `--device ppu` in the serve command.

```bash
lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat
```

### Serve a VLM model

Add `--device ppu` in the serve command

```bash
lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B
```

## Inference with Command line Interface

Add `--device ppu` in the serve command.

```bash
lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode
```
20 changes: 20 additions & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,23 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes |
| CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - |
| GLM4V | 9B | MLLM | Yes | No | - | - | - |

## PyTorchEngine on PPU

| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) |
| :------------: | :-------: | :--: | :--------------: | :--------------: |
| Llama2 | 7B - 70B | LLM | Yes | Yes |
| Llama3 | 8B | LLM | Yes | Yes |
| Llama3.1 | 8B | LLM | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes |
| InternLM2.5 | 7B - 20B | LLM | Yes | Yes |
| InternLM3 | 8B | LLM | Yes | Yes |
| Mixtral | 8x7B | LLM | Yes | Yes |
| QWen1.5-MoE | A2.7B | LLM | Yes | Yes |
| QWen2(.5) | 7B | LLM | Yes | Yes |
| QWen2-MoE | A14.57B | LLM | Yes | Yes |
| QWen3 | 0.6B-235B | LLM | Yes | Yes |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes |
| InternVL2.5 | 1B-78B | MLLM | Yes | Yes |
| InternVL3 | 1B-78B | MLLM | Yes | Yes |
2 changes: 1 addition & 1 deletion docs/zh_cn/get_started/ascend/get_started.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 华为昇腾(Atlas 800T A2 & Atlas 300I Duo)

我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。
我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。

支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-华为昇腾平台).

Expand Down
6 changes: 6 additions & 0 deletions docs/zh_cn/get_started/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,9 @@
:caption: NPU(Huawei)

ascend/get_started.md

.. toctree::
:maxdepth: 1
:caption: PPU

ppu/get_started.md
72 changes: 72 additions & 0 deletions docs/zh_cn/get_started/ppu/get_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# 在阿里平头哥上快速开始

我们基于 LMDeploy 的 PytorchEngine,增加了平头哥设备的支持。所以,在平头哥上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。

## 安装

安装请参考 [dlinfer 安装方法](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95)。

## 离线批处理

> \[!TIP\]
> 图模式已支持。用户可以设定`eager_mode=False`来开启图模式,或者设定`eager_mode=True`来关闭图模式。

### LLM 推理

将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。

```python
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig

pipe = pipeline("internlm/internlm2_5-7b-chat",
backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question)
print(response)
```

### VLM 推理

将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。

```python
from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('OpenGVLab/InternVL2-2B',
backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
```

## 在线服务

> \[!TIP\]
> 图模式已支持。
> 在线服务时,图模式默认开启,用户可以添加`--eager-mode`来关闭图模式。

### LLM 模型服务

将`--device ppu`加入到服务启动命令中。

```bash
lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat
```

### VLM 模型服务

将`--device ppu`加入到服务启动命令中。

```bash
lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B
```

## 使用命令行与LLM模型对话

将`--device ppu`加入到服务启动命令中。

```bash
lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode
```
20 changes: 20 additions & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,23 @@
| InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes |
| CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - |
| GLM4V | 9B | MLLM | Yes | No | - | - | - |

## PyTorchEngine 阿里平头哥平台

| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) |
| :------------: | :-------: | :--: | :--------------: | :--------------: |
| Llama2 | 7B - 70B | LLM | Yes | Yes |
| Llama3 | 8B | LLM | Yes | Yes |
| Llama3.1 | 8B | LLM | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes |
| InternLM2.5 | 7B - 20B | LLM | Yes | Yes |
| InternLM3 | 8B | LLM | Yes | Yes |
| Mixtral | 8x7B | LLM | Yes | Yes |
| QWen1.5-MoE | A2.7B | LLM | Yes | Yes |
| QWen2(.5) | 7B | LLM | Yes | Yes |
| QWen2-MoE | A14.57B | LLM | Yes | Yes |
| QWen3 | 0.6B-235B | LLM | Yes | Yes |
| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes |
| InternVL2 | 1B-40B | MLLM | Yes | Yes |
| InternVL2.5 | 1B-78B | MLLM | Yes | Yes |
| InternVL3 | 1B-78B | MLLM | Yes | Yes |