InternLM · guozixu2001 · Aug 14, 2025 · Aug 14, 2025 · Aug 14, 2025 · Aug 15, 2025
diff --git a/docs/en/get_started/index.rst b/docs/en/get_started/index.rst
@@ -6,3 +6,9 @@ On Other Platforms
    :caption: NPU(Huawei)
 
    ascend/get_started.md
+
+.. toctree::
+   :maxdepth: 1
+   :caption: PPU
+
+   ppu/get_started.md
diff --git a/docs/en/get_started/ppu/get_started.md b/docs/en/get_started/ppu/get_started.md
@@ -0,0 +1,74 @@
+# Get Started with PPU
+
+The usage of lmdeploy on a ppu device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy.
+Please read the original [Get Started](../get_started.md) guide before reading this tutorial.
+
+## Installation
+
+Please refer to [dlinfer installation guide](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95).
+
+## Offline batch inference
+
+> \[!TIP\]
+> Graph mode is supported on ppu.
+> Users can set `eager_mode=False` to enable graph mode, or set `eager_mode=True` to disable graph mode.
+
+### LLM inference
+
+Set `device_type="ppu"` in the `PytorchEngineConfig`:
+
+```python
+from lmdeploy import pipeline
+from lmdeploy import PytorchEngineConfig
+
+pipe = pipeline("internlm/internlm2_5-7b-chat",
+                backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=False))
+question = ['Hi, pls intro yourself', 'Shanghai is']
+response = pipe(question)
+print(response)
+```
+
+### VLM inference
+
+Set `device_type="ppu"` in the `PytorchEngineConfig`:
+
+```python
+from lmdeploy import pipeline, PytorchEngineConfig
+from lmdeploy.vl import load_image
+
+pipe = pipeline('OpenGVLab/InternVL2-2B',
+                backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=False))
+image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
+response = pipe(('describe this image', image))
+print(response)
+```
+
+## Online serving
+
+> \[!TIP\]
+> Graph mode is supported on ppu.
+> Graph mode is default enabled in online serving. Users can add `--eager-mode` to disable graph mode.
+
+### Serve an LLM model
+
+Add `--device ppu` in the serve command.
+
+```bash
+lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat
+```
+
+### Serve a VLM model
+
+Add `--device ppu` in the serve command
+
+```bash
+lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B
+```
+
+## Inference with Command line Interface
+
+Add `--device ppu` in the serve command.
+
+```bash
+lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode
+```
diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md
@@ -141,3 +141,23 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
 |   InternVL3    |  1B-78B   | MLLM |       Yes        |       Yes        |      Yes      |      Yes      |      Yes       |
 |  CogVLM2-chat  |    19B    | MLLM |       Yes        |        No        |       -       |       -       |       -        |
 |     GLM4V      |    9B     | MLLM |       Yes        |        No        |       -       |       -       |       -        |
+
+## PyTorchEngine on PPU
+
+|     Model      |   Size    | Type | FP16/BF16(eager) | FP16/BF16(graph) |
+| :------------: | :-------: | :--: | :--------------: | :--------------: |
+|     Llama2     | 7B - 70B  | LLM  |       Yes        |       Yes        |
+|     Llama3     |    8B     | LLM  |       Yes        |       Yes        |
+|    Llama3.1    |    8B     | LLM  |       Yes        |       Yes        |
+|   InternLM2    | 7B - 20B  | LLM  |       Yes        |       Yes        |
+|  InternLM2.5   | 7B - 20B  | LLM  |       Yes        |       Yes        |
+|   InternLM3    |    8B     | LLM  |       Yes        |       Yes        |
+|    Mixtral     |   8x7B    | LLM  |       Yes        |       Yes        |
+|  QWen1.5-MoE   |   A2.7B   | LLM  |       Yes        |       Yes        |
+|   QWen2(.5)    |    7B     | LLM  |       Yes        |       Yes        |
+|   QWen2-MoE    |  A14.57B  | LLM  |       Yes        |       Yes        |
+|     QWen3      | 0.6B-235B | LLM  |       Yes        |       Yes        |
+| InternVL(v1.5) |  2B-26B   | MLLM |       Yes        |       Yes        |
+|   InternVL2    |  1B-40B   | MLLM |       Yes        |       Yes        |
+|  InternVL2.5   |  1B-78B   | MLLM |       Yes        |       Yes        |
+|   InternVL3    |  1B-78B   | MLLM |       Yes        |       Yes        |
diff --git a/docs/zh_cn/get_started/ascend/get_started.md b/docs/zh_cn/get_started/ascend/get_started.md
@@ -1,6 +1,6 @@
 # 华为昇腾（Atlas 800T A2 & Atlas 300I Duo）
 
-我们基于 LMDeploy 的 PytorchEngine，增加了华为昇腾设备的支持。所以，在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前，请先阅读原版的[快速开始](../get_started.md)。
+我们基于 LMDeploy 的 PytorchEngine，增加了华为昇腾设备的支持。所以，在华为昇腾上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前，请先阅读原版的[快速开始](../get_started.md)。
 
 支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-华为昇腾平台).
 

diff --git a/docs/zh_cn/get_started/index.rst b/docs/zh_cn/get_started/index.rst
@@ -6,3 +6,9 @@
    :caption: NPU(Huawei)
 
    ascend/get_started.md
+
+.. toctree::
+   :maxdepth: 1
+   :caption: PPU
+
+   ppu/get_started.md
diff --git a/docs/zh_cn/get_started/ppu/get_started.md b/docs/zh_cn/get_started/ppu/get_started.md
@@ -0,0 +1,72 @@
+# 在阿里平头哥上快速开始
+
+我们基于 LMDeploy 的 PytorchEngine，增加了平头哥设备的支持。所以，在平头哥上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前，请先阅读原版的[快速开始](../get_started.md)。
+
+## 安装
+
+安装请参考 [dlinfer 安装方法](https://github.com/DeepLink-org/dlinfer#%E5%AE%89%E8%A3%85%E6%96%B9%E6%B3%95)。
+
+## 离线批处理
+
+> \[!TIP\]
+> 图模式已支持。用户可以设定`eager_mode=False`来开启图模式，或者设定`eager_mode=True`来关闭图模式。
+
+### LLM 推理
+
+将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。
+
+```python
+from lmdeploy import pipeline
+from lmdeploy import PytorchEngineConfig
+
+pipe = pipeline("internlm/internlm2_5-7b-chat",
+                backend_config=PytorchEngineConfig(tp=1, device_type="ppu", eager_mode=True))
+question = ["Shanghai is", "Please introduce China", "How are you?"]
+response = pipe(question)
+print(response)
+```
+
+### VLM 推理
+
+将`device_type="ppu"`加入`PytorchEngineConfig`的参数中。
+
+```python
+from lmdeploy import pipeline, PytorchEngineConfig
+from lmdeploy.vl import load_image
+
+pipe = pipeline('OpenGVLab/InternVL2-2B',
+                backend_config=PytorchEngineConfig(tp=1, device_type='ppu', eager_mode=True))
+image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
+response = pipe(('describe this image', image))
+print(response)
+```
+
+## 在线服务
+
+> \[!TIP\]
+> 图模式已支持。
+> 在线服务时，图模式默认开启，用户可以添加`--eager-mode`来关闭图模式。
+
+### LLM 模型服务
+
+将`--device ppu`加入到服务启动命令中。
+
+```bash
+lmdeploy serve api_server --backend pytorch --device ppu --eager-mode internlm/internlm2_5-7b-chat
+```
+
+### VLM 模型服务
+
+将`--device ppu`加入到服务启动命令中。
+
+```bash
+lmdeploy serve api_server --backend pytorch --device ppu --eager-mode OpenGVLab/InternVL2-2B
+```
+
+## 使用命令行与LLM模型对话
+
+将`--device ppu`加入到服务启动命令中。
+
+```bash
+lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ppu --eager-mode
+```
diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md
@@ -141,3 +141,23 @@
 |   InternVL3    |  1B-78B   | MLLM |       Yes        |       Yes        |      Yes      |      Yes      |      Yes       |
 |  CogVLM2-chat  |    19B    | MLLM |       Yes        |        No        |       -       |       -       |       -        |
 |     GLM4V      |    9B     | MLLM |       Yes        |        No        |       -       |       -       |       -        |
+
+## PyTorchEngine 阿里平头哥平台
+
+|     Model      |   Size    | Type | FP16/BF16(eager) | FP16/BF16(graph) |
+| :------------: | :-------: | :--: | :--------------: | :--------------: |
+|     Llama2     | 7B - 70B  | LLM  |       Yes        |       Yes        |
+|     Llama3     |    8B     | LLM  |       Yes        |       Yes        |
+|    Llama3.1    |    8B     | LLM  |       Yes        |       Yes        |
+|   InternLM2    | 7B - 20B  | LLM  |       Yes        |       Yes        |
+|  InternLM2.5   | 7B - 20B  | LLM  |       Yes        |       Yes        |
+|   InternLM3    |    8B     | LLM  |       Yes        |       Yes        |
+|    Mixtral     |   8x7B    | LLM  |       Yes        |       Yes        |
+|  QWen1.5-MoE   |   A2.7B   | LLM  |       Yes        |       Yes        |
+|   QWen2(.5)    |    7B     | LLM  |       Yes        |       Yes        |
+|   QWen2-MoE    |  A14.57B  | LLM  |       Yes        |       Yes        |
+|     QWen3      | 0.6B-235B | LLM  |       Yes        |       Yes        |
+| InternVL(v1.5) |  2B-26B   | MLLM |       Yes        |       Yes        |
+|   InternVL2    |  1B-40B   | MLLM |       Yes        |       Yes        |
+|  InternVL2.5   |  1B-78B   | MLLM |       Yes        |       Yes        |
+|   InternVL3    |  1B-78B   | MLLM |       Yes        |       Yes        |