You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+88-54Lines changed: 88 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,52 +56,78 @@ Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.
56
56
* Generate Markdown output for [GALACTICA](https://github.com/paperswithcode/galai), including LaTeX support.
57
57
* Support for [Pygmalion](https://huggingface.co/models?search=pygmalionai/pygmalion) and custom characters in JSON or TavernAI Character Card formats ([FAQ](https://github.com/oobabooga/text-generation-webui/wiki/Pygmalion-chat-model-FAQ)).
58
58
* Advanced chat features (send images, get audio responses with TTS).
59
-
* Stream the text output in real time.
59
+
* Stream the text output in real time very efficiently.
60
60
* Load parameter presets from text files.
61
-
* Load large models in 8-bit mode (see [here](https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1456040134), [here](https://github.com/oobabooga/text-generation-webui/issues/20#issuecomment-1411650652) and [here](https://www.reddit.com/r/PygmalionAI/comments/1115gom/running_pygmalion_6b_with_8gb_of_vram/) if you are on Windows).
61
+
* Load large models in 8-bit mode.
62
62
* Split large models across your GPU(s), CPU, and disk.
* Get responses via API, [with](https://github.com/oobabooga/text-generation-webui/blob/main/api-example-streaming.py) or [without](https://github.com/oobabooga/text-generation-webui/blob/main/api-example.py) streaming.
67
-
*[Supports the LLaMA model, including 4-bit mode](https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model).
68
-
*[Supports the RWKV model](https://github.com/oobabooga/text-generation-webui/wiki/RWKV-model).
67
+
*[LLaMA model, including 4-bit mode](https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model).
*[Works on Google Colab](https://github.com/oobabooga/text-generation-webui/wiki/Running-on-Colab).
72
73
73
-
## Installation option 1: conda
74
+
## Installation
74
75
75
-
Open a terminal and copy and paste these commands one at a time ([install conda](https://docs.conda.io/en/latest/miniconda.html) first if you don't have it already):
76
+
The recommended installation methods are the following:
77
+
78
+
* Linux and MacOS: using conda natively.
79
+
* Windows: using conda on WSL ([WSL installation guide](https://github.com/oobabooga/text-generation-webui/wiki/Windows-Subsystem-for-Linux-(Ubuntu)-Installation-Guide)).
80
+
81
+
Conda can be downloaded here: https://docs.conda.io/en/latest/miniconda.html
82
+
83
+
On Linux or WSL, it can be automatically installed with these two commands:
> 1. If you are on Windows, it may be easier to run the commands above in a WSL environment. The performance may also be better.
102
122
> 2. For a more detailed, user-contributed guide, see: [Installation instructions for human beings](https://github.com/oobabooga/text-generation-webui/wiki/Installation-instructions-for-human-beings).
123
+
>
124
+
> For bitsandbytes and `--load-in-8bit` to work on Linux/WSL, this dirty fix is currently necessary: https://github.com/oobabooga/text-generation-webui/issues/400#issuecomment-1474876859
125
+
126
+
### Alternative: native Windows installation
103
127
104
-
## Installation option 2: one-click installers
128
+
As an alternative to the recommended WSL method, you can install the web UI natively on Windows using this guide. It will be a lot harder and the performance may be slower: [Installation instructions for human beings](https://github.com/oobabooga/text-generation-webui/wiki/Installation-instructions-for-human-beings).
This method lags behind the newest developments and does not support 8-bit mode on Windows without additional set up: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1456040134, https://github.com/oobabooga/text-generation-webui/issues/20#issuecomment-1411650652
116
144
117
-
Models should be placed under `models/model-name`. For instance, `models/gpt-j-6B` for [GPT-J 6B](https://huggingface.co/EleutherAI/gpt-j-6B/tree/main).
Models should be placed inside the `models` folder.
120
152
121
153
[Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads) is the main place to download models. These are some noteworthy examples:
If you want to download a model manually, note that all you need are the json, txt, and pytorch\*.bin (or model*.safetensors) files. The remaining files are not necessary.
140
172
141
-
####GPT-4chan
173
+
### GPT-4chan
142
174
143
175
[GPT-4chan](https://huggingface.co/ykilcher/gpt-4chan) has been shut down from Hugging Face, so you need to download it elsewhere. You have two options:
Optionally, you can use the following command-line flags:
172
205
173
-
| Flag | Description |
174
-
|-------------|-------------|
175
-
|`-h`, `--help`| show this help message and exit |
176
-
|`--model MODEL`| Name of the model to load by default. |
177
-
|`--notebook`| Launch the web UI in notebook mode, where the output is written to the same text box as the input. |
178
-
|`--chat`| Launch the web UI in chat mode.|
179
-
|`--cai-chat`| Launch the web UI in chat mode with a style similar to Character.AI's. If the file `img_bot.png` or `img_bot.jpg` exists in the same folder as server.py, this image will be used as the bot's profile picture. Similarly, `img_me.png` or `img_me.jpg` will be used as your profile picture. |
180
-
|`--cpu`| Use the CPU to generate text.|
181
-
|`--load-in-8bit`| Load the model with 8-bit precision.|
182
-
|`--load-in-4bit`| DEPRECATED: use `--gptq-bits 4` instead. |
183
-
|`--gptq-bits GPTQ_BITS`| Load a pre-quantized model with specified precision. 2, 3, 4 and 8 (bit) are supported. Currently only works with LLaMA and OPT. |
184
-
|`--gptq-model-type MODEL_TYPE`| Model type of pre-quantized model. Currently only LLaMa and OPT are supported. |
185
-
|`--bf16`| Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
206
+
| Flag | Description |
207
+
|------------------|-------------|
208
+
|`-h`, `--help`| show this help message and exit |
209
+
|`--model MODEL`| Name of the model to load by default. |
210
+
|`--lora LORA`| Name of the LoRA to apply to the model by default. |
211
+
|`--notebook`| Launch the web UI in notebook mode, where the output is written to the same text box as the input. |
212
+
|`--chat`| Launch the web UI in chat mode.|
213
+
|`--cai-chat`| Launch the web UI in chat mode with a style similar to Character.AI's. If the file `img_bot.png` or `img_bot.jpg` exists in the same folder as server.py, this image will be used as the bot's profile picture. Similarly, `img_me.png` or `img_me.jpg` will be used as your profile picture. |
214
+
|`--cpu`| Use the CPU to generate text.|
215
+
|`--load-in-8bit`| Load the model with 8-bit precision.|
216
+
|`--load-in-4bit`| DEPRECATED: use `--gptq-bits 4` instead. |
217
+
|`--gptq-bits GPTQ_BITS`| Load a pre-quantized model with specified precision. 2, 3, 4 and 8 (bit) are supported. Currently only works with LLaMA and OPT. |
218
+
|`--gptq-model-type MODEL_TYPE`| Model type of pre-quantized model. Currently only LLaMa and OPT are supported. |
219
+
|`--bf16`| Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
186
220
|`--auto-devices`| Automatically split the model across the available GPU(s) and CPU.|
187
-
|`--disk`| If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk. |
221
+
|`--disk`| If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk. |
188
222
|`--disk-cache-dir DISK_CACHE_DIR`| Directory to save the disk cache to. Defaults to `cache/`. |
189
223
|`--gpu-memory GPU_MEMORY [GPU_MEMORY ...]`| Maxmimum GPU memory in GiB to be allocated per GPU. Example: `--gpu-memory 10` for a single GPU, `--gpu-memory 10 5` for two GPUs. |
190
-
|`--cpu-memory CPU_MEMORY`| Maximum CPU memory in GiB to allocate for offloaded weights. Must be an integer number. Defaults to 99.|
191
-
|`--flexgen`| Enable the use of FlexGen offloading. |
192
-
|`--percent PERCENT [PERCENT ...]`| FlexGen: allocation percentages. Must be 6 numbers separated by spaces (default: 0, 100, 100, 0, 100, 0). |
193
-
|`--compress-weight`| FlexGen: Whether to compress weight (default: False).|
194
-
|`--pin-weight [PIN_WEIGHT]`| FlexGen: whether to pin weights (setting this to False reduces CPU memory by 20%). |
224
+
|`--cpu-memory CPU_MEMORY`| Maximum CPU memory in GiB to allocate for offloaded weights. Must be an integer number. Defaults to 99.|
225
+
|`--flexgen`| Enable the use of FlexGen offloading. |
226
+
|`--percent PERCENT [PERCENT ...]`| FlexGen: allocation percentages. Must be 6 numbers separated by spaces (default: 0, 100, 100, 0, 100, 0). |
227
+
|`--compress-weight`| FlexGen: Whether to compress weight (default: False).|
228
+
|`--pin-weight [PIN_WEIGHT]`| FlexGen: whether to pin weights (setting this to False reduces CPU memory by 20%). |
195
229
|`--deepspeed`| Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration. |
196
-
|`--nvme-offload-dir NVME_OFFLOAD_DIR`| DeepSpeed: Directory to use for ZeRO-3 NVME offloading. |
197
-
|`--local_rank LOCAL_RANK`| DeepSpeed: Optional argument for distributed setups. |
198
-
|`--rwkv-strategy RWKV_STRATEGY`| RWKV: The strategy to use while loading the model. Examples: "cpu fp32", "cuda fp16", "cuda fp16i8". |
199
-
|`--rwkv-cuda-on`| RWKV: Compile the CUDA kernel for better performance. |
200
-
|`--no-stream`| Don't stream the text output in real time. |
230
+
|`--nvme-offload-dir NVME_OFFLOAD_DIR`| DeepSpeed: Directory to use for ZeRO-3 NVME offloading. |
231
+
|`--local_rank LOCAL_RANK`| DeepSpeed: Optional argument for distributed setups. |
232
+
|`--rwkv-strategy RWKV_STRATEGY`| RWKV: The strategy to use while loading the model. Examples: "cpu fp32", "cuda fp16", "cuda fp16i8". |
233
+
|`--rwkv-cuda-on`| RWKV: Compile the CUDA kernel for better performance. |
234
+
|`--no-stream`| Don't stream the text output in real time. |
201
235
|`--settings SETTINGS_FILE`| Load the default interface settings from this json file. See `settings-template.json` for an example. If you create a file called `settings.json`, this file will be loaded by default without the need to use the `--settings` flag.|
202
236
|`--extensions EXTENSIONS [EXTENSIONS ...]`| The list of extensions to load. If you want to load more than one extension, write the names separated by spaces. |
203
-
|`--listen`| Make the web UI reachable from your local network.|
237
+
|`--listen`| Make the web UI reachable from your local network.|
204
238
|`--listen-port LISTEN_PORT`| The listening port that the server will use. |
205
-
|`--share`| Create a public URL. This is useful for running the web UI on Google Colab or similar. |
206
-
|`--auto-launch`| Open the web UI in the default browser upon launch. |
207
-
|`--verbose`| Print the prompts to the terminal. |
239
+
|`--share`| Create a public URL. This is useful for running the web UI on Google Colab or similar. |
240
+
|`--auto-launch`| Open the web UI in the default browser upon launch. |
241
+
|`--verbose`| Print the prompts to the terminal. |
208
242
209
243
Out of memory errors? [Check this guide](https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide).
210
244
@@ -229,7 +263,7 @@ Before reporting a bug, make sure that you have:
229
263
230
264
## Credits
231
265
232
-
- Gradio dropdown menu refresh button: https://github.com/AUTOMATIC1111/stable-diffusion-webui
266
+
- Gradio dropdown menu refresh button, code for reloading the interface: https://github.com/AUTOMATIC1111/stable-diffusion-webui
233
267
- Verbose preset: Anonymous 4chan user.
234
268
- NovelAI and KoboldAI presets: https://github.com/KoboldAI/KoboldAI-Client/wiki/Settings-Presets
235
269
- Pygmalion preset, code for early stopping in chat mode, code for some of the sliders, --chat mode colors: https://github.com/PygmalionAI/gradio-ui/
0 commit comments