Skip to content

Conversation

gafda
Copy link

@gafda gafda commented Aug 26, 2025

This pull request introduces significant updates to the Docker setup for Stable Diffusion web UI services, with a focus on improving hardware compatibility, updating dependencies, and refactoring service definitions. The changes include upgrading base images and software versions, adding support for AMD ROCm devices, and restructuring the docker-compose.yml file to better organize service configurations for CUDA, ROCm, and CPU environments.

Hardware compatibility and service configuration:

  • Added ROCm (AMD GPU) support with a new auto-rocm service and corresponding Dockerfile-rocm, including device, environment, and build settings for ROCm hardware. [1] [2]
  • Refactored docker-compose.yml to split service definitions by hardware type (CUDA, ROCm, CPU) and grouped CLI arguments for each, improving maintainability and clarity. [1] [2]

Dependency and version updates:

  • Updated base images and Python dependencies for all services, including the upgrade of PyTorch to 2.5.1 (CUDA) and 2.6.0 (ROCm), and bumping the Stable Diffusion web UI to version 1.10.1. [1] [2] [3] [4]
  • Upgraded the Alpine Git and Bash images used for downloads and utility containers to newer versions for improved security and compatibility. [1] [2]

General improvements and bug fixes:

  • Improved robustness of Gradio patching by searching for routes.py files dynamically and applying changes, with added error handling. [1] [2]
  • Updated image tags and build contexts to reflect new versions and configurations for each service profile.

…grades

Refactor Dockerfile and docker-compose for ROCm support and improved build context
Refactor docker-compose.yml for improved service configuration and organization
group_add:
- video
deploy:
# resources:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had to uncomment this to run the profile

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@pcting
Copy link

pcting commented Sep 11, 2025

i'm getting this error on my machine:

$ docker compose --profile auto-rocm up --build -d 
...
[+] Running 1/2
 ✔ sd-auto-rocm:79                     Built                                                                              0.0s 
 ⠙ Container webui-docker-auto-rocm-1  Starting                                                                           0.1s 
DEBU[0000] otel error                                    error="<nil>"
DEBU[0000] otel error                                    error="<nil>"
Error response from daemon: could not select device driver "amd" with capabilities: [[gpu]]

any thoughts?

@gafda
Copy link
Author

gafda commented Sep 12, 2025

i'm getting this error on my machine:

$ docker compose --profile auto-rocm up --build -d 
...
[+] Running 1/2
 ✔ sd-auto-rocm:79                     Built                                                                              0.0s 
 ⠙ Container webui-docker-auto-rocm-1  Starting                                                                           0.1s 
DEBU[0000] otel error                                    error="<nil>"
DEBU[0000] otel error                                    error="<nil>"
Error response from daemon: could not select device driver "amd" with capabilities: [[gpu]]

any thoughts?

That is a weird error. I'll try to look into it.

Meanwhile, did you update the env vars to match your system:

    - ROCm_VERSION=6.4
    - HIP_VISIBLE_DEVICES=0
    - HSA_OVERRIDE_GFX_VERSION=11.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants