Support ControlNet for Qwen-Image-Edit #12325

dimitribarbot · 2025-09-13T12:42:21Z

What does this PR do?

Add ControlNet (InstantX/Qwen-Image-ControlNet-Union) support for Qwen-Image-Edit.

This pipeline enables two latent images to be used as inputs: one for Qwen-Image-Edit and another for Qwen-Image-ControlNet-Union. This provides greater control over the expected results.

Inference

import torch
from diffusers import QwenImageControlNetModel, QwenImageEditControlNetPipeline
from diffusers.utils import load_image

base_model = "Qwen/Qwen-Image-Edit"
controlnet_model = "InstantX/Qwen-Image-ControlNet-Union"

controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)

pipe = QwenImageEditControlNetPipeline.from_pretrained(
    base_model, controlnet=controlnet, torch_dtype=torch.bfloat16
).to("cuda")

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/living_room.png"
).convert("RGB")
control_image = load_image(
    "https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union/resolve/main/conds/depth.png"
)
prompt = (
    "Anime style of a swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. "
    "A beige couch with white and beige cushions sits on a wooden floor, with a matching coffee table in front. "
    "The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. "
    "Sunlight pours through the leaves outside, casting cool shadows on the floor."
)
image = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=" ",
    control_image=image,
    controlnet_conditioning_scale=1.5,
    width=control_image.size[0],
    height=control_image.size[1],
    num_inference_steps=30,
    true_cfg_scale=2.5,
).images[0]
image.save("qwenimage_edit_controlnet.png")

N.B.1. If this PR and image location are accepted, I will upload the living_room.png file to the documentation-images repository.
N.B.2. To achieve the desired result, set controlnet_conditioning_scale to a value greater than 1. A good starting point is 1.5.

Examples

Depth

Input image:

Control image:

prompt = (
    "Anime style of a swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. "
    "A beige couch with white and beige cushions sits on a wooden floor, with a matching coffee table in front. "
    "The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. "
    "Sunlight pours through the leaves outside, casting cool shadows on the floor."
)

Result:

Pose

Input image:

Control image:

prompt = (
    "Make this man sit on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. "
    "The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall."
)

Result:

Whereas if we don't use controlnet:

N.B. All examples were created using the Nunchaku version of the transformer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@yiyixuxu
@asomoza

HuggingFaceDocBuilderDev · 2025-09-16T23:18:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks! i left some comments
didn't know the instant x controlnet is compatible with qwen edit :)

src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit_controlnet.py

yiyixuxu · 2025-09-16T23:23:05Z

src/diffusers/models/transformers/transformer_qwenimage.py

@@ -639,7 +639,9 @@ def forward(
            if controlnet_block_samples is not None:
                interval_control = len(self.transformer_blocks) / len(controlnet_block_samples)
                interval_control = int(np.ceil(interval_control))
-                hidden_states = hidden_states + controlnet_block_samples[index_block // interval_control]
+                sample = controlnet_block_samples[index_block // interval_control]


can we adjust inputs in pipeline instead?

I added sample padding to the pipeline and rolled back that change. Let me know if that works for you.

dimitribarbot · 2025-09-17T07:28:42Z

Thank you for your code review.

In fact, for my PR, I only merged the pipeline_qwenimage_edit.py and pipeline_qwenimage_controlnet.py pipelines. The only things I added were:

the sample code at the beginning of the pipeline,
the documentation for the new arguments specific to controlnet,
a partial update of the hidden_states in the forward of the Qwen-Image transformer.

Your feedback is all valid, and I can address it in my PR. However, this feedback also applies to the other Qwen-Image pipelines. Given that there is already a complete refactoring PR #12322, what would you prefer me to do:

Take your comments into account only in the new pipeline I added,
Take them into account for all the pipelines concerned,
Not take them into account now and integrate them into the refactoring PR?

yiyixuxu · 2025-09-17T17:51:40Z

hi @dimitribarbot
about this, which feedback you are talking about?

However, this feedback also applies to the other Qwen-Image pipelines

if it is refering to the remove of enable_vae_slicing methods and such, that's because this is a new pipeline and we do not need to add a method that we already deprecated

my review is not taking into consideration of this PR # #12322 at all, your PR will probably be merged first, and we can solve the conflicts from the other PR if there is any

dimitribarbot · 2025-09-17T19:04:24Z

hi @yiyixuxu

about this, which feedback you are talking about?

The removal of the enable_vae_slicing methods and other deprecated functions is also needed in the pipeline_qwenimage_edit.py and pipeline_qwenimage_controlnet.py files (and other qwenimage pipelines as well). Should I remove these functions from these files too, or only from my new pipeline?

For your comment in the transformer_qwenimage.py file, what do you mean exactly by:

can we adjust inputs in pipeline instead?

Would you prefer me to revert the changes to transformer_qwenimage.py and instead, pad the controlnet_block_samples with zeros for sizes between sample.size(1) and hidden_states.size(1) in the pipeline_qwenimage_edit_controlnet.py pipeline? Or would you like me to do something else? (changing the code in transformer_qwenimage.py might affect the pipeline_qwenimage_controlnet.py pipeline though)

sayakpaul · 2025-09-20T08:13:03Z

@dimitribarbot thanks for this PR. You can completely disregard #12322 for now. We're still brainstorming it (hence it's WIP).

…nd vae slicing functions

yiyixuxu reviewed Sep 16, 2025

View reviewed changes

dimitribarbot force-pushed the qwen-image-edit-controlnet branch from a603407 to 195bddd Compare September 17, 2025 19:01

dimitribarbot added 4 commits September 20, 2025 20:03

add qwen-image-edit-controlnet

469c576

Add missing 'Copied from' comments and remove deprecated vae tiling a…

a99d99f

…nd vae slicing functions

Move controlnet sample initialization from transformer to pipeline

aff0d86

Fix issue with already used variable

5b1c134

dimitribarbot force-pushed the qwen-image-edit-controlnet branch from 9866325 to 5b1c134 Compare September 20, 2025 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support ControlNet for Qwen-Image-Edit #12325

Support ControlNet for Qwen-Image-Edit #12325

Uh oh!

dimitribarbot commented Sep 13, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 16, 2025

Uh oh!

yiyixuxu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu Sep 16, 2025

Uh oh!

dimitribarbot Sep 18, 2025 •

edited

Loading

Uh oh!

dimitribarbot commented Sep 17, 2025

Uh oh!

yiyixuxu commented Sep 17, 2025 •

edited

Loading

Uh oh!

dimitribarbot commented Sep 17, 2025

Uh oh!

sayakpaul commented Sep 20, 2025

Uh oh!

Uh oh!

Support ControlNet for Qwen-Image-Edit #12325

Are you sure you want to change the base?

Support ControlNet for Qwen-Image-Edit #12325

Uh oh!

Conversation

dimitribarbot commented Sep 13, 2025

What does this PR do?

Inference

Examples

Depth

Pose

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 16, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

dimitribarbot Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitribarbot commented Sep 17, 2025

Uh oh!

yiyixuxu commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dimitribarbot commented Sep 17, 2025

Uh oh!

sayakpaul commented Sep 20, 2025

Uh oh!

Uh oh!

dimitribarbot Sep 18, 2025 •

edited

Loading

yiyixuxu commented Sep 17, 2025 •

edited

Loading