Skip to content

Conversation

dimitribarbot
Copy link
Contributor

What does this PR do?

Add ControlNet (InstantX/Qwen-Image-ControlNet-Union) support for Qwen-Image-Edit.

This pipeline enables two latent images to be used as inputs: one for Qwen-Image-Edit and another for Qwen-Image-ControlNet-Union. This provides greater control over the expected results.

Inference

import torch
from diffusers import QwenImageControlNetModel, QwenImageEditControlNetPipeline
from diffusers.utils import load_image

base_model = "Qwen/Qwen-Image-Edit"
controlnet_model = "InstantX/Qwen-Image-ControlNet-Union"

controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)

pipe = QwenImageEditControlNetPipeline.from_pretrained(
    base_model, controlnet=controlnet, torch_dtype=torch.bfloat16
).to("cuda")

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/living_room.png"
).convert("RGB")
control_image = load_image(
    "https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union/resolve/main/conds/depth.png"
)
prompt = (
    "Anime style of a swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. "
    "A beige couch with white and beige cushions sits on a wooden floor, with a matching coffee table in front. "
    "The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. "
    "Sunlight pours through the leaves outside, casting cool shadows on the floor."
)
image = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=" ",
    control_image=image,
    controlnet_conditioning_scale=1.5,
    width=control_image.size[0],
    height=control_image.size[1],
    num_inference_steps=30,
    true_cfg_scale=2.5,
).images[0]
image.save("qwenimage_edit_controlnet.png")

N.B.1. If this PR and image location are accepted, I will upload the living_room.png file to the documentation-images repository.
N.B.2. To achieve the desired result, set controlnet_conditioning_scale to a value greater than 1. A good starting point is 1.5.

Examples

Depth

Input image:

living_room

Control image:

depth

prompt = (
    "Anime style of a swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. "
    "A beige couch with white and beige cushions sits on a wooden floor, with a matching coffee table in front. "
    "The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. "
    "Sunlight pours through the leaves outside, casting cool shadows on the floor."
)

Result:

living_room_edited

Pose

Input image:

depth

Control image:

depth

prompt = (
    "Make this man sit on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. "
    "The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall."
)

Result:

pose_with_controlnet

Whereas if we don't use controlnet:

pose_without_controlnet

N.B. All examples were created using the Nunchaku version of the transformer.

Before submitting

Who can review?

@yiyixuxu
@asomoza

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i left some comments
didn't know the instant x controlnet is compatible with qwen edit :)

@@ -639,7 +639,9 @@ def forward(
if controlnet_block_samples is not None:
interval_control = len(self.transformer_blocks) / len(controlnet_block_samples)
interval_control = int(np.ceil(interval_control))
hidden_states = hidden_states + controlnet_block_samples[index_block // interval_control]
sample = controlnet_block_samples[index_block // interval_control]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we adjust inputs in pipeline instead?

Copy link
Contributor Author

@dimitribarbot dimitribarbot Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added sample padding to the pipeline and rolled back that change. Let me know if that works for you.

@dimitribarbot
Copy link
Contributor Author

Thank you for your code review.

In fact, for my PR, I only merged the pipeline_qwenimage_edit.py and pipeline_qwenimage_controlnet.py pipelines. The only things I added were:

  • the sample code at the beginning of the pipeline,
  • the documentation for the new arguments specific to controlnet,
  • a partial update of the hidden_states in the forward of the Qwen-Image transformer.

Your feedback is all valid, and I can address it in my PR. However, this feedback also applies to the other Qwen-Image pipelines. Given that there is already a complete refactoring PR #12322, what would you prefer me to do:

  • Take your comments into account only in the new pipeline I added,
  • Take them into account for all the pipelines concerned,
  • Not take them into account now and integrate them into the refactoring PR?

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 17, 2025

hi @dimitribarbot
about this, which feedback you are talking about?

However, this feedback also applies to the other Qwen-Image pipelines

if it is refering to the remove of enable_vae_slicing methods and such, that's because this is a new pipeline and we do not need to add a method that we already deprecated

my review is not taking into consideration of this PR # #12322 at all, your PR will probably be merged first, and we can solve the conflicts from the other PR if there is any

@dimitribarbot dimitribarbot force-pushed the qwen-image-edit-controlnet branch from a603407 to 195bddd Compare September 17, 2025 19:01
@dimitribarbot
Copy link
Contributor Author

hi @yiyixuxu

about this, which feedback you are talking about?

The removal of the enable_vae_slicing methods and other deprecated functions is also needed in the pipeline_qwenimage_edit.py and pipeline_qwenimage_controlnet.py files (and other qwenimage pipelines as well). Should I remove these functions from these files too, or only from my new pipeline?

For your comment in the transformer_qwenimage.py file, what do you mean exactly by:

can we adjust inputs in pipeline instead?

Would you prefer me to revert the changes to transformer_qwenimage.py and instead, pad the controlnet_block_samples with zeros for sizes between sample.size(1) and hidden_states.size(1) in the pipeline_qwenimage_edit_controlnet.py pipeline? Or would you like me to do something else? (changing the code in transformer_qwenimage.py might affect the pipeline_qwenimage_controlnet.py pipeline though)

@sayakpaul
Copy link
Member

@dimitribarbot thanks for this PR. You can completely disregard #12322 for now. We're still brainstorming it (hence it's WIP).

@dimitribarbot dimitribarbot force-pushed the qwen-image-edit-controlnet branch from 9866325 to 5b1c134 Compare September 20, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants