-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Support ControlNet for Qwen-Image-Edit #12325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support ControlNet for Qwen-Image-Edit #12325
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! i left some comments
didn't know the instant x controlnet is compatible with qwen edit :)
def enable_vae_slicing(self): | ||
r""" | ||
Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | ||
compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | ||
""" | ||
depr_message = f"Calling `enable_vae_slicing()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.enable_slicing()`." | ||
deprecate( | ||
"enable_vae_slicing", | ||
"0.40.0", | ||
depr_message, | ||
) | ||
self.vae.enable_slicing() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def enable_vae_slicing(self): | |
r""" | |
Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
""" | |
depr_message = f"Calling `enable_vae_slicing()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.enable_slicing()`." | |
deprecate( | |
"enable_vae_slicing", | |
"0.40.0", | |
depr_message, | |
) | |
self.vae.enable_slicing() |
def disable_vae_slicing(self): | ||
r""" | ||
Disable sliced VAE decoding. If `enable_vae_slicing` was previously enabled, this method will go back to | ||
computing decoding in one step. | ||
""" | ||
depr_message = f"Calling `disable_vae_slicing()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.disable_slicing()`." | ||
deprecate( | ||
"disable_vae_slicing", | ||
"0.40.0", | ||
depr_message, | ||
) | ||
self.vae.disable_slicing() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def disable_vae_slicing(self): | |
r""" | |
Disable sliced VAE decoding. If `enable_vae_slicing` was previously enabled, this method will go back to | |
computing decoding in one step. | |
""" | |
depr_message = f"Calling `disable_vae_slicing()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.disable_slicing()`." | |
deprecate( | |
"disable_vae_slicing", | |
"0.40.0", | |
depr_message, | |
) | |
self.vae.disable_slicing() |
def enable_vae_tiling(self): | ||
r""" | ||
Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | ||
compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | ||
processing larger images. | ||
""" | ||
depr_message = f"Calling `enable_vae_tiling()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.enable_tiling()`." | ||
deprecate( | ||
"enable_vae_tiling", | ||
"0.40.0", | ||
depr_message, | ||
) | ||
self.vae.enable_tiling() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def enable_vae_tiling(self): | |
r""" | |
Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
processing larger images. | |
""" | |
depr_message = f"Calling `enable_vae_tiling()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.enable_tiling()`." | |
deprecate( | |
"enable_vae_tiling", | |
"0.40.0", | |
depr_message, | |
) | |
self.vae.enable_tiling() |
def disable_vae_tiling(self): | ||
r""" | ||
Disable tiled VAE decoding. If `enable_vae_tiling` was previously enabled, this method will go back to | ||
computing decoding in one step. | ||
""" | ||
depr_message = f"Calling `disable_vae_tiling()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.disable_tiling()`." | ||
deprecate( | ||
"disable_vae_tiling", | ||
"0.40.0", | ||
depr_message, | ||
) | ||
self.vae.disable_tiling() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def disable_vae_tiling(self): | |
r""" | |
Disable tiled VAE decoding. If `enable_vae_tiling` was previously enabled, this method will go back to | |
computing decoding in one step. | |
""" | |
depr_message = f"Calling `disable_vae_tiling()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.disable_tiling()`." | |
deprecate( | |
"disable_vae_tiling", | |
"0.40.0", | |
depr_message, | |
) | |
self.vae.disable_tiling() |
def enable_vae_slicing(self): | ||
r""" | ||
Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | ||
compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | ||
""" | ||
depr_message = f"Calling `enable_vae_slicing()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.enable_slicing()`." | ||
deprecate( | ||
"enable_vae_slicing", | ||
"0.40.0", | ||
depr_message, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def enable_vae_slicing(self): | |
r""" | |
Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
""" | |
depr_message = f"Calling `enable_vae_slicing()` on a `{self.__class__.__name__}` is deprecated and this method will be removed in a future version. Please use `pipe.vae.enable_slicing()`." | |
deprecate( | |
"enable_vae_slicing", | |
"0.40.0", | |
depr_message, | |
) |
|
||
return prompt_embeds, encoder_attention_mask | ||
|
||
def encode_prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a #Copied from
?
raise AttributeError("Could not access latents of provided encoder_output") | ||
|
||
|
||
def calculate_dimensions(target_area, ratio): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from?
|
||
return latents | ||
|
||
def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied from?
@@ -639,7 +639,9 @@ def forward( | |||
if controlnet_block_samples is not None: | |||
interval_control = len(self.transformer_blocks) / len(controlnet_block_samples) | |||
interval_control = int(np.ceil(interval_control)) | |||
hidden_states = hidden_states + controlnet_block_samples[index_block // interval_control] | |||
sample = controlnet_block_samples[index_block // interval_control] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we adjust inputs in pipeline instead?
Thank you for your code review. In fact, for my PR, I only merged the
Your feedback is all valid, and I can address it in my PR. However, this feedback also applies to the other Qwen-Image pipelines. Given that there is already a complete refactoring PR #12322, what would you prefer me to do:
|
What does this PR do?
Add ControlNet (InstantX/Qwen-Image-ControlNet-Union) support for Qwen-Image-Edit.
This pipeline enables two latent images to be used as inputs: one for Qwen-Image-Edit and another for Qwen-Image-ControlNet-Union. This provides greater control over the expected results.
Inference
N.B.1. If this PR and image location are accepted, I will upload the
living_room.png
file to thedocumentation-images
repository.N.B.2. To achieve the desired result, set
controlnet_conditioning_scale
to a value greater than 1. A good starting point is 1.5.Examples
Depth
Input image:
Control image:
Result:
Pose
Input image:
Control image:
Result:
Whereas if we don't use controlnet:
N.B. All examples were created using the Nunchaku version of the transformer.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yiyixuxu
@asomoza