Skip to content

Conversation

contentis
Copy link
Contributor

@contentis contentis commented Sep 1, 2025

Refactor the Linear Op to allow for mixed-precision checkpoints and allow for easy expansion to new Quantized types, like NVFP4 in the futurue. This moved the logic of selecting the right datatype and forward call based on the state_dict. Using a class factory this should also slightly reduce the code-dublication for manual_cast ops.

I tried keeping most of the logic as it currently is but there is a solid chance of oversights - happy for any feedback!

Changes:

  • Move Q/DQ logic to seperate file
  • Make Operator selection based of state dict
  • Add class factory to dynamically build Operator Set

I've tested these changes on:

  • T5 XXL
  • Qewn VL
  • umt5 XXL
  • Qwen Image
  • FLUX
  • WAN 2.2
  • SD3.5
  • SD15
  • SDXL

each with scaled and non scaled variants. For the backbones I also tested with load_dtype e4m3 and e5m2.

@blepping
Copy link
Contributor

blepping commented Sep 1, 2025

This seems like a good idea in general. One thing I noticed is that these changes will definitely break GGUF as it subclasses the existing comfy.ops.manual_cast: https://github.com/city96/ComfyUI-GGUF/blob/d247022e3fa66851c5084cc251b076aab816423d/ops.py#L227

Not sure if there's a way to make it compatible somehow, a lot of people use/depend on GGUF though. If compatibility isn't possible, maybe it would be worth reaching out to city96 if this pull seems like it's going to get merged.

@contentis
Copy link
Contributor Author

Thank you for the hint! I've exposed comfy.ops.manual_cast globablly, which should ensure backward compatabilty. I've so far only tested with FLUX GGUF, but that seems to work as expected.

@contentis contentis marked this pull request as ready for review September 8, 2025 11:11
import torch
import logging
import comfy.model_management
from comfy.cli_args import args, PerformanceFeature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is needed by existing code in master:

if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:

The pull will merge but it crashes on that line, at least with the options I am using.

@blepping
Copy link
Contributor

blepping commented Sep 11, 2025

I had a chance to briefly test this. I tested on a Q4_K GGUF Chroma Radiance model and it after fixing the issue mentioned above it does run successfully. One thing I noticed is this does seem to change the generation noticeably and the difference seemed consistent. I verified this by running 4 tests, restarting ComfyUI each time. Twice with your changes and twice without, restarting in between each attempt and I got consistent results. I am not sure if that is expected or not.

I'm not sure if it makes a difference, but I tested by applying your changes on top of this branch: https://github.com/blepping/ComfyUI/tree/radiance_dct_scaling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants