-
Notifications
You must be signed in to change notification settings - Fork 10k
Linear refactor #9662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Linear refactor #9662
Conversation
This seems like a good idea in general. One thing I noticed is that these changes will definitely break GGUF as it subclasses the existing Not sure if there's a way to make it compatible somehow, a lot of people use/depend on GGUF though. If compatibility isn't possible, maybe it would be worth reaching out to city96 if this pull seems like it's going to get merged. |
Thank you for the hint! I've exposed |
import torch | ||
import logging | ||
import comfy.model_management | ||
from comfy.cli_args import args, PerformanceFeature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import is needed by existing code in master
:
Line 55 in e01e99d
if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast: |
The pull will merge but it crashes on that line, at least with the options I am using.
I had a chance to briefly test this. I tested on a Q4_K GGUF Chroma Radiance model and it after fixing the issue mentioned above it does run successfully. One thing I noticed is this does seem to change the generation noticeably and the difference seemed consistent. I verified this by running 4 tests, restarting ComfyUI each time. Twice with your changes and twice without, restarting in between each attempt and I got consistent results. I am not sure if that is expected or not. I'm not sure if it makes a difference, but I tested by applying your changes on top of this branch: https://github.com/blepping/ComfyUI/tree/radiance_dct_scaling |
Refactor the Linear Op to allow for mixed-precision checkpoints and allow for easy expansion to new Quantized types, like NVFP4 in the futurue. This moved the logic of selecting the right datatype and forward call based on the state_dict. Using a class factory this should also slightly reduce the code-dublication for manual_cast ops.
I tried keeping most of the logic as it currently is but there is a solid chance of oversights - happy for any feedback!
Changes:
I've tested these changes on:
each with scaled and non scaled variants. For the backbones I also tested with load_dtype e4m3 and e5m2.