Skip to content

Conversation

Kosinkadink
Copy link
Collaborator

@Kosinkadink Kosinkadink commented Aug 30, 2025

This PR adds support for choosing the attention to be used for sampling via optimized_attention_override in transformer_options in all natively-supported models. This opens up a lot of doors for attention tricks/scheduling/different attention in different blocks without massive hacks/monkey patches.

All available attention types are also registered so it can be tracked/chosen easily in the future - flash attn/sage attention not being available will only cause immediate exit if the corresponding --use-flash-attention or --use-sage-attention arguments are used. Otherwise, the attention types will still be registered but only used upon request in code.

More changes are coming in the future regarding models tracking the current block index in transformer_options.

Changes went through two rounds of QA + my personal testing. No observable slowdowns were noticed.

…down all the code paths where transformer_options would need to be added
…o load SageAttention and FlashAttention if not enabled so that they can be marked as available or not, create registry for available attention
…ve a dropdown with available attention (this is a test node only)
@Kosinkadink Kosinkadink added the Core Core team dependency label Aug 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Core team dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant