Skip to content

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Sep 11, 2025

Background

In that past, it was noticed that value[:, perm] would throw an error for the FP8 dtype because "RuntimeError: "index_cuda" not implemented for 'Float8_e4m3fn".safe_permute was used to avoid calling index_cuda for the FP8 dtype.

However, it has been discovered that Tensor.index_select achieves the same effect without needing to call index_cuda. Tensor.index_select is superior because it supports negative values for dim, unlike safe_permute.

Purpose

  • Enable group activation quantization by enabling permutations to be applied to the last dimension. safe_permute did not support negative dim values
  • Remove unnecessary code (I implemented this function before I was aware that Tensor.index_select would achieve the same effect

Changes

  • Deprecate safe_permute in favor of Tensor.index_select
  • Updated test to also test meta device and to be easier to read

Testing

  • Regression and new test passes
  • Tested safe permute with torch==2.7.1 (2.7.0 is the lowest version supported by LLM Compressor and vLLM)

Signed-off-by: Kyle Sayers <[email protected]>
Copy link
Contributor

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@kylesayrs kylesayrs merged commit 42363c3 into main Sep 12, 2025
2 checks passed
@kylesayrs kylesayrs deleted the kylesayrs/deprecate-safe-permute branch September 12, 2025 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants