Skip to content

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Sep 9, 2025

Purpose

  • Support group quantization of activations
    • This will be used for kv cache/ attention quantization experiments
    • This could be used to kv cache/ attention quantization kernels
    • This could be used in the future for fused activation/group quantization kernels

Prerequisites

Changes

  • Remove hard-coding around 2d weight shapes for group quantization
  • Remove safe_permute, which didn't support dim < 0 as argument
    • Deprecate safe_permute in favor of Tensor.index_select

Testing

  • Updated safe permute test with more coverage
  • Tested safe permute with torch==2.7.1 (2.7.0 is the lowest version supported by LLM Compressor and vLLM)
  • TODO: test e2e with LC
  • TODO: add more tests

@kylesayrs kylesayrs force-pushed the kylesayrs/group-activation-quantization branch from ce45326 to 26d449c Compare September 9, 2025 01:04
@kylesayrs kylesayrs changed the base branch from main to kylesayrs/deprecated-update September 9, 2025 01:09
@kylesayrs kylesayrs force-pushed the kylesayrs/group-activation-quantization branch from 26d449c to d06d093 Compare September 9, 2025 01:09
@kylesayrs kylesayrs changed the title [Quantization] Static group activation quantization [Quantization] Static group activation quantization, Deprecate safe permute Sep 9, 2025
@kylesayrs kylesayrs force-pushed the kylesayrs/group-activation-quantization branch from 3e69d69 to ac9d74e Compare September 9, 2025 13:13
@kylesayrs kylesayrs force-pushed the kylesayrs/deprecated-update branch from f7801f2 to a716206 Compare September 9, 2025 13:15
@kylesayrs kylesayrs force-pushed the kylesayrs/group-activation-quantization branch from ac9d74e to 35f8d46 Compare September 9, 2025 13:15
Copy link
Contributor

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch with safe_permute. Added a couple questions below.

Base automatically changed from kylesayrs/deprecated-update to kylesayrs/loguru September 10, 2025 13:34
Base automatically changed from kylesayrs/loguru to main September 11, 2025 18:25
@kylesayrs kylesayrs force-pushed the kylesayrs/group-activation-quantization branch 2 times, most recently from fde779c to 1c217e4 Compare September 11, 2025 19:13
@kylesayrs kylesayrs changed the base branch from main to kylesayrs/deprecate-safe-permute September 11, 2025 19:13
@kylesayrs kylesayrs changed the title [Quantization] Static group activation quantization, Deprecate safe permute [Quantization] Refactor initialize for readability, Static group activation quantization Sep 11, 2025
Base automatically changed from kylesayrs/deprecate-safe-permute to main September 12, 2025 14:37
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/group-activation-quantization branch from d53ba36 to 3de48fe Compare September 17, 2025 15:37
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants