Skip to content

Conversation

jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Sep 12, 2025

Purpose

See: #24112 (comment)
The tuned configuration will include the triton_version, like:

{   "triton_version": "3.4.0",
    "1": {
        "BLOCK_SIZE_M": 16,
        "BLOCK_SIZE_N": 32,
        "BLOCK_SIZE_K": 64,
        "GROUP_SIZE_M": 1,
        "num_warps": 4,
        "num_stages": 4
    },
    "2": {
        "BLOCK_SIZE_M": 16,
        "BLOCK_SIZE_N": 128,
        "BLOCK_SIZE_K": 64,
        "GROUP_SIZE_M": 1,
        "num_warps": 4,
        "num_stages": 3
    },
    "4": {
        "BLOCK_SIZE_M": 16,
        "BLOCK_SIZE_N": 128,
        "BLOCK_SIZE_K": 64,
        "GROUP_SIZE_M": 1,
        "num_warps": 4,
        "num_stages": 3
    },
}

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <[email protected]>
@jeejeelee jeejeelee requested a review from simon-mo September 12, 2025 17:58
@mergify mergify bot added the performance Performance-related issues label Sep 12, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the Triton version to the tuned MoE configuration files. The changes involve updating the benchmark script to save the version and modifying the MoE layer to handle this new metadata when loading configurations. The version of the Triton placeholder is also updated.

My review identifies a potential robustness issue in how the configuration is loaded. The current approach of explicitly removing the triton_version key is brittle. I've suggested a more resilient method that filters for numeric keys, which will prevent crashes if other metadata is added to the configuration files in the future.

@@ -720,7 +720,10 @@ def get_moe_configs(
logger.info("Using configuration from %s for MoE layer.",
config_file_path)
# If a configuration has been found, return it
return {int(key): val for key, val in json.load(f).items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The proposed change to explicitly pop triton_version is brittle. If other metadata is added to the configuration file in the future, this code will fail with a ValueError when trying to convert a non-integer string key to an integer.

A more robust approach is to filter for keys that are valid integers. By checking key.isdigit(), you can automatically ignore any non-numeric keys, including triton_version and any future metadata, making the code more resilient to future changes.

Suggested change
return {int(key): val for key, val in json.load(f).items()}
return {int(key): val for key, val in json.load(f).items() if key.isdigit()}

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be similar tuning done for fp8 gemm? I'm not super sure though but I recall the config is different. cc @mgoin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants