[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe #24750

mgoin · 2025-09-12T15:10:21Z

Purpose

Also possibly found the culprit for the blackwell cutlass mla failing test https://buildkite.com/vllm/ci/builds/30554/steps/canvas?jid=01993edf-720e-4749-81eb-da58099b7c78

E       RuntimeError: _C::sm100_cutlass_mla_decode() expected at most 9 argument(s) but received 10 argument(s). Declaration: _C::sm100_cutlass_mla_decode(Tensor($0! -> ) out, Tensor q_nope, Tensor q_pe, Tensor kv_c_and_k_pe_cache, Tensor seq_lens, Tensor page_table, Tensor workspace, float scale, int num_kv_splits) -> ()

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a CI failure in test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe by ensuring that the dequantized reference weight tensors are moved to the correct CUDA device. The change is correct and resolves the device mismatch issue. I have provided suggestions to consolidate the tensor conversion and device placement calls for improved code clarity and efficiency.

tests/kernels/moe/test_mxfp4_moe.py

Signed-off-by: mgoin <[email protected]>

yewentao256

Thanks for the work!

yewentao256 · 2025-09-12T22:04:04Z

tests/kernels/moe/test_mxfp4_moe.py

    w2_ref = dequant_mxfp4_batches(
        w2_q.view(torch.uint8),
        w2_scale.view(torch.uint8).reshape(-1)).to(torch.float32).reshape(
-            num_experts, hidden_size, intermediate_size)
+            num_experts, hidden_size, intermediate_size).to(device)


What will happen if we don't add the to(device) here?

Signed-off-by: mgoin <[email protected]>

…project#24750) Signed-off-by: mgoin <[email protected]>

Signed-off-by: mgoin <[email protected]>

…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (123 commits) [Hybrid Allocator] Support Pipeline Parallel (vllm-project#23974) [Spec Decoding]Support Spec Decoding Metrics in DP Mode (vllm-project#24049) [Chore] Remove ipex_ops warning (vllm-project#24835) Force use C++17 globally to avoid compilation error (vllm-project#24823) [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (vllm-project#24819) fix type of sampling rate for encode_base64 (vllm-project#24826) [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (vllm-project#24783) [Misc] Improve `s3_utils` type hints with `BaseClient` (vllm-project#24825) [Multi Modal][Performance] Fused Q,K's apply_rope into one (vllm-project#24511) [Chore] Minor simplification for non-PP path (vllm-project#24810) [Minor] Simplify duplicative device check for cuda (vllm-project#24793) Remove redundant assignment in xfer_buffers, This is a little fix (vllm-project#24732) [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (vllm-project#24771) [Doc]: fix typos in various files (vllm-project#24798) [Misc] Correct an outdated comment. (vllm-project#24765) [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750) [Core][Multimodal] Cache `supports_kw` (vllm-project#24773) [Kernels][DP/EP] Optimize Silu Kernel for R1 (vllm-project#24054) [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (vllm-project#24757) [Doc]: Remove 404 hyperlinks (vllm-project#24785) ...

…project#24750) Signed-off-by: mgoin <[email protected]>

…project#24750) Signed-off-by: mgoin <[email protected]> Signed-off-by: bbartels <[email protected]>

Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe

e776c41

Signed-off-by: mgoin <[email protected]>

mgoin requested review from tlrmchlsmth, WoosukKwon and yewentao256 as code owners September 12, 2025 15:10

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

tests/kernels/moe/test_mxfp4_moe.py Show resolved Hide resolved

tests/kernels/moe/test_mxfp4_moe.py Show resolved Hide resolved

elvircrn mentioned this pull request Sep 12, 2025

[Kernels][DP/EP] Optimize Silu Kernel for R1 #24054

Merged

5 tasks

afeldman-nm mentioned this pull request Sep 12, 2025

[CI] Speed up model unit tests in CI #24253

Merged

mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI labels Sep 12, 2025

github-project-automation bot added this to CI Failures Sep 12, 2025

mgoin added 2 commits September 12, 2025 18:37

Merge branch 'main' into fix-blackwell-mxfp4-test

2243787

Fix dummy def for sm100_cutlass_mla_decode

f84512c

Signed-off-by: mgoin <[email protected]>

yewentao256 reviewed Sep 12, 2025

View reviewed changes

DarkLight1337 approved these changes Sep 13, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 13, 2025 07:10

DarkLight1337 merged commit 59d7ffc into vllm-project:main Sep 13, 2025
81 of 82 checks passed

github-project-automation bot moved this to Done in CI Failures Sep 13, 2025

simon-mo pushed a commit that referenced this pull request Sep 13, 2025

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750)

26b999c

Signed-off-by: mgoin <[email protected]>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-…

e3a8b3f

…project#24750) Signed-off-by: mgoin <[email protected]>

shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Sep 15, 2025

Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750)

a6f7995

Signed-off-by: mgoin <[email protected]>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-…

29ec475

…project#24750) Signed-off-by: mgoin <[email protected]>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-…

874c8dd

…project#24750) Signed-off-by: mgoin <[email protected]> Signed-off-by: bbartels <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe #24750

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe #24750

mgoin commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

yewentao256 Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe #24750

[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe #24750

Conversation

mgoin commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

yewentao256 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mgoin commented Sep 12, 2025 •

edited by github-actions bot

Loading