Skip to content

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Sep 12, 2025

Purpose

Also possibly found the culprit for the blackwell cutlass mla failing test https://buildkite.com/vllm/ci/builds/30554/steps/canvas?jid=01993edf-720e-4749-81eb-da58099b7c78

E       RuntimeError: _C::sm100_cutlass_mla_decode() expected at most 9 argument(s) but received 10 argument(s). Declaration: _C::sm100_cutlass_mla_decode(Tensor($0! -> ) out, Tensor q_nope, Tensor q_pe, Tensor kv_c_and_k_pe_cache, Tensor seq_lens, Tensor page_table, Tensor workspace, float scale, int num_kv_splits) -> ()

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a CI failure in test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe by ensuring that the dequantized reference weight tensors are moved to the correct CUDA device. The change is correct and resolves the device mismatch issue. I have provided suggestions to consolidate the tensor conversion and device placement calls for improved code clarity and efficiency.

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI labels Sep 12, 2025
Copy link
Collaborator

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

w2_ref = dequant_mxfp4_batches(
w2_q.view(torch.uint8),
w2_scale.view(torch.uint8).reshape(-1)).to(torch.float32).reshape(
num_experts, hidden_size, intermediate_size)
num_experts, hidden_size, intermediate_size).to(device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if we don't add the to(device) here?

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 13, 2025 07:10
@DarkLight1337 DarkLight1337 merged commit 59d7ffc into vllm-project:main Sep 13, 2025
81 of 82 checks passed
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025
shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Sep 15, 2025
845473182 pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
…to loader

* 'loader' of https://github.com/dsxsteven/vllm_splitPR: (123 commits)
  [Hybrid Allocator] Support Pipeline Parallel (vllm-project#23974)
  [Spec Decoding]Support Spec Decoding Metrics in DP Mode (vllm-project#24049)
  [Chore] Remove ipex_ops warning (vllm-project#24835)
  Force use C++17 globally to avoid compilation error (vllm-project#24823)
  [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (vllm-project#24819)
  fix type of sampling rate for encode_base64 (vllm-project#24826)
  [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (vllm-project#24783)
  [Misc] Improve `s3_utils` type hints with `BaseClient` (vllm-project#24825)
  [Multi Modal][Performance] Fused Q,K's apply_rope into one (vllm-project#24511)
  [Chore] Minor simplification for non-PP path (vllm-project#24810)
  [Minor] Simplify duplicative device check for cuda (vllm-project#24793)
  Remove redundant assignment in xfer_buffers, This is a little fix (vllm-project#24732)
  [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (vllm-project#24771)
  [Doc]: fix typos in various files (vllm-project#24798)
  [Misc] Correct an outdated comment. (vllm-project#24765)
  [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750)
  [Core][Multimodal] Cache `supports_kw` (vllm-project#24773)
  [Kernels][DP/EP] Optimize Silu Kernel for R1 (vllm-project#24054)
  [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (vllm-project#24757)
  [Doc]: Remove 404 hyperlinks (vllm-project#24785)
  ...
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure Issue about an unexpected test failure in CI ready ONLY add when PR is ready to merge/full CI is needed
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants