Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Sep 13, 2025

Currently, broadcast_pp_output complicates the logic in the common non-PP path.
This PR simplifies the logic a bit.

@mergify mergify bot added the v1 label Sep 13, 2025
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a nice simplification to the execute_model method in GPUModelRunner. By moving the broadcast_pp_output flag to the __init__ method and restructuring the logic to handle the common (non-broadcast) and rare (broadcast) paths separately, the code becomes much clearer and easier to follow. This refactoring also correctly handles kv_connector_output for pooling models and appears to fix a latent bug where logits were not updated on non-last pipeline parallel ranks after being broadcast. The changes are well-contained and improve both readability and correctness. I have no further suggestions.

model_output_broadcast_data = {
"logits": logits.contiguous(),
} if logits is not None else {}
else:
Copy link
Member

@njhill njhill Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought, could we put the else logic in a different function to be even less intrusive to the common path?

@WoosukKwon WoosukKwon merged commit 3e903b6 into main Sep 14, 2025
58 checks passed
@WoosukKwon WoosukKwon deleted the woosuk/minor-simpl-pool branch September 14, 2025 00:41
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025
845473182 pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
…to loader

* 'loader' of https://github.com/dsxsteven/vllm_splitPR: (123 commits)
  [Hybrid Allocator] Support Pipeline Parallel (vllm-project#23974)
  [Spec Decoding]Support Spec Decoding Metrics in DP Mode (vllm-project#24049)
  [Chore] Remove ipex_ops warning (vllm-project#24835)
  Force use C++17 globally to avoid compilation error (vllm-project#24823)
  [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (vllm-project#24819)
  fix type of sampling rate for encode_base64 (vllm-project#24826)
  [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (vllm-project#24783)
  [Misc] Improve `s3_utils` type hints with `BaseClient` (vllm-project#24825)
  [Multi Modal][Performance] Fused Q,K's apply_rope into one (vllm-project#24511)
  [Chore] Minor simplification for non-PP path (vllm-project#24810)
  [Minor] Simplify duplicative device check for cuda (vllm-project#24793)
  Remove redundant assignment in xfer_buffers, This is a little fix (vllm-project#24732)
  [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (vllm-project#24771)
  [Doc]: fix typos in various files (vllm-project#24798)
  [Misc] Correct an outdated comment. (vllm-project#24765)
  [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750)
  [Core][Multimodal] Cache `supports_kw` (vllm-project#24773)
  [Kernels][DP/EP] Optimize Silu Kernel for R1 (vllm-project#24054)
  [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (vllm-project#24757)
  [Doc]: Remove 404 hyperlinks (vllm-project#24785)
  ...
dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
bbartels pushed a commit to bbartels/vllm that referenced this pull request Sep 15, 2025
cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025
cboss6 pushed a commit to cboss6/vllm that referenced this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants