D82005826: [vllm][gptoss] pass toolcall turn to kv cache mgr #24787

kingsmad · 2025-09-12T23:16:24Z

Summary:

Pass the tool call turn to engine thus we can collect cache stats for turn > 0 in following changes.

Test Plan:
run vllm locally, with debugging log, we can see the turn is successfully passed in

(EngineCore_0 pid=910425) INFO 09-09 00:55:16 [kv_cache_manager.py:185] ======kingsmad: current turn is request.toolcall_turn=3
(EngineCore_0 pid=910425) INFO 09-09 00:55:16 [kv_cache_manager.py:186] ======kingsmad: writing stats as hits self.prefix_cache_stats.toolcall_non_1st_turn_hits=912

Reviewers: @yeqcharlotte

Subscribers:

Tasks:

Tags:

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

github-actions · 2025-09-12T23:16:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a toolcall_turn parameter to track turn numbers in tool-calling conversations, enabling more detailed prefix cache statistics. The parameter is correctly passed through the necessary layers to the KV cache manager, where it's used to collect stats for subsequent tool-call turns. The implementation is sound. I have one minor style suggestion to align with Python best practices.

vllm/v1/core/kv_cache_manager.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Juechen Liu <[email protected]>

yeqcharlotte · 2025-09-14T00:28:48Z

vllm/engine/protocol.py

@@ -56,6 +56,7 @@ def generate(
        lora_request: Optional[LoRARequest] = None,
        trace_headers: Optional[Mapping[str, str]] = None,
        priority: int = 0,
+        toolcall_turn: Optional[int] = None,


i don't think it's a good idea to expose "toolcall" as a concept to engine and it's better to let engine focus on single turn scheduling while we can influence its scheduling behavior though setting priority.

is it possible to prevent this? like introducing some api server level stats?

make sense thanks! Let me close this PR for now and have more discussions.

D82005826

1757a7d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

kingsmad requested review from heheda12345, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, aarnphm and chaunceyjiang as code owners September 12, 2025 23:16

mergify bot added frontend gpt-oss Related to GPT-OSS models v1 labels Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

kingsmad and others added 2 commits September 12, 2025 16:18

Update vllm/v1/core/kv_cache_manager.py

b7e2d35

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Juechen Liu <[email protected]>

Merge branch 'vllm-project:main' into main

10b43ae

yeqcharlotte added this to gpt-oss Issues & Enhancements Sep 14, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte moved this from To Triage to Backlog in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte suggested changes Sep 14, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to In progress in gpt-oss Issues & Enhancements Sep 14, 2025

kingsmad closed this Sep 14, 2025

github-project-automation bot moved this from In progress to Done in gpt-oss Issues & Enhancements Sep 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

D82005826: [vllm][gptoss] pass toolcall turn to kv cache mgr #24787

D82005826: [vllm][gptoss] pass toolcall turn to kv cache mgr #24787

kingsmad commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

yeqcharlotte Sep 14, 2025

Uh oh!

kingsmad Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

D82005826: [vllm][gptoss] pass toolcall turn to kv cache mgr #24787

D82005826: [vllm][gptoss] pass toolcall turn to kv cache mgr #24787

Conversation

kingsmad commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yeqcharlotte Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

kingsmad Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kingsmad commented Sep 12, 2025 •

edited by github-actions bot

Loading