Skip to content

Conversation

kingsmad
Copy link

@kingsmad kingsmad commented Sep 12, 2025

Summary:

Pass the tool call turn to engine thus we can collect cache stats for turn > 0 in following changes.

Test Plan:
run vllm locally, with debugging log, we can see the turn is successfully passed in

(EngineCore_0 pid=910425) INFO 09-09 00:55:16 [kv_cache_manager.py:185] ======kingsmad: current turn is request.toolcall_turn=3
(EngineCore_0 pid=910425) INFO 09-09 00:55:16 [kv_cache_manager.py:186] ======kingsmad: writing stats as hits self.prefix_cache_stats.toolcall_non_1st_turn_hits=912

Reviewers: @yeqcharlotte

Subscribers:

Tasks:

Tags:

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models v1 labels Sep 12, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a toolcall_turn parameter to track turn numbers in tool-calling conversations, enabling more detailed prefix cache statistics. The parameter is correctly passed through the necessary layers to the KV cache manager, where it's used to collect stats for subsequent tool-call turns. The implementation is sound. I have one minor style suggestion to align with Python best practices.

kingsmad and others added 2 commits September 12, 2025 16:18
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Juechen Liu <[email protected]>
@@ -56,6 +56,7 @@ def generate(
lora_request: Optional[LoRARequest] = None,
trace_headers: Optional[Mapping[str, str]] = None,
priority: int = 0,
toolcall_turn: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it's a good idea to expose "toolcall" as a concept to engine and it's better to let engine focus on single turn scheduling while we can influence its scheduling behavior though setting priority.

is it possible to prevent this? like introducing some api server level stats?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense thanks! Let me close this PR for now and have more discussions.

@github-project-automation github-project-automation bot moved this from Backlog to In progress in gpt-oss Issues & Enhancements Sep 14, 2025
@kingsmad kingsmad closed this Sep 14, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in gpt-oss Issues & Enhancements Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend gpt-oss Related to GPT-OSS models v1
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants