-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[Bug] Fix gpt-oss missing tool content #24954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a bug in gpt-oss
model tool calls by adding 'analysis' content and with_recipient
. The changes are logical and align with the stated purpose. However, I've identified a high-severity issue where the handling of 'analysis' content is not robust for multimodal inputs, which could lead to a runtime crash. I have provided a code suggestion to address this.
vllm/entrypoints/harmony_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to extract content
is not robust. The content
of an assistant message can be a list of parts (e.g., for multimodal inputs), not just a string. The current implementation content = chat_msg.get("content") or ""
will cause a runtime error if content
is a non-empty list, as Message.from_role_and_content
expects a string. The code should handle the case where content
is a list by extracting the text parts, similar to how it's handled for other message types in this file.
content = chat_msg.get("content")
if isinstance(content, list):
# Extract text from multimodal content
content = "\n".join(
p.get("text", "") for p in content
if isinstance(p, dict) and p.get("type") == "text")
elif not isinstance(content, str):
content = ""
analysis_msg = Message.from_role_and_content(Role.ASSISTANT, content)
analysis_msg = analysis_msg.with_channel("analysis")
msgs.append(analysis_msg)
c3780bf
to
eed8815
Compare
@alecsolder |
The changes include adding 'with_recipient' and the Assistant's 'analysis' content. Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools. Signed-off-by: kyt <[email protected]>
eed8815
to
39d6e8e
Compare
In your test script, I see you're using the streaming completions endpoint, which I don't think uses the harmony_utils method you modified? I just want to double check I'm reading it right |
Also, thanks for mentioning that huggingface tokenizer change, I hadn't seen it and my snapshot was out of date! |
Thank you for checking. I've double-checked the part you mentioned. |
Purpose
This PR requires the following PRs to be merged first: #24768 and the harmony lib PR (openai/harmony#76).
The changes include adding 'with_recipient' and the Assistant's 'analysis' content.
Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools.
Test Plan
gpt-oss_test.py
messages.txt
Run python3 gpt-oss_test.py about 10 times.
Test Result
(before)

(after applying the harmony lib changes)
