Skip to content

Conversation

levunet
Copy link

@levunet levunet commented Sep 16, 2025

Purpose

This PR requires the following PRs to be merged first: #24768 and the harmony lib PR (openai/harmony#76).

The changes include adding 'with_recipient' and the Assistant's 'analysis' content.
Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools.

Test Plan

gpt-oss_test.py
messages.txt

Run python3 gpt-oss_test.py about 10 times.

Test Result

(before)
image

(after applying the harmony lib changes)
image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a bug in gpt-oss model tool calls by adding 'analysis' content and with_recipient. The changes are logical and align with the stated purpose. However, I've identified a high-severity issue where the handling of 'analysis' content is not robust for multimodal inputs, which could lead to a runtime crash. I have provided a code suggestion to address this.

Comment on lines +216 to +219
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to extract content is not robust. The content of an assistant message can be a list of parts (e.g., for multimodal inputs), not just a string. The current implementation content = chat_msg.get("content") or "" will cause a runtime error if content is a non-empty list, as Message.from_role_and_content expects a string. The code should handle the case where content is a list by extracting the text parts, similar to how it's handled for other message types in this file.

        content = chat_msg.get("content")
        if isinstance(content, list):
            # Extract text from multimodal content
            content = "\n".join(
                p.get("text", "") for p in content
                if isinstance(p, dict) and p.get("type") == "text")
        elif not isinstance(content, str):
            content = ""

        analysis_msg = Message.from_role_and_content(Role.ASSISTANT, content)
        analysis_msg = analysis_msg.with_channel("analysis")
        msgs.append(analysis_msg)

@levunet levunet force-pushed the feat/gptoss-tool-fix branch 2 times, most recently from c3780bf to eed8815 Compare September 16, 2025 09:19
@levunet
Copy link
Author

levunet commented Sep 16, 2025

@alecsolder
In the harmony library, the ' to' value should have been encoded as id 316, but occasionally it was incorrectly encoded as a mixture of 220+935 id values, which caused the model to output incorrect tokens.

The changes include adding 'with_recipient' and the Assistant's 'analysis' content.
Without adding this content, there was an issue where the gpt-oss model had a higher probability of outputting abnormal tokens when calling tools.

Signed-off-by: kyt <[email protected]>
@alecsolder
Copy link
Contributor

alecsolder commented Sep 18, 2025

In your test script, I see you're using the streaming completions endpoint, which I don't think uses the harmony_utils method you modified? I just want to double check I'm reading it right

@alecsolder
Copy link
Contributor

Also, thanks for mentioning that huggingface tokenizer change, I hadn't seen it and my snapshot was out of date!

@levunet
Copy link
Author

levunet commented Sep 19, 2025

Thank you for checking. I've double-checked the part you mentioned.
I can confirm that the parse_chat_input I modified in the harmony_utils.py file is being used without any issues through the flow: /v1/chat/completions -> def create_chat_completion -> def _make_request_with_harmony -> def parse_chat_input to parse request.messages.
The streaming endpoint is also correctly using this modified method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend gpt-oss Related to GPT-OSS models
Projects
Status: To Triage
Development

Successfully merging this pull request may close these issues.

2 participants