Skip to content

Conversation

qandrew
Copy link
Contributor

@qandrew qandrew commented Sep 12, 2025

Purpose

Test Plan

curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Hello."
        }
    ],
    "temperature": 0.7,
    "max_output_tokens": 256,
    "stream": true
}'

Test Result

Note: the item id, content_index changes as expected

event: response.output_item.added
data: {"item":{"id":"msg_81a71aae8e5f4af0a7c5d2f0e6bc6324","summary":[],"type":"reasoning","content":null,"encrypted_content":null,"status":"in_progress"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}

event: response.content_part.added
data: {"content_index":0,"item_id":"msg_81a71aae8e5f4af0a7c5d2f0e6bc6324","output_index":0,"part":{"annotations":[],"text":"","type":"output_text","logprobs":[]},"sequence_number":3,"type":"response.content_part.added"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"The","item_id":"msg_81a71aae8e5f4af0a7c5d2f0e6bc6324","output_index":0,"sequence_number":4,"type":"response.reasoning_text.delta"}

event: response.output_item.done
data: {"item":{"id":"msg_81a71aae8e5f4af0a7c5d2f0e6bc6324","summary":[],"type":"reasoning","content":[{"text":"The user says \"Hello.\" Likely they want a greeting. We respond politely.","type":"reasoning_text"}],"encrypted_content":null,"status":"completed"},"output_index":1,"sequence_number":22,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"id":"msg_c8b805f6ff3b42948a1debd89a2961eb","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":1,"sequence_number":23,"type":"response.output_item.added"}

event: response.content_part.added
data: {"content_index":1,"item_id":"msg_c8b805f6ff3b42948a1debd89a2961eb","output_index":1,"part":{"annotations":[],"text":"","type":"output_text","logprobs":[]},"sequence_number":24,"type":"response.content_part.added"}

event: response.output_text.delta
data: {"content_index":1,"delta":"Hello","item_id":"msg_c8b805f6ff3b42948a1debd89a2961eb","logprobs":[],"output_index":1,"sequence_number":25,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":1,"delta":"!","item_id":"msg_c8b805f6ff3b42948a1debd89a2961eb","logprobs":[],"output_index":1,"sequence_number":26,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":1,"delta":" How","item_id":"msg_c8b805f6ff3b42948a1debd89a2961eb","logprobs":[],"output_index":1,"sequence_number":27,"type":"response.output_text.delta"}

...

OAI example:

ResponseCreatedEvent(response=Response(id='resp_68bf41ccb7f881a3b89e1bbc39dca02008013d49ada9bc0a', created_at=1757364684.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-2025-08-07', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort='medium', generate_summary=None, summary='detailed'), safety_identifier=None, service_tier='auto', status='in_progress', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=None, user=None, store=True), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_68bf41ccb7f881a3b89e1bbc39dca02008013d49ada9bc0a', created_at=1757364684.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-2025-08-07', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort='medium', generate_summary=None, summary='detailed'), safety_identifier=None, service_tier='auto', status='in_progress', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=None, user=None, store=True), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseReasoningSummaryPartAddedEvent(item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, part=Part(text='', type='summary_text'), sequence_number=3, summary_index=0, type='response.reasoning_summary_part.added')
ResponseReasoningSummaryTextDeltaEvent(delta='**Multip', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=4, summary_index=0, type='response.reasoning_summary_text.delta', obfuscation='gunpSbbS')
ResponseReasoningSummaryTextDeltaEvent(delta='lying', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=5, summary_index=0, type='response.reasoning_summary_text.delta', obfuscation='Y5Doun05WUq')
ResponseReasoningSummaryTextDeltaEvent(delta=' with', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=6, summary_index=0, type='response.reasoning_summary_text.delta', obfuscation='fIXeF6P3dpv')
ResponseReasoningSummaryTextDeltaEvent(delta=' precision', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=7, summary_index=0, type='response.reasoning_summary_text.delta', obfuscation='EXuwuv')
...
ResponseReasoningSummaryTextDoneEvent(item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=105, summary_index=0, text='**Multiplying with precision**\n\nI need to multiply 3.2342 by 233.1123123. The user likely meant "multiply" when they wrote "multiple," so I’ll compute the product accurately. \n\nI\'ll start by using standard multiplication methods and breaking down the calculations into parts. First, I\'ll calculate b * 3 and then b * 0.2342. After summing up everything, I arrive at approximately 753.9318. Now I should double-check my work to ensure the accuracy of this result.', type='response.reasoning_summary_text.done')
ResponseReasoningSummaryPartDoneEvent(item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, part=Part(text='**Multiplying with precision**\n\nI need to multiply 3.2342 by 233.1123123. The user likely meant "multiply" when they wrote "multiple," so I’ll compute the product accurately. \n\nI\'ll start by using standard multiplication methods and breaking down the calculations into parts. First, I\'ll calculate b * 3 and then b * 0.2342. After summing up everything, I arrive at approximately 753.9318. Now I should double-check my work to ensure the accuracy of this result.', type='summary_text'), sequence_number=106, summary_index=0, type='response.reasoning_summary_part.done')
ResponseReasoningSummaryPartAddedEvent(item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, part=Part(text='', type='summary_text'), sequence_number=107, summary_index=1, type='response.reasoning_summary_part.added')
ResponseReasoningSummaryTextDeltaEvent(delta='**Ver', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=108, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='fAEej8BrrXI')
ResponseReasoningSummaryTextDeltaEvent(delta='ifying', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=109, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='y3VTtzA8kO')
ResponseReasoningSummaryTextDeltaEvent(delta=' multiplication', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=110, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='O')
ResponseReasoningSummaryTextDeltaEvent(delta=' accuracy', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=111, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='8EOQcwh')
ResponseReasoningSummaryTextDeltaEvent(delta='**\n\nI', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=112, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='PNZJtO5JRJd')
ResponseReasoningSummaryTextDeltaEvent(delta='’m', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=113, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='bD9H4GV0ImvosW')
ResponseReasoningSummaryTextDeltaEvent(delta=' considering', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=114, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='q7tS')
ResponseReasoningSummaryTextDeltaEvent(delta=' using', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=115, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='ozdIa1egB2')
ResponseReasoningSummaryTextDeltaEvent(delta=' Python', item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, sequence_number=116, summary_index=1, type='response.reasoning_summary_text.delta', obfuscation='ZqZS6kqzp')
...
ResponseReasoningSummaryPartDoneEvent(item_id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', output_index=0, part=Part(text='**Determining significant digits**\n\nI can confirm that the multiplication of \\( 3.2342 \\) (4 decimal places) and \\( 233.1123123 \\) (7 decimal places) results in a number with at most 11 decimal places. My computed result, \\( 753.93184044066 \\), indeed has 11 digits after the decimal. \n\nI should correct the user’s phrasing of "multiple" to "multiply." It’s good to be concise, so I’ll present the product clearly, maybe with a brief explanation or a one-liner to keep it straightforward. Let\'s go ahead and show that computed result!', type='summary_text'), sequence_number=371, summary_index=2, type='response.reasoning_summary_part.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='rs_68bf41cd40c081a3a772340ccea427bf08013d49ada9bc0a', summary=[Summary(text='**Multiplying with precision**\n\nI need to multiply 3.2342 by 233.1123123. The user likely meant "multiply" when they wrote "multiple," so I’ll compute the product accurately. \n\nI\'ll start by using standard multiplication methods and breaking down the calculations into parts. First, I\'ll calculate b * 3 and then b * 0.2342. After summing up everything, I arrive at approximately 753.9318. Now I should double-check my work to ensure the accuracy of this result.', type='summary_text'), Summary(text='**Verifying multiplication accuracy**\n\nI’m considering using Python for high-precision calculations of \\( a \\times b \\). First, I’ll confirm the previous result: \\( 3.2342 \\times 233.1123123 \\). I recalled that \\( 233.1123123 \\times 3 \\) equals approximately \\( 699.3369369 \\), which checks out. \n\nNow, I’ll look closely at calculating \\( 0.2342 \\times b \\). A direct computation gives around \\( 54.59490354066 \\), and it matches my previous sum. The total is accurately \\( 753.93184044066 \\), which I’ll present as the final rounded result.', type='summary_text'), Summary(text='**Determining significant digits**\n\nI can confirm that the multiplication of \\( 3.2342 \\) (4 decimal places) and \\( 233.1123123 \\) (7 decimal places) results in a number with at most 11 decimal places. My computed result, \\( 753.93184044066 \\), indeed has 11 digits after the decimal. \n\nI should correct the user’s phrasing of "multiple" to "multiply." It’s good to be concise, so I’ll present the product clearly, maybe with a brief explanation or a one-liner to keep it straightforward. Let\'s go ahead and show that computed result!', type='summary_text')], type='reasoning', content=None, encrypted_content=None, status=None), output_index=0, sequence_number=372, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseOutputMessage(id='msg_68bf41e9d4f481a3a52a8532f944df3808013d49ada9bc0a', content=[], role='assistant', status='in_progress', type='message'), output_index=1, sequence_number=373, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='msg_68bf41e9d4f481a3a52a8532f944df3808013d49ada9bc0a', output_index=1, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=374, type='response.content_part.added')
ResponseTextDeltaEvent(content_index=0, delta='3', item_id='msg_68bf41e9d4f481a3a52a8532f944df3808013d49ada9bc0a', logprobs=[], output_index=1, sequence_number=375, type='response.output_text.delta', obfuscation='5UpTHBZr8ovDHaN')
ResponseTextDeltaEvent(content_index=0, delta='.', item_id='msg_68bf41e9d4f481a3a52a8532f944df3808013d49ada9bc0a', logprobs=[], output_index=1, sequence_number=376, type='response.output_text.delta', obfuscation='M130VTwll5a8aEc')
ResponseTextDeltaEvent(content_index=0, delta='234', item_id='msg_68bf41e9d4f481a3a52a8532f944df3808013d49ada9bc0a', logprobs=[], output_index=1, sequence_number=377, type='response.output_text.delta', obfuscation='An2UEHoufSxRq')
ResponseTextDeltaEvent(content_index=0, delta='2', item_id='msg_68bf41e9d4f481a3a52a8532f944df3808013d49ada9bc0a', logprobs=[], output_index=1, sequence_number=378, type='response.output_text.delta', obfuscation='Lt6gv4l7d2FzIxv')

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 12, 2025
Signed-off-by: Andrew Xia <[email protected]>
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-ids branch from 4beb7e5 to e366b57 Compare September 12, 2025 23:43
@qandrew qandrew marked this pull request as ready for review September 12, 2025 23:56
Signed-off-by: Andrew Xia <[email protected]>
)

current_item_id = ""
current_content_index = -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case doesn't capture multiple subsequent streaming items?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i'm not sure if i understand your question? I haven't enabled the tool calling for streaming yet, so currently we're only testing reasoningOutput -> finalOutput items.

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a sample response from the OpenAI online service?

@qandrew
Copy link
Contributor Author

qandrew commented Sep 15, 2025

Could you provide a sample response from the OpenAI online service?

yep, added in the description

@qandrew qandrew changed the title [gpt-oss] streaming add item id, content id [gpt-oss][1b] streaming add item id, content id Sep 15, 2025
Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

It looks consistent with OpenAI’s format.

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 16, 2025
@chaunceyjiang chaunceyjiang self-assigned this Sep 16, 2025
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@zou3519 zou3519 enabled auto-merge (squash) September 16, 2025 17:07
@zou3519 zou3519 merged commit f4d6eb9 into vllm-project:main Sep 16, 2025
44 checks passed
frank-wei pushed a commit to frank-wei/vllm that referenced this pull request Sep 23, 2025
langc23 pushed a commit to zte-riscv/vllm that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants