-
Notifications
You must be signed in to change notification settings - Fork 36
Add OpenAiClient stream handling, update types #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small initial comments, thanks for the addition!
The stream error handling probably needs more work, we can do this as part of the orchestrator chat implementation
Let’s make sure to note this as part of #192 or whatever appropriate issue?
a4986e4
to
5d9b9bf
Compare
Signed-off-by: declark1 <[email protected]>
Signed-off-by: declark1 <[email protected]>
Signed-off-by: declark1 <[email protected]>
Signed-off-by: declark1 <[email protected]>
Signed-off-by: declark1 <[email protected]>
f8d7402
to
5d5a0f0
Compare
Signed-off-by: declark1 <[email protected]>
Signed-off-by: declark1 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nit
pub tool_calls: Vec<ToolCall>, | ||
/// The role of the author of this message. | ||
#[serde(skip_serializing_if = "Option::is_none")] | ||
pub role: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is role
actually optional? I didn't happen to see a or null
on https://platform.openai.com/docs/api-reference/chat/streaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it Option
for the ChatCompletionDelta
object (used in streaming responses) since their documentation has examples without it included, e.g. the second one below.
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered. I believe this is optional (although not properly documented) for the streaming responses because the chunk is part of the assistant's response. In the example above, {"content":"Hello"}
is part of the assistant's response, so it keeps the messages smaller by not repeating {"role":"assistant"}
in each chunk.
This adds handling to return a stream response when
stream: true
and applies type updates based on initial testing with vLLM & llama-3-8b-instruct.Notes: