Skip to content

Conversation

declark1
Copy link
Collaborator

@declark1 declark1 commented Oct 15, 2024

This adds handling to return a stream response when stream: true and applies type updates based on initial testing with vLLM & llama-3-8b-instruct.

Notes:

  • The stream error handling probably needs more work, we can do this as part of the orchestrator chat implementation

Copy link
Collaborator

@evaline-ju evaline-ju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small initial comments, thanks for the addition!

The stream error handling probably needs more work, we can do this as part of the orchestrator chat implementation

Let’s make sure to note this as part of #192 or whatever appropriate issue?

Copy link
Collaborator

@evaline-ju evaline-ju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit

pub tool_calls: Vec<ToolCall>,
/// The role of the author of this message.
#[serde(skip_serializing_if = "Option::is_none")]
pub role: Option<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is role actually optional? I didn't happen to see a or null on https://platform.openai.com/docs/api-reference/chat/streaming

Copy link
Collaborator Author

@declark1 declark1 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it Option for the ChatCompletionDelta object (used in streaming responses) since their documentation has examples without it included, e.g. the second one below.

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini", "system_fingerprint": "fp_44709d6fcb", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

Copy link
Collaborator Author

@declark1 declark1 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered. I believe this is optional (although not properly documented) for the streaming responses because the chunk is part of the assistant's response. In the example above, {"content":"Hello"} is part of the assistant's response, so it keeps the messages smaller by not repeating {"role":"assistant"} in each chunk.

@declark1 declark1 merged commit 521c80f into foundation-model-stack:main Oct 30, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants