Skip to content

Conversation

richiejp
Copy link
Collaborator

@richiejp richiejp commented Sep 10, 2025

Description

Add enough realtime API features to allow talking with an LLM using only audio.

Presently the realtime API only supports transcription which is a minor use-case for it. This PR should allow it to be used with a basic voice assistant.

This PR will ignore many of the options and edge-cases. Instead it'll just, for e.g., rely on server side VAD to commit conversation items.

Notes for Reviewers

  • Configure a model pipeline or use a multi-modal model.
  • Commit client audio to the conversation
  • Generate a text response (optional)
  • Generate an audio response
  • Interrupt generation on voice detection?

Fixes: #3714 (but we'll need follow issues)

Signed commits

  • Yes, I signed my commits.

Copy link

netlify bot commented Sep 10, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 2eae0d9
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68c3c9475b886b00083272f3
😎 Deploy Preview https://deploy-preview-6245--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mudler mudler added the roadmap label Sep 11, 2025
@richiejp
Copy link
Collaborator Author

It's not clear to me if we have audio support in llama.cpp: ggml-org/llama.cpp#15194

@richiejp
Copy link
Collaborator Author

ggml-org/llama.cpp#13759

@richiejp
Copy link
Collaborator Author

ggml-org/llama.cpp#13784

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for realtime API
2 participants