Skip to content

Conversation

richiejp
Copy link
Collaborator

@richiejp richiejp commented Sep 2, 2025

Description

This exposes Whispers tdrz option which enables speaker segmentation. It only works with the small.en-tdrz whisper model.

When enabled it produces output like the following (taken from Whisper CPP's README):

[00:00:00.000 --> 00:00:03.800]   Okay Houston, we've had a problem here. [SPEAKER_TURN]
[00:00:03.800 --> 00:00:06.200]   This is Houston. Say again please. [SPEAKER_TURN]
[00:00:06.200 --> 00:00:08.260]   Uh Houston we've had a problem.
[00:00:08.260 --> 00:00:11.320]   We've had a main beam up on a volt. [SPEAKER_TURN]
[00:00:11.320 --> 00:00:13.820]   Roger main beam interval. [SPEAKER_TURN]
[00:00:13.820 --> 00:00:15.100]   Uh uh [SPEAKER_TURN]
[00:00:15.100 --> 00:00:18.020]   So okay stand, by thirteen we're looking at it. [SPEAKER_TURN]
[00:00:18.020 --> 00:00:25.740]   Okay uh right now uh Houston the uh voltage is uh is looking good um.
[00:00:27.620 --> 00:00:29.940]   And we had a a pretty large bank or so.

It can be enabled by passing the -d or --diarize flags to the CLI or by setting diarize: true on JSON HTTP requests.

Notes for Reviewers

Fixes #3374

Signed commits

  • Yes, I signed my commits.

Copy link

netlify bot commented Sep 2, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 0e7e34e
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68c00ac6608c390008ed0d6c
😎 Deploy Preview https://deploy-preview-6184--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@richiejp richiejp force-pushed the feat/whisper-tdrz branch 2 times, most recently from a6e9c9d to 2d276a0 Compare September 8, 2025 19:40
@richiejp richiejp marked this pull request as ready for review September 9, 2025 11:06
@richiejp
Copy link
Collaborator Author

richiejp commented Sep 9, 2025

I removed the fixes for #1648 because that ticket mentions a bunch of stuff that is way more feature rich than tinydiarize. This only tells us that the speaker has changed, but we don't an embedding identifying the speaker or anything like that.

@mudler mudler merged commit 37f5e4f into mudler:master Sep 10, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

whisper-diarization
2 participants