Fix Qwen3 #646

kozistr · 2025-06-17T14:47:02Z

What does this PR do?

Fixes #642

Changes

left padding (w/ pad token id)
causal attention mask
fix last token pooling for the batch case

Currently, batch inference works fine, but single-query inference has a problem.
(updated) I've further investigated this issue and found that it's due to the attention logic.
- self.config._attn_implementation of Qwen3 Embedding config is sdpa by default, and if we change this value to eager, which is naive impl of self-attention, we could check the identical output with this PR as expected.
- but, looks like sdpa performs something different, need to investigate more on this

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil @alvarobartt

alvarobartt

Thanks for the PR @kozistr and apologies I missed those points! LGTM

kozistr · 2025-06-18T07:16:51Z

@alvarobartt @Narsil thanks for the quick check!

I've further investigated this issue (== single query inference seems weird) and found that it's due to the attention logic. self.config._attn_implementation of Qwen3 Embedding config is sdpa by default, and if we change this value to eager, which is naive implementation of self-attention, we could check the identical output with this PR as expected.

Seems like sdpa performs something different, need to investigate more on this.

I'll let you know when I found the root cause : ) thanks!

kozistr added 5 commits June 17, 2025 14:46

fix: left padding

8e59633

fix: padding

a91c023

update: causal attention mask

fabd845

fix: last token pooling

583ec23

update: test case

9ee64c0

alvarobartt approved these changes Jun 18, 2025

View reviewed changes

Narsil approved these changes Jun 18, 2025

View reviewed changes

Narsil merged commit 1193add into huggingface:main Jun 18, 2025
2 of 13 checks passed

kozistr deleted the fix/qwen3 branch June 18, 2025 08:28

lance-miles mentioned this pull request Jun 18, 2025

Fix Qwen3-Embedding batch vs single inference inconsistency #648

Merged

5 tasks

xfalcox mentioned this pull request Jun 23, 2025

Inconsistent Embedding Outputs with qwen3-embedding-0.6B #649

Closed

BrewTestBot mentioned this pull request Jun 30, 2025

text-embeddings-inference 1.7.3 Homebrew/homebrew-core#228576

Merged

BrewTestBot mentioned this pull request Aug 5, 2025

text-embeddings-inference 1.8.0 Homebrew/homebrew-core#232408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Qwen3 #646

Fix Qwen3 #646

Uh oh!

kozistr commented Jun 17, 2025 •

edited

Loading

Uh oh!

alvarobartt left a comment

Uh oh!

kozistr commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Fix Qwen3 #646

Fix Qwen3 #646

Uh oh!

Conversation

kozistr commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Before submitting

Who can review?

Uh oh!

alvarobartt left a comment

Choose a reason for hiding this comment

Uh oh!

kozistr commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

kozistr commented Jun 17, 2025 •

edited

Loading