Skip to content

Conversation

desmondcheongzx
Copy link
Collaborator

Changes Made

Adds vLLM as a provider for text embedding.

import daft
from daft.ai.provider import load_provider
from daft.functions.ai import embed_text

provider = load_provider("vllm")
model = "Qwen/Qwen3-Embedding-0.6B"

(
    daft.read_huggingface("Open-Orca/OpenOrca")
    .with_column("embedding", embed_text(daft.col("response"), provider=provider, model=model))
    .show()
)

@github-actions github-actions bot added the feat label Sep 4, 2025
@desmondcheongzx desmondcheongzx marked this pull request as ready for review September 4, 2025 07:57
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds vLLM as a new text embedding provider to Daft's AI framework, following the established provider pattern used by existing providers like OpenAI and SentenceTransformers. The implementation includes four main components:

  1. Provider Class (daft/ai/vllm/provider.py): Implements VLLMProvider with the standard constructor pattern and a get_text_embedder method that returns a vLLMTextEmbedderDescriptor.

  2. Protocol Implementation (daft/ai/vllm/protocols/text_embedder.py): Contains the core embedding logic with vLLMTextEmbedderDescriptor and vLLMTextEmbedder classes. The descriptor handles model configuration and dimension detection using AutoConfig.from_pretrained(), while the embedder wraps a vLLM LLM instance for actual text embedding operations.

  3. Provider Registry Update (daft/ai/provider.py): Adds a load_vllm() function and registers it in the PROVIDERS dictionary, enabling users to instantiate the provider via load_provider("vllm").

  4. Test Coverage (tests/ai/test_vllm.py): Provides comprehensive unit tests using mocking to validate the provider interface and embedding functionality without requiring actual vLLM dependencies.

The implementation integrates seamlessly with Daft's existing AI infrastructure, allowing users to leverage vLLM's high-performance inference capabilities for text embedding tasks. vLLM is known for its optimized inference engine for large language models, making this addition valuable for production environments requiring fast embedding generation. The code follows established patterns for lazy dependency loading, consistent interfaces, and proper error handling with ProviderImportError when vLLM is not installed.

Confidence score: 4/5

  • This PR is safe to merge with only minor issues that don't affect core functionality
  • Score reflects solid implementation following established patterns, but with a few inconsistencies in option handling and some missing validation
  • Pay close attention to daft/ai/vllm/provider.py for the unused _options parameter and daft/ai/vllm/protocols/text_embedder.py for potential error handling improvements

5 files reviewed, 2 comments

Edit Code Review Bot Settings | Greptile


def __init__(self, name: str | None = None, **options: Any):
self._name = name if name else "vllm"
self._options = options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The _options are stored but never used in get_text_embedder. Consider passing them to the descriptor like OpenAIProvider does, or remove if truly unused.

Comment on lines +55 to +58
def embed_text(self, text: list[str]) -> list[Embedding]:
outputs = self.model.embed(text)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
return embeddings.cpu().numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Return type annotation says list[Embedding] but returns numpy array. Should return embeddings.cpu().numpy().tolist() or update annotation.

Suggested change
def embed_text(self, text: list[str]) -> list[Embedding]:
outputs = self.model.embed(text)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
return embeddings.cpu().numpy()
def embed_text(self, text: list[str]) -> list[Embedding]:
outputs = self.model.embed(text)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
return embeddings.cpu().numpy().tolist()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The Embedding type is currently an alias to np.ndarray, but these protocols should return a list[np.ndarray].

Copy link

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 87.71930% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.92%. Comparing base (81c2e1a) to head (1567b08).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
daft/ai/provider.py 16.66% 5 Missing ⚠️
daft/ai/vllm/protocols/text_embedder.py 97.22% 1 Missing ⚠️
daft/ai/vllm/provider.py 93.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5136      +/-   ##
==========================================
- Coverage   76.54%   71.92%   -4.63%     
==========================================
  Files         953      955       +2     
  Lines      130653   130581      -72     
==========================================
- Hits       100006    93918    -6088     
- Misses      30647    36663    +6016     
Files with missing lines Coverage Δ
daft/ai/vllm/protocols/text_embedder.py 97.22% <97.22%> (ø)
daft/ai/vllm/provider.py 93.33% <93.33%> (ø)
daft/ai/provider.py 47.16% <16.66%> (-8.39%) ⬇️

... and 155 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +55 to +58
def embed_text(self, text: list[str]) -> list[Embedding]:
outputs = self.model.embed(text)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
return embeddings.cpu().numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The Embedding type is currently an alias to np.ndarray, but these protocols should return a list[np.ndarray].

Comment on lines +24 to +27
def get_text_embedder(self, model: str | None = None, **options: Any) -> TextEmbedderDescriptor:
from daft.ai.vllm.protocols.text_embedder import vLLMTextEmbedderDescriptor

return vLLMTextEmbedderDescriptor(model or "sentence-transformers/all-MiniLM-L6-v2", options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include the provider name and options in the descriptor. Please see OpenAITextEmbedderDescriptor for an example. The provider name is for answering "who produced this descriptor?" and the options are useful when the provider requires late-initialization of 'something' which for OpenAI is the client — the provider takes the client options, but ultimately the must be plumbed into the descriptor->model instantiation.

self.options = options

def embed_text(self, text: list[str]) -> list[Embedding]:
outputs = self.model.embed(text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That easy huh? 😄

Do you know if we need to do any batching on our own? I'm fine with just firing away to vLLM until we learn more, mostly asking out of curiosity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM does batching internally. Tbf there's a lot more that can be done here, but I want to move on and support endpoints like gemini (and maybe hack together something for ev)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants