[Core][Multimodal] Cache `supports_kw` #24773

lgeiger · 2025-09-12T18:43:24Z

supports_kw is surprisingly slow due to inspect.signature.

Since it's called multiple times on every request (every time get_hf_processor is called) I think it makes sense to cache it. Let me know if this could lead to problems.

Signed-off-by: Lukas Geiger <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a performance optimization by caching the results of the supports_kw utility function using @lru_cache. This is a good change as inspect.signature can be slow. However, I've identified a potential regression where using @lru_cache directly can cause a TypeError if the function is called with an unhashable callable. I've left a comment with a suggestion to make the implementation more robust.

gemini-code-assist · 2025-09-12T18:45:00Z

vllm/utils/__init__.py

@@ -2082,6 +2082,7 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
        return await task(*args, **kwargs)


+@lru_cache


Applying @lru_cache directly to supports_kw can lead to a TypeError if it's called with an unhashable callable argument. While common callables like functions and methods are hashable, this isn't guaranteed for all callables (e.g., a class instance with __call__ and __hash__ = None). This change could introduce a regression, causing runtime errors where the code previously worked.

To make this more robust, consider handling the TypeError and falling back to an uncached call for unhashable arguments. This could be done by wrapping the cached function call in a try...except block.

DarkLight1337

Makes sense, thanks

Signed-off-by: Lukas Geiger <[email protected]>

…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (123 commits) [Hybrid Allocator] Support Pipeline Parallel (vllm-project#23974) [Spec Decoding]Support Spec Decoding Metrics in DP Mode (vllm-project#24049) [Chore] Remove ipex_ops warning (vllm-project#24835) Force use C++17 globally to avoid compilation error (vllm-project#24823) [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (vllm-project#24819) fix type of sampling rate for encode_base64 (vllm-project#24826) [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (vllm-project#24783) [Misc] Improve `s3_utils` type hints with `BaseClient` (vllm-project#24825) [Multi Modal][Performance] Fused Q,K's apply_rope into one (vllm-project#24511) [Chore] Minor simplification for non-PP path (vllm-project#24810) [Minor] Simplify duplicative device check for cuda (vllm-project#24793) Remove redundant assignment in xfer_buffers, This is a little fix (vllm-project#24732) [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (vllm-project#24771) [Doc]: fix typos in various files (vllm-project#24798) [Misc] Correct an outdated comment. (vllm-project#24765) [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750) [Core][Multimodal] Cache `supports_kw` (vllm-project#24773) [Kernels][DP/EP] Optimize Silu Kernel for R1 (vllm-project#24054) [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (vllm-project#24757) [Doc]: Remove 404 hyperlinks (vllm-project#24785) ...

[Core][Multimodal] Cache supports_kw

1a0052e

Signed-off-by: Lukas Geiger <[email protected]>

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

lgeiger force-pushed the cache-supports_kw branch from 3e4beff to 1a0052e Compare September 12, 2025 18:45

DarkLight1337 approved these changes Sep 13, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 13, 2025 05:54

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 13, 2025

DarkLight1337 merged commit 1da0f14 into vllm-project:main Sep 13, 2025
52 checks passed

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025

[Core][Multimodal] Cache supports_kw (vllm-project#24773)

c11cc80

Signed-off-by: Lukas Geiger <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core][Multimodal] Cache `supports_kw` #24773

[Core][Multimodal] Cache `supports_kw` #24773

lgeiger commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Uh oh!

		@@ -2082,6 +2082,7 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
		return await task(args, *kwargs)


		@lru_cache

Uh oh!

[Core][Multimodal] Cache supports_kw #24773

[Core][Multimodal] Cache supports_kw #24773

Conversation

lgeiger commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[Core][Multimodal] Cache `supports_kw` #24773

[Core][Multimodal] Cache `supports_kw` #24773

lgeiger commented Sep 12, 2025 •

edited by github-actions bot

Loading