-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
[Core][Multimodal] Cache supports_kw
#24773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Lukas Geiger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a performance optimization by caching the results of the supports_kw
utility function using @lru_cache
. This is a good change as inspect.signature
can be slow. However, I've identified a potential regression where using @lru_cache
directly can cause a TypeError
if the function is called with an unhashable callable. I've left a comment with a suggestion to make the implementation more robust.
@@ -2082,6 +2082,7 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args, | |||
return await task(*args, **kwargs) | |||
|
|||
|
|||
@lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying @lru_cache
directly to supports_kw
can lead to a TypeError
if it's called with an unhashable callable
argument. While common callables like functions and methods are hashable, this isn't guaranteed for all callables (e.g., a class instance with __call__
and __hash__ = None
). This change could introduce a regression, causing runtime errors where the code previously worked.
To make this more robust, consider handling the TypeError
and falling back to an uncached call for unhashable arguments. This could be done by wrapping the cached function call in a try...except
block.
3e4beff
to
1a0052e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks
Signed-off-by: Lukas Geiger <[email protected]>
…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (123 commits) [Hybrid Allocator] Support Pipeline Parallel (vllm-project#23974) [Spec Decoding]Support Spec Decoding Metrics in DP Mode (vllm-project#24049) [Chore] Remove ipex_ops warning (vllm-project#24835) Force use C++17 globally to avoid compilation error (vllm-project#24823) [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (vllm-project#24819) fix type of sampling rate for encode_base64 (vllm-project#24826) [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (vllm-project#24783) [Misc] Improve `s3_utils` type hints with `BaseClient` (vllm-project#24825) [Multi Modal][Performance] Fused Q,K's apply_rope into one (vllm-project#24511) [Chore] Minor simplification for non-PP path (vllm-project#24810) [Minor] Simplify duplicative device check for cuda (vllm-project#24793) Remove redundant assignment in xfer_buffers, This is a little fix (vllm-project#24732) [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (vllm-project#24771) [Doc]: fix typos in various files (vllm-project#24798) [Misc] Correct an outdated comment. (vllm-project#24765) [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750) [Core][Multimodal] Cache `supports_kw` (vllm-project#24773) [Kernels][DP/EP] Optimize Silu Kernel for R1 (vllm-project#24054) [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (vllm-project#24757) [Doc]: Remove 404 hyperlinks (vllm-project#24785) ...
supports_kw
is surprisingly slow due toinspect.signature
.Since it's called multiple times on every request (every time
get_hf_processor
is called) I think it makes sense to cache it. Let me know if this could lead to problems.