Skip to content

Conversation

lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Sep 12, 2025

supports_kw is surprisingly slow due to inspect.signature.
Screenshot 2025-09-12 at 19 38 55

Since it's called multiple times on every request (every time get_hf_processor is called) I think it makes sense to cache it. Let me know if this could lead to problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization by caching the results of the supports_kw utility function using @lru_cache. This is a good change as inspect.signature can be slow. However, I've identified a potential regression where using @lru_cache directly can cause a TypeError if the function is called with an unhashable callable. I've left a comment with a suggestion to make the implementation more robust.

@@ -2082,6 +2082,7 @@ async def _run_task_with_lock(task: Callable, lock: asyncio.Lock, *args,
return await task(*args, **kwargs)


@lru_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Applying @lru_cache directly to supports_kw can lead to a TypeError if it's called with an unhashable callable argument. While common callables like functions and methods are hashable, this isn't guaranteed for all callables (e.g., a class instance with __call__ and __hash__ = None). This change could introduce a regression, causing runtime errors where the code previously worked.

To make this more robust, consider handling the TypeError and falling back to an uncached call for unhashable arguments. This could be done by wrapping the cached function call in a try...except block.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 13, 2025 05:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 13, 2025
@DarkLight1337 DarkLight1337 merged commit 1da0f14 into vllm-project:main Sep 13, 2025
52 checks passed
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Sep 14, 2025
845473182 pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
…to loader

* 'loader' of https://github.com/dsxsteven/vllm_splitPR: (123 commits)
  [Hybrid Allocator] Support Pipeline Parallel (vllm-project#23974)
  [Spec Decoding]Support Spec Decoding Metrics in DP Mode (vllm-project#24049)
  [Chore] Remove ipex_ops warning (vllm-project#24835)
  Force use C++17 globally to avoid compilation error (vllm-project#24823)
  [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (vllm-project#24819)
  fix type of sampling rate for encode_base64 (vllm-project#24826)
  [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (vllm-project#24783)
  [Misc] Improve `s3_utils` type hints with `BaseClient` (vllm-project#24825)
  [Multi Modal][Performance] Fused Q,K's apply_rope into one (vllm-project#24511)
  [Chore] Minor simplification for non-PP path (vllm-project#24810)
  [Minor] Simplify duplicative device check for cuda (vllm-project#24793)
  Remove redundant assignment in xfer_buffers, This is a little fix (vllm-project#24732)
  [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (vllm-project#24771)
  [Doc]: fix typos in various files (vllm-project#24798)
  [Misc] Correct an outdated comment. (vllm-project#24765)
  [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (vllm-project#24750)
  [Core][Multimodal] Cache `supports_kw` (vllm-project#24773)
  [Kernels][DP/EP] Optimize Silu Kernel for R1 (vllm-project#24054)
  [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (vllm-project#24757)
  [Doc]: Remove 404 hyperlinks (vllm-project#24785)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants