feat: nanobind bindings #5961

Linda-Stadter · 2025-07-11T09:37:58Z

feat: nanobind bindings

This PR contains:

Enabling nanobind build in CI + nanobind unit tests in CI (commit: feat: nanobind bindings)
Nanobind bindings (commit: nanobind bindings)

Description

This migration is not complete and still contains some errors that will be resolved in upcoming PRs.

I also enabled unit tests for the nanobind build in the CI. Since the bindings still contain some errors, there are some unit tests failing. I am skipping these tests for nanobind until the binding is complete. (95 passed, 28 skipped)

The migration from pybind11 to nanobind contains many files but is very repetitive. I believe it is therefore easiest to directly compare the difference between the pybind and nanobind version for alle the files, as they usually follow the same pattern. The difference is shown in this commit nanobind bindings.

I tried to list the main differences between pybind11 and nanobind for our code base here. They are sorted approx. by how often they occur.

Simple renamings

pybind11	nanobind
`PYBIND11_MODULE(..)`	`NB_MODULE(..)`
`PYBIND11_OVERRIDE_*(..)`	`NB_OVERRIDE_*(..)`
`reinterpret_borrow<T>(x)`	`borrow<T>(x)`
`reinterpret_steal<T>(x)`	`steal<T>(x)`
`.def_readwrite(..)`	`.def_rw(..)`
`.def_readonly(..)`	`.def_ro(..)`
`.def_property(..)`	`.def_prop_rw(..)`
`.def_property_readonly(..)`	`.def_prop_ro(..)`
`register_exception<T>`	`exception<T>`
`classh`	`class_`

Type caster headers

pybind11	nanobind
Uses just one include <pybind11/stl.h>for all type casters	uses specific headers for type casters <nanobind/stl/...>. See documentation
supports type caster for `std::deque`	No support for `std::deque`. Created a custom caster instead and reached out to owners for support of `std:deque`

Custom constructors

pybind11	nanobind
Could be specified as a lambda function returning an instance of the desired type: `.def(py::init([](int) { return MyType(...); }));`	Changed to in-place construction: `.def("__init__", [](MyType *t) { new (t) MyType(...); });` See documentation

Replacements

pybind11	nanobind
`object.cast<Type>()`	standalone function: `nb::cast<Type>(object)`
`py::arg_v("param_name", default_value, "doc_string_description")`	no argument docstring description: `nb::arg("param_name") = default_value`

Replaced numpy array class py::array with nb::ndarray<>. See documentation
This class does not support nonstandard arithmetic types. I overloaded nanobind::detail::dtype_traits to support datatype half. See documentation

Trampoline classes

pybind11	nanobind
`PYBIND11_OVERRIDE_*(..)` with base type and return value arguments	`NB_OVERRIDE_*()` without these arguments. The class also requires one `NB_TRAMPOLINE(parent, size)` declaration. See documentation

Pickling

pybind11	nanobind
Use `py::pickle()` to bind `__getstate__` and `__setstate__` functions	expose the `__getstate__` and `__setstate__` methods. Changed `__setstate__` construction to in-place. See documentation

There is a special case in set state function executorConfigSetState: The class ExecutorConfig allows to have dynamic attributes (attributes added in Python code). The set state function has to restore the C++ data as well as the attributes from Python. Nanobind then has to be informed inform the instance object is fully constructed via nb::inst_mark_ready()

None arguments
Nanobind refuses conversions from None to nullptr by default. In order to except None in an argument binding: Either call .none() of nb::arg or specify an nb::none() default arguments. See documentation
Custom type casters
See documentation

pybind11	nanobind
`load()`	`from_python()` with different signature
`cast()`	`from_cpp()` with different signature
	Created custom caster for `at::Tensor` and `std::deque`

Exceptions

For the exceptions PeftTaskNotCachedException and LoraCacheFullException, I created custom exceptions with nb::exception<T>. See documentation

Test Coverage

I am running the same unit tests that we have for pybind in tests/unittests/bindings. However, since the migration is incomplete, I am skipping the failing tests for nanobind.

Current state:
95 passed, 28 skipped

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Introduced extensive Python bindings for TensorRT-LLM C++ runtime components using nanobind, covering batch management, executor, runtime, testing, user buffers, and MOE load balancing.
- Added new build and test configurations for nanobind in Jenkins pipeline and integration tests.
- Added Python bindings for advanced model configuration, executor requests/responses, runtime buffers, cache management, and decoding workflows.
- Enabled Python subclassing and overrides for key runtime interfaces and cache transceiver classes.
Bug Fixes
- Fixed deserialization of extended runtime performance configuration to correctly restore all fields.
- Improved enum parsing and argument handling for model configuration options, replacing direct enum construction with robust from_string methods.
- Adjusted method bindings to clarify argument units and improve timeout handling in cache event retrieval.
Tests
- Added nanobind-specific test skips for unsupported unit tests.
- Updated test timeout argument types for cache event stream tests.
Style
- Standardized enum conversion interfaces across Python and C++ bindings by replacing constructor-based parsing with named static methods.

Linda-Stadter · 2025-07-11T09:38:14Z

/bot run

DomBrown

Approving as suggested changes very minor.

Caveat that maybe someone else should also skim this. Lots of files make it easy to miss something when reviewing.

cpp/tensorrt_llm/nanobind/bindings.cpp

cpp/tensorrt_llm/nanobind/executor/executor.cpp

Linda-Stadter · 2025-07-11T14:51:32Z

/bot run

tensorrt-cicd · 2025-07-11T14:56:50Z

PR_Github #11665 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-11T17:24:49Z

PR_Github #11665 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8637 completed with status: 'FAILURE'

Linda-Stadter · 2025-07-13T13:29:34Z

/bot run

tensorrt-cicd · 2025-07-13T13:35:02Z

PR_Github #11733 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-13T16:13:32Z

PR_Github #11733 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8688 completed with status: 'FAILURE'

Linda-Stadter · 2025-07-14T08:06:10Z

/bot run

tensorrt-cicd · 2025-07-14T08:07:57Z

PR_Github #11783 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-14T08:11:07Z

PR_Github #11784 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-14T11:58:54Z

PR_Github #11784 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8730 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

ixlmar

LGTM, except for the details I commented on in the review.

Some remarks not pertaining to any particular code location:

Are all the #includes in the headers needed? I was wondering if, e.g., including "tensorrt_llm/nanobind/common/customCasters.h" could be avoided in the header files. This might give some iota improvement in compilation time.
Some of the license headers only cover the time period until 2024. We might want to update the dates there.

jenkins/L0_Test.groovy

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp

cpp/tensorrt_llm/nanobind/bindings.cpp

cpp/tensorrt_llm/nanobind/executor/request.cpp

cpp/tensorrt_llm/nanobind/runtime/bindings.cpp

cpp/tensorrt_llm/pybind/bindings.cpp

tests/unittest/bindings/test_executor_bindings.py

cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp

coderabbitai · 2025-07-16T17:24:23Z

Walkthrough

This update introduces a comprehensive nanobind-based Python binding layer for the TensorRT-LLM C++ runtime, executor, batch manager, and related components. It adds new C++ source and header files for nanobind bindings, updates CMake and build/test infrastructure to support nanobind builds, and adapts Python and example code to use new enum parsing methods. Numerous classes, enums, and functions are now exposed to Python via nanobind, with detailed type casters and container bindings for seamless integration.

Changes

File(s)	Change Summary
CMake/build infra cpp/CMakeLists.txt, jenkins/Build.groovy, jenkins/L0_Test.groovy	Added nanobind build option, new build/test configs and stages for nanobind, updated pybind11 inclusion logic.
Nanobind core bindings cpp/tensorrt_llm/nanobind/bindings.cpp, cpp/tensorrt_llm/nanobind/common/bindTypes.h, cpp/tensorrt_llm/nanobind/common/customCasters.h	Introduced main nanobind module and utility headers for STL container binding and custom type casters (including PyTorch tensor, filesystem path, optional ref, etc.).
Executor nanobind bindings cpp/tensorrt_llm/nanobind/executor/bindings.cpp, cpp/tensorrt_llm/nanobind/executor/bindings.h, cpp/tensorrt_llm/nanobind/executor/executor.cpp, cpp/tensorrt_llm/nanobind/executor/executor.h, cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp, cpp/tensorrt_llm/nanobind/executor/executorConfig.h, cpp/tensorrt_llm/nanobind/executor/request.cpp, cpp/tensorrt_llm/nanobind/executor/request.h	Added nanobind Python bindings for executor classes, enums, requests, configs, and statistics, with pickling and serialization support.
Runtime nanobind bindings cpp/tensorrt_llm/nanobind/runtime/bindings.cpp, cpp/tensorrt_llm/nanobind/runtime/bindings.h, cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp, cpp/tensorrt_llm/nanobind/runtime/moeBindings.h	Added nanobind bindings for runtime classes, decoder, buffer manager, CUDA streams, MOE load balancer, and related enums/functions.
Batch manager nanobind bindings cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp, cpp/tensorrt_llm/nanobind/batch_manager/bindings.h, cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp, cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h, cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp, cpp/tensorrt_llm/nanobind/batch_manager/buffers.h, cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp, cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h, cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp, cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h, cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp, cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h	Added Python bindings for batch manager requests, buffers, algorithms, cache managers, cache transceivers, and request conversion logic.
Testing and userbuffers nanobind bindings cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp, cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h, cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp, cpp/tensorrt_llm/nanobind/userbuffers/bindings.h	Added nanobind bindings for testing ModelSpec and user buffer management.
Pybind/legacy bindings cpp/tensorrt_llm/pybind/bindings.cpp, cpp/tensorrt_llm/pybind/executor/bindings.cpp	Changed enum string conversion from constructor to static method; updated event manager timeout binding to use milliseconds.
Python API and examples tensorrt_llm/builder.py, tensorrt_llm/commands/build.py, tensorrt_llm/runtime/model_runner.py, examples/models/core/llama/summarize_long.py, examples/models/core/qwen2audio/run.py, examples/models/core/qwenvl/run.py	Updated enum parsing to use `.from_string()` method for `KVCacheType`, added argparse helper for enums.
Testing tests/integration/test_lists/test-db/l0_a10.yml, tests/unittest/bindings/test_bindings_ut.py, tests/unittest/bindings/test_executor_bindings.py	Added nanobind-specific test list, skipped unsupported tests for nanobind, adjusted event timeout argument.
Batch manager runtimeBuffers ownership change cpp/include/tensorrt_llm/batch_manager/runtimeBuffers.h, cpp/tensorrt_llm/batch_manager/runtimeBuffers.cpp	Changed `transformerBuffers` member from `unique_ptr` to `shared_ptr` and updated instantiation accordingly.
ExecutorConfig pickle fix cpp/tensorrt_llm/pybind/executor/executorConfig.cpp	Fixed deserialization of `ExtendedRuntimePerfKnobConfig` pickle state to correctly cast the fourth tuple element.

Sequence Diagram(s)

sequenceDiagram
    participant Python
    participant NanobindModule
    participant CppRuntime
    participant Executor
    participant BatchManager

    Python->>NanobindModule: import tensorrt_llm.bindings
    NanobindModule->>CppRuntime: Expose classes/enums/functions
    Python->>NanobindModule: Executor(config)
    NanobindModule->>Executor: Construct with config
    Python->>NanobindModule: enqueueRequest(request)
    NanobindModule->>Executor: enqueueRequest(request)
    Executor->>BatchManager: Prepare batch/request
    BatchManager-->>Executor: Return status
    Executor-->>NanobindModule: Return request ID
    NanobindModule-->>Python: Return request ID
    Python->>NanobindModule: awaitResponses(timeout)
    NanobindModule->>Executor: awaitResponses(timeout)
    Executor->>BatchManager: Get responses
    BatchManager-->>Executor: Return responses
    Executor-->>NanobindModule: Return responses
    NanobindModule-->>Python: Return responses

Poem

🐇
A nanobind bridge now leaps and bounds,
Bringing C++ to Python grounds.
Executors, buffers, requests galore—
All now exposed, and so much more!
With enums parsed and tests anew,
The bunny cheers: "Bindings breakthrough!"
🥕✨

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Linda-Stadter · 2025-07-16T17:25:39Z

/bot run

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (14)

cpp/tensorrt_llm/nanobind/executor/request.h (1)

20-23: Same alias-leak concern as in executorConfig.h

See previous note: move or drop the global nb alias to avoid polluting the including TU.

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1)

20-23: Header-wide nb alias again

Same reasoning as earlier headers—consider scoping the alias to reduce global namespace noise.

cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1)

20-23: Global alias leakage

Same concern about namespace nb = nanobind; in a public header.

cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)

20-23: Global alias leakage

Same remark regarding namespace nb = nanobind;.
cpp/tensorrt_llm/pybind/bindings.cpp (1)
173-173: Consider the long-term strategy for pybind11 vs nanobind bindings.

The change from constructor-based to static method-based string conversion is consistent with the codebase changes. However, there's a past review comment questioning whether this should be removed for consistency with nanobind bindings.

Given that this PR introduces nanobind bindings, what's the long-term plan for these pybind11 bindings? If nanobind is the target, maintaining parallel implementations might create maintenance overhead.
#!/bin/bash
# Description: Check if there are corresponding nanobind bindings for KVCacheType
# Expected: Find nanobind bindings that might replace this pybind11 code

# Search for KVCacheType in nanobind files
rg -A 5 -B 5 "KVCacheType" --type cpp | grep -E "(nanobind|\.def.*from_string)"
tests/unittest/bindings/test_executor_bindings.py (1)

2239-2239: Verify API consistency between pybind11 and nanobind bindings.

The change from datetime.timedelta(seconds=1) to 1000 suggests different API signatures between the two binding types. This contradicts the past review comment about maintaining consistent behavior between pybind11 and nanobind bindings.

The relevant code snippet shows get_latest_events(timeout_ms: Optional[float] = 0) which expects numeric milliseconds, supporting the numeric value. However, this creates an inconsistency where the same test code cannot work with both binding types.

Consider either:

Updating the pybind11 bindings to accept numeric milliseconds consistently

Making the nanobind bindings accept timedelta objects like pybind11

Adding a conditional check similar to the test skipping pattern

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)

104-104: Consider maintaining consistency with pybind11 parameter naming

The parameter name medusa_buffers appears to differ from the C++ naming convention (medusaBuffers). This was previously flagged in the pybind11 bindings.

Also applies to: 111-111, 122-122, 129-129
cpp/tensorrt_llm/nanobind/executor/executor.cpp (4)
55-56: Address the TODO comment with specific improvements needed

The TODO comment should specify what improvements are planned for FP8 and BF16 support.

77-80: Avoid using default: case per coding guidelines

Consider explicitly handling all enum cases instead of using a default case.

116-116: Reserve vector capacity before populating

Add dims.reserve(array.ndim()); to avoid potential reallocations during the loop.

153-158: Use structured bindings for clarity

Replace the loop variable with structured bindings as previously suggested:
for (auto const& [rawName, rawArray] : managedWeights.value())
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (1)

178-178: Consider using a more descriptive name for the vector binding.

As suggested previously, "RequestVector" would be more descriptive than the generic bind_vector naming.
cpp/tensorrt_llm/nanobind/bindings.cpp (2)
419-419: Use descriptive name for vector binding.

As previously suggested, "SamplingConfigVector" would be more descriptive.

365-365: Replace assert with proper exception handling and avoid magic numbers.

As previously noted:

The magic number 19 should be a named constant

assert is removed in release builds and calls abort(), making it inappropriate for input validation
-        assert(t.size() == 19);
+        constexpr size_t SAMPLING_CONFIG_STATE_SIZE = 19;
+        if (t.size() != SAMPLING_CONFIG_STATE_SIZE) {
+            throw std::runtime_error("Invalid SamplingConfig state size: expected " + 
+                                   std::to_string(SAMPLING_CONFIG_STATE_SIZE) + 
+                                   ", got " + std::to_string(t.size()));
+        }

🧹 Nitpick comments (14)

cpp/tensorrt_llm/nanobind/executor/executorConfig.h (2)

20-23: Global namespace nb alias in a public header may leak into all translation units

Placing namespace nb = nanobind; at the top level of a header exposes the alias to every TU that includes this file, increasing the risk of ODR or name-collision issues when different third-party projects define the same shortcut.
Consider moving the alias inside the tensorrt_llm::nanobind namespace (or omitting it and writing nanobind::module_ explicitly in headers) and keeping the global alias only in the .cpp implementation files.

27-28: Comment wording is misleading

// Register bindings for executor API. does not reflect that this header is specifically for config bindings. Rename to avoid confusion with the higher-level executor bindings already present elsewhere.

cpp/tensorrt_llm/nanobind/executor/request.h (1)

26-27: Consider aligning naming with the rest of the codebase

Most other nanobind headers expose an initBindings function. Using initRequestBindings is perfectly fine, but the inconsistency may surprise contributors skimming the source. Either:

Rename to initBindings, or

Add a brief comment explaining the naming rationale (e.g., because executor exports multiple sub-modules).

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1)

24-29: Missing namespace-closing comment

Most headers in this repo end the namespace block with a comment (} // namespace …). Adding it here would keep style consistent.

cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1)

24-29: Style consistency

Other nanobind headers wrap the closing brace with a comment. Add } // namespace tensorrt_llm::nanobind::testing for uniformity.
cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)
25-30: Prefer free function over static method for module initialisation

Every other component exposes a free initBindings function; introducing a UserBufferBindings class solely to host a static method adds an extra indirection without clear benefit. A free function keeps the API surface consistent and avoids an otherwise empty class.
-namespace tensorrt_llm::kernels::userbuffers
-{
-class UserBufferBindings
-{
-public:
-    static void initBindings(nb::module_& m);
-};
-} // namespace tensorrt_llm::kernels::userbuffers
+namespace tensorrt_llm::kernels::userbuffers
+{
+void initBindings(nb::module_& m);
+} // namespace tensorrt_llm::kernels::userbuffers
cpp/tensorrt_llm/nanobind/CMakeLists.txt (1)
49-49: Inconsistent compile definition name for nanobind module.

The compile definition PYBIND11_DETAILED_ERROR_MESSAGES=1 references pybind11 but this is a nanobind module. Consider renaming to NANOBIND_DETAILED_ERROR_MESSAGES=1 or verify if this is intentional for compatibility.
-                             PYBIND11_DETAILED_ERROR_MESSAGES=1)
+                             NANOBIND_DETAILED_ERROR_MESSAGES=1)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1)

70-131: Consider using a builder pattern for improved maintainability

The toTrtLlm() method correctly handles all 49 parameters, but this large parameter count could benefit from a builder pattern to improve readability and maintainability. This would also make it easier to add new parameters in the future.
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (1)
346-369: Consider extracting repeated tensor conversion pattern

The pattern of converting ITensor::SharedPtr to at::Tensor wrapped in std::optional is repeated in multiple methods. Consider extracting this into a helper function to reduce code duplication.
// Add helper function at the top of the file
static std::optional<at::Tensor> tensor_to_optional_torch(tensorrt_llm::runtime::ITensor::SharedPtr const& tensor)
{
    if (tensor)
    {
        return tr::Torch::tensor(tensor);
    }
    return std::nullopt;
}

// Then simplify the methods:
.def("get_block_pool_pointers",
    [](tbk::BaseKVCacheManager& self)
    {
        return tensor_to_optional_torch(self.getBlockPoolPointers());
    })
.def("get_layer_to_pool_mapping",
    [](tbk::BaseKVCacheManager& self)
    {
        return tensor_to_optional_torch(self.getLayerToPoolMapping());
    })
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)

608-609: Consider aligning property name with underlying methods.

The property "mm_embedding_offloading" maps to methods getPromptTableOffloading/setPromptTableOffloading. This naming inconsistency could confuse users. Consider using "prompt_table_offloading" for the property name to match the C++ API.
cpp/tensorrt_llm/nanobind/bindings.cpp (1)
81-81: Consider keeping leak warnings enabled during development.

Disabling leak warnings with nb::set_leak_warnings(false) could hide memory management issues. Consider enabling them in debug builds or documenting why they need to be disabled.
+#ifdef NDEBUG
     nb::set_leak_warnings(false);
+#endif
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)
42-43: Remove commented code for cleaner implementation.

The commented using declaration is not needed and should be removed.
-    // using BaseCacheTransceiver::BaseCacheTransceiver; // Inherit constructors
     NB_TRAMPOLINE(tb::BaseCacheTransceiver, 6);
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)
192-250: Comprehensive GenericLlmRequest bindings with proper optional handling.

The bindings thoroughly expose the interface with appropriate property access. The lambda functions correctly handle optional types and pointer conversions.

Consider extracting the repeated pattern for optional vector properties into a helper template to reduce code duplication:
template<typename T, typename Getter>
auto makeOptionalVectorProperty(Getter getter) {
    return [getter](GenLlmReq& self) {
        std::optional<T> result = std::nullopt;
        if (auto opt = (self.*getter)()) {
            result = *opt.value();
        }
        return result;
    };
}

// Usage:
.def_prop_ro("multimodal_hashes", 
    makeOptionalVectorProperty<std::vector<std::vector<GenLlmReq::SizeType32>>>(&GenLlmReq::getMultimodalHashes))
253-355: Complex but well-structured constructor binding.

The 49-parameter constructor is handled well with the makeOptionalTensor helper reducing duplication. All parameters have appropriate defaults.

Given the complexity, consider documenting the parameter groupings or creating builder methods in the future to improve usability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c673e9 and 1a3e07c.

📒 Files selected for processing (46)

cpp/CMakeLists.txt (2 hunks)
cpp/tensorrt_llm/nanobind/CMakeLists.txt (2 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h (1 hunks)
cpp/tensorrt_llm/nanobind/bindings.cpp (2 hunks)
cpp/tensorrt_llm/nanobind/common/bindTypes.h (1 hunks)
cpp/tensorrt_llm/nanobind/common/customCasters.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/bindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executor.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executorConfig.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/request.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/request.h (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/bindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1 hunks)
cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1 hunks)
cpp/tensorrt_llm/pybind/bindings.cpp (1 hunks)
cpp/tensorrt_llm/pybind/executor/bindings.cpp (1 hunks)
examples/models/core/llama/summarize_long.py (1 hunks)
examples/models/core/qwen2audio/run.py (1 hunks)
examples/models/core/qwenvl/run.py (1 hunks)
jenkins/Build.groovy (4 hunks)
jenkins/L0_Test.groovy (3 hunks)
tensorrt_llm/builder.py (1 hunks)
tensorrt_llm/commands/build.py (2 hunks)
tensorrt_llm/runtime/model_runner.py (1 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (1 hunks)
tests/unittest/bindings/test_bindings_ut.py (4 hunks)
tests/unittest/bindings/test_executor_bindings.py (9 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (11)

examples/models/core/qwen2audio/run.py (2)

tensorrt_llm/runtime/generation.py (1)

kv_cache_type (1169-1170)

tests/integration/defs/cpp/test_multi_gpu.py (1)

KVCacheType (23-27)

cpp/tensorrt_llm/nanobind/executor/executorConfig.h (5)

cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)

m (28-28)

m (37-37)

cpp/tensorrt_llm/nanobind/executor/executor.h (1)

m (123-123)

cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)

m (28-28)

examples/models/core/qwenvl/run.py (1)

tests/integration/defs/cpp/test_multi_gpu.py (1)

KVCacheType (23-27)

tensorrt_llm/commands/build.py (1)

tests/integration/defs/cpp/test_multi_gpu.py (1)

KVCacheType (23-27)

cpp/tensorrt_llm/nanobind/executor/request.h (5)

cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)

m (28-28)

m (37-37)

cpp/tensorrt_llm/nanobind/executor/executor.h (1)

m (123-123)

cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)

m (28-28)

cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (5)

cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)

m (28-28)

m (37-37)

cpp/tensorrt_llm/nanobind/executor/executor.h (1)

m (123-123)

cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)

m (28-28)

cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (2)

cpp/tensorrt_llm/testing/modelSpec.h (1)

ModelSpec (53-59)

tests/integration/defs/cpp/test_multi_gpu.py (1)

KVCacheType (23-27)

examples/models/core/llama/summarize_long.py (2)

tensorrt_llm/runtime/generation.py (1)

kv_cache_type (1169-1170)

tests/integration/defs/cpp/test_multi_gpu.py (1)

KVCacheType (23-27)

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (5)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (4)

initBindings (290-448)

initBindings (290-290)

initBindings (450-477)

initBindings (450-450)

cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)

initBindings (55-523)

initBindings (55-55)

cpp/tensorrt_llm/nanobind/executor/bindings.cpp (2)

initBindings (48-261)

initBindings (48-48)

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)

m (27-27)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)

m (28-28)

m (37-37)

tests/unittest/bindings/test_executor_bindings.py (2)

cpp/include/tensorrt_llm/batch_manager/llmRequest.h (2)

reason (1661-1664)

reason (1661-1661)

tensorrt_llm/_torch/pyexecutor/resource_manager.py (1)

get_latest_events (562-563)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (3)

cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)

initBindings (55-523)

initBindings (55-55)

cpp/tensorrt_llm/nanobind/common/customCasters.h (2)

tensor (321-342)

tensor (321-321)

cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (2)

from_torch (43-50)

from_torch (43-43)

🔇 Additional comments (83)

examples/models/core/qwen2audio/run.py (1)

125-126: LGTM! Consistent enum parsing improvement.

The change from direct constructor to KVCacheType.from_string() improves consistency with the broader codebase refactoring. This factory method approach provides more robust parsing of string values to enum values and aligns with similar updates in other model examples and core runtime components.

cpp/CMakeLists.txt (2)

201-201: LGTM! Extends pybind11 support for nanobind migration.

The addition of BUILD_DEEP_EP condition appropriately extends pybind11 subdirectory inclusion to support the nanobind migration while maintaining backward compatibility.

220-220: LGTM! Consistent header inclusion logic.

The parallel change to include pybind11 headers when BUILD_DEEP_EP is enabled maintains consistency with the subdirectory inclusion logic above and supports the nanobind migration requirements.

examples/models/core/llama/summarize_long.py (1)

100-100: LGTM! Consistent enum parsing standardization.

The change to use KVCacheType.from_string() follows the same pattern as other model examples and improves the robustness of enum parsing from configuration strings.

tensorrt_llm/runtime/model_runner.py (1)

89-89: LGTM! Core runtime enum parsing standardization.

The change to use KVCacheType.from_string() in the core runtime module is particularly important for maintaining consistency across the codebase. This factory method approach provides more robust parsing of configuration strings to enum values and aligns with the nanobind migration effort.

cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (1)

1-30: LGTM! Well-structured header for MOE bindings.

The header file follows established patterns in the codebase:

Proper Apache 2.0 license header

Standard #pragma once header guard

Consistent namespace structure (tensorrt_llm::nanobind::runtime)

Function signature matches the pattern used in other binding files (initBindings(nb::module_& m))

This addition properly integrates into the nanobind migration effort and follows the project's coding standards.

cpp/tensorrt_llm/nanobind/executor/bindings.h (1)

1-29: LGTM! Well-structured nanobind bindings header.

The header file follows standard practices with proper license headers, include guards, namespace organization, and a clean function declaration for initializing executor bindings.

examples/models/core/qwenvl/run.py (1)

121-122: LGTM! Consistent enum parsing improvement.

The change from direct KVCacheType construction to KVCacheType.from_string() aligns with the broader codebase migration for more robust enum parsing. This improves consistency and error handling.

tensorrt_llm/commands/build.py (2)

41-56: LGTM! Excellent enum argument parser implementation.

The enum_type helper function is well-designed with proper error handling, clear error messages, and support for both enum instances and string values. This improves the robustness of command-line argument parsing.

151-151: LGTM! Consistent with the enum parsing migration.

The change to use enum_type(KVCacheType) aligns with the broader codebase migration for improved enum argument parsing consistency.

tests/integration/test_lists/test-db/l0_a10.yml (1)

193-207: LGTM! Appropriate test configuration for nanobind bindings.

The new l0_a10_nanobind test entry is well-configured for testing nanobind bindings on A10 hardware. The conditions, wildcards, and test selection are appropriate for validating the nanobind migration.

cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1)

1-29: LGTM! Consistent nanobind bindings header structure.

The header file follows the same excellent practices as other nanobind headers in the codebase, with proper license headers, include guards, namespace organization, and a clean function declaration. The consistency across nanobind headers is commendable.

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)

1-30: LGTM! Well-structured nanobind binding header.

The header follows consistent patterns with other nanobind binding files in the project, including proper license header, include guards, namespace structure, and the standard initBindings method signature.

cpp/tensorrt_llm/nanobind/runtime/bindings.h (1)

1-31: LGTM! Consistent nanobind binding header structure.

The header follows the established patterns for nanobind binding files. The presence of both initBindings and initBindingsEarly functions suggests proper handling of initialization order dependencies in the runtime bindings.

cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)

1-30: LGTM! Consistent nanobind binding header structure.

The header follows the established patterns for nanobind binding files with proper license header, include guards, namespace structure, and the standard initBindings method signature.

tensorrt_llm/builder.py (1)

575-576: LGTM! Correct adaptation to the new KVCacheType API.

The change from direct constructor call to KVCacheType.from_string() is consistent with the binding API change and aligns with the coordinated updates across the codebase mentioned in the AI summary.

cpp/tensorrt_llm/pybind/executor/bindings.cpp (1)

247-257: LGTM! API improvement with proper timeout handling.

The change from direct method pointer binding to a lambda wrapper improves the API by:

Making timeout units explicit (timeout_ms instead of timeout)

Properly handling optional parameter conversion from double to std::chrono::milliseconds

Converting floating-point milliseconds to integer as required by std::chrono::milliseconds

The implementation correctly checks for the presence of timeout_ms and handles both cases appropriately.

cpp/tensorrt_llm/nanobind/CMakeLists.txt (4)

6-22: LGTM! Comprehensive source files expansion for nanobind migration.

The expanded source files list properly includes all the new nanobind binding modules across batch_manager, executor, runtime, testing, and userbuffers directories. This aligns with the comprehensive nanobind migration described in the PR.

33-36: LGTM! Proper conditional linking for NVSHMEM.

The conditional NVSHMEM linking is correctly implemented, ensuring the libraries are only linked when the feature is enabled.

46-46: LGTM! CUDA NVML library addition.

Adding ${CUDA_NVML_LIB} to the target link libraries is appropriate for the expanded nanobind functionality.

56-56: LGTM! Enhanced linker rpath configuration.

The addition of CUDA toolkit stubs to the rpath is appropriate for the expanded nanobind functionality that may require CUDA runtime libraries.

tests/unittest/bindings/test_bindings_ut.py (4)

8-8: LGTM! Proper pytest import addition.

Adding pytest import is necessary for the @pytest.mark.skipif decorators used in the test functions.

313-314: LGTM! Appropriate test skipping for nanobind migration.

The conditional test skipping is well-implemented:

Uses proper pytest.mark.skipif syntax

Checks the binding_type correctly

Provides clear reason for skipping

Allows CI to pass during the incomplete migration

This approach is consistent with the PR's goal of enabling nanobind while maintaining test stability.

424-425: LGTM! Consistent test skipping pattern.

The skipif decorator follows the same pattern as other nanobind-incompatible tests, maintaining consistency across the test suite.

505-506: LGTM! Consistent test skipping for pickle functionality.

The test skipping is appropriately applied to pickle-related functionality that may have different behavior between pybind11 and nanobind.

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (4)

1-16: LGTM! Proper copyright header with updated date range.

The copyright header is correctly formatted with the updated 2022-2025 date range and proper SPDX license identifiers.

18-21: LGTM! Proper nanobind integration.

The header correctly includes nanobind and establishes the namespace alias following nanobind conventions.

23-30: LGTM! Well-organized KV cache manager bindings class.

The KVCacheManagerBindings class is properly structured:

Placed in the appropriate nested namespace

Uses static method pattern for binding initialization

Clear and descriptive class name

32-39: LGTM! Consistent PEFT cache manager bindings structure.

The BasePeftCacheManagerBindings class follows the same pattern as KVCacheManagerBindings:

Proper namespace organization

Static initialization method

Consistent naming convention

jenkins/L0_Test.groovy (4)

67-68: LGTM: Nanobind configuration constant follows established pattern.

The NANOBIND_CONFIG constant is properly defined with the @Field annotation, consistent with other build configuration constants in the file.

77-77: LGTM: Nanobind build configuration properly added.

The nanobind configuration entry in BUILD_CONFIGS follows the established pattern with an appropriate tarball name.

1718-1718: LGTM: A10-Nanobind test job properly configured.

The new test job entry follows the established pattern with appropriate platform, test list, and GPU configuration for nanobind testing.

1795-1797: LGTM: Nanobind configuration selection logic is correct.

The conditional logic to set NANOBIND_CONFIG for jobs containing "Nanobind" follows the established pattern used for other build configurations.

cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (3)

1-29: LGTM: Includes and namespace setup are correct.

The file includes the necessary headers and properly sets up namespace aliases and using declarations for the nanobind bindings.

35-44: LGTM: Enum bindings are properly implemented.

Both QuantMethod and OutputContentType enums are correctly bound with appropriate documentation strings and the nb::is_arithmetic() flag for arithmetic operations.

46-84: LGTM: ModelSpec class binding is comprehensive and correct.

The class binding properly exposes all member functions with appropriate return value policies. The use of nb::rv_policy::reference_internal for methods returning references and the __copy__ method for Python-side copying are correctly implemented.

cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (3)

1-26: LGTM: Includes and namespace setup are correct.

The file includes the necessary headers and properly sets up namespace aliases for the nanobind bindings.

32-36: LGTM: UBBuffer class binding is safely implemented.

The binding properly exposes buffer metadata with read-only properties and safely converts the raw pointer address to intptr_t for Python consumption.

38-45: LGTM: User buffer management functions are properly bound.

The function bindings correctly handle pointer conversions and use appropriate lambda wrappers for type safety. The symmetric conversion between intptr_t and void* for addresses is handled correctly.

tests/unittest/bindings/test_executor_bindings.py (8)

17-17: LGTM: Import necessary for nanobind conditional logic.

The import enables checking the binding type to conditionally skip tests, which is appropriate for the migration strategy.

488-489: LGTM: Appropriate test skipping for nanobind migration.

The conditional skip decorator correctly excludes this test from running under nanobind bindings during the migration phase.

694-695: LGTM: Consistent test skipping pattern.

The decorator maintains consistency with the nanobind migration strategy.

1120-1121: LGTM: Consistent test skipping pattern.

The decorator maintains consistency with the nanobind migration strategy.

1159-1160: LGTM: Consistent test skipping pattern.

The decorator maintains consistency with the nanobind migration strategy.

1507-1508: LGTM: Consistent test skipping pattern.

The decorator maintains consistency with the nanobind migration strategy.

1881-1882: LGTM: Consistent test skipping pattern.

The decorator maintains consistency with the nanobind migration strategy.

2157-2158: LGTM: Consistent test skipping pattern.

The decorator maintains consistency with the nanobind migration strategy.

jenkins/Build.groovy (1)

50-55: Build configuration for nanobind bindings looks correct

The new nanobind build configurations follow the established pattern and correctly specify:

The --binding_type nanobind parameter to enable nanobind builds

Distinct tarball names to differentiate from pybind11 builds

Appropriate wheel architectures for each platform

Also applies to: 65-69, 85-89, 540-541

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)

55-178: Algorithm bindings implementation looks good

The nanobind bindings properly expose all batch management algorithm classes with:

Appropriate constructors with default arguments

Callable operators wrapped in lambdas for tensor type conversions

Consistent name method implementations

cpp/tensorrt_llm/nanobind/common/bindTypes.h (1)

24-100: Well-implemented container bindings with comprehensive Python interface

The bindList and bindSet template functions provide excellent Python-like interfaces for STL containers:

Proper bounds checking with index_error exceptions

Iterator support with appropriate lifetime management

Pickling support for sets enables serialization

The use of nb::overload_cast for disambiguating the erase method is correct

cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (2)

88-90: Pointer handling in property is correct

The use of .get() to return a raw pointer without transferring ownership is appropriate here, as confirmed in the previous review discussion. The setter properly takes ownership with reset().

41-109: Buffer bindings are comprehensive and well-structured

Both TransformerBuffers and RuntimeBuffers bindings expose all necessary:

Constructors with appropriate parameters

Methods for reshaping, copying, and buffer management

Read-write properties for all public data members

cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1)

43-68: Helper functions correctly handle type conversions

The from_torch helper and callbackAdapter properly manage:

Optional tensor conversions between PyTorch and internal formats

Callback wrapping with appropriate tensor type conversions

CUDA stream unwrapping for Python callbacks

cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)

189-198: Good GIL handling in shutdown method

The GIL release before calling mExecutor->shutdown() is correct and necessary to prevent deadlocks. The detailed comment explaining the rationale is helpful.

cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (2)

33-47: Good input validation in wrapper functions

The validation checks ensuring expert_count matches the expert_load_factor vector size prevent potential buffer overruns and provide clear error messages.

58-64: Well-formatted repr method for debugging

The __repr__ implementation provides a clear, informative string representation of the MoeWeight object, which is helpful for debugging.
cpp/tensorrt_llm/nanobind/executor/request.cpp (1)

556-583: Fix state deserialization mismatch in Request.setstate

The deserialization has a parameter mismatch. The state tuple contains 33 elements, but the Request constructor call includes a hardcoded 1 between state[29] and state[30], suggesting either:

A missing field in the serialized state

An incorrect parameter mapping

This could cause incorrect object restoration from pickled data.
 auto requestSetstate = [](tle::Request& request, nb::tuple const& state)
 {
-    if (state.size() != 33)
+    if (state.size() != 34)  // Update expected size if adding missing field
     {
         throw std::runtime_error("Invalid Request state!");
     }
     new (&request) tle::Request(nb::cast<VecTokens>(state[0]), nb::cast<SizeType32>(state[1]),
         // ... other parameters ...
         nb::cast<std::optional<tle::ContextPhaseParams>>(state[26]),
         nb::cast<std::optional<tle::Tensor>>(state[27]), nb::cast<std::optional<SizeType32>>(state[28]),
-        nb::cast<std::optional<tle::Tensor>>(state[29]), 1, nb::cast<std::optional<tle::EagleConfig>>(state[30]),
+        nb::cast<std::optional<tle::Tensor>>(state[29]), nb::cast<SizeType32>(state[30]), // Or retrieve from state
+        nb::cast<std::optional<tle::EagleConfig>>(state[31]),
-        nb::cast<std::optional<tle::Tensor>>(state[31]),
+        nb::cast<std::optional<tle::Tensor>>(state[32]),
-        nb::cast<std::optional<tle::GuidedDecodingParams>>(state[32]));
+        nb::cast<std::optional<tle::GuidedDecodingParams>>(state[33]));
 };
Likely an incorrect or invalid review comment.
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (5)

1-60: LGTM! Well-organized header includes.

The file structure follows good practices with proper copyright header and logically grouped includes.

61-98: Trampoline class correctly implements virtual methods.

The PyIGptDecoder trampoline class properly exposes all 5 virtual methods from the IGptDecoder interface using nanobind's override mechanism.

307-319: Verify proper handling of raw pointers in lamport functions.

The functions use intptr_t and reinterpret_cast for raw pointer manipulation. Ensure that:

The pointer sizes are validated

The buffer pointers are valid before dereferencing

Consider adding null checks for safety

323-327: Validate stream pointer extraction from PyTorch object.

The code assumes the PyTorch stream object has a cuda_stream attribute that can be cast to int64_t. This is fragile and could break with PyTorch API changes.

Consider adding error handling:
-            auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream"));
-            cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
+            if (!nb::hasattr(py_stream, "cuda_stream")) {
+                throw std::runtime_error("Stream object missing cuda_stream attribute");
+            }
+            auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream"));
+            cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
364-388: Clean implementation of SpeculativeDecodingMode bindings.

The static factory methods and read-only properties provide a well-structured API for mode configuration.
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (2)

1-46: Headers and namespace setup look good.

Proper organization of includes and namespace aliases.

445-514: Excellent implementation of ExecutorConfig serialization with dynamic attributes.

The implementation correctly handles both C++ state and Python dynamic attributes, ensuring complete serialization/deserialization support. The use of nb::inst_mark_ready after restoration is crucial for proper nanobind instance management.

cpp/tensorrt_llm/nanobind/bindings.cpp (1)

123-497: Well-structured module organization with comprehensive bindings.

The file provides extensive bindings for TensorRT-LLM components with good organization through submodules. The integration of various binding modules is clean and maintainable.

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (4)

1-35: LGTM!

The copyright header and includes are properly structured for nanobind bindings.

79-85: Correct implementation of abstract base class bindings.

The BaseCacheTransceiver binding properly uses the trampoline class and exposes all pure virtual methods with appropriate Python naming conventions.

87-95: Enum bindings are correctly implemented.

Both enum bindings follow the standard nanobind pattern and properly expose the enum values to Python.

97-110: Well-structured class bindings with proper parameter documentation.

The CacheTransceiver and CacheTransBufferManager bindings correctly specify inheritance, use named arguments for clarity, and properly handle optional parameters.

cpp/tensorrt_llm/nanobind/executor/bindings.cpp (6)

1-46: LGTM!

The file header and template function are properly structured. The template approach for KVCacheEventDiff is a good design choice for reusability.

48-54: Proper module initialization and enum binding.

The module version is correctly set and the ModelType enum follows standard nanobind patterns.

56-88: Well-implemented DecodingMode binding with proper serialization support.

The custom pickle support correctly handles state serialization/deserialization with appropriate error checking. The use of placement new in setstate is the correct approach for reconstructing the object.

90-209: Comprehensive statistics bindings with JSON serialization support.

All statistics classes are properly bound with read-write properties and JSON serialization methods where appropriate. The implementation is consistent and follows nanobind best practices.

210-256: Well-designed KV cache submodule with proper timeout handling.

The submodule structure is clean, and the get_latest_events method correctly handles optional timeout conversion from Python float (milliseconds) to C++ chrono types.

258-261: Good modular organization of bindings.

Separating request, config, and executor bindings into their own initialization functions promotes better code organization and maintainability.

cpp/tensorrt_llm/nanobind/executor/executor.h (4)

1-45: Well-designed header with flexible initialization options.

The multiple constructors provide good flexibility for different initialization scenarios, properly handling model paths, engine buffers, and encoder/decoder configurations.

46-59: Proper context manager implementation and delegation pattern.

The context manager support enables clean Python with statement usage, and the delegation to the underlying executor is correctly implemented.

61-86: Excellent GIL management for blocking operations.

All awaitResponses methods correctly release the GIL before blocking operations, which is crucial for proper Python threading behavior and prevents deadlocks.

88-127: Clean interface with proper resource management.

The remaining methods correctly delegate to the underlying executor, use [[nodiscard]] appropriately, and manage the executor lifetime with std::unique_ptr.

cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (4)

55-81: Proper exception handling setup for nanobind.

The exception registration and translator implementation correctly handle custom exceptions and follow nanobind best practices with no-capture lambdas.

356-428: Well-structured bindings for request methods and buffer management classes.

The methods are properly exposed, and the tensor conversion for new_output_tokens_host correctly handles the Python interface requirements.

429-443: Safe utility function with proper validation.

The function correctly validates input sizes and provides clear documentation about the expected token ordering.

445-522: Complex but correct implementation of decoding batch input creation.

The function properly handles context/generation request separation, beam width tiling for context requests, and constructs the decoding input with appropriate logits views and slot management.

cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h

cpp/tensorrt_llm/nanobind/common/customCasters.h

cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp

cpp/tensorrt_llm/nanobind/bindings.cpp

Linda-Stadter · 2025-07-16T21:42:25Z

/bot run

tensorrt-cicd · 2025-07-16T21:48:03Z

PR_Github #12115 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (6)

cpp/tensorrt_llm/nanobind/common/customCasters.h (1)
294-305: Remove unsafe const_cast in ITensor::SharedConstPtr conversion

The reinterpret_cast from SharedConstPtr to SharedPtr& violates const-correctness and could lead to undefined behavior if the underlying tensor is modified.
     static handle from_cpp(
         tensorrt_llm::runtime::ITensor::SharedConstPtr const& src, rv_policy /* policy */, cleanup_list* /* cleanup */)
     {
         if (src == nullptr)
         {
             return none().release();
         }
-        return THPVariable_Wrap(tensorrt_llm::runtime::Torch::tensor(
-            reinterpret_cast<tensorrt_llm::runtime::ITensor::SharedPtr const&>(src)));
+        // Create a const tensor view or copy the data to maintain const-correctness
+        auto const_tensor = tensorrt_llm::runtime::Torch::tensor(src);
+        return THPVariable_Wrap(const_tensor);
     }
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (2)
178-178: Consider renaming for consistency.

The vector binding RequestVector could be renamed to RequestVector for better consistency with naming conventions.

307-329: Validate stream pointer casting for safety.

The unchecked cast from Python object to CUDA stream pointer could be unsafe. Consider adding validation to prevent undefined behavior from invalid pointer casts.

Consider adding validation like:
-        [](int64_t delay_micro_secs, nb::object py_stream)
-        {
-            // Get the raw stream handle from PyTorch stream object
-            auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream"));
-            cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
+        [](int64_t delay_micro_secs, nb::object py_stream)
+        {
+            // Get the raw stream handle from PyTorch stream object
+            auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream"));
+            if (!stream_ptr) {
+                throw std::runtime_error("Invalid stream pointer");
+            }
+            cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
cpp/tensorrt_llm/nanobind/bindings.cpp (3)
117-117: Add validation for stream pointer casting.

The unchecked cast from Python object to CUDA stream pointer is unsafe and could lead to undefined behavior.

Consider adding validation:
-                cudaStream_t stream = reinterpret_cast<cudaStream_t>(nb::cast<uintptr_t>(py_stream));
+                auto stream_ptr = nb::cast<uintptr_t>(py_stream);
+                if (!stream_ptr) {
+                    throw std::runtime_error("Invalid stream pointer");
+                }
+                cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
355-388: Consider replacing magic number with named constant.

The magic number 19 in the assertion could be problematic if the tuple size changes. Consider using a named constant.
+    static constexpr size_t SAMPLING_CONFIG_TUPLE_SIZE = 19;
     auto SamplingConfigSetState = [](tr::SamplingConfig& self, nb::tuple t) -> tr::SamplingConfig
     {
-        assert(t.size() == 19);
+        assert(t.size() == SAMPLING_CONFIG_TUPLE_SIZE);
365-365: Consider using exception instead of assert for validation.

Since this handles unvalidated input data and assert may be removed in release builds, throwing an exception would be more appropriate.
-        assert(t.size() == 19);
+        if (t.size() != 19) {
+            throw std::runtime_error("Invalid tuple size for SamplingConfig deserialization");
+        }

🧹 Nitpick comments (4)

cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1)

253-355: Consider using a builder pattern or configuration object for the complex constructor.

The LlmRequest constructor has 49 parameters, which significantly impacts maintainability and usability. Consider refactoring to use a builder pattern or a configuration object to improve the API.

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)

93-95: Consider moving AttentionType enum binding to executor module.

The AttentionType enum from executor::kv_cache namespace is being bound in the batch manager module. This creates a cross-module dependency that might be better organized by binding this enum in the executor bindings instead.
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (1)
246-256: Consider adding timeout validation.

The get_latest_events method converts a float timeout to milliseconds without validating that the value is non-negative. Consider adding validation to prevent potential issues with negative timeouts.
"get_latest_events",
[](tle::KVCacheEventManager& self, std::optional<double> timeout_ms = std::nullopt)
{
    if (timeout_ms)
    {
+       if (*timeout_ms < 0)
+       {
+           throw std::invalid_argument("timeout_ms must be non-negative");
+       }
        return self.getLatestEvents(std::chrono::milliseconds(static_cast<int64_t>(*timeout_ms)));
    }
    return self.getLatestEvents(std::nullopt);
},
nb::arg("timeout_ms") = std::nullopt);
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)
172-178: UpdateDecoderBuffers binding uses string literal instead of class name.

The binding uses "UpdateDecoderBuffers" as a string literal instead of the class's static name member like other bindings. For consistency with the other bindings in this file, consider using the class name member if available.
-    nb::class_<UpdateDecoderBuffers>(m, "UpdateDecoderBuffers")
+    nb::class_<UpdateDecoderBuffers>(m, UpdateDecoderBuffers::name)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a3e07c and 44a0288.

📒 Files selected for processing (45)

cpp/CMakeLists.txt (2 hunks)
cpp/tensorrt_llm/nanobind/CMakeLists.txt (2 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h (1 hunks)
cpp/tensorrt_llm/nanobind/bindings.cpp (2 hunks)
cpp/tensorrt_llm/nanobind/common/bindTypes.h (1 hunks)
cpp/tensorrt_llm/nanobind/common/customCasters.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/bindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executor.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/executorConfig.h (1 hunks)
cpp/tensorrt_llm/nanobind/executor/request.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/executor/request.h (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/bindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (1 hunks)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1 hunks)
cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (1 hunks)
cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1 hunks)
cpp/tensorrt_llm/pybind/bindings.cpp (1 hunks)
cpp/tensorrt_llm/pybind/executor/bindings.cpp (1 hunks)
examples/models/core/llama/summarize_long.py (1 hunks)
examples/models/core/qwen2audio/run.py (1 hunks)
examples/models/core/qwenvl/run.py (1 hunks)
jenkins/Build.groovy (4 hunks)
jenkins/L0_Test.groovy (3 hunks)
tensorrt_llm/builder.py (1 hunks)
tensorrt_llm/commands/build.py (2 hunks)
tensorrt_llm/runtime/model_runner.py (1 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (1 hunks)
tests/unittest/bindings/test_bindings_ut.py (4 hunks)

✅ Files skipped from review due to trivial changes (2)

cpp/tensorrt_llm/nanobind/common/bindTypes.h
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h

🚧 Files skipped from review as they are similar to previous changes (35)

examples/models/core/qwen2audio/run.py
examples/models/core/qwenvl/run.py
tensorrt_llm/runtime/model_runner.py
examples/models/core/llama/summarize_long.py
tensorrt_llm/commands/build.py
cpp/tensorrt_llm/nanobind/runtime/moeBindings.h
cpp/CMakeLists.txt
cpp/tensorrt_llm/nanobind/executor/bindings.h
cpp/tensorrt_llm/nanobind/executor/request.h
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h
tests/integration/test_lists/test-db/l0_a10.yml
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h
cpp/tensorrt_llm/nanobind/userbuffers/bindings.h
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
cpp/tensorrt_llm/nanobind/runtime/bindings.h
cpp/tensorrt_llm/pybind/executor/bindings.cpp
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h
cpp/tensorrt_llm/pybind/bindings.cpp
tensorrt_llm/builder.py
tests/unittest/bindings/test_bindings_ut.py
cpp/tensorrt_llm/nanobind/CMakeLists.txt
cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp
cpp/tensorrt_llm/nanobind/executor/executorConfig.h
cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp
jenkins/Build.groovy
cpp/tensorrt_llm/nanobind/executor/executor.cpp
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp
cpp/tensorrt_llm/nanobind/executor/executor.h
cpp/tensorrt_llm/nanobind/executor/request.cpp
jenkins/L0_Test.groovy
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp

🧰 Additional context used

🧬 Code Graph Analysis (1)

cpp/tensorrt_llm/nanobind/executor/bindings.cpp (6)

cpp/tensorrt_llm/nanobind/executor/executor.h (1)

m (123-123)

cpp/tensorrt_llm/nanobind/executor/request.h (1)

initRequestBindings (27-27)

cpp/tensorrt_llm/nanobind/executor/request.cpp (2)

initRequestBindings (53-933)

initRequestBindings (53-53)

cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (2)

initConfigBindings (46-614)

initConfigBindings (46-46)

cpp/tensorrt_llm/nanobind/executor/bindings.h (1)

initBindings (27-27)

cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)

initBindings (200-239)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (25)

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (6)

1-51: LGTM! Well-structured includes and namespace setup.

The includes are comprehensive and properly organized. The namespace aliases enhance readability throughout the file.

54-61: LGTM! Clean helper function for optional tensor conversion.

The helper function properly handles the optional conversion from PyTorch tensors to TensorRT-LLM tensors.

63-237: LGTM! Comprehensive trampoline class for KV cache manager.

The PyKvCacheManager trampoline class properly implements all 28 virtual methods from the base class, enabling Python subclassing. The use of NB_OVERRIDE_PURE is appropriate for pure virtual methods.

240-287: LGTM! Well-implemented trampoline class for PEFT cache manager.

The PyBasePeftCacheManager trampoline class correctly implements all 8 virtual methods, following the same pattern as the KV cache manager trampoline.

290-448: LGTM! Comprehensive KV cache manager bindings with proper tensor handling.

The bindings are well-structured and provide extensive functionality. The lambda functions for PyTorch tensor conversions are appropriate, and the parameter handling is correct.

Note the TODO comment at line 239 about deduplicating executor bindings for KvCacheStats - this could be addressed in a future refactor.

450-477: LGTM! Well-designed PEFT cache manager bindings with proper GIL handling.

The bindings are properly implemented with appropriate use of nb::gil_scoped_release for the ensure_batch method, which allows for better concurrency during potentially long-running operations.

cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (4)

1-57: LGTM! Comprehensive includes and proper namespace setup.

The includes are well-organized and cover all necessary components for runtime bindings. The namespace aliases enhance code readability.

61-97: LGTM! Well-implemented trampoline class for IGptDecoder.

The PyIGptDecoder trampoline class properly implements all 5 virtual methods from the base interface, enabling Python subclassing of the decoder interface.

105-178: LGTM! Comprehensive bindings for runtime configuration classes.

The bindings for TaskLayerModuleConfig, BufferManager, TllmRuntime, and Request are well-structured with appropriate parameter handling and property exposure.

360-388: LGTM! Well-structured early bindings initialization.

The initBindingsEarly function properly initializes SpeculativeDecodingMode with static factory methods and read-only properties, following good nanobind practices.

cpp/tensorrt_llm/nanobind/bindings.cpp (4)

2-2: LGTM! Copyright year updated appropriately.

The copyright year has been updated to 2025, which is appropriate for this new nanobind implementation.

70-74: LGTM! Clean helper function for sampling config creation.

The makeSamplingConfig helper function provides a clean interface for creating sampling configurations from vectors.

177-182: LGTM! Improved KVCacheType binding with static method.

The use of from_string as a static method instead of a constructor is a good design choice that provides better control over string parsing and error handling.

470-497: LGTM! Comprehensive initialization of all binding modules.

The initialization sequence properly coordinates all the various binding modules (runtime, testing, batch manager, etc.) and provides appropriate NVLS functionality.

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (11)

1-49: Well-structured nanobind binding file with proper includes and namespaces.

The file follows good practices with comprehensive includes, proper licensing, and clear namespace organization. The nanobind and STL container includes are appropriate for the binding patterns used.

57-65: CapacityScheduler binding looks correct with proper default arguments.

The constructor binding with named arguments and sensible defaults is well-implemented. The call method binding correctly handles optional parameters.

67-75: MicroBatchScheduler binding properly handles optional parameters.

The use of std::nullopt for optional parameters and the comprehensive argument binding is appropriate for this scheduler class.

77-83: PauseRequests binding correctly handles multiple optional cache managers.

The binding properly exposes the pause functionality with appropriate optional parameters for different cache manager types.

85-89: AssignReqSeqSlots binding is clean and straightforward.

Simple binding with no constructor arguments and clear call method parameters.

91-95: AllocateKvCache binding properly handles optional cross cache manager.

The binding correctly exposes the KV cache allocation functionality with optional cross cache manager parameter.

97-112: HandleContextLogits binding uses lambda for tensor view conversion.

The lambda wrapper correctly converts at::Tensor to tr::TorchView using tr::TorchView::of(logits). This is a proper approach for handling tensor type conversions in nanobind.

114-130: HandleGenerationLogits binding correctly handles void return type.

The lambda wrapper properly handles the void return type by not returning anything, and the tensor conversion is consistent with the context logits handler.

132-137: MakeDecodingBatchInputOutput binding is straightforward and correct.

Direct binding of the operator with appropriate optional parameters for runtime buffers.

139-144: LogitsPostProcessor binding properly handles optional batched processor.

The binding correctly exposes the logits post-processing functionality with optional batched processor parameter.

146-170: CreateNewDecoderRequests binding uses structured bindings effectively.

The lambda wrapper correctly handles the tuple return type by using structured bindings and converting the tensor result with runtime::Torch::tensor(batchSlots). This is a good pattern for handling complex return types in nanobind.

tensorrt-cicd · 2025-07-16T22:03:26Z

PR_Github #12115 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9000 completed with status: 'FAILURE'

Signed-off-by: Linda-Stadter <[email protected]>

This reverts commit 5bff317. Signed-off-by: Iman Tabrizian <[email protected]>

Signed-off-by: Iman Tabrizian <[email protected]>

This reverts commit b75e53a.

This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>

Signed-off-by: Linda-Stadter <[email protected]>

Signed-off-by: Iman Tabrizian <[email protected]>

Signed-off-by: Iman Tabrizian <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>

MartinMarciniszyn assigned Linda-Stadter Jul 11, 2025

MartinMarciniszyn requested review from MartinMarciniszyn, dcampora and DomBrown July 11, 2025 10:22

DomBrown approved these changes Jul 11, 2025

View reviewed changes

cpp/tensorrt_llm/nanobind/bindings.cpp Show resolved Hide resolved

cpp/tensorrt_llm/nanobind/executor/executor.cpp Outdated Show resolved Hide resolved

ixlmar self-requested a review July 14, 2025 09:29

ixlmar reviewed Jul 15, 2025

View reviewed changes

coderabbitai bot reviewed Jul 16, 2025

View reviewed changes

Linda-Stadter force-pushed the nanobind-bindings branch from 1a3e07c to 44a0288 Compare July 16, 2025 21:42

coderabbitai bot reviewed Jul 16, 2025

View reviewed changes

Linda-Stadter added 4 commits July 17, 2025 02:13

feat: nanobind bindings

7fe8286

Signed-off-by: Linda-Stadter <[email protected]>

copied pybind folder for review

2652173

Signed-off-by: Linda-Stadter <[email protected]>

nanobind bindings

0243026

Signed-off-by: Linda-Stadter <[email protected]>

Addressing comments and rebase

655cbc3

Signed-off-by: Linda-Stadter <[email protected]>

Tabrizian added a commit to Tabrizian/TensorRT-LLM that referenced this pull request Jul 17, 2025

Revert "feat: nanobind bindings (NVIDIA#5961)"

ee27a95

This reverts commit 5bff317. Signed-off-by: Iman Tabrizian <[email protected]>

Tabrizian added a commit to Tabrizian/TensorRT-LLM that referenced this pull request Jul 17, 2025

Revert "feat: nanobind bindings (NVIDIA#5961)"

d643668

This reverts commit 5bff317. Signed-off-by: Iman Tabrizian <[email protected]>

coderabbitai bot mentioned this pull request Jul 17, 2025

Revert "feat: nanobind bindings (#5961)" #6160

Merged

chzblych pushed a commit that referenced this pull request Jul 18, 2025

Revert "feat: nanobind bindings (#5961)" (#6160)

b75e53a

Signed-off-by: Iman Tabrizian <[email protected]>

DomBrown mentioned this pull request Jul 18, 2025

refactor: Enhanced handling of decoder requests and logits within the batch manager #6055

Merged

Linda-Stadter pushed a commit to Linda-Stadter/TensorRT-LLM that referenced this pull request Jul 18, 2025

Reapply "feat: nanobind bindings (NVIDIA#5961)"

bdd88cc

This reverts commit b75e53a.

coderabbitai bot mentioned this pull request Jul 18, 2025

feat: nanobind bindings #6185

Merged

Linda-Stadter added a commit to Linda-Stadter/TensorRT-LLM that referenced this pull request Jul 18, 2025

Reapply "feat: nanobind bindings (NVIDIA#5961)"

25890bf

This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>

Linda-Stadter added a commit to Linda-Stadter/TensorRT-LLM that referenced this pull request Jul 18, 2025

Reapply "feat: nanobind bindings (NVIDIA#5961)"

bc9828b

This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>

Linda-Stadter added a commit to Linda-Stadter/TensorRT-LLM that referenced this pull request Jul 20, 2025

Reapply "feat: nanobind bindings (NVIDIA#5961)"

ce00314

This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>

reasonsolo pushed a commit to reasonsolo/TensorRT-LLM that referenced this pull request Jul 21, 2025

feat: nanobind bindings (NVIDIA#5961)

07087c6

Signed-off-by: Linda-Stadter <[email protected]>

reasonsolo pushed a commit to reasonsolo/TensorRT-LLM that referenced this pull request Jul 21, 2025

Revert "feat: nanobind bindings (NVIDIA#5961)" (NVIDIA#6160)

fde179a

Signed-off-by: Iman Tabrizian <[email protected]>

This was referenced Jul 21, 2025

[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory #5034

Merged

fix: bindings unit tests for nanobind #6221

Merged

timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025

Revert "feat: nanobind bindings (NVIDIA#5961)" (NVIDIA#6160)

8203dd6

Signed-off-by: Iman Tabrizian <[email protected]>

coderabbitai bot mentioned this pull request Jul 22, 2025

Draft: Nanobind integration tests #6203

Closed

NVShreyas pushed a commit to NVShreyas/TensorRT-LLM that referenced this pull request Jul 28, 2025

Revert "feat: nanobind bindings (NVIDIA#5961)" (NVIDIA#6160)

f850f7e

Signed-off-by: Iman Tabrizian <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>

This was referenced Aug 11, 2025

[None][feat] KV Cache Connector API #6488

Closed

[None][feat] Add Request specific exception #6931

Merged

This was referenced Aug 18, 2025

[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path #6348

Merged

Draft:[TRTLLM-7078][chore] optimal kvcache transfer for VWSA #6861

Closed

This was referenced Aug 26, 2025

feat: RayExecutor #7240

Closed

[None][feat] Nixl support for GDS #5488

Merged

[None][feat] KV Cache Connector API #7228

Merged

coderabbitai bot mentioned this pull request Sep 4, 2025

[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse #7106

Merged

feat: nanobind bindings #5961

feat: nanobind bindings #5961

Uh oh!

Conversation

Linda-Stadter commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: nanobind bindings

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

Linda-Stadter commented Jul 11, 2025

Uh oh!

DomBrown left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Linda-Stadter commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

Linda-Stadter commented Jul 13, 2025

Uh oh!

tensorrt-cicd commented Jul 13, 2025

Uh oh!

tensorrt-cicd commented Jul 13, 2025

Uh oh!

Linda-Stadter commented Jul 14, 2025

Uh oh!

tensorrt-cicd commented Jul 14, 2025

Uh oh!

tensorrt-cicd commented Jul 14, 2025

Uh oh!

tensorrt-cicd commented Jul 14, 2025

Uh oh!

ixlmar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Linda-Stadter commented Jul 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Linda-Stadter commented Jul 16, 2025

Uh oh!

tensorrt-cicd commented Jul 16, 2025

Uh oh!

Linda-Stadter commented Jul 11, 2025 •

edited

Loading

coderabbitai bot commented Jul 16, 2025 •

edited

Loading