-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: nanobind bindings #5961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: nanobind bindings #5961
Conversation
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as suggested changes very minor.
Caveat that maybe someone else should also skim this. Lots of files make it easy to miss something when reviewing.
/bot run |
PR_Github #11665 [ run ] triggered by Bot |
PR_Github #11665 [ run ] completed with state |
/bot run |
PR_Github #11733 [ run ] triggered by Bot |
PR_Github #11733 [ run ] completed with state |
/bot run |
PR_Github #11783 [ run ] triggered by Bot |
PR_Github #11784 [ run ] triggered by Bot |
PR_Github #11784 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for the details I commented on in the review.
Some remarks not pertaining to any particular code location:
-
Are all the
#include
s in the headers needed? I was wondering if, e.g., including "tensorrt_llm/nanobind/common/customCasters.h" could be avoided in the header files. This might give some iota improvement in compilation time. -
Some of the license headers only cover the time period until 2024. We might want to update the dates there.
WalkthroughThis update introduces a comprehensive nanobind-based Python binding layer for the TensorRT-LLM C++ runtime, executor, batch manager, and related components. It adds new C++ source and header files for nanobind bindings, updates CMake and build/test infrastructure to support nanobind builds, and adapts Python and example code to use new enum parsing methods. Numerous classes, enums, and functions are now exposed to Python via nanobind, with detailed type casters and container bindings for seamless integration. Changes
Sequence Diagram(s)sequenceDiagram
participant Python
participant NanobindModule
participant CppRuntime
participant Executor
participant BatchManager
Python->>NanobindModule: import tensorrt_llm.bindings
NanobindModule->>CppRuntime: Expose classes/enums/functions
Python->>NanobindModule: Executor(config)
NanobindModule->>Executor: Construct with config
Python->>NanobindModule: enqueueRequest(request)
NanobindModule->>Executor: enqueueRequest(request)
Executor->>BatchManager: Prepare batch/request
BatchManager-->>Executor: Return status
Executor-->>NanobindModule: Return request ID
NanobindModule-->>Python: Return request ID
Python->>NanobindModule: awaitResponses(timeout)
NanobindModule->>Executor: awaitResponses(timeout)
Executor->>BatchManager: Get responses
BatchManager-->>Executor: Return responses
Executor-->>NanobindModule: Return responses
NanobindModule-->>Python: Return responses
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (14)
cpp/tensorrt_llm/nanobind/executor/request.h (1)
20-23
: Same alias-leak concern as inexecutorConfig.h
See previous note: move or drop the global
nb
alias to avoid polluting the including TU.cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1)
20-23
: Header-widenb
alias againSame reasoning as earlier headers—consider scoping the alias to reduce global namespace noise.
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1)
20-23
: Global alias leakageSame concern about
namespace nb = nanobind;
in a public header.cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)
20-23
: Global alias leakageSame remark regarding
namespace nb = nanobind;
.cpp/tensorrt_llm/pybind/bindings.cpp (1)
173-173
: Consider the long-term strategy for pybind11 vs nanobind bindings.The change from constructor-based to static method-based string conversion is consistent with the codebase changes. However, there's a past review comment questioning whether this should be removed for consistency with nanobind bindings.
Given that this PR introduces nanobind bindings, what's the long-term plan for these pybind11 bindings? If nanobind is the target, maintaining parallel implementations might create maintenance overhead.
#!/bin/bash # Description: Check if there are corresponding nanobind bindings for KVCacheType # Expected: Find nanobind bindings that might replace this pybind11 code # Search for KVCacheType in nanobind files rg -A 5 -B 5 "KVCacheType" --type cpp | grep -E "(nanobind|\.def.*from_string)"tests/unittest/bindings/test_executor_bindings.py (1)
2239-2239
: Verify API consistency between pybind11 and nanobind bindings.The change from
datetime.timedelta(seconds=1)
to1000
suggests different API signatures between the two binding types. This contradicts the past review comment about maintaining consistent behavior between pybind11 and nanobind bindings.The relevant code snippet shows
get_latest_events(timeout_ms: Optional[float] = 0)
which expects numeric milliseconds, supporting the numeric value. However, this creates an inconsistency where the same test code cannot work with both binding types.Consider either:
- Updating the pybind11 bindings to accept numeric milliseconds consistently
- Making the nanobind bindings accept timedelta objects like pybind11
- Adding a conditional check similar to the test skipping pattern
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)
104-104
: Consider maintaining consistency with pybind11 parameter namingThe parameter name
medusa_buffers
appears to differ from the C++ naming convention (medusaBuffers
). This was previously flagged in the pybind11 bindings.Also applies to: 111-111, 122-122, 129-129
cpp/tensorrt_llm/nanobind/executor/executor.cpp (4)
55-56
: Address the TODO comment with specific improvements neededThe TODO comment should specify what improvements are planned for FP8 and BF16 support.
77-80
: Avoid usingdefault:
case per coding guidelinesConsider explicitly handling all enum cases instead of using a default case.
116-116
: Reserve vector capacity before populatingAdd
dims.reserve(array.ndim());
to avoid potential reallocations during the loop.
153-158
: Use structured bindings for clarityReplace the loop variable with structured bindings as previously suggested:
for (auto const& [rawName, rawArray] : managedWeights.value())cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (1)
178-178
: Consider using a more descriptive name for the vector binding.As suggested previously, "RequestVector" would be more descriptive than the generic bind_vector naming.
cpp/tensorrt_llm/nanobind/bindings.cpp (2)
419-419
: Use descriptive name for vector binding.As previously suggested, "SamplingConfigVector" would be more descriptive.
365-365
: Replace assert with proper exception handling and avoid magic numbers.As previously noted:
- The magic number 19 should be a named constant
assert
is removed in release builds and callsabort()
, making it inappropriate for input validation- assert(t.size() == 19); + constexpr size_t SAMPLING_CONFIG_STATE_SIZE = 19; + if (t.size() != SAMPLING_CONFIG_STATE_SIZE) { + throw std::runtime_error("Invalid SamplingConfig state size: expected " + + std::to_string(SAMPLING_CONFIG_STATE_SIZE) + + ", got " + std::to_string(t.size())); + }
🧹 Nitpick comments (14)
cpp/tensorrt_llm/nanobind/executor/executorConfig.h (2)
20-23
: Globalnamespace nb
alias in a public header may leak into all translation unitsPlacing
namespace nb = nanobind;
at the top level of a header exposes the alias to every TU that includes this file, increasing the risk of ODR or name-collision issues when different third-party projects define the same shortcut.
Consider moving the alias inside thetensorrt_llm::nanobind
namespace (or omitting it and writingnanobind::module_
explicitly in headers) and keeping the global alias only in the.cpp
implementation files.
27-28
: Comment wording is misleading
// Register bindings for executor API.
does not reflect that this header is specifically for config bindings. Rename to avoid confusion with the higher-level executor bindings already present elsewhere.cpp/tensorrt_llm/nanobind/executor/request.h (1)
26-27
: Consider aligning naming with the rest of the codebaseMost other nanobind headers expose an
initBindings
function. UsinginitRequestBindings
is perfectly fine, but the inconsistency may surprise contributors skimming the source. Either:
- Rename to
initBindings
, or- Add a brief comment explaining the naming rationale (e.g., because executor exports multiple sub-modules).
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1)
24-29
: Missing namespace-closing commentMost headers in this repo end the namespace block with a comment (
} // namespace …
). Adding it here would keep style consistent.cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1)
24-29
: Style consistencyOther nanobind headers wrap the closing brace with a comment. Add
} // namespace tensorrt_llm::nanobind::testing
for uniformity.cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)
25-30
: Prefer free function over static method for module initialisationEvery other component exposes a free
initBindings
function; introducing aUserBufferBindings
class solely to host a static method adds an extra indirection without clear benefit. A free function keeps the API surface consistent and avoids an otherwise empty class.-namespace tensorrt_llm::kernels::userbuffers -{ -class UserBufferBindings -{ -public: - static void initBindings(nb::module_& m); -}; -} // namespace tensorrt_llm::kernels::userbuffers +namespace tensorrt_llm::kernels::userbuffers +{ +void initBindings(nb::module_& m); +} // namespace tensorrt_llm::kernels::userbufferscpp/tensorrt_llm/nanobind/CMakeLists.txt (1)
49-49
: Inconsistent compile definition name for nanobind module.The compile definition
PYBIND11_DETAILED_ERROR_MESSAGES=1
references pybind11 but this is a nanobind module. Consider renaming toNANOBIND_DETAILED_ERROR_MESSAGES=1
or verify if this is intentional for compatibility.- PYBIND11_DETAILED_ERROR_MESSAGES=1) + NANOBIND_DETAILED_ERROR_MESSAGES=1)cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1)
70-131
: Consider using a builder pattern for improved maintainabilityThe
toTrtLlm()
method correctly handles all 49 parameters, but this large parameter count could benefit from a builder pattern to improve readability and maintainability. This would also make it easier to add new parameters in the future.cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (1)
346-369
: Consider extracting repeated tensor conversion patternThe pattern of converting
ITensor::SharedPtr
toat::Tensor
wrapped instd::optional
is repeated in multiple methods. Consider extracting this into a helper function to reduce code duplication.// Add helper function at the top of the file static std::optional<at::Tensor> tensor_to_optional_torch(tensorrt_llm::runtime::ITensor::SharedPtr const& tensor) { if (tensor) { return tr::Torch::tensor(tensor); } return std::nullopt; } // Then simplify the methods: .def("get_block_pool_pointers", [](tbk::BaseKVCacheManager& self) { return tensor_to_optional_torch(self.getBlockPoolPointers()); }) .def("get_layer_to_pool_mapping", [](tbk::BaseKVCacheManager& self) { return tensor_to_optional_torch(self.getLayerToPoolMapping()); })cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)
608-609
: Consider aligning property name with underlying methods.The property "mm_embedding_offloading" maps to methods
getPromptTableOffloading/setPromptTableOffloading
. This naming inconsistency could confuse users. Consider using "prompt_table_offloading" for the property name to match the C++ API.cpp/tensorrt_llm/nanobind/bindings.cpp (1)
81-81
: Consider keeping leak warnings enabled during development.Disabling leak warnings with
nb::set_leak_warnings(false)
could hide memory management issues. Consider enabling them in debug builds or documenting why they need to be disabled.+#ifdef NDEBUG nb::set_leak_warnings(false); +#endifcpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)
42-43
: Remove commented code for cleaner implementation.The commented using declaration is not needed and should be removed.
- // using BaseCacheTransceiver::BaseCacheTransceiver; // Inherit constructors NB_TRAMPOLINE(tb::BaseCacheTransceiver, 6);
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)
192-250
: Comprehensive GenericLlmRequest bindings with proper optional handling.The bindings thoroughly expose the interface with appropriate property access. The lambda functions correctly handle optional types and pointer conversions.
Consider extracting the repeated pattern for optional vector properties into a helper template to reduce code duplication:
template<typename T, typename Getter> auto makeOptionalVectorProperty(Getter getter) { return [getter](GenLlmReq& self) { std::optional<T> result = std::nullopt; if (auto opt = (self.*getter)()) { result = *opt.value(); } return result; }; } // Usage: .def_prop_ro("multimodal_hashes", makeOptionalVectorProperty<std::vector<std::vector<GenLlmReq::SizeType32>>>(&GenLlmReq::getMultimodalHashes))
253-355
: Complex but well-structured constructor binding.The 49-parameter constructor is handled well with the
makeOptionalTensor
helper reducing duplication. All parameters have appropriate defaults.Given the complexity, consider documenting the parameter groupings or creating builder methods in the future to improve usability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (46)
cpp/CMakeLists.txt
(2 hunks)cpp/tensorrt_llm/nanobind/CMakeLists.txt
(2 hunks)cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/bindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h
(1 hunks)cpp/tensorrt_llm/nanobind/bindings.cpp
(2 hunks)cpp/tensorrt_llm/nanobind/common/bindTypes.h
(1 hunks)cpp/tensorrt_llm/nanobind/common/customCasters.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/bindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executor.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executor.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executorConfig.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/request.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/request.h
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/bindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/moeBindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h
(1 hunks)cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/userbuffers/bindings.h
(1 hunks)cpp/tensorrt_llm/pybind/bindings.cpp
(1 hunks)cpp/tensorrt_llm/pybind/executor/bindings.cpp
(1 hunks)examples/models/core/llama/summarize_long.py
(1 hunks)examples/models/core/qwen2audio/run.py
(1 hunks)examples/models/core/qwenvl/run.py
(1 hunks)jenkins/Build.groovy
(4 hunks)jenkins/L0_Test.groovy
(3 hunks)tensorrt_llm/builder.py
(1 hunks)tensorrt_llm/commands/build.py
(2 hunks)tensorrt_llm/runtime/model_runner.py
(1 hunks)tests/integration/test_lists/test-db/l0_a10.yml
(1 hunks)tests/unittest/bindings/test_bindings_ut.py
(4 hunks)tests/unittest/bindings/test_executor_bindings.py
(9 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (11)
examples/models/core/qwen2audio/run.py (2)
tensorrt_llm/runtime/generation.py (1)
kv_cache_type
(1169-1170)tests/integration/defs/cpp/test_multi_gpu.py (1)
KVCacheType
(23-27)
cpp/tensorrt_llm/nanobind/executor/executorConfig.h (5)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)
m
(28-28)m
(37-37)cpp/tensorrt_llm/nanobind/executor/executor.h (1)
m
(123-123)cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)
m
(28-28)
examples/models/core/qwenvl/run.py (1)
tests/integration/defs/cpp/test_multi_gpu.py (1)
KVCacheType
(23-27)
tensorrt_llm/commands/build.py (1)
tests/integration/defs/cpp/test_multi_gpu.py (1)
KVCacheType
(23-27)
cpp/tensorrt_llm/nanobind/executor/request.h (5)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)
m
(28-28)m
(37-37)cpp/tensorrt_llm/nanobind/executor/executor.h (1)
m
(123-123)cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)
m
(28-28)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (5)
cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)
m
(28-28)m
(37-37)cpp/tensorrt_llm/nanobind/executor/executor.h (1)
m
(123-123)cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)
m
(28-28)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (2)
cpp/tensorrt_llm/testing/modelSpec.h (1)
ModelSpec
(53-59)tests/integration/defs/cpp/test_multi_gpu.py (1)
KVCacheType
(23-27)
examples/models/core/llama/summarize_long.py (2)
tensorrt_llm/runtime/generation.py (1)
kv_cache_type
(1169-1170)tests/integration/defs/cpp/test_multi_gpu.py (1)
KVCacheType
(23-27)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (5)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (4)
initBindings
(290-448)initBindings
(290-290)initBindings
(450-477)initBindings
(450-450)cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)
initBindings
(55-523)initBindings
(55-55)cpp/tensorrt_llm/nanobind/executor/bindings.cpp (2)
initBindings
(48-261)initBindings
(48-48)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
m
(27-27)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)
m
(28-28)m
(37-37)
tests/unittest/bindings/test_executor_bindings.py (2)
cpp/include/tensorrt_llm/batch_manager/llmRequest.h (2)
reason
(1661-1664)reason
(1661-1661)tensorrt_llm/_torch/pyexecutor/resource_manager.py (1)
get_latest_events
(562-563)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (3)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)
initBindings
(55-523)initBindings
(55-55)cpp/tensorrt_llm/nanobind/common/customCasters.h (2)
tensor
(321-342)tensor
(321-321)cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (2)
from_torch
(43-50)from_torch
(43-43)
🔇 Additional comments (83)
examples/models/core/qwen2audio/run.py (1)
125-126
: LGTM! Consistent enum parsing improvement.The change from direct constructor to
KVCacheType.from_string()
improves consistency with the broader codebase refactoring. This factory method approach provides more robust parsing of string values to enum values and aligns with similar updates in other model examples and core runtime components.cpp/CMakeLists.txt (2)
201-201
: LGTM! Extends pybind11 support for nanobind migration.The addition of
BUILD_DEEP_EP
condition appropriately extends pybind11 subdirectory inclusion to support the nanobind migration while maintaining backward compatibility.
220-220
: LGTM! Consistent header inclusion logic.The parallel change to include pybind11 headers when
BUILD_DEEP_EP
is enabled maintains consistency with the subdirectory inclusion logic above and supports the nanobind migration requirements.examples/models/core/llama/summarize_long.py (1)
100-100
: LGTM! Consistent enum parsing standardization.The change to use
KVCacheType.from_string()
follows the same pattern as other model examples and improves the robustness of enum parsing from configuration strings.tensorrt_llm/runtime/model_runner.py (1)
89-89
: LGTM! Core runtime enum parsing standardization.The change to use
KVCacheType.from_string()
in the core runtime module is particularly important for maintaining consistency across the codebase. This factory method approach provides more robust parsing of configuration strings to enum values and aligns with the nanobind migration effort.cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (1)
1-30
: LGTM! Well-structured header for MOE bindings.The header file follows established patterns in the codebase:
- Proper Apache 2.0 license header
- Standard
#pragma once
header guard- Consistent namespace structure (
tensorrt_llm::nanobind::runtime
)- Function signature matches the pattern used in other binding files (
initBindings(nb::module_& m)
)This addition properly integrates into the nanobind migration effort and follows the project's coding standards.
cpp/tensorrt_llm/nanobind/executor/bindings.h (1)
1-29
: LGTM! Well-structured nanobind bindings header.The header file follows standard practices with proper license headers, include guards, namespace organization, and a clean function declaration for initializing executor bindings.
examples/models/core/qwenvl/run.py (1)
121-122
: LGTM! Consistent enum parsing improvement.The change from direct
KVCacheType
construction toKVCacheType.from_string()
aligns with the broader codebase migration for more robust enum parsing. This improves consistency and error handling.tensorrt_llm/commands/build.py (2)
41-56
: LGTM! Excellent enum argument parser implementation.The
enum_type
helper function is well-designed with proper error handling, clear error messages, and support for both enum instances and string values. This improves the robustness of command-line argument parsing.
151-151
: LGTM! Consistent with the enum parsing migration.The change to use
enum_type(KVCacheType)
aligns with the broader codebase migration for improved enum argument parsing consistency.tests/integration/test_lists/test-db/l0_a10.yml (1)
193-207
: LGTM! Appropriate test configuration for nanobind bindings.The new
l0_a10_nanobind
test entry is well-configured for testing nanobind bindings on A10 hardware. The conditions, wildcards, and test selection are appropriate for validating the nanobind migration.cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1)
1-29
: LGTM! Consistent nanobind bindings header structure.The header file follows the same excellent practices as other nanobind headers in the codebase, with proper license headers, include guards, namespace organization, and a clean function declaration. The consistency across nanobind headers is commendable.
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
1-30
: LGTM! Well-structured nanobind binding header.The header follows consistent patterns with other nanobind binding files in the project, including proper license header, include guards, namespace structure, and the standard
initBindings
method signature.cpp/tensorrt_llm/nanobind/runtime/bindings.h (1)
1-31
: LGTM! Consistent nanobind binding header structure.The header follows the established patterns for nanobind binding files. The presence of both
initBindings
andinitBindingsEarly
functions suggests proper handling of initialization order dependencies in the runtime bindings.cpp/tensorrt_llm/nanobind/batch_manager/buffers.h (1)
1-30
: LGTM! Consistent nanobind binding header structure.The header follows the established patterns for nanobind binding files with proper license header, include guards, namespace structure, and the standard
initBindings
method signature.tensorrt_llm/builder.py (1)
575-576
: LGTM! Correct adaptation to the new KVCacheType API.The change from direct constructor call to
KVCacheType.from_string()
is consistent with the binding API change and aligns with the coordinated updates across the codebase mentioned in the AI summary.cpp/tensorrt_llm/pybind/executor/bindings.cpp (1)
247-257
: LGTM! API improvement with proper timeout handling.The change from direct method pointer binding to a lambda wrapper improves the API by:
- Making timeout units explicit (
timeout_ms
instead oftimeout
)- Properly handling optional parameter conversion from
double
tostd::chrono::milliseconds
- Converting floating-point milliseconds to integer as required by
std::chrono::milliseconds
The implementation correctly checks for the presence of
timeout_ms
and handles both cases appropriately.cpp/tensorrt_llm/nanobind/CMakeLists.txt (4)
6-22
: LGTM! Comprehensive source files expansion for nanobind migration.The expanded source files list properly includes all the new nanobind binding modules across batch_manager, executor, runtime, testing, and userbuffers directories. This aligns with the comprehensive nanobind migration described in the PR.
33-36
: LGTM! Proper conditional linking for NVSHMEM.The conditional NVSHMEM linking is correctly implemented, ensuring the libraries are only linked when the feature is enabled.
46-46
: LGTM! CUDA NVML library addition.Adding
${CUDA_NVML_LIB}
to the target link libraries is appropriate for the expanded nanobind functionality.
56-56
: LGTM! Enhanced linker rpath configuration.The addition of CUDA toolkit stubs to the rpath is appropriate for the expanded nanobind functionality that may require CUDA runtime libraries.
tests/unittest/bindings/test_bindings_ut.py (4)
8-8
: LGTM! Proper pytest import addition.Adding pytest import is necessary for the
@pytest.mark.skipif
decorators used in the test functions.
313-314
: LGTM! Appropriate test skipping for nanobind migration.The conditional test skipping is well-implemented:
- Uses proper pytest.mark.skipif syntax
- Checks the binding_type correctly
- Provides clear reason for skipping
- Allows CI to pass during the incomplete migration
This approach is consistent with the PR's goal of enabling nanobind while maintaining test stability.
424-425
: LGTM! Consistent test skipping pattern.The skipif decorator follows the same pattern as other nanobind-incompatible tests, maintaining consistency across the test suite.
505-506
: LGTM! Consistent test skipping for pickle functionality.The test skipping is appropriately applied to pickle-related functionality that may have different behavior between pybind11 and nanobind.
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (4)
1-16
: LGTM! Proper copyright header with updated date range.The copyright header is correctly formatted with the updated 2022-2025 date range and proper SPDX license identifiers.
18-21
: LGTM! Proper nanobind integration.The header correctly includes nanobind and establishes the namespace alias following nanobind conventions.
23-30
: LGTM! Well-organized KV cache manager bindings class.The
KVCacheManagerBindings
class is properly structured:
- Placed in the appropriate nested namespace
- Uses static method pattern for binding initialization
- Clear and descriptive class name
32-39
: LGTM! Consistent PEFT cache manager bindings structure.The
BasePeftCacheManagerBindings
class follows the same pattern asKVCacheManagerBindings
:
- Proper namespace organization
- Static initialization method
- Consistent naming convention
jenkins/L0_Test.groovy (4)
67-68
: LGTM: Nanobind configuration constant follows established pattern.The
NANOBIND_CONFIG
constant is properly defined with the@Field
annotation, consistent with other build configuration constants in the file.
77-77
: LGTM: Nanobind build configuration properly added.The nanobind configuration entry in
BUILD_CONFIGS
follows the established pattern with an appropriate tarball name.
1718-1718
: LGTM: A10-Nanobind test job properly configured.The new test job entry follows the established pattern with appropriate platform, test list, and GPU configuration for nanobind testing.
1795-1797
: LGTM: Nanobind configuration selection logic is correct.The conditional logic to set
NANOBIND_CONFIG
for jobs containing "Nanobind" follows the established pattern used for other build configurations.cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (3)
1-29
: LGTM: Includes and namespace setup are correct.The file includes the necessary headers and properly sets up namespace aliases and using declarations for the nanobind bindings.
35-44
: LGTM: Enum bindings are properly implemented.Both
QuantMethod
andOutputContentType
enums are correctly bound with appropriate documentation strings and thenb::is_arithmetic()
flag for arithmetic operations.
46-84
: LGTM: ModelSpec class binding is comprehensive and correct.The class binding properly exposes all member functions with appropriate return value policies. The use of
nb::rv_policy::reference_internal
for methods returning references and the__copy__
method for Python-side copying are correctly implemented.cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (3)
1-26
: LGTM: Includes and namespace setup are correct.The file includes the necessary headers and properly sets up namespace aliases for the nanobind bindings.
32-36
: LGTM: UBBuffer class binding is safely implemented.The binding properly exposes buffer metadata with read-only properties and safely converts the raw pointer address to
intptr_t
for Python consumption.
38-45
: LGTM: User buffer management functions are properly bound.The function bindings correctly handle pointer conversions and use appropriate lambda wrappers for type safety. The symmetric conversion between
intptr_t
andvoid*
for addresses is handled correctly.tests/unittest/bindings/test_executor_bindings.py (8)
17-17
: LGTM: Import necessary for nanobind conditional logic.The import enables checking the binding type to conditionally skip tests, which is appropriate for the migration strategy.
488-489
: LGTM: Appropriate test skipping for nanobind migration.The conditional skip decorator correctly excludes this test from running under nanobind bindings during the migration phase.
694-695
: LGTM: Consistent test skipping pattern.The decorator maintains consistency with the nanobind migration strategy.
1120-1121
: LGTM: Consistent test skipping pattern.The decorator maintains consistency with the nanobind migration strategy.
1159-1160
: LGTM: Consistent test skipping pattern.The decorator maintains consistency with the nanobind migration strategy.
1507-1508
: LGTM: Consistent test skipping pattern.The decorator maintains consistency with the nanobind migration strategy.
1881-1882
: LGTM: Consistent test skipping pattern.The decorator maintains consistency with the nanobind migration strategy.
2157-2158
: LGTM: Consistent test skipping pattern.The decorator maintains consistency with the nanobind migration strategy.
jenkins/Build.groovy (1)
50-55
: Build configuration for nanobind bindings looks correctThe new nanobind build configurations follow the established pattern and correctly specify:
- The
--binding_type nanobind
parameter to enable nanobind builds- Distinct tarball names to differentiate from pybind11 builds
- Appropriate wheel architectures for each platform
Also applies to: 65-69, 85-89, 540-541
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)
55-178
: Algorithm bindings implementation looks goodThe nanobind bindings properly expose all batch management algorithm classes with:
- Appropriate constructors with default arguments
- Callable operators wrapped in lambdas for tensor type conversions
- Consistent
name
method implementationscpp/tensorrt_llm/nanobind/common/bindTypes.h (1)
24-100
: Well-implemented container bindings with comprehensive Python interfaceThe
bindList
andbindSet
template functions provide excellent Python-like interfaces for STL containers:
- Proper bounds checking with index_error exceptions
- Iterator support with appropriate lifetime management
- Pickling support for sets enables serialization
- The use of
nb::overload_cast
for disambiguating theerase
method is correctcpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp (2)
88-90
: Pointer handling in property is correctThe use of
.get()
to return a raw pointer without transferring ownership is appropriate here, as confirmed in the previous review discussion. The setter properly takes ownership withreset()
.
41-109
: Buffer bindings are comprehensive and well-structuredBoth
TransformerBuffers
andRuntimeBuffers
bindings expose all necessary:
- Constructors with appropriate parameters
- Methods for reshaping, copying, and buffer management
- Read-write properties for all public data members
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1)
43-68
: Helper functions correctly handle type conversionsThe
from_torch
helper andcallbackAdapter
properly manage:
- Optional tensor conversions between PyTorch and internal formats
- Callback wrapping with appropriate tensor type conversions
- CUDA stream unwrapping for Python callbacks
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)
189-198
: Good GIL handling in shutdown methodThe GIL release before calling
mExecutor->shutdown()
is correct and necessary to prevent deadlocks. The detailed comment explaining the rationale is helpful.cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (2)
33-47
: Good input validation in wrapper functionsThe validation checks ensuring
expert_count
matches theexpert_load_factor
vector size prevent potential buffer overruns and provide clear error messages.
58-64
: Well-formatted repr method for debuggingThe
__repr__
implementation provides a clear, informative string representation of the MoeWeight object, which is helpful for debugging.cpp/tensorrt_llm/nanobind/executor/request.cpp (1)
556-583
: Fix state deserialization mismatch in Request.setstateThe deserialization has a parameter mismatch. The state tuple contains 33 elements, but the Request constructor call includes a hardcoded
1
betweenstate[29]
andstate[30]
, suggesting either:
- A missing field in the serialized state
- An incorrect parameter mapping
This could cause incorrect object restoration from pickled data.
auto requestSetstate = [](tle::Request& request, nb::tuple const& state) { - if (state.size() != 33) + if (state.size() != 34) // Update expected size if adding missing field { throw std::runtime_error("Invalid Request state!"); } new (&request) tle::Request(nb::cast<VecTokens>(state[0]), nb::cast<SizeType32>(state[1]), // ... other parameters ... nb::cast<std::optional<tle::ContextPhaseParams>>(state[26]), nb::cast<std::optional<tle::Tensor>>(state[27]), nb::cast<std::optional<SizeType32>>(state[28]), - nb::cast<std::optional<tle::Tensor>>(state[29]), 1, nb::cast<std::optional<tle::EagleConfig>>(state[30]), + nb::cast<std::optional<tle::Tensor>>(state[29]), nb::cast<SizeType32>(state[30]), // Or retrieve from state + nb::cast<std::optional<tle::EagleConfig>>(state[31]), - nb::cast<std::optional<tle::Tensor>>(state[31]), + nb::cast<std::optional<tle::Tensor>>(state[32]), - nb::cast<std::optional<tle::GuidedDecodingParams>>(state[32])); + nb::cast<std::optional<tle::GuidedDecodingParams>>(state[33])); };Likely an incorrect or invalid review comment.
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (5)
1-60
: LGTM! Well-organized header includes.The file structure follows good practices with proper copyright header and logically grouped includes.
61-98
: Trampoline class correctly implements virtual methods.The PyIGptDecoder trampoline class properly exposes all 5 virtual methods from the IGptDecoder interface using nanobind's override mechanism.
307-319
: Verify proper handling of raw pointers in lamport functions.The functions use
intptr_t
andreinterpret_cast
for raw pointer manipulation. Ensure that:
- The pointer sizes are validated
- The buffer pointers are valid before dereferencing
- Consider adding null checks for safety
323-327
: Validate stream pointer extraction from PyTorch object.The code assumes the PyTorch stream object has a
cuda_stream
attribute that can be cast toint64_t
. This is fragile and could break with PyTorch API changes.Consider adding error handling:
- auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream")); - cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr); + if (!nb::hasattr(py_stream, "cuda_stream")) { + throw std::runtime_error("Stream object missing cuda_stream attribute"); + } + auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream")); + cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
364-388
: Clean implementation of SpeculativeDecodingMode bindings.The static factory methods and read-only properties provide a well-structured API for mode configuration.
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (2)
1-46
: Headers and namespace setup look good.Proper organization of includes and namespace aliases.
445-514
: Excellent implementation of ExecutorConfig serialization with dynamic attributes.The implementation correctly handles both C++ state and Python dynamic attributes, ensuring complete serialization/deserialization support. The use of
nb::inst_mark_ready
after restoration is crucial for proper nanobind instance management.cpp/tensorrt_llm/nanobind/bindings.cpp (1)
123-497
: Well-structured module organization with comprehensive bindings.The file provides extensive bindings for TensorRT-LLM components with good organization through submodules. The integration of various binding modules is clean and maintainable.
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (4)
1-35
: LGTM!The copyright header and includes are properly structured for nanobind bindings.
79-85
: Correct implementation of abstract base class bindings.The BaseCacheTransceiver binding properly uses the trampoline class and exposes all pure virtual methods with appropriate Python naming conventions.
87-95
: Enum bindings are correctly implemented.Both enum bindings follow the standard nanobind pattern and properly expose the enum values to Python.
97-110
: Well-structured class bindings with proper parameter documentation.The CacheTransceiver and CacheTransBufferManager bindings correctly specify inheritance, use named arguments for clarity, and properly handle optional parameters.
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (6)
1-46
: LGTM!The file header and template function are properly structured. The template approach for KVCacheEventDiff is a good design choice for reusability.
48-54
: Proper module initialization and enum binding.The module version is correctly set and the ModelType enum follows standard nanobind patterns.
56-88
: Well-implemented DecodingMode binding with proper serialization support.The custom pickle support correctly handles state serialization/deserialization with appropriate error checking. The use of placement new in setstate is the correct approach for reconstructing the object.
90-209
: Comprehensive statistics bindings with JSON serialization support.All statistics classes are properly bound with read-write properties and JSON serialization methods where appropriate. The implementation is consistent and follows nanobind best practices.
210-256
: Well-designed KV cache submodule with proper timeout handling.The submodule structure is clean, and the
get_latest_events
method correctly handles optional timeout conversion from Python float (milliseconds) to C++ chrono types.
258-261
: Good modular organization of bindings.Separating request, config, and executor bindings into their own initialization functions promotes better code organization and maintainability.
cpp/tensorrt_llm/nanobind/executor/executor.h (4)
1-45
: Well-designed header with flexible initialization options.The multiple constructors provide good flexibility for different initialization scenarios, properly handling model paths, engine buffers, and encoder/decoder configurations.
46-59
: Proper context manager implementation and delegation pattern.The context manager support enables clean Python
with
statement usage, and the delegation to the underlying executor is correctly implemented.
61-86
: Excellent GIL management for blocking operations.All
awaitResponses
methods correctly release the GIL before blocking operations, which is crucial for proper Python threading behavior and prevents deadlocks.
88-127
: Clean interface with proper resource management.The remaining methods correctly delegate to the underlying executor, use
[[nodiscard]]
appropriately, and manage the executor lifetime withstd::unique_ptr
.cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (4)
55-81
: Proper exception handling setup for nanobind.The exception registration and translator implementation correctly handle custom exceptions and follow nanobind best practices with no-capture lambdas.
356-428
: Well-structured bindings for request methods and buffer management classes.The methods are properly exposed, and the tensor conversion for
new_output_tokens_host
correctly handles the Python interface requirements.
429-443
: Safe utility function with proper validation.The function correctly validates input sizes and provides clear documentation about the expected token ordering.
445-522
: Complex but correct implementation of decoding batch input creation.The function properly handles context/generation request separation, beam width tiling for context requests, and constructs the decoding input with appropriate logits views and slot management.
1a3e07c
to
44a0288
Compare
/bot run |
PR_Github #12115 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
cpp/tensorrt_llm/nanobind/common/customCasters.h (1)
294-305
: Remove unsafe const_cast in ITensor::SharedConstPtr conversionThe reinterpret_cast from
SharedConstPtr
toSharedPtr&
violates const-correctness and could lead to undefined behavior if the underlying tensor is modified.static handle from_cpp( tensorrt_llm::runtime::ITensor::SharedConstPtr const& src, rv_policy /* policy */, cleanup_list* /* cleanup */) { if (src == nullptr) { return none().release(); } - return THPVariable_Wrap(tensorrt_llm::runtime::Torch::tensor( - reinterpret_cast<tensorrt_llm::runtime::ITensor::SharedPtr const&>(src))); + // Create a const tensor view or copy the data to maintain const-correctness + auto const_tensor = tensorrt_llm::runtime::Torch::tensor(src); + return THPVariable_Wrap(const_tensor); }cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (2)
178-178
: Consider renaming for consistency.The vector binding
RequestVector
could be renamed toRequestVector
for better consistency with naming conventions.
307-329
: Validate stream pointer casting for safety.The unchecked cast from Python object to CUDA stream pointer could be unsafe. Consider adding validation to prevent undefined behavior from invalid pointer casts.
Consider adding validation like:
- [](int64_t delay_micro_secs, nb::object py_stream) - { - // Get the raw stream handle from PyTorch stream object - auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream")); - cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr); + [](int64_t delay_micro_secs, nb::object py_stream) + { + // Get the raw stream handle from PyTorch stream object + auto stream_ptr = nb::cast<int64_t>(py_stream.attr("cuda_stream")); + if (!stream_ptr) { + throw std::runtime_error("Invalid stream pointer"); + } + cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);cpp/tensorrt_llm/nanobind/bindings.cpp (3)
117-117
: Add validation for stream pointer casting.The unchecked cast from Python object to CUDA stream pointer is unsafe and could lead to undefined behavior.
Consider adding validation:
- cudaStream_t stream = reinterpret_cast<cudaStream_t>(nb::cast<uintptr_t>(py_stream)); + auto stream_ptr = nb::cast<uintptr_t>(py_stream); + if (!stream_ptr) { + throw std::runtime_error("Invalid stream pointer"); + } + cudaStream_t stream = reinterpret_cast<cudaStream_t>(stream_ptr);
355-388
: Consider replacing magic number with named constant.The magic number 19 in the assertion could be problematic if the tuple size changes. Consider using a named constant.
+ static constexpr size_t SAMPLING_CONFIG_TUPLE_SIZE = 19; auto SamplingConfigSetState = [](tr::SamplingConfig& self, nb::tuple t) -> tr::SamplingConfig { - assert(t.size() == 19); + assert(t.size() == SAMPLING_CONFIG_TUPLE_SIZE);
365-365
: Consider using exception instead of assert for validation.Since this handles unvalidated input data and
assert
may be removed in release builds, throwing an exception would be more appropriate.- assert(t.size() == 19); + if (t.size() != 19) { + throw std::runtime_error("Invalid tuple size for SamplingConfig deserialization"); + }
🧹 Nitpick comments (4)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1)
253-355
: Consider using a builder pattern or configuration object for the complex constructor.The LlmRequest constructor has 49 parameters, which significantly impacts maintainability and usability. Consider refactoring to use a builder pattern or a configuration object to improve the API.
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)
93-95
: Consider moving AttentionType enum binding to executor module.The
AttentionType
enum fromexecutor::kv_cache
namespace is being bound in the batch manager module. This creates a cross-module dependency that might be better organized by binding this enum in the executor bindings instead.cpp/tensorrt_llm/nanobind/executor/bindings.cpp (1)
246-256
: Consider adding timeout validation.The
get_latest_events
method converts a float timeout to milliseconds without validating that the value is non-negative. Consider adding validation to prevent potential issues with negative timeouts."get_latest_events", [](tle::KVCacheEventManager& self, std::optional<double> timeout_ms = std::nullopt) { if (timeout_ms) { + if (*timeout_ms < 0) + { + throw std::invalid_argument("timeout_ms must be non-negative"); + } return self.getLatestEvents(std::chrono::milliseconds(static_cast<int64_t>(*timeout_ms))); } return self.getLatestEvents(std::nullopt); }, nb::arg("timeout_ms") = std::nullopt);cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)
172-178
: UpdateDecoderBuffers binding uses string literal instead of class name.The binding uses
"UpdateDecoderBuffers"
as a string literal instead of the class's static name member like other bindings. For consistency with the other bindings in this file, consider using the class name member if available.- nb::class_<UpdateDecoderBuffers>(m, "UpdateDecoderBuffers") + nb::class_<UpdateDecoderBuffers>(m, UpdateDecoderBuffers::name)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (45)
cpp/CMakeLists.txt
(2 hunks)cpp/tensorrt_llm/nanobind/CMakeLists.txt
(2 hunks)cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/bindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h
(1 hunks)cpp/tensorrt_llm/nanobind/bindings.cpp
(2 hunks)cpp/tensorrt_llm/nanobind/common/bindTypes.h
(1 hunks)cpp/tensorrt_llm/nanobind/common/customCasters.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/bindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executor.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executor.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/executorConfig.h
(1 hunks)cpp/tensorrt_llm/nanobind/executor/request.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/executor/request.h
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/bindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/runtime/moeBindings.h
(1 hunks)cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h
(1 hunks)cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp
(1 hunks)cpp/tensorrt_llm/nanobind/userbuffers/bindings.h
(1 hunks)cpp/tensorrt_llm/pybind/bindings.cpp
(1 hunks)cpp/tensorrt_llm/pybind/executor/bindings.cpp
(1 hunks)examples/models/core/llama/summarize_long.py
(1 hunks)examples/models/core/qwen2audio/run.py
(1 hunks)examples/models/core/qwenvl/run.py
(1 hunks)jenkins/Build.groovy
(4 hunks)jenkins/L0_Test.groovy
(3 hunks)tensorrt_llm/builder.py
(1 hunks)tensorrt_llm/commands/build.py
(2 hunks)tensorrt_llm/runtime/model_runner.py
(1 hunks)tests/integration/test_lists/test-db/l0_a10.yml
(1 hunks)tests/unittest/bindings/test_bindings_ut.py
(4 hunks)
✅ Files skipped from review due to trivial changes (2)
- cpp/tensorrt_llm/nanobind/common/bindTypes.h
- cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h
🚧 Files skipped from review as they are similar to previous changes (35)
- examples/models/core/qwen2audio/run.py
- examples/models/core/qwenvl/run.py
- tensorrt_llm/runtime/model_runner.py
- examples/models/core/llama/summarize_long.py
- tensorrt_llm/commands/build.py
- cpp/tensorrt_llm/nanobind/runtime/moeBindings.h
- cpp/CMakeLists.txt
- cpp/tensorrt_llm/nanobind/executor/bindings.h
- cpp/tensorrt_llm/nanobind/executor/request.h
- cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h
- cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h
- tests/integration/test_lists/test-db/l0_a10.yml
- cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h
- cpp/tensorrt_llm/nanobind/userbuffers/bindings.h
- cpp/tensorrt_llm/nanobind/batch_manager/bindings.h
- cpp/tensorrt_llm/nanobind/batch_manager/buffers.h
- cpp/tensorrt_llm/nanobind/runtime/bindings.h
- cpp/tensorrt_llm/pybind/executor/bindings.cpp
- cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h
- cpp/tensorrt_llm/pybind/bindings.cpp
- tensorrt_llm/builder.py
- tests/unittest/bindings/test_bindings_ut.py
- cpp/tensorrt_llm/nanobind/CMakeLists.txt
- cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp
- cpp/tensorrt_llm/nanobind/executor/executorConfig.h
- cpp/tensorrt_llm/nanobind/batch_manager/buffers.cpp
- cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp
- cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp
- jenkins/Build.groovy
- cpp/tensorrt_llm/nanobind/executor/executor.cpp
- cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp
- cpp/tensorrt_llm/nanobind/executor/executor.h
- cpp/tensorrt_llm/nanobind/executor/request.cpp
- jenkins/L0_Test.groovy
- cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
🧰 Additional context used
🧬 Code Graph Analysis (1)
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (6)
cpp/tensorrt_llm/nanobind/executor/executor.h (1)
m
(123-123)cpp/tensorrt_llm/nanobind/executor/request.h (1)
initRequestBindings
(27-27)cpp/tensorrt_llm/nanobind/executor/request.cpp (2)
initRequestBindings
(53-933)initRequestBindings
(53-53)cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (2)
initConfigBindings
(46-614)initConfigBindings
(46-46)cpp/tensorrt_llm/nanobind/executor/bindings.h (1)
initBindings
(27-27)cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)
initBindings
(200-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (25)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (6)
1-51
: LGTM! Well-structured includes and namespace setup.The includes are comprehensive and properly organized. The namespace aliases enhance readability throughout the file.
54-61
: LGTM! Clean helper function for optional tensor conversion.The helper function properly handles the optional conversion from PyTorch tensors to TensorRT-LLM tensors.
63-237
: LGTM! Comprehensive trampoline class for KV cache manager.The
PyKvCacheManager
trampoline class properly implements all 28 virtual methods from the base class, enabling Python subclassing. The use ofNB_OVERRIDE_PURE
is appropriate for pure virtual methods.
240-287
: LGTM! Well-implemented trampoline class for PEFT cache manager.The
PyBasePeftCacheManager
trampoline class correctly implements all 8 virtual methods, following the same pattern as the KV cache manager trampoline.
290-448
: LGTM! Comprehensive KV cache manager bindings with proper tensor handling.The bindings are well-structured and provide extensive functionality. The lambda functions for PyTorch tensor conversions are appropriate, and the parameter handling is correct.
Note the TODO comment at line 239 about deduplicating executor bindings for
KvCacheStats
- this could be addressed in a future refactor.
450-477
: LGTM! Well-designed PEFT cache manager bindings with proper GIL handling.The bindings are properly implemented with appropriate use of
nb::gil_scoped_release
for theensure_batch
method, which allows for better concurrency during potentially long-running operations.cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (4)
1-57
: LGTM! Comprehensive includes and proper namespace setup.The includes are well-organized and cover all necessary components for runtime bindings. The namespace aliases enhance code readability.
61-97
: LGTM! Well-implemented trampoline class for IGptDecoder.The
PyIGptDecoder
trampoline class properly implements all 5 virtual methods from the base interface, enabling Python subclassing of the decoder interface.
105-178
: LGTM! Comprehensive bindings for runtime configuration classes.The bindings for
TaskLayerModuleConfig
,BufferManager
,TllmRuntime
, andRequest
are well-structured with appropriate parameter handling and property exposure.
360-388
: LGTM! Well-structured early bindings initialization.The
initBindingsEarly
function properly initializesSpeculativeDecodingMode
with static factory methods and read-only properties, following good nanobind practices.cpp/tensorrt_llm/nanobind/bindings.cpp (4)
2-2
: LGTM! Copyright year updated appropriately.The copyright year has been updated to 2025, which is appropriate for this new nanobind implementation.
70-74
: LGTM! Clean helper function for sampling config creation.The
makeSamplingConfig
helper function provides a clean interface for creating sampling configurations from vectors.
177-182
: LGTM! Improved KVCacheType binding with static method.The use of
from_string
as a static method instead of a constructor is a good design choice that provides better control over string parsing and error handling.
470-497
: LGTM! Comprehensive initialization of all binding modules.The initialization sequence properly coordinates all the various binding modules (runtime, testing, batch manager, etc.) and provides appropriate NVLS functionality.
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (11)
1-49
: Well-structured nanobind binding file with proper includes and namespaces.The file follows good practices with comprehensive includes, proper licensing, and clear namespace organization. The nanobind and STL container includes are appropriate for the binding patterns used.
57-65
: CapacityScheduler binding looks correct with proper default arguments.The constructor binding with named arguments and sensible defaults is well-implemented. The call method binding correctly handles optional parameters.
67-75
: MicroBatchScheduler binding properly handles optional parameters.The use of
std::nullopt
for optional parameters and the comprehensive argument binding is appropriate for this scheduler class.
77-83
: PauseRequests binding correctly handles multiple optional cache managers.The binding properly exposes the pause functionality with appropriate optional parameters for different cache manager types.
85-89
: AssignReqSeqSlots binding is clean and straightforward.Simple binding with no constructor arguments and clear call method parameters.
91-95
: AllocateKvCache binding properly handles optional cross cache manager.The binding correctly exposes the KV cache allocation functionality with optional cross cache manager parameter.
97-112
: HandleContextLogits binding uses lambda for tensor view conversion.The lambda wrapper correctly converts
at::Tensor
totr::TorchView
usingtr::TorchView::of(logits)
. This is a proper approach for handling tensor type conversions in nanobind.
114-130
: HandleGenerationLogits binding correctly handles void return type.The lambda wrapper properly handles the void return type by not returning anything, and the tensor conversion is consistent with the context logits handler.
132-137
: MakeDecodingBatchInputOutput binding is straightforward and correct.Direct binding of the operator with appropriate optional parameters for runtime buffers.
139-144
: LogitsPostProcessor binding properly handles optional batched processor.The binding correctly exposes the logits post-processing functionality with optional batched processor parameter.
146-170
: CreateNewDecoderRequests binding uses structured bindings effectively.The lambda wrapper correctly handles the tuple return type by using structured bindings and converting the tensor result with
runtime::Torch::tensor(batchSlots)
. This is a good pattern for handling complex return types in nanobind.
PR_Github #12115 [ run ] completed with state |
Signed-off-by: Linda-Stadter <[email protected]>
Signed-off-by: Linda-Stadter <[email protected]>
Signed-off-by: Linda-Stadter <[email protected]>
Signed-off-by: Linda-Stadter <[email protected]>
This reverts commit 5bff317. Signed-off-by: Iman Tabrizian <[email protected]>
This reverts commit 5bff317. Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
This reverts commit b75e53a.
This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>
This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>
This reverts commit b75e53a. Signed-off-by: Linda-Stadter <[email protected]>
Signed-off-by: Linda-Stadter <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
feat: nanobind bindings
This PR contains:
Description
This migration is not complete and still contains some errors that will be resolved in upcoming PRs.
I also enabled unit tests for the nanobind build in the CI. Since the bindings still contain some errors, there are some unit tests failing. I am skipping these tests for nanobind until the binding is complete. (95 passed, 28 skipped)
The migration from pybind11 to nanobind contains many files but is very repetitive. I believe it is therefore easiest to directly compare the difference between the pybind and nanobind version for alle the files, as they usually follow the same pattern. The difference is shown in this commit nanobind bindings.
I tried to list the main differences between pybind11 and nanobind for our code base here. They are sorted approx. by how often they occur.
PYBIND11_MODULE(..)
NB_MODULE(..)
PYBIND11_OVERRIDE_*(..)
NB_OVERRIDE_*(..)
reinterpret_borrow<T>(x)
borrow<T>(x)
reinterpret_steal<T>(x)
steal<T>(x)
.def_readwrite(..)
.def_rw(..)
.def_readonly(..)
.def_ro(..)
.def_property(..)
.def_prop_rw(..)
.def_property_readonly(..)
.def_prop_ro(..)
register_exception<T>
exception<T>
classh
class_
std::deque
std::deque
. Created a custom caster instead and reached out to owners for support ofstd:deque
.def(py::init([](int) { return MyType(...); }));
.def("__init__", [](MyType *t) { new (t) MyType(...); });
See documentationobject.cast<Type>()
nb::cast<Type>(object)
py::arg_v("param_name", default_value, "doc_string_description")
nb::arg("param_name") = default_value
Replaced numpy array class
py::array
withnb::ndarray<>
. See documentationThis class does not support nonstandard arithmetic types. I overloaded
nanobind::detail::dtype_traits
to support datatypehalf
. See documentationPYBIND11_OVERRIDE_*(..)
with base type and return value argumentsNB_OVERRIDE_*()
without these arguments. The class also requires oneNB_TRAMPOLINE(parent, size)
declaration. See documentationpy::pickle()
to bind__getstate__
and__setstate__
functions__getstate__
and__setstate__
methods. Changed__setstate__
construction to in-place. See documentationThere is a special case in set state function
executorConfigSetState
: The classExecutorConfig
allows to have dynamic attributes (attributes added in Python code). The set state function has to restore the C++ data as well as the attributes from Python. Nanobind then has to be informed inform the instance object is fully constructed via nb::inst_mark_ready()None arguments
Nanobind refuses conversions from
None
tonullptr
by default. In order to exceptNone
in an argument binding: Either call.none()
ofnb::arg
or specify annb::none()
default arguments. See documentationCustom type casters
See documentation
load()
from_python()
with different signaturecast()
from_cpp()
with different signatureat::Tensor
andstd::deque
For the exceptions
PeftTaskNotCachedException
andLoraCacheFullException
, I created custom exceptions withnb::exception<T>
. See documentationTest Coverage
I am running the same unit tests that we have for pybind in
tests/unittests/bindings
. However, since the migration is incomplete, I am skipping the failing tests for nanobind.Current state:
95 passed, 28 skipped
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
New Features
Bug Fixes
from_string
methods.Tests
Style