-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Remove source compilation of nixl dependency #24749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the installation of the nixl
dependency by removing the custom source compilation script (install_nixl.sh
) and adding nixl
as a PyPI dependency in requirements/common.txt
. This simplifies the build process. However, I have a critical concern regarding whether the nixl
pip package can fully replace the complex installation logic from the removed script, which included handling dependencies like UCX
and gdrcopy
with hardware-specific optimizations. My review includes a detailed comment on this potential issue.
I think Gemini is right in the sense that |
@cjackal Do you think it suffice if we are just installing prebuild .debs? https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.8/ubuntu22_04/x64/ |
pulling in precombiled |
But in terms of maintenance cost installing a pre-compiled deb package which depends on a specific CUDA version is not so favorable I think. Say vllm HEAD is using CUDA runtime 12.9 by default and 12.6 as optional in CI, but pre-compiled gdrcopy packages in nvidia compute repo has neither CUDA 12.9 nor 12.6 variants. Maybe a viewpoint from vllm maintainer is needed, cc @DarkLight1337 |
No ciflow labels are configured for this repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only left a few minor comments.
docker/Dockerfile
Outdated
@@ -105,6 +105,7 @@ RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \ | |||
&& curl -sS ${GET_PIP_URL} | python${PYTHON_VERSION} \ | |||
&& python3 --version && python3 -m pip --version | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it clearer with the extra lines?
@@ -191,11 +190,9 @@ For production deployments requiring strict SLA guarantees for time-to-first-tok | |||
|
|||
### Setup Steps | |||
|
|||
1. **Install KV Connector**: Install NIXL using the [installation script](gh-file:tools/install_nixl.sh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we mention one should install kv_connectors.txt requirements somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I'll add that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can replace kf_connectors.txt
entirely be including these as an extra in setup.py
?
setup(
...
extras_require={
...
"kv-connector": ["lmcache", "nixl>=0.5.1"],
...
)
Then you just install vLLM using uv pip install vllm[kv-connector]
or uv pip install -e .[kv-connector]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure can do, the only issue is that you still need to do the manual install of libgdrapi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am expecting this list to grow but that is an option.
Also, we may need to separate those into separate extras eg it will be common to avoid nixl if using nccl, but still want lmcache as additional connector for offloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now let's stick with txt files as that's what we have for everything. In future it would be nice to use the extras_requires
feature properly for all cases like this
No ciflow labels are configured for this repo. |
No ciflow labels are configured for this repo. |
No ciflow labels are configured for this repo. |
No ciflow labels are configured for this repo. |
No ciflow labels are configured for this repo. |
Looks like the git history is broken and everyone got pinged. @bbartels could you please make a new PR? |
Yep, sorry tried to do the signoff and royally failed |
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.