Remove source compilation of nixl dependency #24749

bbartels · 2025-09-12T15:09:40Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the installation of the nixl dependency by removing the custom source compilation script (install_nixl.sh) and adding nixl as a PyPI dependency in requirements/common.txt. This simplifies the build process. However, I have a critical concern regarding whether the nixl pip package can fully replace the complex installation logic from the removed script, which included handling dependencies like UCX and gdrcopy with hardware-specific optimizations. My review includes a detailed comment on this potential issue.

cjackal · 2025-09-12T15:33:31Z

I think Gemini is right in the sense that install_nixl.sh installs libgdrapi which is a necessary lib for nixl to use gdrcopy. We may need to recover the gdrcopy install part out of install_nixl.sh.

bbartels · 2025-09-12T16:05:19Z

@cjackal Do you think it suffice if we are just installing prebuild .debs? https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.8/ubuntu22_04/x64/

bbartels · 2025-09-12T16:17:54Z

https://github.com/vllm-project/vllm/pull/24749/files#diff-f34da55ca08f1a30591d8b0b3e885bcc678537b2a9a4aadea4f190806b374ddcR444

pulling in precombiled libgdrapi here

cjackal · 2025-09-12T16:21:42Z

@cjackal Do you think it suffice if we are just installing prebuild .debs? https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.8/ubuntu22_04/x64/

~~I haven't tested but it likely is working, gdrcopy is not so sensitive to release versions.~~ Yeah it works, I have tested the nixl P/D tutorial code.

But in terms of maintenance cost installing a pre-compiled deb package which depends on a specific CUDA version is not so favorable I think. Say vllm HEAD is using CUDA runtime 12.9 by default and 12.6 as optional in CI, but pre-compiled gdrcopy packages in nvidia compute repo has neither CUDA 12.9 nor 12.6 variants. Maybe a viewpoint from vllm maintainer is needed, cc @DarkLight1337

DarkLight1337 · 2025-09-13T05:14:30Z

cc @youkaichao @simon-mo

docker/Dockerfile

pytorch-bot · 2025-09-15T08:07:30Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

NickLucche

LGTM, only left a few minor comments.

NickLucche · 2025-09-15T08:46:33Z

docker/Dockerfile

@@ -105,6 +105,7 @@ RUN echo 'tzdata tzdata/Areas select America' | debconf-set-selections \
    && curl -sS ${GET_PIP_URL} | python${PYTHON_VERSION} \
    && python3 --version && python3 -m pip --version

+


nit: is it clearer with the extra lines?

NickLucche · 2025-09-15T08:48:37Z

docs/serving/expert_parallel_deployment.md

@@ -191,11 +190,9 @@ For production deployments requiring strict SLA guarantees for time-to-first-tok

 ### Setup Steps

-1. **Install KV Connector**: Install NIXL using the [installation script](gh-file:tools/install_nixl.sh)


do we mention one should install kv_connectors.txt requirements somewhere else?

Nope, I'll add that :)

Perhaps we can replace kf_connectors.txt entirely be including these as an extra in setup.py?

setup( ... extras_require={ ... "kv-connector": ["lmcache", "nixl>=0.5.1"], ... )

Then you just install vLLM using uv pip install vllm[kv-connector] or uv pip install -e .[kv-connector]?

Sure can do, the only issue is that you still need to do the manual install of libgdrapi

I am expecting this list to grow but that is an option.
Also, we may need to separate those into separate extras eg it will be common to avoid nixl if using nccl, but still want lmcache as additional connector for offloading

For now let's stick with txt files as that's what we have for everything. In future it would be nice to use the extras_requires feature properly for all cases like this

pytorch-bot · 2025-09-15T09:13:11Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

pytorch-bot · 2025-09-15T09:17:07Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

pytorch-bot · 2025-09-15T09:23:27Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

pytorch-bot · 2025-09-15T09:25:04Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

pytorch-bot · 2025-09-15T10:20:48Z

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

hmellor · 2025-09-15T10:25:57Z

Looks like the git history is broken and everyone got pinged. @bbartels could you please make a new PR?

bbartels · 2025-09-15T10:28:55Z

Yep, sorry tried to do the signoff and royally failed

bbartels requested a review from hmellor as a code owner September 12, 2025 15:09

mergify bot added documentation Improvements or additions to documentation ci/build labels Sep 12, 2025

bbartels force-pushed the nixl branch from b5a317e to 49a507f Compare September 12, 2025 15:10

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

cjackal reviewed Sep 13, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved