Remove source compilation of nixl dependency

bbartels · bbartels · commit b5a317eaaacb · 2025-09-12T15:09:24.000Z
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -441,11 +441,9 @@ RUN --mount=type=cache,target=/root/.cache/uv \
 
 # Install EP kernels(pplx-kernels and DeepEP), NixL
 COPY tools/ep_kernels/install_python_libraries.sh install_python_libraries.sh
-COPY tools/install_nixl.sh install_nixl.sh
 ENV CUDA_HOME=/usr/local/cuda
 RUN export TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-9.0a+PTX}" \
-    && bash install_python_libraries.sh \
-    && bash install_nixl.sh --force
+    && bash install_python_libraries.sh
 
 #################### vLLM installation IMAGE ####################
 
diff --git a/docs/serving/expert_parallel_deployment.md b/docs/serving/expert_parallel_deployment.md
@@ -10,7 +10,6 @@ Before using EP, you need to install the necessary dependencies. We are actively
 
 1. **Install DeepEP and pplx-kernels**: Set up host environment following vLLM's guide for EP kernels [here](gh-file:tools/ep_kernels).
 2. **Install DeepGEMM library**: Follow the [official instructions](https://github.com/deepseek-ai/DeepGEMM#installation).
-3. **For disaggregated serving**: Install UCX and NIXL following the [script](gh-file:tools/install_nixl.sh).
 
 ### Backend Selection Guide
 
@@ -191,11 +190,9 @@ For production deployments requiring strict SLA guarantees for time-to-first-tok
 
 ### Setup Steps
 
-1. **Install KV Connector**: Install NIXL using the [installation script](gh-file:tools/install_nixl.sh)
+1. **Configure Both Instances**: Add this flag to both prefill and decode instances `--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}`
 
-2. **Configure Both Instances**: Add this flag to both prefill and decode instances `--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}`
-
-3. **Client Orchestration**: Use the client-side script below to coordinate prefill/decode operations. We are actively working on routing solutions.
+2. **Client Orchestration**: Use the client-side script below to coordinate prefill/decode operations. We are actively working on routing solutions.
 
 ### Client Orchestration Example
 
diff --git a/requirements/common.txt b/requirements/common.txt
@@ -49,3 +49,4 @@ pybase64 # fast base64 implementation
 cbor2 # Required for cross-language serialization of hashable objects
 setproctitle # Used to set process names for better debugging and monitoring
 openai-harmony >= 0.0.3  # Required for gpt-oss
+nixl >= 0.5.1 # Required for disaggregated prefill
diff --git a/tools/install_nixl.sh b/tools/install_nixl.sh