Skip to content

Conversation

dkoshkin
Copy link
Contributor

@dkoshkin dkoshkin commented Sep 3, 2025

What problem does this PR solve?:
This PR enables Cilium's kube-proxy replacement feature automatically when clusters disable kube-proxy installation during upgrades.
The Cilium handler will apply the new configuration, wait for the DaemonSet to be rolled out and then delete the kube-proxy DaemonSet and its ConfigMap.

Which issue(s) this PR fixes:
Fixes #

How Has This Been Tested?:

Special notes for your reviewer:

Stacked on #1288

@dkoshkin dkoshkin force-pushed the dkoshkin/feat-cilium-kube-proxy-replacement branch 4 times, most recently from 9c4f374 to 18206bd Compare September 3, 2025 22:46
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-cilium-kube-proxy-replacement branch 2 times, most recently from 0524c7c to 314c6a0 Compare September 3, 2025 23:58
@dkoshkin dkoshkin changed the base branch from main to dkoshkin/feat-cilium-kube-proxy-replacement-new-clusters September 3, 2025 23:58
@github-actions github-actions bot added feature and removed feature labels Sep 3, 2025
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-cilium-kube-proxy-replacement-new-clusters branch 2 times, most recently from 742615f to 7d79c35 Compare September 4, 2025 01:14
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-cilium-kube-proxy-replacement branch from 314c6a0 to c654ac0 Compare September 4, 2025 01:17
jimmidyson
jimmidyson previously approved these changes Sep 4, 2025
Copy link
Member

@jimmidyson jimmidyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great!

@dkoshkin dkoshkin force-pushed the dkoshkin/feat-cilium-kube-proxy-replacement branch from c654ac0 to 9fa1a65 Compare September 4, 2025 15:37
Base automatically changed from dkoshkin/feat-cilium-kube-proxy-replacement-new-clusters to main September 4, 2025 16:32
@jimmidyson jimmidyson dismissed their stale review September 4, 2025 16:32

The base branch was changed.

@github-actions github-actions bot added feature and removed feature labels Sep 4, 2025
This value is only needed when kubeProxyReplacement is set.
Its also there to force restart the Cilium Pods.
@dkoshkin dkoshkin force-pushed the dkoshkin/feat-cilium-kube-proxy-replacement branch from 9fa1a65 to ba3979f Compare September 4, 2025 17:23
@github-actions github-actions bot removed the feature label Sep 4, 2025
Copy link
Contributor

@yanhua121 yanhua121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is on stack of #1288, and I see in the PR #1288 (merged) the annotation "skip-kube-proxy" is not added to the file examples/capi-quick-start/nutanix-cluster-failuredomain-cilium-helm-addon.yaml, but added to the file examples/capi-quick-start/nutanix-cluster-cilium-helm-addon.yaml. Is this intentionally? Are these example files only used for e2e tests?

@dkoshkin
Copy link
Contributor Author

dkoshkin commented Sep 8, 2025

Are these example files only used for e2e tests?

Yes its mainly used in e2e tests and local dev testing, these are not published.

is not added to the file examples/capi-quick-start/nutanix-cluster-failuredomain-cilium-helm-addon.yaml, but added to the file examples/capi-quick-start/nutanix-cluster-cilium-helm-addon.yaml. Is this intentionally

Not really, I will have a follow up to fix it in a separate PR.

Copy link
Contributor

@yanhua121 yanhua121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have e2e test for upgrade? Can we add the e2e test for this PR changes when upgrading the cluster?

@dkoshkin
Copy link
Contributor Author

dkoshkin commented Sep 8, 2025

Do we have e2e test for upgrade? Can we add the e2e test for this PR changes when upgrading the cluster?

No we don't have upgrade tests here yet and its a bigger effort to do it. It won't be possible to add it as part of this, but hoping the integration tests are enough.

(I've also tested manually)

{{- with .ControlPlane }}
{{- range $key, $val := .metadata.annotations }}
{{- if eq $key "controlplane.cluster.x-k8s.io/skip-kube-proxy" }}
k8sServiceHost: auto
Copy link
Contributor

@dlipovetsky dlipovetsky Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe only when kubeProxyReplacement is true? I thought it was always safe.

Copy link
Contributor Author

@dkoshkin dkoshkin Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it back because unfortunately just setting kubeProxyReplacement does not trigger a Pod rollout.
In a follow up I will remove it if the rest of the config for IP preservation does result in new Pods.

Although kubeProxyReplacement is not really necessary with the current config without kube-proxy replacement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like this - it's too opaque as to why the necessary pods are being rolled out. We should have a better defined trigger for this.

Copy link
Contributor

@dlipovetsky dlipovetsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our plan is to represent kube-proxy replacement in the API. But this makes sense in the meantime.

@dkoshkin dkoshkin enabled auto-merge (squash) September 9, 2025 22:59
@dkoshkin dkoshkin merged commit 6fe03f8 into main Sep 9, 2025
25 checks passed
@dkoshkin dkoshkin deleted the dkoshkin/feat-cilium-kube-proxy-replacement branch September 9, 2025 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants