Skip to content

Conversation

rouke-broersma
Copy link
Contributor

@rouke-broersma rouke-broersma commented Aug 18, 2025

Improves custom action result normalization to check for apiVersion Group, this reduces the chance that a Kind with the same name but the wrong api group is normalized accidentally in resource actions tests.

Adds specific actions for cloudnativepg that trigger the operator to execute an operational task that should not be governed by gitops. The following tasks are added:

  • Reload - this action instructs the cloudnativepg operator to check all Cluster child resources are still up-to-date
  • Restart - effectively kubectl rollout-restart but for the Cluster CRD
  • Promote - Starts the promotion process for promoting one of the healthy standby replicas to primary in one of three ways. You can either specify the full replica pod name, you can specify the replica pod instance number or you can specify any which will select the next available instance number automatically
  • Suspend - Suspends resource reconciliation by operator
  • Resume - Resumes resource reconciliation by operator

The cloudnativepg health check is extended to support detecting the reconciliation suspension

The promotion action is disabled if no healthy replicas are available to promote:

image

The promotion action is enabled if there are healthy replicas:

image

Before any promotion status:

targetPrimary: test-cluster-1
targetPrimaryTimestamp: "2025-08-18T16:34:03Z"
currentPrimary: test-cluster-1
currentPrimaryTimestamp: "2025-08-18T16:34:08.311908Z"
instancesStatus:
  healthy:
    - test-cluster-1
    - test-cluster-2

During promotion status:

targetPrimary: test-cluster-2
targetPrimaryTimestamp: "2025-08-18T16:41:38Z"
currentPrimary: test-cluster-1
currentPrimaryTimestamp: "2025-08-18T16:34:08.311908Z"
instancesStatus:
  healthy:
    - test-cluster-2
  replicating:
    - test-cluster-1

After promotion:

targetPrimary: test-cluster-2
targetPrimaryTimestamp: "2025-08-18T16:41:38Z"
currentPrimary: test-cluster-2
currentPrimaryTimestamp: "2025-08-18T16:41:42.441994Z"
instancesStatus:
  healthy:
    - test-cluster-1
    - test-cluster-2

The reload action triggers the operation to reconciliate the cluster resources:

{"level":"info","ts":"2025-08-18T17:16:06.310914875Z","logger":"cluster-resource","msg":"Defaulting for Cluster","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:16:06.346904494Z","logger":"cluster-resource","msg":"Validation for Cluster upon update","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:17:03.739202478Z","logger":"cluster-resource","msg":"Defaulting for Cluster","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:17:03.769107314Z","logger":"cluster-resource","msg":"Validation for Cluster upon update","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:17:14.529150851Z","logger":"cluster-resource","msg":"Defaulting for Cluster","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:17:14.564404404Z","logger":"cluster-resource","msg":"Validation for Cluster upon update","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:17:19.913762103Z","logger":"cluster-resource","msg":"Defaulting for Cluster","version":"v1","name":"test-cluster","namespace":"test"}
{"level":"info","ts":"2025-08-18T17:17:19.945209921Z","logger":"cluster-resource","msg":"Validation for Cluster upon update","version":"v1","name":"test-cluster","namespace":"test"}

And the restart action triggers a rolling restart without a primary failover:

{"level":"info","ts":"2025-08-18T17:18:52.646611677Z","msg":"Pod rollout required","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"bc77ac22-9ae4-40ef-bef9-19110d3c64da","podName":"test-cluster-1","reason":"cluster has been explicitly restarted via annotation"}
{"level":"info","ts":"2025-08-18T17:18:52.667735942Z","msg":"Cluster has become unhealthy","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"bc77ac22-9ae4-40ef-bef9-19110d3c64da"}
{"level":"info","ts":"2025-08-18T17:18:52.667806284Z","msg":"Recreating instance pod","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"bc77ac22-9ae4-40ef-bef9-19110d3c64da","pod":"test-cluster-1","to":"ghcr.io/cloudnative-pg/postgresql:17.5","reason":"Restarting instance test-cluster-1, because: cluster has been explicitly restarted via annotation"}
{"level":"info","ts":"2025-08-18T17:18:54.254567272Z","msg":"Creating new Pod to reattach a PVC","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"f51265aa-3a73-402b-968f-0e8c8bc58648","pod":"test-cluster-1","pvc":"test-cluster-1"}
{"level":"info","ts":"2025-08-18T17:18:59.843779751Z","msg":"Setting replica label","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"fa372546-a72a-49ae-a5cb-aaed83e78097","pod":"test-cluster-1"}
{"level":"info","ts":"2025-08-18T17:19:07.404906306Z","msg":"Pod rollout required","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"119e45d3-1c8e-4fab-bd63-95b48bc54760","podName":"test-cluster-2","reason":"cluster has been explicitly restarted via annotation"}
{"level":"info","ts":"2025-08-18T17:19:07.404979398Z","msg":"Restarting primary instance without a switchover first","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"119e45d3-1c8e-4fab-bd63-95b48bc54760","primaryPod":"test-cluster-2","reason":"cluster has been explicitly restarted via annotation"}
{"level":"info","ts":"2025-08-18T17:19:07.426877123Z","msg":"Recreating instance pod","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"119e45d3-1c8e-4fab-bd63-95b48bc54760","pod":"test-cluster-2","to":"ghcr.io/cloudnative-pg/postgresql:17.5","reason":"cluster has been explicitly restarted via annotation"}
{"level":"info","ts":"2025-08-18T17:19:16.746789739Z","msg":"Setting primary label","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"a98b9a04-46e3-4c3f-b526-ba1725633422","pod":"test-cluster-2"}
{"level":"info","ts":"2025-08-18T17:19:16.786118784Z","msg":"Waiting for the Kubelet to refresh the readiness probe","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"a98b9a04-46e3-4c3f-b526-ba1725633422","mostAdvancedInstanceName":"test-cluster-2","hasHTTPStatus":true,"isPodReady":false}
{"level":"info","ts":"2025-08-18T17:19:23.475217292Z","msg":"All instances ready, will proceed","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"b2f95c7a-0da3-4992-87ad-ce777aa666fc","currentPrimary":"test-cluster-2","targetPrimary":"test-cluster-2"}
{"level":"info","ts":"2025-08-18T17:19:23.511452375Z","msg":"Cluster has become healthy","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"test-cluster","namespace":"test"},"namespace":"test","name":"test-cluster","reconcileID":"b2f95c7a-0da3-4992-87ad-ce777aa666fc"}

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Title of the PR
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

Signed-off-by: Rouke Broersma <[email protected]>
Copy link

bunnyshell bot commented Aug 18, 2025

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

Signed-off-by: Rouke Broersma <[email protected]>
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (master@1268dd9). Learn more about missing BASE report.
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #24192   +/-   ##
=========================================
  Coverage          ?   60.33%           
=========================================
  Files             ?      350           
  Lines             ?    60035           
  Branches          ?        0           
=========================================
  Hits              ?    36223           
  Misses            ?    20904           
  Partials          ?     2908           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rouke-broersma rouke-broersma marked this pull request as ready for review August 19, 2025 11:41
@rouke-broersma rouke-broersma requested review from a team as code owners August 19, 2025 11:41
Signed-off-by: Rouke Broersma <[email protected]>
Signed-off-by: Rouke Broersma <[email protected]>
Signed-off-by: Rouke Broersma <[email protected]>
@rouke-broersma rouke-broersma changed the title feat(actions): Add cloudnativepg reload, restart and promote actions feat(actions): Add cloudnativepg reload, restart, promote, suspend and resume actions Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant