-
Notifications
You must be signed in to change notification settings - Fork 4.7k
OCPEDGE-1916: fix: add exception for two node fencing #30218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@eggfoobar: This pull request references OCPEDGE-1916 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/37c2f790-8c72-11f0-889b-f227d4e7d021-0 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: eggfoobar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5c464ce0-8cac-11f0-8150-467409da5b51-0 |
81364bb
to
758a627
Compare
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5afce150-8cca-11f0-86cb-573ed793600f-0 |
add two node fencing exception to the etcd operator state transition during upgrade, in two node fencing the etcd operator will go unavailable as the two pods are updated and etcd fencing job is running via pacemaker, this is expected behavior due to the limitations of two node deployments Signed-off-by: ehila <[email protected]>
758a627
to
43e6915
Compare
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
@eggfoobar: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c36908a0-8ce7-11f0-943f-b163f5f7e932-0 |
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f097ce90-8d0c-11f0-8fcc-7944b029ffea-0 |
Job Failure Risk Analysis for sha: 43e6915
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c83827d0-8d32-11f0-8cac-2c537bd949d8-0 |
After some issues with the equinix account, the latest job run is successful, we no longer see the etcd events trigger the test. This is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want us to be more targeted with this exception. This should only happen during initial deployment. IIUC, the setup job shouldn't be re-run after initial deployment unless we're doing a control-plane node replacement.
Currently, we are too aggressive in TNF with regards to "unavailable". Etcd should only be unavailable from the start of the setup job, to it's completion. Other jobs should only affect the progressing status, with "degraded" being set if we give up.
I can tighten up that catch. As it currently stands, there are 4 conditions this captures, |
Closing in favor of openshift/cluster-etcd-operator#1481 |
add two node fencing exception to the etcd operator state transition during upgrade, in two node fencing the etcd operator will go unavailable as the two pods are updated and etcd fencing job is running via pacemaker, this is expected behavior due to the limitations of two node deployments