Skip to content

Conversation

akrzos
Copy link
Member

@akrzos akrzos commented Mar 11, 2025

No description provided.

Copy link

openshift-ci bot commented Mar 11, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@akrzos
Copy link
Member Author

akrzos commented Mar 11, 2025

/test deploy-5nodes

1 similar comment
@akrzos
Copy link
Member Author

akrzos commented Mar 11, 2025

/test deploy-5nodes

@josecastillolema
Copy link
Member

Test failed because the timeout was too short, I suggest adjusting according to the timeouts that we have in Prow.

@akrzos
Copy link
Member Author

akrzos commented Mar 12, 2025

Test failed because the timeout was too short, I suggest adjusting according to the timeouts that we have in Prow.

Got it, I'll try those defaults here with the goal of seeing how long it normally takes a cluster to become stable, I don't really like the idea of adding up to 20m to the length of the playbook so I might default the action to false

@akrzos
Copy link
Member Author

akrzos commented Mar 12, 2025

/test deploy-5nodes

@josecastillolema
Copy link
Member

Test failed because the timeout was too short, I suggest adjusting according to the timeouts that we have in Prow.

Got it, I'll try those defaults here with the goal of seeing how long it normally takes a cluster to become stable, I don't really like the idea of adding up to 20m to the length of the playbook so I might default the action to false

Having a 20 minutes timeout doesn't mean it's going to add 20 minutes always to the total time, if the cluster is healthy before for 2 minutes then it won't take the full timeout time.

@akrzos
Copy link
Member Author

akrzos commented Mar 12, 2025

Test failed because the timeout was too short, I suggest adjusting according to the timeouts that we have in Prow.

Got it, I'll try those defaults here with the goal of seeing how long it normally takes a cluster to become stable, I don't really like the idea of adding up to 20m to the length of the playbook so I might default the action to false

Having a 20 minutes timeout doesn't mean it's going to add 20 minutes always to the total time, if the cluster is healthy before for 2 minutes then it won't take the full timeout time.

Understood, but in theory, it could add up to 20m to tell you that there is a non-stable cluster as well. Some folks don't bother to play with something until they see the current playbook etc run to completion. Also could make CI jobs run 20m to tell you about an unstable cluster. Having ran many tests from jetlag MNO clusters before I actually wasn't even aware all the time that there was still some amount of cluster operator work occurring post assisted installer saying the cluster completed installing.

@akrzos
Copy link
Member Author

akrzos commented Mar 12, 2025

Most recent run added 7m and 40s to an MNO deployment, but should ensure that the cluster is stable for day2 operations post deployment:

mno-post-cluster-install : Wait until cluster is stable --------------- 460.44s

@akrzos akrzos marked this pull request as ready for review March 12, 2025 14:48
@openshift-ci openshift-ci bot requested review from radez and rsevilla87 March 12, 2025 14:48
@akrzos
Copy link
Member Author

akrzos commented Mar 12, 2025

/test deploy-sno

@akrzos
Copy link
Member Author

akrzos commented Mar 13, 2025

/test deploy-5nodes

@akrzos
Copy link
Member Author

akrzos commented Mar 13, 2025

On my own SNO I observed 5m and 30s to wait for the cluster to become stable.

sno-post-cluster-install : Wait until cluster is stable ------------------------------------------ 330.39s

@josecastillolema
Copy link
Member

/lgtm

@akrzos
Copy link
Member Author

akrzos commented Mar 13, 2025

/test deploy-sno

@akrzos
Copy link
Member Author

akrzos commented Mar 13, 2025

/approve

Copy link

openshift-ci bot commented Mar 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrzos

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit ea12e85 into redhat-performance:main Mar 13, 2025
3 checks passed
@akrzos akrzos deleted the wait_to_stable branch March 13, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants