Skip to content

Conversation

radez
Copy link
Collaborator

@radez radez commented Feb 20, 2025

@josecastillolema
Copy link
Member

/test ?

Copy link

openshift-ci bot commented Feb 20, 2025

@josecastillolema: The following commands are available to trigger required jobs:

/test deploy-5nodes
/test deploy-5nodes-dev
/test deploy-sno
/test deploy-sno-dev

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@josecastillolema
Copy link
Member

/test deploy-sno

@akrzos
Copy link
Member

akrzos commented Feb 24, 2025

/test deploy-5nodes

@josecastillolema
Copy link
Member

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets.
ps. Even with the updated secrets we will have the route issue :/
cc @akrzos

@akrzos
Copy link
Member

akrzos commented Feb 24, 2025

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos

Ok I was going to look into the route issue more this afternoon also.

@josecastillolema
Copy link
Member

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos

Ok I was going to look into the route issue more this afternoon also.

Should be fixed when openshift/release#62015 merges

@akrzos
Copy link
Member

akrzos commented Feb 25, 2025

/test deploy-5nodes

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could include some sort of "limit" on initial deployment of a cluster such that say someone had a 200 node allocation given to them, and they ran create inventory, they would have 196 workers in the worker section. I am thinking we could make mno-deploy only deploy say 120 of the worker nodes and you'd have to use the mno-scale-out.yml playbook to increase the count of workers above this initial threshold. Do you think it is worth it to implement that into mno-deploy?

@akrzos
Copy link
Member

akrzos commented Feb 27, 2025

/test deploy-5nodes

Copy link

openshift-ci bot commented Feb 27, 2025

@radez: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/deploy-sno 7bb5d2b link true /test deploy-sno
ci/prow/deploy-5nodes 7bb5d2b link true /test deploy-5nodes

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@akrzos
Copy link
Member

akrzos commented Feb 27, 2025

This needs to be rebased to pick up the fix in #619 for CI to work

@radez
Copy link
Collaborator Author

radez commented Mar 18, 2025

Turns out we do need all.yml, there's a config director var that I used to hold the generated iso in.

- Add nodes to worker inventory section and update vars in scaleout.yml to add nodes to the existing cluster.
- https://docs.openshift.com/container-platform/4.17/nodes/nodes/nodes-nodes-adding-node-iso.html
@akrzos
Copy link
Member

akrzos commented Mar 19, 2025

Was able to run an initial deployment with 3 nodes and then scale up to 6 nodes:

# oc get no
NAME               STATUS   ROLES                  AGE     VERSION
e38-h02-000-r650   Ready    control-plane,master   35m     v1.31.6
e38-h03-000-r650   Ready    control-plane,master   52m     v1.31.6
e38-h06-000-r650   Ready    control-plane,master   52m     v1.31.6
vm00001            Ready    worker                 38m     v1.31.6
vm00002            Ready    worker                 38m     v1.31.6
vm00003            Ready    worker                 38m     v1.31.6
vm00004            Ready    worker                 4m37s   v1.31.6
vm00005            Ready    worker                 4m39s   v1.31.6
vm00006            Ready    worker                 4m40s   v1.31.6

Basic process was:

  1. Deploy with worker_node_count: 0 and hybrid_worker_count: 3
  2. Rerun create-inventory playbook with hybrid_worker_count: 6
  3. Copy the sample scale up vars and edit ansible/vars/scale_out.yml
  4. Run mno-scale-out.yml

Playbook ran for 10m 28s for the 3 node scale up.

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to scale the cluster to a 156 nodes, works well!!

@openshift-ci openshift-ci bot added the lgtm label Mar 19, 2025
Copy link

openshift-ci bot commented Mar 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrzos

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 938cef9 into redhat-performance:main Mar 19, 2025
1 check passed
@josecastillolema
Copy link
Member

I think it would be nice to have a Prow test for this feature in the Jetlag CI,
it can deploy a 3+1 cluster and then scale out +1 node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants