Adding Scale Out functionality #613

radez · 2025-02-20T13:04:51Z

Add nodes to worker inventory section and update vars in scaleout.yml to add nodes to the existing cluster.
https://docs.openshift.com/container-platform/4.17/nodes/nodes/nodes-nodes-adding-node-iso.html

josecastillolema · 2025-02-20T15:24:05Z

/test ?

openshift-ci · 2025-02-20T15:24:09Z

@josecastillolema: The following commands are available to trigger required jobs:

/test deploy-5nodes

/test deploy-5nodes-dev

/test deploy-sno

/test deploy-sno-dev

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

josecastillolema · 2025-02-20T15:24:17Z

/test deploy-sno

akrzos · 2025-02-24T19:27:16Z

/test deploy-5nodes

josecastillolema · 2025-02-24T19:42:35Z

/test deploy-5nodes

The test failed because of:

   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found

Let me take care of this tomorrow morning, we need to update the secrets.
ps. Even with the updated secrets we will have the route issue :/
cc @akrzos

akrzos · 2025-02-24T19:47:25Z

/test deploy-5nodes

The test failed because of:
   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found
Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos

Ok I was going to look into the route issue more this afternoon also.

josecastillolema · 2025-02-25T11:45:49Z

/test deploy-5nodes

The test failed because of:
   * could not run steps: step deploy-5nodes failed: failed to create credentials: could not read source credential: secrets "perfscale-metal-bastion" not found
Let me take care of this tomorrow morning, we need to update the secrets. ps. Even with the updated secrets we will have the route issue :/ cc @akrzos
Ok I was going to look into the route issue more this afternoon also.

Should be fixed when openshift/release#62015 merges

akrzos · 2025-02-25T19:51:50Z

/test deploy-5nodes

akrzos

I'm wondering if we could include some sort of "limit" on initial deployment of a cluster such that say someone had a 200 node allocation given to them, and they ran create inventory, they would have 196 workers in the worker section. I am thinking we could make mno-deploy only deploy say 120 of the worker nodes and you'd have to use the mno-scale-out.yml playbook to increase the count of workers above this initial threshold. Do you think it is worth it to implement that into mno-deploy?

ansible/roles/boot-iso/tasks/main.yml

ansible/mno-scale-out.yml

ansible/roles/mno-scale-out-csr/tasks/check_nodes_joined.yml

akrzos · 2025-02-27T20:31:17Z

/test deploy-5nodes

openshift-ci · 2025-02-27T20:54:11Z

@radez: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/deploy-sno	`7bb5d2b`	link	true	`/test deploy-sno`
ci/prow/deploy-5nodes	`7bb5d2b`	link	true	`/test deploy-5nodes`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

akrzos · 2025-02-27T21:30:33Z

This needs to be rebased to pick up the fix in #619 for CI to work

radez · 2025-03-18T14:07:22Z

Turns out we do need all.yml, there's a config director var that I used to hold the generated iso in.

- Add nodes to worker inventory section and update vars in scaleout.yml to add nodes to the existing cluster. - https://docs.openshift.com/container-platform/4.17/nodes/nodes/nodes-nodes-adding-node-iso.html

akrzos · 2025-03-19T19:54:15Z

Was able to run an initial deployment with 3 nodes and then scale up to 6 nodes:

# oc get no
NAME               STATUS   ROLES                  AGE     VERSION
e38-h02-000-r650   Ready    control-plane,master   35m     v1.31.6
e38-h03-000-r650   Ready    control-plane,master   52m     v1.31.6
e38-h06-000-r650   Ready    control-plane,master   52m     v1.31.6
vm00001            Ready    worker                 38m     v1.31.6
vm00002            Ready    worker                 38m     v1.31.6
vm00003            Ready    worker                 38m     v1.31.6
vm00004            Ready    worker                 4m37s   v1.31.6
vm00005            Ready    worker                 4m39s   v1.31.6
vm00006            Ready    worker                 4m40s   v1.31.6

Basic process was:

Deploy with worker_node_count: 0 and hybrid_worker_count: 3
Rerun create-inventory playbook with hybrid_worker_count: 6
Copy the sample scale up vars and edit ansible/vars/scale_out.yml
Run mno-scale-out.yml

Playbook ran for 10m 28s for the 3 node scale up.

akrzos

Was able to scale the cluster to a 156 nodes, works well!!

openshift-ci · 2025-03-19T21:17:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akrzos

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [akrzos]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

josecastillolema · 2025-03-20T08:05:23Z

I think it would be nice to have a Prow test for this feature in the Jetlag CI,
it can deploy a 3+1 cluster and then scale out +1 node.

openshift-ci bot requested review from akrzos and rsevilla87 February 20, 2025 13:04

openshift-ci bot added the approved label Feb 20, 2025

akrzos requested changes Feb 25, 2025

View reviewed changes

ansible/roles/boot-iso/tasks/main.yml Outdated Show resolved Hide resolved

ansible/mno-scale-out.yml Show resolved Hide resolved

openshift-ci bot assigned akrzos Feb 25, 2025

akrzos reviewed Feb 25, 2025

View reviewed changes

ansible/roles/mno-scale-out-csr/tasks/check_nodes_joined.yml Outdated Show resolved Hide resolved

radez force-pushed the scale_out branch from 7bb5d2b to e9d7fbd Compare March 4, 2025 13:30

radez requested a review from akrzos March 4, 2025 13:32

radez force-pushed the scale_out branch from e9d7fbd to 7a84dc5 Compare March 18, 2025 13:57

openshift-ci bot removed the approved label Mar 18, 2025

Adding Scale Out functionality

0e84d5e

- Add nodes to worker inventory section and update vars in scaleout.yml to add nodes to the existing cluster. - https://docs.openshift.com/container-platform/4.17/nodes/nodes/nodes-nodes-adding-node-iso.html

radez force-pushed the scale_out branch from 7a84dc5 to 0e84d5e Compare March 18, 2025 15:25

akrzos approved these changes Mar 19, 2025

View reviewed changes

openshift-ci bot added the lgtm label Mar 19, 2025

openshift-ci bot added the approved label Mar 19, 2025

openshift-merge-bot bot merged commit 938cef9 into redhat-performance:main Mar 19, 2025
1 check passed

akrzos mentioned this pull request Jun 6, 2025

Research and develop method to scale up an existing cluster #449

Closed

Adding Scale Out functionality #613

Adding Scale Out functionality #613

Uh oh!

Conversation

radez commented Feb 20, 2025

Uh oh!

josecastillolema commented Feb 20, 2025

Uh oh!

openshift-ci bot commented Feb 20, 2025

Uh oh!

josecastillolema commented Feb 20, 2025

Uh oh!

akrzos commented Feb 24, 2025

Uh oh!

josecastillolema commented Feb 24, 2025

Uh oh!

akrzos commented Feb 24, 2025

Uh oh!

josecastillolema commented Feb 25, 2025

Uh oh!

akrzos commented Feb 25, 2025

Uh oh!

akrzos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akrzos commented Feb 27, 2025

Uh oh!

openshift-ci bot commented Feb 27, 2025

Uh oh!

akrzos commented Feb 27, 2025

Uh oh!

radez commented Mar 18, 2025

Uh oh!

akrzos commented Mar 19, 2025

Uh oh!

akrzos left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 19, 2025

Uh oh!

Uh oh!

josecastillolema commented Mar 20, 2025

Uh oh!

Uh oh!