-
Notifications
You must be signed in to change notification settings - Fork 129
Add NodePool STS Permissions Test for Day 2 Operations #3001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add NodePool STS Permissions Test for Day 2 Operations #3001
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Amarthya-v The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Amarthya-v Could you remove the trailing whitespace in the file? I use https://github.com/bitc/vim-bad-whitespace for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if should create nodepool and provision EC2 instances with proper STS permissions
could be split up into smaller tests. For example, would it make sense to have a test just to create the nodepool and then a second to do the rest?
Could you add more tests? Also, can you test the bad paths as well as the happy paths?
2fc11fd
to
3e9ef47
Compare
3e9ef47
to
5e1aef9
Compare
/test all |
@Amarthya-v looks like you have some failing checks at the moment. |
@Amarthya-v: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
ad2b5db
to
f1274bb
Compare
pkg/e2e/verify/nodepool.go
Outdated
ginkgo.GinkgoLogr.Error(err, "Failed to cleanup test NodePool", "name", testNodePoolName) | ||
} | ||
|
||
ginkgo.By("Cleaning up test pods") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't deleting the nodepool also delete its pods?
d9b4fe5
to
37a587c
Compare
pkg/e2e/verify/nodepool.go
Outdated
np, err := h.Dynamic().Resource(nodePoolGVR).Namespace(clusterNamespace). | ||
Get(ctx, testNodePoolName, metav1.GetOptions{}) | ||
if err != nil { return false, err } | ||
ready, _, _ := unstructured.NestedInt64(np.Object, "status", "readyReplicas") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to check for an error here?
37a587c
to
d0df745
Compare
d0df745
to
bc48a6d
Compare
pkg/e2e/verify/nodepool.go
Outdated
return false | ||
} | ||
|
||
func failWithSTSError(err error, operation, permissions string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not combine this with isSTSError
and call it something like failIfSTSError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/e2e/verify/nodepool.go
Outdated
|
||
func getClusterNamespace(h *helper.H) (string, error) { | ||
if name := viper.GetString(config.Cluster.Name); name != "" { | ||
return "clusters-" + name, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using fmt.Sprintf
in other functions but you're not using it here. Probably be good to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now using fmt.Sprintf consistently
pkg/e2e/verify/nodepool.go
Outdated
|
||
ginkgo.BeforeAll(func() { | ||
cloudProvider := viper.GetString(config.CloudProvider.CloudProviderID) | ||
if cloudProvider != "aws" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a constant here for the "aws" string and anywhere else you have "aws"? (like line 274)
pkg/e2e/verify/nodepool.go
Outdated
}) | ||
|
||
ginkgo.It("rejects operations on non-existent NodePools", func(ctx context.Context) { | ||
if clusterNamespace == "" || !canAccessNodePoolAPI(h, clusterNamespace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once here, clusterNamespace
should never by an empty string, should it? And, can the canAccessNodePoolAPI
test be done before any tests are run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved API access check to BeforeAll and removed redundant namespace checks
pkg/e2e/verify/nodepool.go
Outdated
}) | ||
|
||
ginkgo.It("rejects duplicate NodePool creation", func(ctx context.Context) { | ||
if clusterNamespace == "" || !canAccessNodePoolAPI(h, clusterNamespace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once here, clusterNamespace should never by an empty string, should it? And, can the canAccessNodePoolAPI test be done before any tests are run?
pkg/e2e/verify/nodepool.go
Outdated
}) | ||
|
||
ginkgo.AfterAll(func() { | ||
if createdNodePool && testNodePoolName != "" && clusterNamespace != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this conditional necessary? Are any of the tests doing any node pool cleanup? And, if they are, should they be?
pkg/e2e/verify/nodepool.go
Outdated
}) | ||
|
||
ginkgo.It("creates new NodePool successfully", func(ctx context.Context) { | ||
if clusterNamespace == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once here, clusterNamespace should never by an empty string, should it? And, can the canAccessNodePoolAPI test be done before any tests are run?
Note, this isn't consistent with the others.
pkg/e2e/verify/nodepool.go
Outdated
} | ||
|
||
expect.NoError(err, "NodePool creation failed") | ||
createdNodePool = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about putting any test that is dependent upon createdNodePool
being true
to be in the same block? Otherwise, it's kind of kludgy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouped NodePool creation and provisioning tests in same Describe block
pkg/e2e/verify/nodepool.go
Outdated
expect.NoError(err, "Failed to find ready NodePool nodes") | ||
Expect(len(nodePoolNodes)).To(BeNumerically(">", 0), "No ready NodePool nodes available") | ||
|
||
targetNode := &nodePoolNodes[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are several nodepool nodes in this slice? Would it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logic to prefer ready nodes instead of just taking the first
pkg/e2e/verify/nodepool.go
Outdated
expect.NoError(err, "Should create pod with invalid selector") | ||
defer client.Delete(ctx, pod) | ||
|
||
time.Sleep(30 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this arbitrary sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with proper wait condition using wait.For
pkg/e2e/verify/nodepool.go
Outdated
expect.NoError(err, "Failed to find ready NodePool nodes") | ||
Expect(len(nodePoolNodes)).To(BeNumerically(">", 0), "No ready NodePool nodes available") | ||
|
||
targetNode := &nodePoolNodes[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about the possibility of several nodepool nodes.
pkg/e2e/verify/nodepool.go
Outdated
} | ||
} | ||
|
||
func createPodWithSelector(namespace string, nodeSelector map[string]string) *corev1.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function is virtually equivalent to createTestPod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created single generic createPod function to eliminate duplication
pkg/e2e/verify/nodepool.go
Outdated
} | ||
} | ||
|
||
func createResourceIntensivePod(namespace, nodeName string) *corev1.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably also roll this into the same generic createPod
function.
pkg/e2e/verify/nodepool.go
Outdated
|
||
ginkgo.AfterAll(func() { | ||
if createdNodePool && testNodePoolName != "" && clusterNamespace != "" { | ||
cleanupNodePool(context.Background(), h, clusterNamespace, testNodePoolName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only done once, do you think you need a separate function to do it? What about inlining it here?
…etter namespace lookup
}) | ||
|
||
ginkgo.AfterAll(func() { | ||
if createdNodePool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want to move this entire block into the one that starts on line 95?
}) | ||
|
||
ginkgo.Describe("NodePool creation and provisioning", func() { | ||
ginkgo.It("creates new NodePool successfully", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want to create the nodepool in a BeforeAll
and then you can get rid of the createdNodePool
boolean?
Expect(apierrors.IsNotFound(err)).To(BeTrue(), "Should return NotFound error") | ||
}) | ||
|
||
ginkgo.It("rejects duplicate NodePool creation", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move this into the Describe
block above for the nodepool?
return "", fmt.Errorf("could not determine cluster namespace") | ||
} | ||
|
||
func canAccessNodePoolAPI(h *helper.H, namespace string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you want to fail the entire suite if this cannot be accessed and exit early? You could also then remove all of the canAccessAPI
checks.
for _, pattern := range patterns { | ||
if strings.Contains(msg, pattern) { | ||
ginkgo.Fail(fmt.Sprintf( | ||
"STS_PERMISSION_ERROR: %s failed due to missing AWS permissions (%s): %v. "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be cleaned up. There's no need to concat strings within a fmt.Sprintf
method call.
pod.Spec.NodeSelector = config.nodeSelector | ||
} | ||
|
||
if config.cpuLimit != "" && config.memLimit != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need resource limits? Aren't these short-lived pods? Are you anticipating resource spikes that would need to be throttled?
}, | ||
} | ||
|
||
if config.nodeName != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this check and the one below it? Can't it just be defined in the above struct?
This test validates STS permissions required for ROSA HCP NodePool operations:
The test performs day 2 operations by creating a NodePool, waiting for nodes
to provision, and scheduling workloads. Test failures block releases until
STS permissions are fixed, ensuring proper AWS access for cluster operations.
Fixes: https://issues.redhat.com/browse/OSD-28896