Skip to content

NFD worker POD CrashLoopBackoff on GPU node with SELinux enabled. #2155

@RangaSamudrala

Description

@RangaSamudrala

NFD worker POD logs show log entry below:

failed to get self pod, cannot inherit ownerReference for NodeFeature. Get https://10.43.0.1:443/api/v1/namespaces/gpu-operator/pods/gpu-operator-node-feature-discovery-worker-xxxx. dial tcp 10.43.0.1:443 I/O timeout

The machine in which GPU operator is configured is an RHEL v9.5 with SELinux enabled but configured to be ```permissive``

  • NFD version: v0.16.6
  • GPU Operator Version: v25.3.0
  • OS: RHEL 9.5
  • Kernel Version: 5.14.0-503.35.1.el9_5.x86_64
  • Container Runtime Version: v1.7.23-k3s2
  • Kubernetes Distro and Version: Rancher v1.31.4 RKE2

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions