Skip to content

Operator terminates nodes via restart after k8s node removed via spot request termination. #312

@mleklund

Description

@mleklund

I have been toying with Opensearch on spot nodes. When a spot k8s node gets recalled, the opensearch node gets rescheduled as expected. What is unusual is that a rolling restart gets issued by the controller and any opensearch nodes with an index lower then the rescheduled node also are restarted. I expected the opensearch node to get rescheduled and go about it's normal business instead.

this happened in rapid succession:

content-data-3                                            1/1     Terminating   0          107m
content-data-0                                            1/1     Terminating   0          95m
content-data-1                                            1/1     Terminating   0          93m
content-data-2                                            1/1     Terminating   0          92m

where the k8s node (on EKS) containing content-data-3 was issued an interruption request.

I have tried turning off smartScaler and setting a PodDisruptionBudget, with no success.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    Status

    📦 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions