-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Problem?
cadvisor
has an EventChannel
where currently 4 types of events are pushed.
Specifically a EventType.EventContainerDeleteion
event is pushed when manager
detects a container is deleted and it currently only captures the timestamp of the deletion event and name of the container.
However, since the container can be deleted due to many chaos scenarios, the current information is not sufficient enough to capture why the container is deleted. There's a specific case added for out of memory issues, however rather than accommodating all such scenarios individually, it will be a good idea to also include the exit code of the main process of the container.
We specifically have a use case where we want to emit a metric when a container is killed with a non-zero exit code and we use cadvisor
as our main container observability tool. We tried using metrics from orchestrators (k8s) however they are not reliable in our environment. Subscribing to the EventContainerDeletion
in the event channel looked like a good approach for us, but it lacks the exit code. We do have a work around but it might be better to include this info from the source itself.
Mostly all container runtime provides a way of getting the exit code for their exited containers like docker has client.ContainerInspect
, containerd API also provide a way to get the exit status using the taskClient.
Possible Solution
This can be achieved by adding a new field here
type EventData struct {
// Information about an OOM kill event.
OomKill *OomKillEventData `json:"oom,omitempty"`
ContainerExitCode int8 `json:"conainer_exit_code,omitempty`
}
and then updating the logic here
Lines 1042 to 1052 in 5adb1c3
contRef, err := cont.handler.ContainerReference() | |
if err != nil { | |
return err | |
} | |
newEvent := &info.Event{ | |
ContainerName: contRef.Name, | |
Timestamp: time.Now(), | |
EventType: info.EventContainerDeletion, | |
} | |
err = m.eventHandler.AddEvent(newEvent) |
to
contRef, err := cont.handler.ContainerReference()
if err != nil {
return err
}
contExitCode, err := cont.handler.GetExitCode() ## New method to the ContainerHandler interface
if err != nil {
return err
}
newEvent := &info.Event{
ContainerName: contRef.Name,
Timestamp: time.Now(),
EventType: info.EventContainerDeletion,
EventData: info.EventData{
ContainerExitCode: contExitCode,
},
}
Then each container runtime can individually implement its way of getting the exit code.