-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
Description
Problem:
The Grafana dashboards defined in grafana-dashboardDefinitions.yaml include graphs for memory consumption per pod. The memory consumption query currently used is:
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container)",
"legendFormat": "__auto"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum(\n kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
"legendFormat": "requests"
},
{
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"expr": "sum(\n kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
"legendFormat": "limits"
}
],
"title": "Memory Usage (WSS)",
"type": "timeseries"
},
When a pod is restarted, the current query adds memory usage data from both the old and new containers simultaneously. This can lead to temporary spikes in the displayed memory consumption. As a result, the dashboard may show memory usage that exceeds the container's memory limit, even though the actual memory consumption is within the limit.
Steps to Reproduce:
- Trigger a pod restart (e.g OOM kill, or Evict).
- Compare graphs with expression grouped by just
container
field with graph that has expression that groups bycontainer
andid
:
"expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container, id)"
janjaros, cpotteck, pl-fb and froblesmartin