Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted

## Problem:
The Grafana dashboards defined in grafana-dashboardDefinitions.yaml include graphs for memory consumption per pod. The memory consumption query currently used is:


https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/grafana-dashboardDefinitions.yaml#L8300
```
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container)",
                          "legendFormat": "__auto"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(\n    kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
                          "legendFormat": "requests"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(\n    kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
                          "legendFormat": "limits"
                      }
                  ],
                  "title": "Memory Usage (WSS)",
                  "type": "timeseries"
              },
```

When a pod is restarted, the current query adds memory usage data from both the old and new containers simultaneously. This can lead to temporary spikes in the displayed memory consumption. As a result, the dashboard may show memory usage that exceeds the container's memory limit, even though the actual memory consumption is within the limit.

![Screenshot 2024-09-17 at 14 20 49](https://github.com/user-attachments/assets/32fc1100-2671-497f-ae5d-424b1cfa4596)
![Screenshot 2024-09-17 at 14 22 55 (1)](https://github.com/user-attachments/assets/edfc8706-1bdd-46dc-a670-b7c04f973193)

## Steps to Reproduce:

* Trigger a pod restart (e.g OOM kill, or Evict).
* Compare graphs with expression grouped by just `container` field with graph that has expression that groups by `container` and `id`:
```
"expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container, id)"
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

Problem:

Steps to Reproduce:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

Description

Problem:

Steps to Reproduce:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions