-
Notifications
You must be signed in to change notification settings - Fork 204
Open
Description
Description
We have a lot of SLOs, a large amount of them are for a singular system class which results in us having a massive rules file for these specific SLOs. Unfortunately, this breaks in Kubernetes when our ConfigMap(s) exceed 4MiB and causes our custom SlothSLOGenerationFailure
alert to fire, which is defined with the following expression:
expr: "sum(rate(kooper_controller_processed_event_duration_seconds_count{job=\"prometheus/sloth-kube-prometheus\",success=\"false\"}[30m])) > 0"
Ideally, we should be able to detect if a rule file is larger than 4MiB and if so, split it up into multiple files (ConfigMaps)
Seems to be hardcoded in the prometheus-operator: https://github.com/prometheus-operator/prometheus-operator/blob/370a2ea18a48000e2ea4bc05acb093502915f5c9/pkg/operator/rules.go#L55-L59
wbollock, wbh1 and johuang-Akamai
Metadata
Metadata
Assignees
Labels
No labels