-
Notifications
You must be signed in to change notification settings - Fork 6
feat: adjustable rule_group interval #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows for more expensive SLO recording rules by letting users create a custom Prometheus `rule_group.interval`. https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule_group Now instead of all rule groups using the default global evaluation interval, a custom interval can be set on all sets of recording rules for an SLO. If no `interval` is set then the global default will be assumed, matching current Sloth behavior. Resolves: slok#367
Related to how the unit test coverage works, this will make sure RuleGroupInterval is not appended to RuleGroups unless it is defined and not empty. Not requring the `yaml` file does an okay job of this but expliclity writing the rules without RuleGroupInterval is safer. It's a bit ugly and repetitive but I think that's just how Go works..
This updates tests include `interval`'s and also includes a test to make sure the rules render correctly when interval is not included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good! Since Sloth produces 3 rules groups, I think it'd be nice to support specifying separate intervals for each of them, though. (sloth-slo-meta-recordings
, sloth-slo-alerts
, and sloth-slo-sli-recordings
are the 3 prefixes). For example, you may want an interval of 2m
on the SLI recording rules but still 1m
on the alerts
.
Instead of a singular global default, now a rule_group interval can be set for every individual type of rule_group Sloth generates. The generic, `interval:all` rule will also stay and can "fill in" any missing per-rule group defaults. Along with the default behavior of doing nothing if no `interval` is specified.
Specific ones and general/specific combined
eb73e69
to
dc3994b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as it looks good as-is, but left some comments on form.
internal/prometheus/storage.go
Outdated
// 0s is default empty string value for time.Duration | ||
if slo.SLO.RuleGroupInterval.String() != "0s" || slo.SLO.SLIErrorRulesInterval.String() != "0s" { | ||
var ruleGroupIntervalDuration prommodel.Duration | ||
var err error | ||
|
||
// if we have a valid meta rule rule_group interval, use that first and overwrite any generic ones | ||
if slo.SLO.SLIErrorRulesInterval.String() != "0s" { | ||
ruleGroupIntervalDuration, err = prommodel.ParseDuration(slo.SLO.SLIErrorRulesInterval.String()) | ||
} else { | ||
ruleGroupIntervalDuration, err = prommodel.ParseDuration(slo.SLO.RuleGroupInterval.String()) | ||
} | ||
|
||
if err != nil { | ||
return fmt.Errorf("could not parse rule_group interval duration %w", err) | ||
} | ||
|
||
ruleGroups.Groups = append(ruleGroups.Groups, ruleGroupYAMLv2{ | ||
Name: fmt.Sprintf("sloth-slo-sli-recordings-%s", slo.SLO.ID), | ||
RuleGroupInterval: ruleGroupIntervalDuration, | ||
Rules: slo.Rules.SLIErrorRecRules, | ||
}) | ||
} else { | ||
ruleGroups.Groups = append(ruleGroups.Groups, ruleGroupYAMLv2{ | ||
Name: fmt.Sprintf("sloth-slo-sli-recordings-%s", slo.SLO.ID), | ||
Rules: slo.Rules.SLIErrorRecRules, | ||
}) | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can simplify this with a switch
statement like:
group := ruleGroupYAMLv2{
Name: fmt.Sprintf("sloth-slo-sli-recordings-%s", slo.SLO.ID),
Rules: slo.Rules.SLIErrorRecRules,
}
switch {
case AlertRulesInterval.String() != "0s":
ruleGroupIntervalDuration, err = time.ParseDuration(AlertRulesInterval.String())
if err != nil {
return fmt.Errorf("could not parse rule_group interval duration for alerts %w", err)
} else {
group.RuleGroupInterval = ruleGroupIntervalDuration
}
case RuleGroupInterval.String() != "0s":
ruleGroupIntervalDuration, err = time.ParseDuration(RuleGroupInterval.String())
if err != nil {
return fmt.Errorf("could not parse default ("all") rule_group interval duration %w", err)
} else {
group.RuleGroupInterval = ruleGroupIntervalDuration
}
}
ruleGroups.Groups = append(ruleGroups.Groups, group)
Co-authored-by: Will Hegedus <[email protected]>
…issues Dennisme/bump versions fix build issues
This allows for more expensive SLO recording rules by letting users create a custom Prometheus
rule_group.interval
.https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule_group
Now instead of all rule groups using the default global evaluation interval, a custom interval can be set on all sets of recording rules for an SLO.
If no
interval
is set then the global default will be assumed, matching current Sloth behavior.Resolves: slok#367