-
-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Describe the bug
The UptimeRobot API sometimes fails with this error:
{"level":"error","ts":1750345098.7199364,"logger":"http-client","msg":"","error":"Post \"https://api.uptimerobot.com/v2/getMonitors\": unexpected EOF","stacktrace":"github.com/stakater/IngressMonitorController/v2/pkg/http.(*HttpClient).RequestWithHeaders\n\t/workspace/pkg/http/httpClient.go:52\ngithub.com/stakater/IngressMonitorController/v2/pkg/http.(*HttpClient).PostUrlEncodedFormBody\n\t/workspace/pkg/http/httpClient.go:91\ngithub.com/stakater/IngressMonitorController/v2/pkg/monitors/uptimerobot.(*UpTimeMonitorService).GetByName\n\t/workspace/pkg/monitors/uptimerobot/uptime-monitor.go:53\ngithub.com/stakater/IngressMonitorController/v2/pkg/monitors.(*MonitorServiceProxy).GetByName\n\t/workspace/pkg/monitors/monitor-proxy.go:105\ngithub.com/stakater/IngressMonitorController/v2/pkg/controllers.findMonitorByName\n\t/workspace/pkg/controllers/endpointmonitor_util.go:10\ngithub.com/stakater/IngressMonitorController/v2/pkg/controllers.(*EndpointMonitorReconciler).Reconcile\n\t/workspace/pkg/controllers/endpointmonitor_controller.go:92\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"}
I'm not aware of the exact root cause. I was able to reproduce consistently when requesting the API several times per seconds.
Unfortunately, the consequence is that the controller panics a few line later.
The issue is that code at https://github.com/stakater/IngressMonitorController/blob/master/pkg/http/httpClient.go#L52 does not return when it's failing. As a consequence, at https://github.com/stakater/IngressMonitorController/blob/master/pkg/http/httpClient.go#L58, the code tries to dereference response
to access response.StatusCode
. This panics:
{"level":"info","ts":1750345098.720687,"msg":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"endpointmonitor","controllerGroup":"endpointmonitor.stakater.com","controllerKind":"EndpointMonitor","endpointMonitor":{"name":"redacted","namespace":"redacted"},"namespace":"redacted","name":"redacted","reconcileID":"redacted"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x181e1df]
goroutine 289 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0x1da
panic({0x1a3e740?, 0x2eb02d0?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/stakater/IngressMonitorController/v2/pkg/http.(*HttpClient).RequestWithHeaders(0xc00063b728, {0x1d0ed0d, 0x4}, {0xc0000e0c80, 0x7f, 0x80}, 0xc00063b788)
/workspace/pkg/http/httpClient.go:58 +0x33f
github.com/stakater/IngressMonitorController/v2/pkg/http.(*HttpClient).PostUrlEncodedFormBody(...)
/workspace/pkg/http/httpClient.go:91
github.com/stakater/IngressMonitorController/v2/pkg/monitors/uptimerobot.(*UpTimeMonitorService).GetByName(0xc00079e410, {0xc0001621b0, 0x2a})
/workspace/pkg/monitors/uptimerobot/uptime-monitor.go:53 +0x1de
github.com/stakater/IngressMonitorController/v2/pkg/monitors.(*MonitorServiceProxy).GetByName(...)
/workspace/pkg/monitors/monitor-proxy.go:105
github.com/stakater/IngressMonitorController/v2/pkg/controllers.findMonitorByName(0x0?, {0xc0001621b0?, 0x1?})
/workspace/pkg/controllers/endpointmonitor_util.go:10 +0x22
github.com/stakater/IngressMonitorController/v2/pkg/controllers.(*EndpointMonitorReconciler).Reconcile(0xc00079e460, {0x30?, 0xc00007e800?}, {{{0xc000686ae0, 0x10}, {0xc000686aa0, 0x7}}})
/workspace/pkg/controllers/endpointmonitor_controller.go:92 +0x4c5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1fb42e8?, {0x1fb42b0?, 0xc002810780?}, {{{0xc000686ae0?, 0x1c330a0?}, {0xc000686aa0?, 0xc00063bda8?}}})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00048c460, {0x1fb42e8, 0xc00079e2d0}, {0x1ad6120?, 0xc000632280?})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 +0x318
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00048c460, {0x1fb42e8, 0xc00079e2d0})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 +0x1af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 264
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:230 +0x565
To Reproduce
We were able to reproduce by creating and deleting an EndpointMonitor
every second.
Expected behavior
I would expect the controller to gracefully handle the error from the API.
Environment
- Operator Version: Current last version (2.2.3)
- Kubernetes/OpenShift Version: 1.32.2
Additional context
We implemented a patch locally that force this function to return a StatusCode ServerError:
if err != nil {
log.Error(err, "")
+ return HttpResponse{
+ StatusCode: http.StatusInternalServerError,
+ Header: make(map[string][]string),
+ }
} else if response == nil {
log.Error(nil, "got empty response")
+ return HttpResponse{
+ StatusCode: http.StatusInternalServerError,
+ Header: make(map[string][]string),
+ }
}
This works perfectly. However, this is not ideal.
I'm not fluent in Golang, but I guess in a perfect world this function should return an Error. Because this is a core library, we would need to patch a lot of calls to handle the errors.
The patch we did does not have this issue, and should break a little less the API. However, returning http.StatusInternalServerError
is quite arbitrary.
What do you think? Do you want me to open a PR with this changes?