fix #5152: expanding the error detection #5153

shawkins · 2023-05-19T03:11:50Z

Description

As seen on #5152 some jdk client implementations do call onError with connection related errors (my local one would call onClose instead). The proposed change is to further differentiate what are protocol errors, so that only select exceptions are treated as terminal. Note that logic like this would be applicable in the http scenario as well, but no changes were made there.

Another possible issue I noticed in doing these changes is the behavior in WatchConnectionManager.start - if a websocket has already been established, then a subsequent connection attempt can short-circuit on failure before scheduling another attempt if a 200 or 503 occurs. That short-circuit should only be for the first attempt, where we fall-back to an http watch instead. It seems like there is probably some improvement to be had with the http case - also #4624 was never addressed.

However comparing to the go client, it will only retry a watch when the connection is refused or if there have been too many requests: https://sourcegraph.com/github.com/kubernetes/client-go@2a5f18df73b70cb85c26a3785b06162f3d513cf5/-/blob/tools/cache/reflector.go?L418 - so it seems like they would have a similar issue if 200 or 503 where returned.

Type of change

Bug fix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change
Chore (non-breaking change which doesn't affect codebase;
test, version modification, documentation, etc.)

Checklist

Code contributed by me aligns with current project license: Apache 2.0
I Added CHANGELOG entry regarding this change
I have implemented unit tests to cover my changes
I have added/updated the javadocs and other documentation accordingly
No new bugs, code smells, etc. in SonarCloud report
I tested my code in Kubernetes
I tested my code in OpenShift

shawkins · 2023-05-19T15:51:24Z

@manusa @scholzj this refines the changes in #5047 to address #5047 (comment) which was observed in #5152. We'll specifically look for a protocolexception as the only case to assume we should proactively terminate - with the expectation that the httpclient will have normalized to that. For everything else there will be some logging based upon the inferred severity and number of times we've seen that termination since we've made progress with the watch.

The calls to scheduleReconnect have been consolidated to ensure consistent handling of exceptions - the http watch exceptions have not been fully normalized, but at worst this will result in additional info logs or looping on unresolvable protocol errors.

scholzj · 2023-05-19T16:32:30Z

Thanks @shawkins

…d logging also obeying the Status retryAfterSeconds if provided

…rted

shawkins · 2023-05-22T12:06:53Z

@manusa at least for the jdk client this causes pretty bad behavior with watches / informers, so another 6.6 may be in order. For the other clients it shouldn't be as impactful.

… handling

iss5152

shawkins · 2023-05-23T17:18:38Z

@manusa @scholzj After updating to master this was updated to harden the websocket implementation contracts - returning a boolean for send and sendClose is a little problematic. A non-terminal error writing would mean that any additional writes would be invalid. To ensure the expected behavior and let us know if something is wrong, callbacks will log and even terminate the websocket if possible. Similar logic / logging was added / made more robust for sendClose - with a delayed termination if a remote close has not been received.

There is currently no method exposed for the vertx websocket to force a termination, so if anyone hits one of the error logs, we'll need to have something added or that specific error fixed.

scholzj

Thanks a lot for the effort you put into this @shawkins!

also adding logging and refining termination

sonarqubecloud · 2023-05-29T14:34:26Z

SonarCloud Quality Gate failed.

2 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

59.0% Coverage
0.0% Duplication

shawkins force-pushed the iss5152 branch from 6794387 to dcb1ae1 Compare May 19, 2023 13:44

shawkins marked this pull request as ready for review May 19, 2023 13:44

shawkins requested review from manusa, oscerd, rohanKanojia and sunix as code owners May 19, 2023 13:44

shawkins force-pushed the iss5152 branch 4 times, most recently from 47111a2 to 77f1319 Compare May 19, 2023 15:48

shawkins force-pushed the iss5152 branch from 77f1319 to d0bfa4e Compare May 19, 2023 17:35

fix fabric8io#5152: consolidating logic for watch ending with improve…

c3f95f1

…d logging also obeying the Status retryAfterSeconds if provided

shawkins force-pushed the iss5152 branch from d0bfa4e to c3f95f1 Compare May 19, 2023 22:14

shawkins added 2 commits May 20, 2023 10:27

4.4.2 includes a fix to address demand with ping/pong

8b418c6

fix fabric8io#5152 adding a fail-safe check that the watch gets resta…

5d48cf3

…rted

shawkins force-pushed the iss5152 branch from 29c938c to 5d48cf3 Compare May 21, 2023 14:10

shawkins added 2 commits May 23, 2023 13:02

fix fabric8io#5152: ensuring ws errors are logged and expanding close…

623b68d

… handling

Merge branch 'master' of github.com:fabric8io/kubernetes-client into

050af68

iss5152

scholzj approved these changes May 23, 2023

View reviewed changes

shawkins force-pushed the iss5152 branch from 9bf604c to 1b36bd9 Compare May 24, 2023 01:42

fix fabric8io#5152: correcting the jetty ws close expectation

949ba8b

also adding logging and refining termination

shawkins force-pushed the iss5152 branch from 1b36bd9 to 949ba8b Compare May 24, 2023 11:40

Merge branch 'master' into iss5152

605babb

shawkins added this to the 6.7.0 milestone May 29, 2023

Merge branch 'master' into iss5152

34ad01c

manusa approved these changes May 29, 2023

View reviewed changes

rohanKanojia approved these changes May 29, 2023

View reviewed changes

manusa merged commit 1c3baf9 into fabric8io:master May 29, 2023

shawkins mentioned this pull request May 31, 2023

Kubernetes websocket watches silently dying #5189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix #5152: expanding the error detection #5153

fix #5152: expanding the error detection #5153

Uh oh!

shawkins commented May 19, 2023

Uh oh!

shawkins commented May 19, 2023

Uh oh!

scholzj commented May 19, 2023

Uh oh!

shawkins commented May 22, 2023

Uh oh!

shawkins commented May 23, 2023

Uh oh!

scholzj left a comment

Uh oh!

sonarqubecloud bot commented May 29, 2023

Uh oh!

Uh oh!

fix #5152: expanding the error detection #5153

fix #5152: expanding the error detection #5153

Uh oh!

Conversation

shawkins commented May 19, 2023

Description

Type of change

Checklist

Uh oh!

shawkins commented May 19, 2023

Uh oh!

scholzj commented May 19, 2023

Uh oh!

shawkins commented May 22, 2023

Uh oh!

shawkins commented May 23, 2023

Uh oh!

scholzj left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented May 29, 2023

Uh oh!

Uh oh!