-
Notifications
You must be signed in to change notification settings - Fork 4
Description
What happened?
Description
Under bursty traffic, the peer can stall or deadlock when a handler sends many requests and synchronously waits for their responses. The synchronous Peer.process()
loop stops draining reader
, causing InnerConnection.loop() to block on forwarding inbound frames.
This is not the case only when the handler is waiting for its responses. If the handler is slow, you won’t get a deadlock, but you can still get Periodic stalls or reconnects and lose messages.
Details
1. Peer.process()
callsd.handler(...)
inline. While the handler is blocked waiting on its responses,
process()
is not draining d.reader
.
2. InnerConnection.loop()
attempts output <- data
(where output is Peer.reader
) and blocks because
reader
channel is unbuffered.
3. The response the handler is waiting for gets stuck behind the blocked send, creating a stall/deadlock.
4. Even if the handler is not waiting for its own responses but is slow, still, while the handler runs, it isn’t draining reader
, the connection loop blocks, pausing ping/pong, increasing latency, which causes messages to timeout and
risking reconnects if it blocks long enough
Root Cause
- Tight coupling:
Peer.process()
executes handlers inline, so it stops draining thereader
channel while the handler runs. - Backpressure:
InnerConnection.loop()
forwards inbound frames intoreader
. Whenprocess()
is in a handler, the channel is not drained, causing the connection loop to block on the block-send. - The problem manifests at higher concurrency (message rate) when implicit buffers (socket, websocket, internal channel) are exhausted.
Proposed Fixes
- Decouple handler execution from
process()
: Rund.handler(...)
in a goroutine or via a bounded worker pool - Bound handler concurrency using a semaphore to limit in-flight work to a safe number, preventing sustained backpressure.
- Buffer the peer’s inbound channel: Change
reader
to a buffered channel (tune size).
which network/s did you face the problem on?
Dev
Twin ID/s
No response
Version
No response
Node ID/s
No response
Farm ID/s
No response
Contract ID/s
No response
Relevant log output
NA
Metadata
Metadata
Assignees
Labels
Type
Projects
Status