Skip to content

Conversation

maddeleine
Copy link
Contributor

Release Summary:

Fixed bug that caused a hang on Windows platforms.

Resolved issues:

resolves #2220

Background:

s2n-quic has a bug that causes a hang when receiving data on s2n-quic in Windows. The root cause of this issue is that the consumer of the shared ring buffer resets each payload to be its original length without syncing this change in between primary and secondary memory regions. This causes some payloads to remain very small, making it so that the Producer can only make tiny writes to the un-synced payloads. In Windows eventually this causes a hang.
Notably, this issue was not present on unix as the msghdr struct accesses payload lengths through pointers, meaning that primary and secondary memory regions do not need to be synced for payload length changes.

Another interesting point is that this problem could be seen in our integration tests, even though it did not cause them to fail. I opened #2805 to see if we can fix this.

Description of changes:

I changed the consumer to sync only the payload lengths for each entry that the producer touched. The reason why I did not go with the solution in this PR #2786 is that I am worried about adding another copy call to a hot codepath. I also considered #2806, but that feels like a larger change than this fix.

Huge credit to BiagioFesta for helping with the investigation of this bug as well as helping with a fix. This PR is mostly code pulled from PR #2786.

Call-outs:

Note that the MTU amount of data sent had to be increased as we are not dropping packets in these tests anymore. As a result, the test speeds up and we don't have enough RTTs to finish the MTU search.

Testing:

Includes a new test for the consumer modifying the data. I also no longer see DecryptionFailed events or truncated packets in the MTU tests after this change.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Traffic stops after many streams on Windows
2 participants