`std::atomic<std::shared_ptr>::wait` should compare control blocks #3655

fsb4000 · 2023-04-15T14:05:17Z

Fixes #3602

stl/inc/memory

Co-authored-by: Alex Guteniev <[email protected]>

Resolved trivial adjacent-edit merge conflict in tests/std/include/test_atomic_wait.hpp.

…ull pointer.

stl/inc/memory

tests/std/include/test_atomic_wait.hpp

StephanTLavavej · 2024-02-12T21:18:59Z

After hiding from this PR for a year 😹, I've finally looked at it and I believe I'm comfortable with the change (after pushing a fix/test for "owning the null pointer"). I'm definitely happy that @AlexGuteniev's suggested test case has been added, I don't think that this poses significant regression risks, and I don't think that there's any vNext interaction (this is just a behavioral change; mix-and-match scenarios shouldn't be harmed beyond maybe not getting the fix).

Edit: I've been convinced on Discord that we are, in fact, doomed. ☠️

AlexGuteniev · 2024-02-12T22:03:03Z

I'm afraid the PR doesn't fully fix hang in any of the scenario - it only makes hangs unlikely, but still possible, thus introducing inconsistency.

AlexGuteniev · 2024-02-13T09:22:42Z

I don't want to provide an example anymore as there's a consensus that the problem is very infrequently reproducible, yet existent.

@StephanTLavavej suggested that we can have timeout for correctness, @BillyONeal suggested that exponential back-off is an acceptable implementation of atomic wait.

We can try our best and make atomic wait with exponential timeout. This is not abi-breaking, as we still wait on the same address. I think that the PR should be closed with this PR with exponential back-out added.

Another issue can be open to consider something better for vNext. Note that SRWLOCK+CONDITION_VARIABLE isn't necessarily better due to space overhead and extra atomic ops.

stl/inc/memory

StephanTLavavej · 2024-02-15T23:09:14Z

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

CaseyCarter · 2024-02-16T20:08:35Z

stl/inc/memory

        for (;;) {
            auto _Rep   = _Repptr._Lock_and_load();
-            bool _Equal = _Ptr.load(memory_order_relaxed) == _Old;
+            bool _Equal = _Ptr.load(memory_order_relaxed) == _Old_ptr && _Rep == _Old_rep;


No change requested; we should do this in a followup if at all to avoid resetting testing, since it's not a correctness issue. Should this be reordered as:

Suggested change

bool _Equal = _Ptr.load(memory_order_relaxed) == _Old_ptr && _Rep == _Old_rep;

bool _Equal = _Rep == _Old_rep && _Ptr.load(memory_order_relaxed) == _Old_ptr;

so we short-circuit when the core-local _Rep == _Old_rep is false before potentially loading _Ptr's cache line from some other core's data cache? (I suspect I understand cache coherence protocols just well enough to be dangerous.) Would any difference just be noise compared to the expense of _Repptr._Lock_and_load()?

(Pulling in @AlexGuteniev to render an opinion and/or tell me how wrong I am 😄.)

My vague understanding is that a relaxed load has no barriers, so it has no special costs. It would be fine to reorder, though, since the .load is at least a debug mode function call.

Aren't _Repptr and _Ptr in the same cache line? Sure, whatever they point to can be anywhere, but the pointers themselves are (as far as I can see) right beside each other.

(STL's comment sounds accurate, though.)

I agree that they are always in the same cache line (it is really always, not just most of the time, because we over align to alignas(2 * sizeof(void*))). So from the cache perspective it does not matter.

I agree also that eliminating debug mode call can be potentially beneficial.

Still I more like the original order, because we compare _Rep == _Old_rep; the very last thing, as close to wait as possible, so reduce the probability of resorting to timeout.

Considering _Rep is just a stack-local and the real load is on the line above, I can't quite follow your reasoning.

Considering _Rep is just a stack-local and the real load is on the line above, I can't quite follow your reasoning.

Hm, you are right, we done loading it into register at the same time anyway.

Then debug mode call saving could be the main decision factor here.

Ah, I'd forgotten that we overalign - my cache coherence point is invalid. +1 to avoiding the call in debug compiles, though. We really want to minimize the time between load and check.

StephanTLavavej · 2024-02-16T21:05:13Z

Thanks for fixing this runtime correctness bug - what a rollercoaster of despair and elation! 📉 📈 😹 🎉

std::atomic<std::shared_ptr>::wait should compare control blocks

3009cb0

fsb4000 requested a review from a team as a code owner April 15, 2023 14:05

AlexGuteniev approved these changes Apr 15, 2023

View reviewed changes

stl/inc/memory Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

StephanTLavavej added the bug Something isn't working label Apr 15, 2023

This comment was marked as resolved.

Sign in to view

CaseyCarter self-assigned this May 3, 2023

fsb4000 and others added 3 commits May 6, 2023 11:57

Merge branch 'main' into fix3602

de27d34

add a test case

d25fc84

Co-authored-by: Alex Guteniev <[email protected]>

P1135R6_atomic_wait_vista hangs

e1422f6

This comment was marked as resolved.

Sign in to view

StephanTLavavej added 7 commits February 12, 2024 11:19

Merge branch 'main' into fix3602

8e9aa09

Resolved trivial adjacent-edit merge conflict in tests/std/include/test_atomic_wait.hpp.

_Old_Rep => _Old_rep

bd5181b

_Old => _Old_ptr for clarity.

9aeb9a0

Bugfix: Must share ownership or both be empty, even when owning the n…

506e941

…ull pointer.

Add const to the level parameter.

9716763

Drop unnecessary braces.

50943d9

Also test shared_ptrs that own the null pointer.

dc0a582

StephanTLavavej reviewed Feb 12, 2024

View reviewed changes

StephanTLavavej approved these changes Feb 12, 2024

View reviewed changes

StephanTLavavej unassigned CaseyCarter Feb 12, 2024

StephanTLavavej mentioned this pull request Feb 12, 2024

std::atomic<std::shared_ptr>::wait does not seem to care about control block difference. Is this a bug? #3602

Closed

StephanTLavavej added blocked Something is preventing work on this decision needed We need to choose something before working on this labels Feb 12, 2024

use growing timeout with __std_atomic_wait_direct

f837e8a

AlexGuteniev approved these changes Feb 13, 2024

View reviewed changes

starting timeout = 16

014a8c7

StephanTLavavej removed blocked Something is preventing work on this decision needed We need to choose something before working on this labels Feb 13, 2024

fsb4000 and others added 2 commits February 13, 2024 22:50

limit timeout to 1 million milliseconds (~17 minutes)

07a6903

Comment units.

fc399ef

StephanTLavavej reviewed Feb 13, 2024

View reviewed changes

stl/inc/memory Outdated Show resolved Hide resolved

StephanTLavavej approved these changes Feb 13, 2024

View reviewed changes

StephanTLavavej assigned CaseyCarter and StephanTLavavej Feb 14, 2024

CaseyCarter approved these changes Feb 16, 2024

View reviewed changes

CaseyCarter removed their assignment Feb 16, 2024

StephanTLavavej merged commit dc1e003 into microsoft:main Feb 16, 2024

fsb4000 deleted the fix3602 branch February 19, 2024 11:11

AlexGuteniev mentioned this pull request Mar 24, 2025

<atomic>: reconsider atomic wait on shared_ptr #5356

Open

	bool _Equal = _Ptr.load(memory_order_relaxed) == _Old_ptr && _Rep == _Old_rep;
	bool _Equal = _Rep == _Old_rep && _Ptr.load(memory_order_relaxed) == _Old_ptr;

std::atomic<std::shared_ptr>::wait should compare control blocks #3655

std::atomic<std::shared_ptr>::wait should compare control blocks #3655

Uh oh!

Conversation

fsb4000 commented Apr 15, 2023

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexGuteniev commented Feb 12, 2024

Uh oh!

AlexGuteniev commented Feb 13, 2024

Uh oh!

Uh oh!

StephanTLavavej commented Feb 15, 2024

Uh oh!

CaseyCarter Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

CaseyCarter Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

StephanTLavavej Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

Alcaro Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

AlexGuteniev Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

Alcaro Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

AlexGuteniev Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CaseyCarter Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

StephanTLavavej commented Feb 16, 2024

Uh oh!

Uh oh!

`std::atomic<std::shared_ptr>::wait` should compare control blocks #3655

`std::atomic<std::shared_ptr>::wait` should compare control blocks #3655

StephanTLavavej commented Feb 12, 2024 •

edited

Loading

AlexGuteniev Feb 16, 2024 •

edited

Loading