Skip to content

Conversation

fsb4000
Copy link
Contributor

@fsb4000 fsb4000 commented Apr 15, 2023

Fixes #3602

@fsb4000 fsb4000 requested a review from a team as a code owner April 15, 2023 14:05
@AlexGuteniev

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Apr 15, 2023
@StephanTLavavej

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as resolved.

@CaseyCarter CaseyCarter self-assigned this May 3, 2023
@zwhfly

This comment was marked as resolved.

@StephanTLavavej
Copy link
Member

StephanTLavavej commented Feb 12, 2024

After hiding from this PR for a year 😹, I've finally looked at it and I believe I'm comfortable with the change (after pushing a fix/test for "owning the null pointer"). I'm definitely happy that @AlexGuteniev's suggested test case has been added, I don't think that this poses significant regression risks, and I don't think that there's any vNext interaction (this is just a behavioral change; mix-and-match scenarios shouldn't be harmed beyond maybe not getting the fix).

Edit: I've been convinced on Discord that we are, in fact, doomed. ☠️

@AlexGuteniev
Copy link
Contributor

I'm afraid the PR doesn't fully fix hang in any of the scenario - it only makes hangs unlikely, but still possible, thus introducing inconsistency.

@StephanTLavavej StephanTLavavej added blocked Something is preventing work on this decision needed We need to choose something before working on this labels Feb 12, 2024
@AlexGuteniev
Copy link
Contributor

I don't want to provide an example anymore as there's a consensus that the problem is very infrequently reproducible, yet existent.

@StephanTLavavej suggested that we can have timeout for correctness, @BillyONeal suggested that exponential back-off is an acceptable implementation of atomic wait.

We can try our best and make atomic wait with exponential timeout. This is not abi-breaking, as we still wait on the same address. I think that the PR should be closed with this PR with exponential back-out added.

Another issue can be open to consider something better for vNext. Note that SRWLOCK+CONDITION_VARIABLE isn't necessarily better due to space overhead and extra atomic ops.

@StephanTLavavej StephanTLavavej removed blocked Something is preventing work on this decision needed We need to choose something before working on this labels Feb 13, 2024
@StephanTLavavej
Copy link
Member

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

for (;;) {
auto _Rep = _Repptr._Lock_and_load();
bool _Equal = _Ptr.load(memory_order_relaxed) == _Old;
bool _Equal = _Ptr.load(memory_order_relaxed) == _Old_ptr && _Rep == _Old_rep;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change requested; we should do this in a followup if at all to avoid resetting testing, since it's not a correctness issue. Should this be reordered as:

Suggested change
bool _Equal = _Ptr.load(memory_order_relaxed) == _Old_ptr && _Rep == _Old_rep;
bool _Equal = _Rep == _Old_rep && _Ptr.load(memory_order_relaxed) == _Old_ptr;

so we short-circuit when the core-local _Rep == _Old_rep is false before potentially loading _Ptr's cache line from some other core's data cache? (I suspect I understand cache coherence protocols just well enough to be dangerous.) Would any difference just be noise compared to the expense of _Repptr._Lock_and_load()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Pulling in @AlexGuteniev to render an opinion and/or tell me how wrong I am 😄.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vague understanding is that a relaxed load has no barriers, so it has no special costs. It would be fine to reorder, though, since the .load is at least a debug mode function call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't _Repptr and _Ptr in the same cache line? Sure, whatever they point to can be anywhere, but the pointers themselves are (as far as I can see) right beside each other.

(STL's comment sounds accurate, though.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they are always in the same cache line (it is really always, not just most of the time, because we over align to alignas(2 * sizeof(void*))). So from the cache perspective it does not matter.

I agree also that eliminating debug mode call can be potentially beneficial.

Still I more like the original order, because we compare _Rep == _Old_rep; the very last thing, as close to wait as possible, so reduce the probability of resorting to timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering _Rep is just a stack-local and the real load is on the line above, I can't quite follow your reasoning.

Copy link
Contributor

@AlexGuteniev AlexGuteniev Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering _Rep is just a stack-local and the real load is on the line above, I can't quite follow your reasoning.

Hm, you are right, we done loading it into register at the same time anyway.

Then debug mode call saving could be the main decision factor here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'd forgotten that we overalign - my cache coherence point is invalid. +1 to avoiding the call in debug compiles, though. We really want to minimize the time between load and check.

@CaseyCarter CaseyCarter removed their assignment Feb 16, 2024
@StephanTLavavej StephanTLavavej merged commit dc1e003 into microsoft:main Feb 16, 2024
@StephanTLavavej
Copy link
Member

Thanks for fixing this runtime correctness bug - what a rollercoaster of despair and elation! 📉 📈 😹 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

std::atomic<std::shared_ptr>::wait does not seem to care about control block difference. Is this a bug?
6 participants