Skip to content

Conversation

AlexGuteniev
Copy link
Contributor

I'm not completely sure what exactly ARM64EC is (#2740).
But the upstream memchr / wmemchr are apparently properly optimized for it.

I didn't benchmark or test this change. It seems to compile though.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner June 16, 2025 06:31
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jun 16, 2025
@YexuanXiao
Copy link
Contributor

Here is a technical note written in 2024 about ARM64EC and x64 emulation. For writing C++ code, the key points are: ARM64EC is aarch64 code. However, during preprocessing, it masquerades as x64 through macros, not ARM64. Therefore, if anyone wants to optimize ARM64EC performance, they only need to be careful not to use AMD64-specific vector intrinsics, as these are provided by softintrin.lib.

@AlexGuteniev
Copy link
Contributor Author

Here is a technical note written in 2024 about ARM64EC and x64 emulation

Thanks, I now see that ARM64 intrinsics are the way to go, also no AVX+ intrinsics are expected.

Still, I'm not sure how bad the emulation is, so not sure if emulated SSE4.2 is neccessarily worse than native scalar.

@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Jun 16, 2025
@StephanTLavavej StephanTLavavej self-assigned this Jun 16, 2025
@StephanTLavavej StephanTLavavej requested a review from Copilot June 27, 2025 14:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to optimize the find algorithms for the ARM64EC platform by using the optimized C-library functions memchr and wmemchr.

  • Introduces an ARM64EC-specific branch in __std_find_trivial_1 using memchr.
  • Introduces an ARM64EC-specific branch in __std_find_trivial_2 using wmemchr with type casts and a modified element count.

Also, we don't need to static_cast from uint16_t to wchar_t. That's a value-preserving integral conversion.
@StephanTLavavej StephanTLavavej removed their assignment Jun 27, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jun 27, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Jul 14, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 7464e9a into microsoft:main Jul 15, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Jul 15, 2025
@StephanTLavavej
Copy link
Member

Thanks for noticing and improving this performance (hopefully!). 🚀 🐈 🐈‍⬛

@AlexGuteniev AlexGuteniev deleted the arm64ec-memchr branch July 15, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM64 Related to the ARM64 architecture performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants