Skip to content

Conversation

muellerj2
Copy link
Contributor

@muellerj2 muellerj2 commented May 15, 2025

Towards #5468. This is a small change that greatly speeds up searches for regexes like a+ that start with some letter/string/character class followed by a + quantifier (or any other quantifier requiring at least one repetition). Because this loop must be matched at least once, we can enter the repeated subpattern and look for the first position a letter/string/character class in the subpattern can match.

While working on this, I noticed that I didn't think the implementation of text_regex::should_search_match_capture_groups() through well enough: I designed it to use relative coordinates for expected submatches, but this isn't so helpful when one wants to ensure that the whole match is in a particular position. So I changed the implementation to use absolute coordinates from the start of the matched string. Luckily, no test seems to have relied on the previous behavior, meaning all of them just matched the start of the input string anyway.

Benchmark

Running on my machine:

Benchmark Before After Speedup
bm_lorem_search/"bibe"/2 38504 ns 39237 ns 0.98
bm_lorem_search/"bibe"/3 76730 ns 76730 ns 1.00
bm_lorem_search/"bibe"/4 153460 ns 153460 ns 1.00
bm_lorem_search/"(bibe)+"/2 4814680 ns 92076 ns 52.29
bm_lorem_search/"(bibe)+"/3 9521484 ns 204041 ns 46.66
bm_lorem_search/"(bibe)+"/4 18158784 ns 401088 ns 45.27
bm_lorem_search/"(?:bibe)+"/2 4743304 ns 97656 ns 48.57
bm_lorem_search/"(?:bibe)+"/3 9521484 ns 192540 ns 49.45
bm_lorem_search/"(?:bibe)+"/4 19531250 ns 384976 ns 50.73

@muellerj2 muellerj2 requested a review from a team as a code owner May 15, 2025 20:31
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews May 15, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels May 15, 2025
@StephanTLavavej StephanTLavavej self-assigned this May 15, 2025
@StephanTLavavej StephanTLavavej removed their assignment May 16, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 16, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 16, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej
Copy link
Member

I resolved a trivial adjacent-add conflict with #5494 in VSO_0000000_regex_use.

@StephanTLavavej StephanTLavavej merged commit 2391e5e into microsoft:main May 17, 2025
40 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews May 17, 2025
@StephanTLavavej
Copy link
Member

➕ 🚀 ⏱️

@muellerj2 muellerj2 deleted the regex-improve-regex_search-performance branch May 31, 2025 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regex meow is a substring of homeowner
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants