<regex>
: Move saved match state to the heap
#5682
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the first minor step on the journey of making the matcher non-recursive to deal with #997 and #1528.
This moves all instances of
_Tgt_state
or_Bt_state
from the stack to the heap (into avector
). These objects store the current match state and are used to restore it when backtracking. In functions, the_Tgt_state
objects in the vector are referenced using their indices (and not using pointers because the vector might reallocate).Note that we have to pop the frames from the stack whenever we return during backtracking, but we don't have to worry about exceptions: If an exception occurs, the whole
_Matcher2
object will be destructed.Since these objects are large, this slightly reduces stack pressure: Copying #997 to a test and running it with the usual matrix, the first asan configs now start failing after 258 instead of 249 repetitions.
This PR adds no additional tests because all existing regex tests already provide coverage for this change and there is no obvious gap where further tests are needed. It does add internal checks, though, to verify that we don't accidentally fail to pop from the stack somewhere when backtracking.
The impact on performance is mixed: On the one hand, the change can improve performance when regex matching involves a substantial amount of repeated backtracking, because it avoids reallocations (and
regex_search
usually backtracks completely at least once because the initial match fails). On the other hand, the new vector performs additional allocations, which noticeably worsens performance when individual_Tgt_state/_Bt_state
objects don't allocate at all after #5518, such as the patterns(:?bibe)+
and((?!lorem)bibe)
in theregex_search
benchmark. (In the long run, we probably want to use a vector variant implementing small vector optimization here.)Benchmark for
regex_search
: