<regex>
: Process positive lookahead assertions non-recursively
#5714
+232
−187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #997 and #1528. While the PR title describes the main observable effect, the actual main change is the implementation of manual unwinding of stack frames in
_Match_pat
. But at least one recursive call had to be replaced by the manual stack management to validate the main change, and positive assertions turned out be the easiest option.Previously,
_Match_pat
mainly consisted of an NFA interpreter loop. After this PR,_Match_pat
consists of two main loops: An NFA interpreter loop and a stack unwinding loop (joined together by a loop surrounding both). As before this PR, the interpreter loop in_Match_pat
processes the nodes in the NFA. When this loop is done because some final node of the NFA has been reached or matching along a trajectory failed, a second loop is entered that manually unwinds the explicit state stack. If that second loop leads to some interpretable position in the NFA again (i.e., if_Nx
becomes not null), the unwinding loop is exited and the interpreter loop is engaged again. If not and the stack has been fully unwound,_Match_pat
is exited.Because the NFA node following the node
_Nx
in the interpreter loop might have to be some node other than_Nx->_Next
from now on, a new local variable_Next
now stores the next node to process, which gets assigned to the correct next node for a each node type in theswitch
if it's not_Nx->_Next
. In this PR specifically, this is used to makestatic_cast<_Node_assert*>(_Nx)->_Child
follow a node of type_N_assert
in the interpreter loop. (Additionally, final nodes in the interpreter loop that do not result in failure assignnullptr
to_Next
rather than_Nx
now.)The stack unwinding uses its own set of operation codes, which are interpreted by the unwinding loop. The operation codes are stored in the stack frames at the time the new frame is pushed to the stack. (After this PR, there is only one code, but there will soon be more.)
Because the matcher is currently semi-recursive, the stack counts as unwound in
_Match_pat
if it has been unwound up to its size at the time the_Match_pat
call started. Further unwinding will happen in a surrounding_Matcher_pat
call. (We can simplify this when the matcher is finally fully non-recursive.)The stack frames on the heap are now represented by objects of type
_Rx_state_frame_t
. Currently, this is a very inefficient structure and it will become even worse in the next few PRs before it starts getting better.For now, I opted to exactly preserve the situations when
regex_error
s witherror_stack
orerror_complexity
get thrown. But I moved the related code to their own member functions to avoid unnecessary code duplication. We can think about changing this after the matcher has been made fully non-recursive.