-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[RFC] Named macro capture groups #3649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: Jacob Lifshay <[email protected]>
Aren't there still some subtle questions around nesting one has to worry about? Specifically, the RFC doesn't seem to say anything about nested cases like |
Although I cannot remember the exact use-case, I remember wanting something like this to name an optional trailing comma so that I could forward it on to another macro as part of the expansion. 👍 |
Great point, the scope part is tricky - I think what you said is correct. It is probably best if I just add an appendix of examples that do and don't work with reasoning, especially given your comments on Zulip about the existing macro mental model not being very clear (I completely agree). |
A bunch of examples would be great, but they are not substitute for also describing the general rules that they all follow from.
|
( $group1:( $a:ident ),+ ) => { | ||
$group1( println!("{}", $a); )+ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you write this I thought you are defining a metavariable $group1
which matches a comma-separated list of $a:ident
, and using $group1
will expand to foo,bar,baz,etc
.
but no the $group1
is just a label 🤔
the rationale mentioned using :
is to be "to be more consistent with existing fragment specifiers" but IMO similarity with fragment specifiers is exactly why this RFC should not use the current syntax.
i prefer getting rid of :
(my main source of confusion) and use a label syntax like
macro_rules! make_functions {
(
names: [ $'names $($name:ident),+ ],
// ^~~~~~~~~
$'greetings $( greeting: $greeting:literal, )?
) => {
$'names $(
fn $name() {
println!("function {} called", stringify!($name));
$'greetings $(println!("{}", $greeting) )?
}
)+
}
}
macro_rules! innermost1 {
( $'outer_rep $($a:ident: $'inner_rep $($b:literal),* );+ ) => {
[$'outer_rep $( $'inner_rep $( ${index($'outer_rep)}, )* )+]
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metavariables matching some parts of the input are an interesting idea in general.
If we also introduce "match exactly once repetitions", then we'll also be able to capture just arbitrary token streams.
It would be nice to also be able to bind a name to "repeated exactly one" groups aka just artibtrary substreams in the input.
// macro LHS, capture "exactly once" repetition
$my_tokens:(a b $var c d)①
// macro RHS, reemit `a b value_for_var c d`
$my_tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ideas - I updated the default syntax to be $group1(...)
(no colon) and added the label syntax as a possibility.
It would be nice to also be able to bind a name to "repeated exactly one" groups aka just artibtrary substreams in the input.
Is there a standard kleene for a single capture, or could we get by with no kleene at all? I could probably include a proposal for single captures in this RFC, to be accepted as part of it or rejected separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regexes, a "capture exactly once" group is just (...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After writing some more examples, it seems like "capture once" and "emit the entire capture" would be pretty useful, so I just included them as part of the proposal. Especially for code that just needs to be validated and passed elsewhere, or matching and reemitting optional keywords/tts like mut
, &
or pub(crate)
(just allowing capture groups that don't contain metavars is enough for that, but $pub_crate
in the expansion is nicer to read than $pub_crate(pub(crate))
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One advantage of the reemitting is that it will use spans from the input, instead of spans from the macro body.
One example I recently observed in rust-lang/rust#119412 is the mir!
macro in standard library.
It will match on something like let $expr = $expr;
and then reemit $expr = $expr;
as an assignment expression.
If =
is emitted manually from the macro it will get the macro span and may mess up the combined span of the assignment expression.
If =
is taken from the input, then the expression will get a precise span from the input, which is good for any kind of DSLs.
cc @markbt |
Drop the `:`. Also add an alternative proposed syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed this: I feel positive on this direction.
…r-e2024, r=petrochenkov Make `missing_fragment_specifier` an error in edition 2024 `missing_fragment_specifier` has been a future compatibility warning since 2017. Uplifting it to an unconditional hard error was attempted in 2020, but eventually reverted due to fallout. Make it an error only in edition >= 2024, leaving the lint for older editions. This change will make it easier to support more macro syntax that relies on usage of `$`. Fixes <rust-lang#40107> --- It is rather late for the edition but since this change is relatively small, it seems worth at least bringing up. This follows a brief [Zulip discussion](https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60.20DBD.20-.3E.20hard.20error) (cc `@tmandry).` Making this an edition-dependent lint has come up before but there was not a strong motivation. I am proposing it at this time because this would simplify the [named macro capture groups](rust-lang/rfcs#3649) RFC, which has had mildly positive response, and makes use of new `$` syntax in the matcher. The proposed syntax currently parses as metavariables without a fragment specifier; this warning is raised, but there are no errors. It is obviously not known that this specific RFC will eventually be accepted, but forbidding `missing_fragment_specifier` should make it easier to support any new syntax in the future that makes use of `$` in different ways. The syntax conflict is also not impossible to overcome, but making it clear that unnamed metavariables are rejected makes things more straightforward and should allow for better diagnostics. `@Mark-Simulacrum` suggested making this forbid-by-default instead of an error at rust-lang#40107 (comment), but I don't think this would allow the same level of syntax flexibility. It is also possible to reconsider making this an unconditional error since four years have elapsed since the previous attempt, but this seems likely to hit the same pitfalls. (Possibly worth a crater run?) Tracking: - rust-lang#128143
…e2024, r=petrochenkov Make `missing_fragment_specifier` an error in edition 2024 `missing_fragment_specifier` has been a future compatibility warning since 2017. Uplifting it to an unconditional hard error was attempted in 2020, but eventually reverted due to fallout. Make it an error only in edition >= 2024, leaving the lint for older editions. This change will make it easier to support more macro syntax that relies on usage of `$`. Fixes <rust-lang#40107> --- It is rather late for the edition but since this change is relatively small, it seems worth at least bringing up. This follows a brief [Zulip discussion](https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60.20DBD.20-.3E.20hard.20error) (cc `@tmandry).` Making this an edition-dependent lint has come up before but there was not a strong motivation. I am proposing it at this time because this would simplify the [named macro capture groups](rust-lang/rfcs#3649) RFC, which has had mildly positive response, and makes use of new `$` syntax in the matcher. The proposed syntax currently parses as metavariables without a fragment specifier; this warning is raised, but there are no errors. It is obviously not known that this specific RFC will eventually be accepted, but forbidding `missing_fragment_specifier` should make it easier to support any new syntax in the future that makes use of `$` in different ways. The syntax conflict is also not impossible to overcome, but making it clear that unnamed metavariables are rejected makes things more straightforward and should allow for better diagnostics. `@Mark-Simulacrum` suggested making this forbid-by-default instead of an error at rust-lang#40107 (comment), but I don't think this would allow the same level of syntax flexibility. It is also possible to reconsider making this an unconditional error since four years have elapsed since the previous attempt, but this seems likely to hit the same pitfalls. (Possibly worth a crater run?) Tracking: - rust-lang#128143
``` | ||
|
||
Groups can also be used in the expansion without `(...)` to emit their entire | ||
capture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this specific point, and have wanted this in the past.
|
||
Since groups are named, it is now possible to write a group name to reemit its | ||
captured contents with no further expansion. This syntax uses the group name | ||
but omits the `(/* expansion pattern */)/* kleene */`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential source of ambiguity:
How would you express "I want to expand the entire named group, followed by some tokens in parentheses"?
I have a suggestion for how to solve that: ${$group}(tokens)
.
- `${ignore($metavar)}`: if this RFC is accepted than `$ignore` can be removed. | ||
It is used to specify which capture group an expansion group belongs to when | ||
no metavariables are used in the expansion; with named groups, however, this | ||
is specified by the group name rather than by contained metavariables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 This is a great sign that this design is a good idea.
@tgross35 I love this approach. This seems cleaner, nicely orthogonal, and would fit very well with macro fragment fields. |
Co-authored-by: Bennet Bleßmann <[email protected]>
|
||
- Variable definition syntax could be `$ident:(/* ... */)` rather than | ||
`$ident(/* ... */)`. Including the `:` is proposed to be more consistent | ||
with existing fragment specifiers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the argument for either $group:(...)
or $group(...)
. The former seems nicely analogous to existing macro fragments ($x:ident $g:(...)
defines two captures with their types), while the latter has less syntax, still parses unambiguously, and has the advantage that the usage of a group matches the input syntax.
Co-authored-by: Josh Triplett <[email protected]>
As a result, `$identifier( /* ... */ ) /* sep and kleene */` will allow naming | ||
a capture group. It can then be used in expansion: | ||
|
||
```rust | ||
$identifier( | ||
/* expansion within group */ | ||
) /* sep and kleene */ | ||
``` | ||
|
||
Group and metavariables share the same namespace; all groups and metavariables | ||
must have a unique name within the macro capture pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax like e.g. $identifier( /* … */ ),
is already valid and useful in macro expansions.
macro_rules! call_many_fuctions {
([$($($function_name:ident),+$(,)?)?]($arg:expr)) => {
[
$($(
$function_name($arg),
)+)?
]
}
}
fn foo(x: i32) -> i32 { x }
fn bar(x: i32) -> i32 { x + 1 }
fn baz(x: i32) -> i32 { x * x }
fn qux(x: i32) -> i32 { x / 2 }
fn main() {
println!("{:?}", call_many_fuctions!([foo, bar, baz, qux](100)));
}
(this concrete example doesn’t follow it up with a *
, +
or ?
, but these aren’t forbidden either, of course)
If this RFC proposes to make the “parsing” of the macro RHS depend on the “type” of the fragment, I think that’s a clear possible disadvantage.
Especially with the (IMO very useful) idea that $group
on its own can already stand in for an expansion of the whole group’s contents, so expanding some identifier $function_name
– or expression, pattern, etc… – doesn’t work different from expanding some group $group
– it seems desirable (to me) if $function_name($arg), *etc…
doesn’t have a severely different meaning from $group($arg), *etc…
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the design that $group
does not expand is primarily why i suggested in #3649 (comment) that it should use a separate syntax $'group
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great points. I wish we could use the same syntax for metavars and groups, but the $'group
syntax does get us out of the problems here, so I'll use that for the implementation and leave an unresolved question.
``` | ||
|
||
Groups can also be used in the expansion without `(...)` to emit their entire | ||
capture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you want to use ()
after the group in the expansion? That would be interpreted as replacing the contents of the capture, right? Maybe it could be written as $...group
or something like that?
Capture groups can now take a name by providing an identifier between the `$` | ||
and opening `(`. This group can then be referred to by name in the expansion: | ||
|
||
```rust | ||
macro_rules! foo { | ||
( $group1( $a:ident ),+ ) => { | ||
$group1( println!("{}", $a); )+ | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by other reviewers, the expansion-side macro is already valid.
I'd like to suggest an alternative:
macro_rules! foo {
( $group1( $a:ident ),+ ) => {
${for a in group1} (
println!("{}", $a);
)
}
}
This alternative is more verbose, but it's also much more explicit about what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or for something closer to existing syntax:
${for(group1)} (
println!("{}", $a);
)+
Groups can also be used in the expansion without `(...)` to emit their entire | ||
capture. | ||
|
||
This works well with a new "match exactly once" grouping that takes no kleene | ||
operator (as opposed to matching zero or more times (`*`), matching once or | ||
more (`+`), or matching zero or one times (`?`)). | ||
|
||
```rust | ||
macro_rules! rename_fn { | ||
( | ||
$newname:ident; | ||
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* ); | ||
) => { | ||
$pub fn $newname( $args ); | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I like about everything else in the RFC is that capture groups are mostly distinct from their contents, and mostly serve as explicit scope. Being able to do $capture_group
kind of loses that.
I think you should use an explicit syntax for emitting capture groups, maybe a metavariable like ${paste(capture_group)}
:
macro_rules! rename_fn {
(
$newname:ident;
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* );
) => {
$pub fn $newname( ${paste(args)} );
}
}
### "Exactly one" matching and full group emission | ||
|
||
If a group is specified without a kleene operator (`*`, `+`, `?`), it will now | ||
be assumed to match exactly once. This will be most useful with the ability to | ||
emit an entire matched group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If exactly-once groups don't have a kleene operator to terminate the group matcher, how would a macro author define a rule that matches an exactly-once group followed by one of the tokens *
, +
, or ?
?
For example, in the following (admittedly kind of silly) example, I want to specify a multiplication-like syntax for my macro. But I think the *
would be attached to the group instead, silently turning it into a zero-or-more group.
macro_rules! example1 {
( $g1($a:ident[$b:expr]) * $g2($c:ident[$d:expr]) ) => { $g1 $g2 }
}
example1!(x[1] * y[2]); // ERROR: Something something unexpected `*`...
Further, a kleene-looking token could “reach back” across an intermediate token, interpreting it as a separator. I think this would be extra surprising, because an author may edit a macro definition and unintentionally change the the nature of the group at a distance.
For example, the following example would specify an exactly-once group.
macro_rules! example2 {
( $g($a:ident $b:ident) x y + ) => { $g }
}
example2!(foo bar x y +); // Ok
But if I later edited it to remove the y
token, it would silently become a one-or-more group with x
as the separator.
macro_rules! example2 {
( $g($a:ident $b:ident) x + ) => { $g }
}
example2!(foo bar x +); // ERROR: Something about `+` is not an ident and/or unexpected end of input...
For this reason, I don't think it would be a satisfying solution to just add an escape syntax for *
, +
, and ?
. It would work, but it would still lead to gotchas like this where the macro author would have to
- Notice that their macro has broken, since the updated rule is still valid even if it doesn't match what they meant, and simple invocations may still work;
- Then reason out what went wrong from possibly-unhelpful error messages, before eventually slapping their head and adding an escape.
I don't think the compiler could help much here, because again these are all valid matchers. The compiler wouldn't know the author wanted an exactly-once group.
I think it would be better to introduce a new “kleene” operator for this new group type. (Scare quotes since Kleene didn't actually define a regular language operator like that, but he wasn't concerned with saving sub-matches anyway.) There isn't any symbol that I love for the job—maybe !
or =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think there would be some backwards compatibility concerns with my suggestion if we allowed unnamed group matchers to use the new exactly-once operator. For example, consider the following macro definition.
macro_rules! example {
( $( $a:ident )!* ) => { $( $a ) }
}
Today, the rule matches zero or more idents with !
as the separator. If we made !
the new exactly-once operator, then I think this rule would instead match exactly one ident followed by *
.
Maybe we only allow exactly-once groups with the new named syntax? (That might actually be a good idea regardless what syntax we use, since exactly-once groups are kind of useless without named groups.) But then would it be too surprising if !*
meant different things for named and unnamed groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized my example in the previous comment wasn't quite right, because the group expansion in the transcriber needs an operator too. My example should have been written as follows, with a *
in the transcriber.
macro_rules! example {
( $( $a:ident )!* ) => { $( $a )* }
}
Then I think expanding an exactly-once group using a zero-or-more operator should be an error. Allowing a new exactly-once operator with unnamed groups would still a backwards-compatibility concern, but it wouldn't be a silent backwards compatibility concern. (Nope. See below.)
I'm actually a bit unclear here, though. The reference for Macros by Example says the following about repetitions:
- A metavariable must appear in exactly the same number, kind, and nesting order of repetitions in the transcriber as it did in the matcher. So for the matcher
$( $i:ident ),*
, the transcribers=> { $i }
,=> { $( $( $i)* )* }
, and=> { $( $i )+ }
are all illegal, …— https://doc.rust-lang.org/reference/macros-by-example.html#r-macro.decl.repetition.fragment
However, it doesn't seem like this is currently enforced. (Playground example mixing *
with +
, and another mixing *
with ?
.) I don't know whether that's just a bug, or whether the reference is out of date.
Edit: Bleh, it's not a bug, but it's not exactly allowed either. It was reported in (at least) rust-lang/rust#103256 and the answer is there's an allow-by-default lint meta_variable_misuse
that would catch it, but they don't want to make it warn- or deny-by-default because there are too many false positives.
I guess this is all kind of moot if we just don't allow exactly-once unnamed groups. And maybe for named groups, at least, things will be unambiguous enough to let us enforce that Kleene operators must match between matcher and transcriber.
|
||
This will be expanded to the following: | ||
|
||
> `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented elsewhere about concerns with the proposed syntax for exactly-once groups. But with the current proposal, I think the grammar node would need the trailing MacroRepSep-MacroRepOp sequence to be optional.
> `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_ | |
> `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> `(` _MacroMatch<sup>+</sup>_ `)` ( _MacroRepSep_<sup>?</sup> _MacroRepOp_ )<sup>?</sup> |
(I also added missing backticks around the literal parentheses tokens that delimit the group body.)
|
||
Macro captures currently include the following grammar node: | ||
|
||
> `$` ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> `$` ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_ | |
> `$` `(` _MacroMatch<sup>+</sup>_ `)` _MacroRepSep_<sup>?</sup> _MacroRepOp_ |
We talked about this in the lang design meeting today. We were enthusiastic about this and in general agreement on the major design points. Given that, @tgross35 is planning to pursue landing an implementation as a lang experiment. The implementation experience should help us to answer the remaining unresolved questions on some details such that this RFC can be updated to represent the likely-final user-facing behavior. With that, we can accept this RFC, and then hopefully stabilize it soon thereafter. Thanks to @tgross35 for joining us in the meeting and for putting forward this well motivated and well written RFC. |
macro_rules! rename_fn { | ||
( | ||
$newname:ident; | ||
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading, I felt the args repetition was missing a separator, e.g. ,
?
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* ); | |
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty ),* ); |
This RFC proposes optional names for repetition groups in macros:
Rendered
Small Pre-RFC: https://internals.rust-lang.org/t/pre-rfc-named-capture-groups-for-macros/20883