Skip to content

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented May 29, 2024

This RFC proposes optional names for repetition groups in macros:

macro_rules! foo {
    ( $group1( $a:ident ),+ ) => {
        $group1( println!("{}", $a); )+
    }
}

Rendered

Small Pre-RFC: https://internals.rust-lang.org/t/pre-rfc-named-capture-groups-for-macros/20883

@tgross35 tgross35 changed the title Add an initial RFC for named macro capture groups [RFC] Named macro capture groups May 29, 2024
@tgross35

This comment was marked as resolved.

@rustbot rustbot added A-macros Macro related proposals and issues T-lang Relevant to the language team, which will review and decide on the RFC. labels May 29, 2024
@RalfJung
Copy link
Member

RalfJung commented May 29, 2024

Aren't there still some subtle questions around nesting one has to worry about? Specifically, the RFC doesn't seem to say anything about nested cases like ( $group1:( $a:ident [ $group2:( $b:ident ),+ ] ),+ ). Presumably I will only be allowed to use $group2 inside a $group1(...) repetition? But there can be arbitrarily many other repetitions "in between" -- group2 will always refer to the repetition instance in the "current" group1?

@davidhewitt
Copy link
Contributor

Although I cannot remember the exact use-case, I remember wanting something like this to name an optional trailing comma so that I could forward it on to another macro as part of the expansion. 👍

@tgross35
Copy link
Contributor Author

Aren't there still some subtle questions around nesting one has to worry about? Specifically, the RFC doesn't seem to say anything about nested cases like ( $group1:( $a:ident [ $group2:( $b:ident ),+ ] ),+ ). Presumably I will only be allowed to use $group2 inside a $group1(...) repetition? But there can be arbitrarily many other repetitions "in between" -- group2 will always refer to the repetition instance in the "current" group1?

Great point, the scope part is tricky - I think what you said is correct. It is probably best if I just add an appendix of examples that do and don't work with reasoning, especially given your comments on Zulip about the existing macro mental model not being very clear (I completely agree).

@RalfJung
Copy link
Member

RalfJung commented May 29, 2024 via email

Comment on lines 51 to 53
( $group1:( $a:ident ),+ ) => {
$group1( println!("{}", $a); )+
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you write this I thought you are defining a metavariable $group1 which matches a comma-separated list of $a:ident, and using $group1 will expand to foo,bar,baz,etc.

but no the $group1 is just a label 🤔

the rationale mentioned using : is to be "to be more consistent with existing fragment specifiers" but IMO similarity with fragment specifiers is exactly why this RFC should not use the current syntax.

i prefer getting rid of : (my main source of confusion) and use a label syntax like

macro_rules! make_functions {
    ( 
        names: [ $'names $($name:ident),+ ],
//               ^~~~~~~~~
        $'greetings $( greeting: $greeting:literal, )?
    ) => {
        $'names $(
            fn $name() {
                println!("function {} called", stringify!($name));
                $'greetings $(println!("{}", $greeting) )?
            }
        )+
    }
}


macro_rules! innermost1 {
    ( $'outer_rep $($a:ident: $'inner_rep $($b:literal),* );+ ) => {
        [$'outer_rep $( $'inner_rep $( ${index($'outer_rep)}, )* )+]
    };
}

Copy link
Contributor

@petrochenkov petrochenkov May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metavariables matching some parts of the input are an interesting idea in general.
If we also introduce "match exactly once repetitions", then we'll also be able to capture just arbitrary token streams.
It would be nice to also be able to bind a name to "repeated exactly one" groups aka just artibtrary substreams in the input.

// macro LHS, capture "exactly once" repetition
$my_tokens:(a b $var c d)// macro RHS, reemit `a b value_for_var c d`
$my_tokens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ideas - I updated the default syntax to be $group1(...) (no colon) and added the label syntax as a possibility.

It would be nice to also be able to bind a name to "repeated exactly one" groups aka just artibtrary substreams in the input.

Is there a standard kleene for a single capture, or could we get by with no kleene at all? I could probably include a proposal for single captures in this RFC, to be accepted as part of it or rejected separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regexes, a "capture exactly once" group is just (...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing some more examples, it seems like "capture once" and "emit the entire capture" would be pretty useful, so I just included them as part of the proposal. Especially for code that just needs to be validated and passed elsewhere, or matching and reemitting optional keywords/tts like mut, & or pub(crate) (just allowing capture groups that don't contain metavars is enough for that, but $pub_crate in the expansion is nicer to read than $pub_crate(pub(crate))).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of the reemitting is that it will use spans from the input, instead of spans from the macro body.

One example I recently observed in rust-lang/rust#119412 is the mir! macro in standard library.
It will match on something like let $expr = $expr; and then reemit $expr = $expr; as an assignment expression.
If = is emitted manually from the macro it will get the macro span and may mess up the combined span of the assignment expression.
If = is taken from the input, then the expression will get a precise span from the input, which is good for any kind of DSLs.

@c410-f3r
Copy link

cc @markbt

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed this: I feel positive on this direction.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 27, 2024
…r-e2024, r=petrochenkov

Make `missing_fragment_specifier` an error in edition 2024

`missing_fragment_specifier` has been a future compatibility warning since 2017. Uplifting it to an unconditional hard error was attempted in 2020, but eventually reverted due to fallout.

Make it an error only in edition >= 2024, leaving the lint for older editions. This change will make it easier to support more macro syntax that relies on usage of `$`.

Fixes <rust-lang#40107>

---

It is rather late for the edition but since this change is relatively small, it seems worth at least bringing up. This follows a brief [Zulip discussion](https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60.20DBD.20-.3E.20hard.20error) (cc `@tmandry).`

Making this an edition-dependent lint has come up before but there was not a strong motivation. I am proposing it at this time because this would simplify the [named macro capture groups](rust-lang/rfcs#3649) RFC, which has had mildly positive response, and makes use of new `$` syntax in the matcher. The proposed syntax currently parses as metavariables without a fragment specifier; this warning is raised, but there are no errors.

It is obviously not known that this specific RFC will eventually be accepted, but forbidding `missing_fragment_specifier` should make it easier to support any new syntax in the future that makes use of `$` in different ways. The syntax conflict is also not impossible to overcome, but making it clear that unnamed metavariables are rejected makes things more straightforward and should allow for better diagnostics.

`@Mark-Simulacrum` suggested making this forbid-by-default instead of an error at rust-lang#40107 (comment), but I don't think this would allow the same level of syntax flexibility.

It is also possible to reconsider making this an unconditional error since four years have elapsed since the previous attempt, but this seems likely to hit the same pitfalls. (Possibly worth a crater run?)

Tracking:

- rust-lang#128143
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 27, 2024
…e2024, r=petrochenkov

Make `missing_fragment_specifier` an error in edition 2024

`missing_fragment_specifier` has been a future compatibility warning since 2017. Uplifting it to an unconditional hard error was attempted in 2020, but eventually reverted due to fallout.

Make it an error only in edition >= 2024, leaving the lint for older editions. This change will make it easier to support more macro syntax that relies on usage of `$`.

Fixes <rust-lang#40107>

---

It is rather late for the edition but since this change is relatively small, it seems worth at least bringing up. This follows a brief [Zulip discussion](https://rust-lang.zulipchat.com/#narrow/stream/268952-edition/topic/.60.20DBD.20-.3E.20hard.20error) (cc `@tmandry).`

Making this an edition-dependent lint has come up before but there was not a strong motivation. I am proposing it at this time because this would simplify the [named macro capture groups](rust-lang/rfcs#3649) RFC, which has had mildly positive response, and makes use of new `$` syntax in the matcher. The proposed syntax currently parses as metavariables without a fragment specifier; this warning is raised, but there are no errors.

It is obviously not known that this specific RFC will eventually be accepted, but forbidding `missing_fragment_specifier` should make it easier to support any new syntax in the future that makes use of `$` in different ways. The syntax conflict is also not impossible to overcome, but making it clear that unnamed metavariables are rejected makes things more straightforward and should allow for better diagnostics.

`@Mark-Simulacrum` suggested making this forbid-by-default instead of an error at rust-lang#40107 (comment), but I don't think this would allow the same level of syntax flexibility.

It is also possible to reconsider making this an unconditional error since four years have elapsed since the previous attempt, but this seems likely to hit the same pitfalls. (Possibly worth a crater run?)

Tracking:

- rust-lang#128143
```

Groups can also be used in the expansion without `(...)` to emit their entire
capture.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this specific point, and have wanted this in the past.


Since groups are named, it is now possible to write a group name to reemit its
captured contents with no further expansion. This syntax uses the group name
but omits the `(/* expansion pattern */)/* kleene */`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential source of ambiguity:

How would you express "I want to expand the entire named group, followed by some tokens in parentheses"?

I have a suggestion for how to solve that: ${$group}(tokens).

- `${ignore($metavar)}`: if this RFC is accepted than `$ignore` can be removed.
It is used to specify which capture group an expansion group belongs to when
no metavariables are used in the expansion; with named groups, however, this
is specified by the group name rather than by contained metavariables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 This is a great sign that this design is a good idea.

@joshtriplett
Copy link
Member

@tgross35 I love this approach. This seems cleaner, nicely orthogonal, and would fit very well with macro fragment fields.

Co-authored-by: Bennet Bleßmann <[email protected]>

- Variable definition syntax could be `$ident:(/* ... */)` rather than
`$ident(/* ... */)`. Including the `:` is proposed to be more consistent
with existing fragment specifiers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the argument for either $group:(...) or $group(...). The former seems nicely analogous to existing macro fragments ($x:ident $g:(...) defines two captures with their types), while the latter has less syntax, still parses unambiguously, and has the advantage that the usage of a group matches the input syntax.

Comment on lines +161 to +171
As a result, `$identifier( /* ... */ ) /* sep and kleene */` will allow naming
a capture group. It can then be used in expansion:

```rust
$identifier(
/* expansion within group */
) /* sep and kleene */
```

Group and metavariables share the same namespace; all groups and metavariables
must have a unique name within the macro capture pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax like e.g. $identifier( /* … */ ), is already valid and useful in macro expansions.

macro_rules! call_many_fuctions {
    ([$($($function_name:ident),+$(,)?)?]($arg:expr)) => {
        [
            $($(
                $function_name($arg),
            )+)?
        ]
    }
}
fn foo(x: i32) -> i32 { x }
fn bar(x: i32) -> i32 { x + 1 }
fn baz(x: i32) -> i32 { x * x }
fn qux(x: i32) -> i32 { x / 2 }

fn main() {
    println!("{:?}", call_many_fuctions!([foo, bar, baz, qux](100)));
}

(this concrete example doesn’t follow it up with a *, + or ?, but these aren’t forbidden either, of course)

If this RFC proposes to make the “parsing” of the macro RHS depend on the “type” of the fragment, I think that’s a clear possible disadvantage.

Especially with the (IMO very useful) idea that $group on its own can already stand in for an expansion of the whole group’s contents, so expanding some identifier $function_name – or expression, pattern, etc… – doesn’t work different from expanding some group $group – it seems desirable (to me) if $function_name($arg), *etc… doesn’t have a severely different meaning from $group($arg), *etc… either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the design that $group does not expand is primarily why i suggested in #3649 (comment) that it should use a separate syntax $'group.

Copy link
Contributor Author

@tgross35 tgross35 Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points. I wish we could use the same syntax for metavars and groups, but the $'group syntax does get us out of the problems here, so I'll use that for the implementation and leave an unresolved question.

```

Groups can also be used in the expansion without `(...)` to emit their entire
capture.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you want to use () after the group in the expansion? That would be interpreted as replacing the contents of the capture, right? Maybe it could be written as $...group or something like that?

Comment on lines +49 to +58
Capture groups can now take a name by providing an identifier between the `$`
and opening `(`. This group can then be referred to by name in the expansion:

```rust
macro_rules! foo {
( $group1( $a:ident ),+ ) => {
$group1( println!("{}", $a); )+
}
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by other reviewers, the expansion-side macro is already valid.

I'd like to suggest an alternative:

macro_rules! foo {
    ( $group1( $a:ident ),+ ) => {

        ${for a in group1} (
            println!("{}", $a);
        )
    }
}

This alternative is more verbose, but it's also much more explicit about what's going on.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or for something closer to existing syntax:

        ${for(group1)} (
            println!("{}", $a);
        )+

Comment on lines +132 to +148
Groups can also be used in the expansion without `(...)` to emit their entire
capture.

This works well with a new "match exactly once" grouping that takes no kleene
operator (as opposed to matching zero or more times (`*`), matching once or
more (`+`), or matching zero or one times (`?`)).

```rust
macro_rules! rename_fn {
(
$newname:ident;
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* );
) => {
$pub fn $newname( $args );
}
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I like about everything else in the RFC is that capture groups are mostly distinct from their contents, and mostly serve as explicit scope. Being able to do $capture_group kind of loses that.

I think you should use an explicit syntax for emitting capture groups, maybe a metavariable like ${paste(capture_group)}:

macro_rules! rename_fn {
    (
        $newname:ident;
        $pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* );
    ) => {
        $pub fn $newname( ${paste(args)} );
    }
}

Comment on lines +325 to +329
### "Exactly one" matching and full group emission

If a group is specified without a kleene operator (`*`, `+`, `?`), it will now
be assumed to match exactly once. This will be most useful with the ability to
emit an entire matched group.
Copy link

@chrisbouchard chrisbouchard Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If exactly-once groups don't have a kleene operator to terminate the group matcher, how would a macro author define a rule that matches an exactly-once group followed by one of the tokens *, +, or ??

For example, in the following (admittedly kind of silly) example, I want to specify a multiplication-like syntax for my macro. But I think the * would be attached to the group instead, silently turning it into a zero-or-more group.

macro_rules! example1 {
    ( $g1($a:ident[$b:expr]) * $g2($c:ident[$d:expr]) ) => { $g1 $g2 }
}

example1!(x[1] * y[2]);  // ERROR: Something something unexpected `*`...

Further, a kleene-looking token could “reach back” across an intermediate token, interpreting it as a separator. I think this would be extra surprising, because an author may edit a macro definition and unintentionally change the the nature of the group at a distance.

For example, the following example would specify an exactly-once group.

macro_rules! example2 {
    ( $g($a:ident $b:ident) x y + ) => { $g }
}

example2!(foo bar x y +);  // Ok

But if I later edited it to remove the y token, it would silently become a one-or-more group with x as the separator.

macro_rules! example2 {
    ( $g($a:ident $b:ident) x + ) => { $g }
}

example2!(foo bar x +);  // ERROR: Something about `+` is not an ident and/or unexpected end of input...

For this reason, I don't think it would be a satisfying solution to just add an escape syntax for *, +, and ?. It would work, but it would still lead to gotchas like this where the macro author would have to

  • Notice that their macro has broken, since the updated rule is still valid even if it doesn't match what they meant, and simple invocations may still work;
  • Then reason out what went wrong from possibly-unhelpful error messages, before eventually slapping their head and adding an escape.

I don't think the compiler could help much here, because again these are all valid matchers. The compiler wouldn't know the author wanted an exactly-once group.

I think it would be better to introduce a new “kleene” operator for this new group type. (Scare quotes since Kleene didn't actually define a regular language operator like that, but he wasn't concerned with saving sub-matches anyway.) There isn't any symbol that I love for the job—maybe ! or =?

Copy link

@chrisbouchard chrisbouchard Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think there would be some backwards compatibility concerns with my suggestion if we allowed unnamed group matchers to use the new exactly-once operator. For example, consider the following macro definition.

macro_rules! example {
    ( $( $a:ident )!* ) => { $( $a ) }
}

Today, the rule matches zero or more idents with ! as the separator. If we made ! the new exactly-once operator, then I think this rule would instead match exactly one ident followed by *.

Maybe we only allow exactly-once groups with the new named syntax? (That might actually be a good idea regardless what syntax we use, since exactly-once groups are kind of useless without named groups.) But then would it be too surprising if !* meant different things for named and unnamed groups?

Copy link

@chrisbouchard chrisbouchard Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized my example in the previous comment wasn't quite right, because the group expansion in the transcriber needs an operator too. My example should have been written as follows, with a * in the transcriber.

macro_rules! example {
    ( $( $a:ident )!* ) => { $( $a )* }
}

Then I think expanding an exactly-once group using a zero-or-more operator should be an error. Allowing a new exactly-once operator with unnamed groups would still a backwards-compatibility concern, but it wouldn't be a silent backwards compatibility concern. (Nope. See below.)


I'm actually a bit unclear here, though. The reference for Macros by Example says the following about repetitions:

  1. A metavariable must appear in exactly the same number, kind, and nesting order of repetitions in the transcriber as it did in the matcher. So for the matcher $( $i:ident ),*, the transcribers => { $i }, => { $( $( $i)* )* }, and => { $( $i )+ } are all illegal, …

https://doc.rust-lang.org/reference/macros-by-example.html#r-macro.decl.repetition.fragment

However, it doesn't seem like this is currently enforced. (Playground example mixing * with +, and another mixing * with ?.) I don't know whether that's just a bug, or whether the reference is out of date.

Edit: Bleh, it's not a bug, but it's not exactly allowed either. It was reported in (at least) rust-lang/rust#103256 and the answer is there's an allow-by-default lint meta_variable_misuse that would catch it, but they don't want to make it warn- or deny-by-default because there are too many false positives.


I guess this is all kind of moot if we just don't allow exactly-once unnamed groups. And maybe for named groups, at least, things will be unambiguous enough to let us enforce that Kleene operators must match between matcher and transcriber.


This will be expanded to the following:

> `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented elsewhere about concerns with the proposed syntax for exactly-once groups. But with the current proposal, I think the grammar node would need the trailing MacroRepSep-MacroRepOp sequence to be optional.

Suggested change
> `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_
> `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> `(` _MacroMatch<sup>+</sup>_ `)` ( _MacroRepSep_<sup>?</sup> _MacroRepOp_ )<sup>?</sup>

(I also added missing backticks around the literal parentheses tokens that delimit the group body.)


Macro captures currently include the following grammar node:

> `$` ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> `$` ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_
> `$` `(` _MacroMatch<sup>+</sup>_ `)` _MacroRepSep_<sup>?</sup> _MacroRepOp_

@traviscross
Copy link
Contributor

We talked about this in the lang design meeting today. We were enthusiastic about this and in general agreement on the major design points.

Given that, @tgross35 is planning to pursue landing an implementation as a lang experiment. The implementation experience should help us to answer the remaining unresolved questions on some details such that this RFC can be updated to represent the likely-final user-facing behavior. With that, we can accept this RFC, and then hopefully stabilize it soon thereafter.

Thanks to @tgross35 for joining us in the meeting and for putting forward this well motivated and well written RFC.

macro_rules! rename_fn {
(
$newname:ident;
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading, I felt the args repetition was missing a separator, e.g. ,?

Suggested change
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* );
$pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty ),* );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Macro related proposals and issues T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.