Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit d4b952f

Browse files
committedMar 11, 2024
additional changes to links and some text
1 parent 1e6ad9b commit d4b952f

File tree

1 file changed

+121
-120
lines changed

1 file changed

+121
-120
lines changed
 

‎src/macro-expansion.md

+121-120
Original file line numberDiff line numberDiff line change
@@ -331,9 +331,11 @@ a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an int
331331
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
332332

333333
For built-in `macro`s, we use the context:
334-
`SyntaxContext::empty().apply_mark(expn_id)`, and such `macro`s are considered to
335-
be defined at the hierarchy root. We do the same for `proc macro`s because we
336-
haven't implemented cross-crate hygiene yet.
334+
[`SyntaxContext::empty().apply_mark(expn_id)`], and such `macro`s are
335+
considered to be defined at the hierarchy root. We do the same for `proc
336+
macro`s because we haven't implemented cross-crate hygiene yet.
337+
338+
[`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
337339

338340
If the token had context `X` before being produced by a `macro` then after being
339341
produced by the `macro` it has context `X -> macro_id`. Here are some examples:
@@ -346,12 +348,11 @@ macro m() { ident }
346348
m!();
347349
```
348350

349-
Here `ident` originally has context [`SyntaxContext::root`][scr]. `ident` has
351+
Here `ident` which initially has context [`SyntaxContext::root`][scr] has
350352
context `ROOT -> id(m)` after it's produced by `m`.
351353

352354
[scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root
353355

354-
355356
Example 1:
356357

357358
```rust,ignore
@@ -360,7 +361,8 @@ macro m() { macro n() { ident } }
360361
m!();
361362
n!();
362363
```
363-
In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)`
364+
365+
In this example the `ident` has context `ROOT` initially, then `ROOT -> id(m)`
364366
after the first expansion, then `ROOT -> id(m) -> id(n)`.
365367

366368
Example 2:
@@ -377,11 +379,11 @@ m!(foo);
377379
After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
378380
`ROOT -> id(m) -> id(n)`.
379381

380-
Finally, one last thing to mention is that currently, this hierarchy is subject
381-
to the ["context transplantation hack"][hack]. Basically, the more modern (and
382-
experimental) `macro` `macro`s have stronger hygiene than the older MBE system,
383-
but this can result in weird interactions between the two. The hack is intended
384-
to make things "just work" for now.
382+
Currently this hierarchy for tracking `macro` definitions is subject to the
383+
so-called ["context transplantation hack"][hack]. Modern (i.e. experimental)
384+
`macro`s have stronger hygiene than the legacy "Macros By Example" (`MBE`)
385+
system which can result in weird interactions between the two. The hack is
386+
intended to make things "just work" for now.
385387

386388
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
387389
[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
@@ -390,7 +392,8 @@ to make things "just work" for now.
390392

391393
The third and final hierarchy tracks the location of `macro` invocations.
392394

393-
In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
395+
In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent`
396+
link.
394397

395398
[callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site
396399

@@ -420,20 +423,22 @@ Above, we saw how the output of a `macro` is integrated into the `AST` for a cra
420423
and we also saw how the hygiene data for a crate is generated. But how do we
421424
actually produce the output of a `macro`? It depends on the type of `macro`.
422425

423-
There are two types of `macro`s in Rust:
424-
`macro_rules!` `macro`s (a.k.a. "Macros By Example" (MBE)) and procedural `macro`s
425-
(or "proc `macro`s"; including custom derives). During the parsing phase, the normal
426-
Rust parser will set aside the contents of `macro`s and their invocations. Later,
427-
`macro`s are expanded using these portions of the code.
426+
There are two types of `macro`s in Rust:
427+
1. `macro_rules!` macros, and,
428+
2. procedural `macro`s (`proc macro`s); including custom derives.
429+
430+
During the parsing phase, the normal Rust parser will set aside the contents of
431+
`macro`s and their invocations. Later, `macro`s are expanded using these
432+
portions of the code.
428433

429434
Some important data structures/interfaces here:
430435
- [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
431-
function, which transforms a `TokenStream` or `AST` into another `TokenStream`
432-
or `AST` + some additional data like stability, or a list of unstable features
433-
allowed inside the `macro`.
436+
function, which transforms a [`TokenStream`] or `AST` into another
437+
[`TokenStream`] or `AST` + some additional data like stability, or a list of
438+
unstable features allowed inside the `macro`.
434439
- [`SyntaxExtensionKind`] - expander functions may have several different
435440
signatures (take one token stream, or two, or a piece of `AST`, etc). This is
436-
an enum that lists them.
441+
an `enum` that lists them.
437442
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
438443
`trait`s representing the expander function signatures.
439444

@@ -446,18 +451,15 @@ Some important data structures/interfaces here:
446451

447452
## Macros By Example
448453

449-
MBEs have their own parser distinct from the normal Rust parser. When `macro`s
450-
are expanded, we may invoke the MBE parser to parse and expand a `macro`. The
451-
MBE parser, in turn, may call the normal Rust parser when it needs to bind a
452-
metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
454+
`MBE`s have their own parser distinct from the Rust parser. When `macro`s are
455+
expanded, we may invoke the `MBE` parser to parse and expand a `macro`. The
456+
`MBE` parser, in turn, may call the Rust parser when it needs to bind a
457+
metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
453458
invocation. The code for `macro` expansion is in
454459
[`compiler/rustc_expand/src/mbe/`][code_dir].
455460

456461
### Example
457462

458-
It's helpful to have an example to refer to. For the remainder of this chapter,
459-
whenever we refer to the "example _definition_", we mean the following:
460-
461463
```rust,ignore
462464
macro_rules! printer {
463465
(print $mvar:ident) => {
@@ -470,41 +472,41 @@ macro_rules! printer {
470472
}
471473
```
472474

473-
`$mvar` is called a _metavariable_. Unlike normal variables, rather than
474-
binding to a value in a computation, a metavariable binds _at compile time_ to
475-
a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an
475+
Here `$mvar` is called a _metavariable_. Unlike normal variables, rather than
476+
binding to a value _at runtime_, a metavariable binds _at compile time_ to a
477+
tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an
476478
identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
477-
special tokens, such as `EOF`, which indicates that there are no more tokens.
478-
Token trees resulting from paired parentheses-like characters (`(`...`)`,
479-
`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
480-
in between (we do require that parentheses-like characters be balanced). Having
481-
`macro` expansion operate on token streams rather than the raw bytes of a source
482-
file abstracts away a lot of complexity. The `macro` expander (and much of the
483-
rest of the compiler) doesn't really care that much about the exact line and
484-
column of some syntactic construct in the code; it cares about what constructs
485-
are used in the code. Using tokens allows us to care about _what_ without
486-
worrying about _where_. For more information about tokens, see the
487-
[Parsing][parsing] chapter of this book.
488-
489-
Whenever we refer to the "example _invocation_", we mean the following snippet:
479+
special tokens, such as `EOF`, which its self indicates that there are no more
480+
tokens. There are token trees resulting from the paired parentheses-like
481+
characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and
482+
close and all the tokens in between (Rust requires that parentheses-like
483+
characters be balanced). Having `macro` expansion operate on token streams
484+
rather than the raw bytes of a source-file abstracts away a lot of complexity.
485+
The `macro` expander (and much of the rest of the compiler) doesn't consider
486+
the exact line and column of some syntactic construct in the code; it considers
487+
which constructs are used in the code. Using tokens allows us to care about
488+
_what_ without worrying about _where_. For more information about tokens, see
489+
the [Parsing][parsing] chapter of this book.
490490

491491
```rust,ignore
492-
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
492+
printer!(print foo); // `foo` is a variable
493493
```
494494

495495
The process of expanding the `macro` invocation into the syntax tree
496-
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
497-
called _`macro` expansion_, and it is the topic of this chapter.
496+
`println!("{}", foo)` and then expanding the syntax tree into a call to
497+
`Display::fmt` is one common example of _`macro` expansion_.
498498

499499
### The MBE parser
500500

501-
There are two parts to MBE expansion: parsing the definition and parsing the
502-
invocations. Interestingly, both are done by the `macro` parser.
501+
There are two parts to `MBE` expansion done by the `macro` parser:
502+
1. parsing the definition, and,
503+
2. parsing the invocations.
503504

504-
Basically, the MBE parser is like an NFA-based regex parser. It uses an
505-
algorithm similar in spirit to the [Earley parsing
506-
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro` parser is
507-
defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
505+
We think of the `MBE` parser as a nondeterministic finite automaton (NFA) based
506+
regex parser since it uses an algorithm similar in spirit to the [Earley
507+
parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro`
508+
parser is defined in
509+
[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
508510

509511
The interface of the `macro` parser is as follows (this is slightly simplified):
510512

@@ -518,97 +520,89 @@ fn parse_tt(
518520

519521
We use these items in `macro` parser:
520522

521-
- `parser` is a reference to the state of a normal Rust parser, including the
522-
token stream and parsing session. The token stream is what we are about to
523-
ask the MBE parser to parse. We will consume the raw stream of tokens and
524-
output a binding of metavariables to corresponding token trees. The parsing
525-
session can be used to report parser errors.
526-
- `matcher` is a sequence of `MatcherLoc`s that we want to match
523+
- a `parser` variable is a reference to the state of a normal Rust parser,
524+
including the token stream and parsing session. The token stream is what we
525+
are about to ask the `MBE` parser to parse. We will consume the raw stream of
526+
tokens and output a binding of metavariables to corresponding token trees.
527+
The parsing session can be used to report parser errors.
528+
- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
527529
the token stream against. They're converted from token trees before matching.
528530

529-
In the analogy of a regex parser, the token stream is the input and we are matching it
530-
against the pattern `matcher`. Using our examples, the token stream could be the stream of
531-
tokens containing the inside of the example invocation `print foo`, while `matcher`
532-
might be the sequence of token (trees) `print $mvar:ident`.
531+
[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
532+
533+
In the analogy of a regex parser, the token stream is the input and we are
534+
matching it against the pattern defined by `matcher`. Using our examples, the
535+
token stream could be the stream of tokens containing the inside of the example
536+
invocation `print foo`, while `matcher` might be the sequence of token (trees)
537+
`print $mvar:ident`.
533538

534539
The output of the parser is a [`ParseResult`], which indicates which of
535540
three cases has occurred:
536541

537-
- Success: the token stream matches the given `matcher`, and we have produced a binding
538-
from metavariables to the corresponding token trees.
539-
- Failure: the token stream does not match `matcher`. This results in an error message such as
540-
"No rule expected token _blah_".
541-
- Error: some fatal error has occurred _in the parser_. For example, this
542-
happens if there is more than one pattern match, since that indicates
543-
the `macro` is ambiguous.
542+
- **Success**: the token stream matches the given `matcher` and we have produced a
543+
binding from metavariables to the corresponding token trees.
544+
- **Failure**: the token stream does not match `matcher` and results in an error
545+
message such as "No rule expected token ...".
546+
- **Error**: some fatal error has occurred _in the parser_. For example, this
547+
happens if there is more than one pattern match, since that indicates the
548+
`macro` is ambiguous.
544549

545550
The full interface is defined [here][code_parse_int].
546551

547-
The `macro` parser does pretty much exactly the same as a normal regex parser with
548-
one exception: in order to parse different types of metavariables, such as
549-
`ident`, `block`, `expr`, etc., the `macro` parser must sometimes call back to the
550-
normal Rust parser.
551-
552-
As mentioned above, both definitions and invocations of `macro`s are parsed using
553-
the `macro` parser. This is extremely non-intuitive and self-referential. The code
554-
to parse `macro` _definitions_ is in
555-
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
556-
matching for a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
557-
a `macro_rules` definition should have in its body at least one occurrence of a
558-
token tree followed by `=>` followed by another token tree. When the compiler
559-
comes to a `macro_rules` definition, it uses this pattern to match the two token
560-
trees per rule in the definition of the `macro` _using the `macro` parser itself_.
561-
In our example definition, the metavariable `$lhs` would match the patterns of
562-
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
563-
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
564-
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
565-
knowledge around for when it needs to expand a `macro` invocation.
552+
The `macro` parser does pretty much exactly the same as a normal regex parser
553+
with one exception: in order to parse different types of metavariables, such as
554+
`ident`, `block`, `expr`, etc., the `macro` parser must call back to the normal
555+
Rust parser. Both the definition and invocation of `macro`s are parsed using
556+
the parser in a process which is non-intuitively self-referential.
557+
558+
The code to parse `macro` _definitions_ is in
559+
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
560+
pattern for matching a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In
561+
other words, a `macro_rules` definition should have in its body at least one
562+
occurrence of a token tree followed by `=>` followed by another token tree.
563+
When the compiler comes to a `macro_rules` definition, it uses this pattern to
564+
match the two token trees per rule in the definition of the `macro`, _thereby
565+
utilizing the `macro` parser itself_. In our example definition, the
566+
metavariable `$lhs` would match the patterns of both arms: `(print
567+
$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
568+
bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
569+
println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
570+
needs to expand a `macro` invocation.
566571

567572
When the compiler comes to a `macro` invocation, it parses that invocation using
568-
the same NFA-based `macro` parser that is described above. However, the matcher
573+
a NFA-based `macro` parser described above. However, the `matcher` variable
569574
used is the first token tree (`$lhs`) extracted from the arms of the `macro`
570575
_definition_. Using our example, we would try to match the token stream `print
571-
foo` from the invocation against the matchers `print $mvar:ident` and `print
572-
twice $mvar:ident` that we previously extracted from the definition. The
576+
foo` from the invocation against the `matcher`s `print $mvar:ident` and `print
577+
twice $mvar:ident` that we previously extracted from the definition. The
573578
algorithm is exactly the same, but when the `macro` parser comes to a place in the
574-
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
579+
current `matcher` where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
575580
it calls back to the normal Rust parser to get the contents of that
576581
non-terminal. In this case, the Rust parser would look for an `ident` token,
577582
which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
578-
proceeds in parsing as normal. Also, note that exactly one of the matchers from
583+
proceeds in parsing as normal. Also, note that exactly one of the `matcher`s from
579584
the various arms should match the invocation; if there is more than one match,
580585
the parse is ambiguous, while if there are no matches at all, there is a syntax
581586
error.
582587

583588
For more information about the `macro` parser's implementation, see the comments
584589
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
585590

586-
### `macro`s and Macros 2.0
587-
588-
There is an old and mostly undocumented effort to improve the MBE system, give
589-
it more hygiene-related features, better scoping and visibility rules, etc. There
590-
hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
591-
`macro`s use the same machinery as today's MBEs; they just have additional
592-
syntactic sugar and are allowed to be in namespaces.
593-
594591
## Procedural Macros
595592

596-
Procedural `macro`s are also expanded during parsing, as mentioned above.
597-
However, they use a rather different mechanism. Rather than having a parser in
598-
the compiler, procedural `macro`s are implemented as custom, third-party crates.
599-
The compiler will compile the proc `macro` crate and specially annotated
600-
functions in them (i.e. the proc `macro` itself), passing them a stream of tokens.
601-
602-
The proc `macro` can then transform the token stream and output a new token
603-
stream, which is synthesized into the `AST`.
604-
605-
It's worth noting that the token stream type used by proc `macro`s is _stable_,
606-
so `rustc` does not use it internally (since our internal data structures are
607-
unstable). The compiler's token stream is
608-
[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
609-
converted into the stable [`proc_macro::TokenStream`][stablets] and back in
593+
Procedural `macro`s are also expanded during parsing. However, rather than
594+
having a parser in the compiler, `proc macro`s are implemented as custom,
595+
third-party crates. The compiler will compile the `proc macro` crate and
596+
specially annotated functions in them (i.e. the `proc macro` itself), passing
597+
them a stream of tokens. A `proc macro` can then transform the token stream and
598+
output a new token stream, which is synthesized into the `AST`.
599+
600+
The token stream type used by `proc macro`s is _stable_, so `rustc` does not
601+
use it internally. The compiler's (unstable) token stream is defined in
602+
[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the
603+
stable [`proc_macro::TokenStream`][stablets] and back in
610604
[`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms].
611-
Because the Rust ABI is unstable, we use the C ABI for this conversion.
605+
Since the Rust ABI is currently unstable, we use the C ABI for this conversion.
612606

613607
[tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
614608
[rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
@@ -617,10 +611,17 @@ Because the Rust ABI is unstable, we use the C ABI for this conversion.
617611
[pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
618612
[`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html
619613

620-
TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
614+
<!-- TODO(rylev): more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->
621615

622616
### Custom Derive
623617

624-
Custom derives are a special type of proc `macro`.
618+
Custom derives are a special type of `proc macro`.
619+
620+
### Macros By Example and Macros 2.0
621+
622+
There is an legacy and mostly undocumented effort to improve the `MBE` system
623+
by giving it more hygiene-related features, better scoping and visibility
624+
rules, etc. Internally this uses the same machinery as today's `MBE`s with some
625+
additional syntactic sugar and are allowed to be in namespaces.
625626

626-
TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
627+
<!-- TODO(rylev): more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->

0 commit comments

Comments
 (0)
Please sign in to comment.