@@ -331,9 +331,11 @@ a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an int
331
331
[ am ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
332
332
333
333
For built-in ` macro ` s, we use the context:
334
- ` SyntaxContext::empty().apply_mark(expn_id) ` , and such ` macro ` s are considered to
335
- be defined at the hierarchy root. We do the same for ` proc macro ` s because we
336
- haven't implemented cross-crate hygiene yet.
334
+ [ ` SyntaxContext::empty().apply_mark(expn_id) ` ] , and such ` macro ` s are
335
+ considered to be defined at the hierarchy root. We do the same for `proc
336
+ macro`s because we haven't implemented cross-crate hygiene yet.
337
+
338
+ [ `SyntaxContext::empty().apply_mark(expn_id)` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
337
339
338
340
If the token had context ` X ` before being produced by a ` macro ` then after being
339
341
produced by the ` macro ` it has context ` X -> macro_id ` . Here are some examples:
@@ -346,12 +348,11 @@ macro m() { ident }
346
348
m!();
347
349
```
348
350
349
- Here ` ident ` originally has context [ ` SyntaxContext::root ` ] [ scr ] . ` ident ` has
351
+ Here ` ident ` which initially has context [ ` SyntaxContext::root ` ] [ scr ] has
350
352
context ` ROOT -> id(m) ` after it's produced by ` m ` .
351
353
352
354
[ scr ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root
353
355
354
-
355
356
Example 1:
356
357
357
358
``` rust,ignore
@@ -360,7 +361,8 @@ macro m() { macro n() { ident } }
360
361
m!();
361
362
n!();
362
363
```
363
- In this example the ` ident ` has context ` ROOT ` originally, then ` ROOT -> id(m) `
364
+
365
+ In this example the ` ident ` has context ` ROOT ` initially, then ` ROOT -> id(m) `
364
366
after the first expansion, then ` ROOT -> id(m) -> id(n) ` .
365
367
366
368
Example 2:
@@ -377,11 +379,11 @@ m!(foo);
377
379
After all expansions, ` foo ` has context ` ROOT -> id(n) ` and ` bar ` has context
378
380
` ROOT -> id(m) -> id(n) ` .
379
381
380
- Finally, one last thing to mention is that currently, this hierarchy is subject
381
- to the [ "context transplantation hack"] [ hack ] . Basically, the more modern (and
382
- experimental) ` macro ` ` macro ` s have stronger hygiene than the older MBE system,
383
- but this can result in weird interactions between the two. The hack is intended
384
- to make things "just work" for now.
382
+ Currently this hierarchy for tracking ` macro ` definitions is subject to the
383
+ so-called [ "context transplantation hack"] [ hack ] . Modern (i.e. experimental)
384
+ ` macro ` s have stronger hygiene than the legacy "Macros By Example" ( ` MBE ` )
385
+ system which can result in weird interactions between the two. The hack is
386
+ intended to make things "just work" for now.
385
387
386
388
[ `ExpnId` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
387
389
[ hack ] : https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
@@ -390,7 +392,8 @@ to make things "just work" for now.
390
392
391
393
The third and final hierarchy tracks the location of ` macro ` invocations.
392
394
393
- In this hierarchy [ ` ExpnData::call_site ` ] [ callsite ] is the child -> parent link.
395
+ In this hierarchy [ ` ExpnData::call_site ` ] [ callsite ] is the ` child -> parent `
396
+ link.
394
397
395
398
[ callsite ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site
396
399
@@ -420,20 +423,22 @@ Above, we saw how the output of a `macro` is integrated into the `AST` for a cra
420
423
and we also saw how the hygiene data for a crate is generated. But how do we
421
424
actually produce the output of a ` macro ` ? It depends on the type of ` macro ` .
422
425
423
- There are two types of ` macro ` s in Rust:
424
- ` macro_rules! ` ` macro ` s (a.k.a. "Macros By Example" (MBE)) and procedural ` macro ` s
425
- (or "proc ` macro ` s"; including custom derives). During the parsing phase, the normal
426
- Rust parser will set aside the contents of ` macro ` s and their invocations. Later,
427
- ` macro ` s are expanded using these portions of the code.
426
+ There are two types of ` macro ` s in Rust:
427
+ 1 . ` macro_rules! ` macros, and,
428
+ 2 . procedural ` macro ` s (` proc macro ` s); including custom derives.
429
+
430
+ During the parsing phase, the normal Rust parser will set aside the contents of
431
+ ` macro ` s and their invocations. Later, ` macro ` s are expanded using these
432
+ portions of the code.
428
433
429
434
Some important data structures/interfaces here:
430
435
- [ ` SyntaxExtension ` ] - a lowered ` macro ` representation, contains its expander
431
- function, which transforms a ` TokenStream ` or ` AST ` into another ` TokenStream `
432
- or ` AST ` + some additional data like stability, or a list of unstable features
433
- allowed inside the ` macro ` .
436
+ function, which transforms a [ ` TokenStream ` ] or ` AST ` into another
437
+ [ ` TokenStream ` ] or ` AST ` + some additional data like stability, or a list of
438
+ unstable features allowed inside the ` macro ` .
434
439
- [ ` SyntaxExtensionKind ` ] - expander functions may have several different
435
440
signatures (take one token stream, or two, or a piece of ` AST ` , etc). This is
436
- an enum that lists them.
441
+ an ` enum ` that lists them.
437
442
- [ ` BangProcMacro ` ] /[ ` TTMacroExpander ` ] /[ ` AttrProcMacro ` ] /[ ` MultiItemModifier ` ] -
438
443
` trait ` s representing the expander function signatures.
439
444
@@ -446,18 +451,15 @@ Some important data structures/interfaces here:
446
451
447
452
## Macros By Example
448
453
449
- MBEs have their own parser distinct from the normal Rust parser. When ` macro ` s
450
- are expanded, we may invoke the MBE parser to parse and expand a ` macro ` . The
451
- MBE parser, in turn, may call the normal Rust parser when it needs to bind a
452
- metavariable (e.g. ` $my_expr ` ) while parsing the contents of a ` macro `
454
+ ` MBE ` s have their own parser distinct from the Rust parser. When ` macro ` s are
455
+ expanded, we may invoke the ` MBE ` parser to parse and expand a ` macro ` . The
456
+ ` MBE ` parser, in turn, may call the Rust parser when it needs to bind a
457
+ metavariable (e.g. ` $my_expr ` ) while parsing the contents of a ` macro `
453
458
invocation. The code for ` macro ` expansion is in
454
459
[ ` compiler/rustc_expand/src/mbe/ ` ] [ code_dir ] .
455
460
456
461
### Example
457
462
458
- It's helpful to have an example to refer to. For the remainder of this chapter,
459
- whenever we refer to the "example _ definition_ ", we mean the following:
460
-
461
463
``` rust,ignore
462
464
macro_rules! printer {
463
465
(print $mvar:ident) => {
@@ -470,41 +472,41 @@ macro_rules! printer {
470
472
}
471
473
```
472
474
473
- ` $mvar ` is called a _ metavariable_ . Unlike normal variables, rather than
474
- binding to a value in a computation , a metavariable binds _ at compile time_ to
475
- a tree of _ tokens_ . A _ token_ is a single "unit" of the grammar, such as an
475
+ Here ` $mvar ` is called a _ metavariable_ . Unlike normal variables, rather than
476
+ binding to a value _ at runtime _ , a metavariable binds _ at compile time_ to a
477
+ tree of _ tokens_ . A _ token_ is a single "unit" of the grammar, such as an
476
478
identifier (e.g. ` foo ` ) or punctuation (e.g. ` => ` ). There are also other
477
- special tokens, such as ` EOF ` , which indicates that there are no more tokens.
478
- Token trees resulting from paired parentheses-like characters (` ( ` ...` ) ` ,
479
- ` [ ` ...` ] ` , and ` { ` ...` } ` ) – they include the open and close and all the tokens
480
- in between (we do require that parentheses-like characters be balanced). Having
481
- ` macro ` expansion operate on token streams rather than the raw bytes of a source
482
- file abstracts away a lot of complexity. The ` macro ` expander (and much of the
483
- rest of the compiler) doesn't really care that much about the exact line and
484
- column of some syntactic construct in the code; it cares about what constructs
485
- are used in the code. Using tokens allows us to care about _ what_ without
486
- worrying about _ where_ . For more information about tokens, see the
487
- [ Parsing] [ parsing ] chapter of this book.
488
-
489
- Whenever we refer to the "example _ invocation_ ", we mean the following snippet:
479
+ special tokens, such as ` EOF ` , which its self indicates that there are no more
480
+ tokens. There are token trees resulting from the paired parentheses-like
481
+ characters (` ( ` ...` ) ` , ` [ ` ...` ] ` , and ` { ` ...` } ` ) – they include the open and
482
+ close and all the tokens in between (Rust requires that parentheses-like
483
+ characters be balanced). Having ` macro ` expansion operate on token streams
484
+ rather than the raw bytes of a source-file abstracts away a lot of complexity.
485
+ The ` macro ` expander (and much of the rest of the compiler) doesn't consider
486
+ the exact line and column of some syntactic construct in the code; it considers
487
+ which constructs are used in the code. Using tokens allows us to care about
488
+ _ what_ without worrying about _ where_ . For more information about tokens, see
489
+ the [ Parsing] [ parsing ] chapter of this book.
490
490
491
491
``` rust,ignore
492
- printer!(print foo); // Assume `foo` is a variable defined somewhere else...
492
+ printer!(print foo); // `foo` is a variable
493
493
```
494
494
495
495
The process of expanding the ` macro ` invocation into the syntax tree
496
- ` println!("{}", foo) ` and then expanding that into a call to ` Display::fmt ` is
497
- called _ ` macro ` expansion _ , and it is the topic of this chapter .
496
+ ` println!("{}", foo) ` and then expanding the syntax tree into a call to
497
+ ` Display::fmt ` is one common example of _ ` macro ` expansion _ .
498
498
499
499
### The MBE parser
500
500
501
- There are two parts to MBE expansion: parsing the definition and parsing the
502
- invocations. Interestingly, both are done by the ` macro ` parser.
501
+ There are two parts to ` MBE ` expansion done by the ` macro ` parser:
502
+ 1 . parsing the definition, and,
503
+ 2 . parsing the invocations.
503
504
504
- Basically, the MBE parser is like an NFA-based regex parser. It uses an
505
- algorithm similar in spirit to the [ Earley parsing
506
- algorithm] ( https://en.wikipedia.org/wiki/Earley_parser ) . The ` macro ` parser is
507
- defined in [ ` compiler/rustc_expand/src/mbe/macro_parser.rs ` ] [ code_mp ] .
505
+ We think of the ` MBE ` parser as a nondeterministic finite automaton (NFA) based
506
+ regex parser since it uses an algorithm similar in spirit to the [ Earley
507
+ parsing algorithm] ( https://en.wikipedia.org/wiki/Earley_parser ) . The ` macro `
508
+ parser is defined in
509
+ [ ` compiler/rustc_expand/src/mbe/macro_parser.rs ` ] [ code_mp ] .
508
510
509
511
The interface of the ` macro ` parser is as follows (this is slightly simplified):
510
512
@@ -518,97 +520,89 @@ fn parse_tt(
518
520
519
521
We use these items in ` macro ` parser:
520
522
521
- - ` parser ` is a reference to the state of a normal Rust parser, including the
522
- token stream and parsing session. The token stream is what we are about to
523
- ask the MBE parser to parse. We will consume the raw stream of tokens and
524
- output a binding of metavariables to corresponding token trees. The parsing
525
- session can be used to report parser errors.
526
- - ` matcher ` is a sequence of ` MatcherLoc ` s that we want to match
523
+ - a ` parser ` variable is a reference to the state of a normal Rust parser,
524
+ including the token stream and parsing session. The token stream is what we
525
+ are about to ask the ` MBE ` parser to parse. We will consume the raw stream of
526
+ tokens and output a binding of metavariables to corresponding token trees.
527
+ The parsing session can be used to report parser errors.
528
+ - a ` matcher ` variable is a sequence of [ ` MatcherLoc ` ] s that we want to match
527
529
the token stream against. They're converted from token trees before matching.
528
530
529
- In the analogy of a regex parser, the token stream is the input and we are matching it
530
- against the pattern ` matcher ` . Using our examples, the token stream could be the stream of
531
- tokens containing the inside of the example invocation ` print foo ` , while ` matcher `
532
- might be the sequence of token (trees) ` print $mvar:ident ` .
531
+ [ `MatcherLoc` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
532
+
533
+ In the analogy of a regex parser, the token stream is the input and we are
534
+ matching it against the pattern defined by ` matcher ` . Using our examples, the
535
+ token stream could be the stream of tokens containing the inside of the example
536
+ invocation ` print foo ` , while ` matcher ` might be the sequence of token (trees)
537
+ ` print $mvar:ident ` .
533
538
534
539
The output of the parser is a [ ` ParseResult ` ] , which indicates which of
535
540
three cases has occurred:
536
541
537
- - Success: the token stream matches the given ` matcher ` , and we have produced a binding
538
- from metavariables to the corresponding token trees.
539
- - Failure: the token stream does not match ` matcher ` . This results in an error message such as
540
- "No rule expected token _ blah _ ".
541
- - Error: some fatal error has occurred _ in the parser_ . For example, this
542
- happens if there is more than one pattern match, since that indicates
543
- the ` macro ` is ambiguous.
542
+ - ** Success** : the token stream matches the given ` matcher ` and we have produced a
543
+ binding from metavariables to the corresponding token trees.
544
+ - ** Failure** : the token stream does not match ` matcher ` and results in an error
545
+ message such as "No rule expected token ... ".
546
+ - ** Error** : some fatal error has occurred _ in the parser_ . For example, this
547
+ happens if there is more than one pattern match, since that indicates the
548
+ ` macro ` is ambiguous.
544
549
545
550
The full interface is defined [ here] [ code_parse_int ] .
546
551
547
- The ` macro ` parser does pretty much exactly the same as a normal regex parser with
548
- one exception: in order to parse different types of metavariables, such as
549
- ` ident ` , ` block ` , ` expr ` , etc., the ` macro ` parser must sometimes call back to the
550
- normal Rust parser.
551
-
552
- As mentioned above, both definitions and invocations of ` macro ` s are parsed using
553
- the ` macro ` parser. This is extremely non-intuitive and self-referential. The code
554
- to parse ` macro ` _ definitions _ is in
555
- [ ` compiler/rustc_expand/src/mbe/macro_rules.rs ` ] [ code_mr ] . It defines the pattern for
556
- matching for a ` macro ` definition as ` $( $lhs:tt => $rhs:tt );+ ` . In other words,
557
- a ` macro_rules ` definition should have in its body at least one occurrence of a
558
- token tree followed by ` => ` followed by another token tree. When the compiler
559
- comes to a ` macro_rules ` definition, it uses this pattern to match the two token
560
- trees per rule in the definition of the ` macro ` _ using the ` macro ` parser itself_ .
561
- In our example definition, the metavariable ` $lhs ` would match the patterns of
562
- both arms: ` (print $mvar:ident)` and ` (print twice $mvar:ident) ` . And ` $rhs `
563
- would match the bodies of both arms: ` { println!("{}", $mvar); } ` and `{
564
- println!("{}", $mvar); println!("{}", $mvar); } `. The parser would keep this
565
- knowledge around for when it needs to expand a ` macro ` invocation.
552
+ The ` macro ` parser does pretty much exactly the same as a normal regex parser
553
+ with one exception: in order to parse different types of metavariables, such as
554
+ ` ident ` , ` block ` , ` expr ` , etc., the ` macro ` parser must call back to the normal
555
+ Rust parser. Both the definition and invocation of ` macro ` s are parsed using
556
+ the parser in a process which is non-intuitively self-referential.
557
+
558
+ The code to parse ` macro ` _ definitions _ is in
559
+ [ ` compiler/rustc_expand/src/mbe/macro_rules.rs ` ] [ code_mr ] . It defines the
560
+ pattern for matching a ` macro ` definition as ` $( $lhs:tt => $rhs:tt );+ ` . In
561
+ other words, a ` macro_rules ` definition should have in its body at least one
562
+ occurrence of a token tree followed by ` => ` followed by another token tree.
563
+ When the compiler comes to a ` macro_rules ` definition, it uses this pattern to
564
+ match the two token trees per rule in the definition of the ` macro ` , _ thereby
565
+ utilizing the ` macro ` parser itself_ . In our example definition, the
566
+ metavariable ` $lhs ` would match the patterns of both arms: `(print
567
+ $mvar: ident )` and ` (print twice $mvar: ident )` . And ` $rhs` would match the
568
+ bodies of both arms: ` { println!("{}", $mvar); } ` and `{ println!("{}", $mvar);
569
+ println!("{}", $mvar); } `. The parser keeps this knowledge around for when it
570
+ needs to expand a ` macro ` invocation.
566
571
567
572
When the compiler comes to a ` macro ` invocation, it parses that invocation using
568
- the same NFA-based ` macro ` parser that is described above. However, the matcher
573
+ a NFA-based ` macro ` parser described above. However, the ` matcher ` variable
569
574
used is the first token tree (` $lhs ` ) extracted from the arms of the ` macro `
570
575
_ definition_ . Using our example, we would try to match the token stream `print
571
- foo` from the invocation against the matchers ` print $mvar: ident ` and ` print
572
- twice $mvar: ident ` that we previously extracted from the definition. The
576
+ foo` from the invocation against the ` matcher ` s ` print $mvar: ident ` and ` print
577
+ twice $mvar: ident ` that we previously extracted from the definition. The
573
578
algorithm is exactly the same, but when the ` macro ` parser comes to a place in the
574
- current matcher where it needs to match a _ non-terminal_ (e.g. ` $mvar:ident ` ),
579
+ current ` matcher ` where it needs to match a _ non-terminal_ (e.g. ` $mvar:ident ` ),
575
580
it calls back to the normal Rust parser to get the contents of that
576
581
non-terminal. In this case, the Rust parser would look for an ` ident ` token,
577
582
which it finds (` foo ` ) and returns to the ` macro ` parser. Then, the ` macro ` parser
578
- proceeds in parsing as normal. Also, note that exactly one of the matchers from
583
+ proceeds in parsing as normal. Also, note that exactly one of the ` matcher ` s from
579
584
the various arms should match the invocation; if there is more than one match,
580
585
the parse is ambiguous, while if there are no matches at all, there is a syntax
581
586
error.
582
587
583
588
For more information about the ` macro ` parser's implementation, see the comments
584
589
in [ ` compiler/rustc_expand/src/mbe/macro_parser.rs ` ] [ code_mp ] .
585
590
586
- ### ` macro ` s and Macros 2.0
587
-
588
- There is an old and mostly undocumented effort to improve the MBE system, give
589
- it more hygiene-related features, better scoping and visibility rules, etc. There
590
- hasn't been a lot of work on this recently, unfortunately. Internally, ` macro `
591
- ` macro ` s use the same machinery as today's MBEs; they just have additional
592
- syntactic sugar and are allowed to be in namespaces.
593
-
594
591
## Procedural Macros
595
592
596
- Procedural ` macro ` s are also expanded during parsing, as mentioned above.
597
- However, they use a rather different mechanism. Rather than having a parser in
598
- the compiler, procedural ` macro ` s are implemented as custom, third-party crates.
599
- The compiler will compile the proc ` macro ` crate and specially annotated
600
- functions in them (i.e. the proc ` macro ` itself), passing them a stream of tokens.
601
-
602
- The proc ` macro ` can then transform the token stream and output a new token
603
- stream, which is synthesized into the ` AST ` .
604
-
605
- It's worth noting that the token stream type used by proc ` macro ` s is _ stable_ ,
606
- so ` rustc ` does not use it internally (since our internal data structures are
607
- unstable). The compiler's token stream is
608
- [ ` rustc_ast::tokenstream::TokenStream ` ] [ rustcts ] , as previously. This is
609
- converted into the stable [ ` proc_macro::TokenStream ` ] [ stablets ] and back in
593
+ Procedural ` macro ` s are also expanded during parsing. However, rather than
594
+ having a parser in the compiler, ` proc macro ` s are implemented as custom,
595
+ third-party crates. The compiler will compile the ` proc macro ` crate and
596
+ specially annotated functions in them (i.e. the ` proc macro ` itself), passing
597
+ them a stream of tokens. A ` proc macro ` can then transform the token stream and
598
+ output a new token stream, which is synthesized into the ` AST ` .
599
+
600
+ The token stream type used by ` proc macro ` s is _ stable_ , so ` rustc ` does not
601
+ use it internally. The compiler's (unstable) token stream is defined in
602
+ [ ` rustc_ast::tokenstream::TokenStream ` ] [ rustcts ] . This is converted into the
603
+ stable [ ` proc_macro::TokenStream ` ] [ stablets ] and back in
610
604
[ ` rustc_expand::proc_macro ` ] [ pm ] and [ ` rustc_expand::proc_macro_server ` ] [ pms ] .
611
- Because the Rust ABI is unstable, we use the C ABI for this conversion.
605
+ Since the Rust ABI is currently unstable, we use the C ABI for this conversion.
612
606
613
607
[ tsmod ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
614
608
[ rustcts ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
@@ -617,10 +611,17 @@ Because the Rust ABI is unstable, we use the C ABI for this conversion.
617
611
[ pms ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
618
612
[ `ParseResult` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html
619
613
620
- TODO: more here. [ #1160 ] ( https://github.com/rust-lang/rustc-dev-guide/issues/1160 )
614
+ <!-- TODO(rylev) : more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->
621
615
622
616
### Custom Derive
623
617
624
- Custom derives are a special type of proc ` macro ` .
618
+ Custom derives are a special type of ` proc macro ` .
619
+
620
+ ### Macros By Example and Macros 2.0
621
+
622
+ There is an legacy and mostly undocumented effort to improve the ` MBE ` system
623
+ by giving it more hygiene-related features, better scoping and visibility
624
+ rules, etc. Internally this uses the same machinery as today's ` MBE ` s with some
625
+ additional syntactic sugar and are allowed to be in namespaces.
625
626
626
- TODO: more? [ #1160 ] ( https://github.com/rust-lang/rustc-dev-guide/issues/1160 )
627
+ <!-- TODO(rylev) : more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->
0 commit comments