Skip to content

Implement pattern-based formatter for rust-code on top of rust-analyzer #1665

Open
@matklad

Description

@matklad

Goal

We need to implement Rust code formatter on top of ra_syntax's concrete syntax tree.
Long-term, I am almost sure that we should just port rustfmt, with two modifications:

  • it should be able to format an isolated syntax node (as opposed to formatting the whole module)
  • it should be able to format code even in presence of sever syntax errors

However, I think it would be valuable to experiment a bit short-medium term!

There are two kinds of code formatters out there:

  • pretty printers, which more or less completely rewrite the code, disregarding existing white space completely.
  • pattern-based formatters, which fix whitespace between tokens and indentation, but doesn't generally introduce or remove newlines.

Rustfmt, JavaScript's prettier and Python's black are examples of the first breed.
Gofmt and IntelliJ formatter are examples of the second kind.

At https://users.rust-lang.org/t/how-are-you-using-rustfmt-and-clippy/31082/20?u=matklad, some folks suggested that they prefer gofmt style to rustfmt.
So I think it makes sense to just implement that on top of rust-analyzer, as an experiment.
To be clear, the goal is not to implement a different code style for rustfmt: code, already formatted by rustfmt, should not be changed by ra-fmt.
However, ra-fmt should be more flexible about line breaks, allowing to layout code, manually, in a more readable way than rustfmt's style.

Actual Task

So, how does one actually implements a rule-based formatter?

The core idea is to implement formatting as a set of rules like "=> token should be surrounded by exactly one whitespace inside match arm", than walk through each syntax node and apply the matching rules to each node.

So, formatter is typically split into DLS for defining rules and engine for applying them.

Examples

For example, here's the set of rules for Kotlin:

https://github.com/JetBrains/kotlin/tree/7d173ed3856e429739b55d8c3892e1b85ca41571/idea/formatter/src/org/jetbrains/kotlin/idea/formatter

Here's the DLS definition form IntelliJ:

https://github.com/JetBrains/intellij-community/tree/5c3f54c486856d513c33088a76c7805d312d7ae7/platform/lang-api/src/com/intellij/formatting

And here's the IntelliJ's engine:

https://github.com/JetBrains/intellij-community/tree/5c3f54c486856d513c33088a76c7805d312d7ae7/platform/lang-impl/src/com/intellij/formatting/engine

In a smaller scale, all three parts are implemented in nixpkgs-fmt:

https://github.com/nix-community/nixpkgs-fmt

And of course there's a set of rules that IntelliJ Rust is using:

https://github.com/intellij-rust/intellij-rust/tree/ccf1c54db8e77db90764e32651c8503564a05bc0/src/main/kotlin/org/rust/ide/formatter

Fmt Model

While one can apply formatting operations directly to the syntax tree, I think it's a good idea to introduce an intermediate formatting model tree.
The benefits are:

  • whitespace gets an abstract representation
  • empty whitespace is explicitelly represented
  • rust-specific: the fmt tree can be made single-threaded an mutable
  • sometimes it's useful to make the fmt tree of slightly a different shape than the syntax tree

A good place to look at would be this line:

https://github.com/intellij-rust/intellij-rust/blob/ccf1c54db8e77db90764e32651c8503564a05bc0/src/main/kotlin/org/rust/ide/formatter/RsFormattingModelBuilder.kt#L27

Uncomment it to see how the actual model looks like

for

fn main() {
    frobnicate()
        .foo().bar()
}

I get

  <block FILE 0:51 <Indent: NONE>>
    <block FUNCTION 0:51 <Indent: NONE>>
      "fn" 0:2 <Indent: NONE>
      spacing: <Spacing: minSpaces=1 maxSpaces=1 minLineFeeds=0>
      "main" 3:7 <Indent: NONE>
      spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
      <block VALUE_PARAMETER_LIST 7:9 <Indent: NONE>>
        "(" 7:8 <Indent: NONE>
        spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
        ")" 8:9 <Indent: NONE>
      spacing: <Spacing: minSpaces=1 maxSpaces=1 minLineFeeds=0>
      <block BLOCK 10:51 <Indent: NONE>>
        "{" 10:11 <Indent: NONE>
        spacing: <DependantSpacing: minSpaces=1 maxSpaces=1 minLineFeeds=0 dep=(10,51)>
        <block DOT_EXPR 16:49 <Indent: NORMAL>>
          <block DOT_EXPR 16:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
            <block CALL_EXPR 16:28 <Indent: CONTINUATION_WITHOUT_FIRST>>
              <block PATH_EXPR 16:26 <Indent: CONTINUATION_WITHOUT_FIRST>>
                <block PATH 16:26 <Indent: CONTINUATION_WITHOUT_FIRST>>
                  "frobnicate" 16:26 <Indent: NONE>
              spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
              <block VALUE_ARGUMENT_LIST 26:28 <Indent: CONTINUATION_WITHOUT_FIRST>>
                "(" 26:27 <Indent: NONE>
                spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                ")" 27:28 <Indent: NONE>
            spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
            <block . 37:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
              "." 37:38 <Indent: CONTINUATION_WITHOUT_FIRST>
              spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
              <block METHOD_CALL 38:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
                "foo" 38:41 <Indent: NONE>
                spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                <block VALUE_ARGUMENT_LIST 41:43 <Indent: NONE>>
                  "(" 41:42 <Indent: NONE>
                  spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                  ")" 42:43 <Indent: NONE>
          spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
          <block . 43:49 <Indent: CONTINUATION_WITHOUT_FIRST>>
            "." 43:44 <Indent: CONTINUATION_WITHOUT_FIRST>
            spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
            <block METHOD_CALL 44:49 <Indent: CONTINUATION_WITHOUT_FIRST>>
              "bar" 44:47 <Indent: NONE>
              spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
              <block VALUE_ARGUMENT_LIST 47:49 <Indent: NONE>>
                "(" 47:48 <Indent: NONE>
                spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
                ")" 48:49 <Indent: NONE>
        spacing: <DependantSpacing: minSpaces=1 maxSpaces=1 minLineFeeds=0 dep=(10,51)>
        "}" 50:51 <Indent: NONE>

Note how's stuff after . is split in it's own block!

The model in nixpkgs-fmt uses explitic whitespace, but uses existing syntax nodes for blocks.
I feel like for rust we should also add fmt-blocks, precisely to deal with chained calls problem.

Passes

After fmt model is build, you run severl passes over it, which fix particula aspects of formatting.

  • first, spacing between the tokens is enforced
  • second, indentation is fixed
  • optionally, an alighment pass adds whitespace such that stuff on different lines aligns (we probably dont' need this for Rust)
  • finally, we render the results of formatting as either of:
    • fully formatted string
    • a diff of some form
    • fully formatted syntax tree

Impl Plan

The rough plan is:

  • build the testing harness for this. Take a look at inline tests in nixpkgs-fmt, I find them very valuable
  • build formatting model
  • build spacing DSL, pass that enforces spacing and a couple of simple spacing rules
  • build indentation DLS, pass that fixes indentation and a copule of simple indentation rules
  • add rules for the full rust language

This should live in ra_fmt crate and use syntax tree from ra_syntax crate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-formattingformatting of r-a output/formatting on saveE-hardE-has-instructionsIssue has some instructions and pointers to code to get startedS-actionableSomeone could pick this issue up and work on it right nowfunA technically challenging issue with high impactgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions