Description
Goal
We need to implement Rust code formatter on top of ra_syntax's concrete syntax tree.
Long-term, I am almost sure that we should just port rustfmt, with two modifications:
- it should be able to format an isolated syntax node (as opposed to formatting the whole module)
- it should be able to format code even in presence of sever syntax errors
However, I think it would be valuable to experiment a bit short-medium term!
There are two kinds of code formatters out there:
- pretty printers, which more or less completely rewrite the code, disregarding existing white space completely.
- pattern-based formatters, which fix whitespace between tokens and indentation, but doesn't generally introduce or remove newlines.
Rustfmt, JavaScript's prettier and Python's black are examples of the first breed.
Gofmt and IntelliJ formatter are examples of the second kind.
At https://users.rust-lang.org/t/how-are-you-using-rustfmt-and-clippy/31082/20?u=matklad, some folks suggested that they prefer gofmt style to rustfmt.
So I think it makes sense to just implement that on top of rust-analyzer, as an experiment.
To be clear, the goal is not to implement a different code style for rustfmt: code, already formatted by rustfmt, should not be changed by ra-fmt.
However, ra-fmt should be more flexible about line breaks, allowing to layout code, manually, in a more readable way than rustfmt's style.
Actual Task
So, how does one actually implements a rule-based formatter?
The core idea is to implement formatting as a set of rules like "=>
token should be surrounded by exactly one whitespace inside match arm", than walk through each syntax node and apply the matching rules to each node.
So, formatter is typically split into DLS for defining rules and engine for applying them.
Examples
For example, here's the set of rules for Kotlin:
Here's the DLS definition form IntelliJ:
And here's the IntelliJ's engine:
In a smaller scale, all three parts are implemented in nixpkgs-fmt:
https://github.com/nix-community/nixpkgs-fmt
And of course there's a set of rules that IntelliJ Rust is using:
Fmt Model
While one can apply formatting operations directly to the syntax tree, I think it's a good idea to introduce an intermediate formatting model tree.
The benefits are:
- whitespace gets an abstract representation
- empty whitespace is explicitelly represented
- rust-specific: the fmt tree can be made single-threaded an mutable
- sometimes it's useful to make the fmt tree of slightly a different shape than the syntax tree
A good place to look at would be this line:
Uncomment it to see how the actual model looks like
for
fn main() {
frobnicate()
.foo().bar()
}
I get
<block FILE 0:51 <Indent: NONE>>
<block FUNCTION 0:51 <Indent: NONE>>
"fn" 0:2 <Indent: NONE>
spacing: <Spacing: minSpaces=1 maxSpaces=1 minLineFeeds=0>
"main" 3:7 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block VALUE_PARAMETER_LIST 7:9 <Indent: NONE>>
"(" 7:8 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
")" 8:9 <Indent: NONE>
spacing: <Spacing: minSpaces=1 maxSpaces=1 minLineFeeds=0>
<block BLOCK 10:51 <Indent: NONE>>
"{" 10:11 <Indent: NONE>
spacing: <DependantSpacing: minSpaces=1 maxSpaces=1 minLineFeeds=0 dep=(10,51)>
<block DOT_EXPR 16:49 <Indent: NORMAL>>
<block DOT_EXPR 16:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
<block CALL_EXPR 16:28 <Indent: CONTINUATION_WITHOUT_FIRST>>
<block PATH_EXPR 16:26 <Indent: CONTINUATION_WITHOUT_FIRST>>
<block PATH 16:26 <Indent: CONTINUATION_WITHOUT_FIRST>>
"frobnicate" 16:26 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block VALUE_ARGUMENT_LIST 26:28 <Indent: CONTINUATION_WITHOUT_FIRST>>
"(" 26:27 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
")" 27:28 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block . 37:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
"." 37:38 <Indent: CONTINUATION_WITHOUT_FIRST>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block METHOD_CALL 38:43 <Indent: CONTINUATION_WITHOUT_FIRST>>
"foo" 38:41 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block VALUE_ARGUMENT_LIST 41:43 <Indent: NONE>>
"(" 41:42 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
")" 42:43 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block . 43:49 <Indent: CONTINUATION_WITHOUT_FIRST>>
"." 43:44 <Indent: CONTINUATION_WITHOUT_FIRST>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block METHOD_CALL 44:49 <Indent: CONTINUATION_WITHOUT_FIRST>>
"bar" 44:47 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
<block VALUE_ARGUMENT_LIST 47:49 <Indent: NONE>>
"(" 47:48 <Indent: NONE>
spacing: <Spacing: minSpaces=0 maxSpaces=0 minLineFeeds=0>
")" 48:49 <Indent: NONE>
spacing: <DependantSpacing: minSpaces=1 maxSpaces=1 minLineFeeds=0 dep=(10,51)>
"}" 50:51 <Indent: NONE>
Note how's stuff after .
is split in it's own block!
The model in nixpkgs-fmt uses explitic whitespace, but uses existing syntax nodes for blocks.
I feel like for rust we should also add fmt-blocks, precisely to deal with chained calls problem.
Passes
After fmt model is build, you run severl passes over it, which fix particula aspects of formatting.
- first, spacing between the tokens is enforced
- second, indentation is fixed
- optionally, an alighment pass adds whitespace such that stuff on different lines aligns (we probably dont' need this for Rust)
- finally, we render the results of formatting as either of:
- fully formatted string
- a diff of some form
- fully formatted syntax tree
Impl Plan
The rough plan is:
- build the testing harness for this. Take a look at inline tests in nixpkgs-fmt, I find them very valuable
- build formatting model
- build spacing DSL, pass that enforces spacing and a couple of simple spacing rules
- build indentation DLS, pass that fixes indentation and a copule of simple indentation rules
- add rules for the full rust language
This should live in ra_fmt
crate and use syntax tree from ra_syntax
crate.