Skip to content

Commit f8531ef

Browse files
authored
Update README for exact types (#7600)
Mention that ref.func and allocation instructions are always typed as exact internally. Clarify that binary writing will generalize types as necessary. Remove some other outdated information about the supported text format and strings as well.
1 parent d175e45 commit f8531ef

File tree

1 file changed

+72
-62
lines changed

1 file changed

+72
-62
lines changed

README.md

Lines changed: 72 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -72,117 +72,127 @@ There are a few differences between Binaryen IR and the WebAssembly language:
7272
* Binaryen IR [is a tree][binaryen_ir], i.e., it has hierarchical structure,
7373
for convenience of optimization. This differs from the WebAssembly binary
7474
format which is a stack machine.
75-
* Consequently Binaryen's text format allows only s-expressions.
76-
WebAssembly's official text format is primarily a linear instruction list
77-
(with s-expression extensions). Binaryen can't read the linear style, but
78-
it can read a wasm text file if it contains only s-expressions.
75+
7976
* Binaryen uses Stack IR to optimize "stacky" code (that can't be
8077
represented in structured form).
78+
8179
* When stacky code must be represented in Binaryen IR, such as with
8280
multivalue instructions and blocks, it is represented with tuple types that
8381
do not exist in the WebAssembly language. In addition to multivalue
8482
instructions, locals and globals can also have tuple types in Binaryen IR
8583
but not in WebAssembly. Experiments show that better support for
8684
multivalue could enable useful but small code size savings of 1-3%, so it
8785
has not been worth changing the core IR structure to support it better.
86+
8887
* Block input values (currently only supported in `catch` blocks in the
8988
exception handling feature) are represented as `pop` subexpressions.
89+
9090
* Types and unreachable code
91+
9192
* WebAssembly limits block/if/loop types to none and the concrete value types
9293
(i32, i64, f32, f64). Binaryen IR has an unreachable type, and it allows
9394
block/if/loop to take it, allowing [local transforms that don't need to
9495
know the global context][unreachable]. As a result, Binaryen's default
9596
text output is not necessarily valid wasm text. (To get valid wasm text,
9697
you can do `--generate-stack-ir --print-stack-ir`, which prints Stack IR,
9798
this is guaranteed to be valid for wasm parsers.)
98-
* Binaryen ignores unreachable code when reading WebAssembly binaries. That
99-
means that if you read a wasm file with unreachable code, that code will be
100-
discarded as if it were optimized out (often this is what you want anyhow,
101-
and optimized programs have no unreachable code anyway, but if you write an
102-
unoptimized file and then read it, it may look different). The reason for
103-
this behavior is that unreachable code in WebAssembly has corner cases that
104-
are tricky to handle in Binaryen IR (it can be very unstructured, and
105-
Binaryen IR is more structured than WebAssembly as noted earlier). Note
106-
that Binaryen does support unreachable code in .wat text files, since as we
107-
saw Binaryen only supports s-expressions there, which are structured.
99+
108100
* Binaryen supports a `stringref` type. This is similar to the currently-
109-
frozen [stringref proposal], with the difference that the string type is a
101+
inactive [stringref proposal], with the difference that the string type is a
110102
subtype of `externref` rather than `anyref`. Doing so allows toolchains to
111103
emit code in a form that uses [js string builtins] which Binaryen can then
112104
"lift" into stringref in its internal IR, optimize (for example, a
113105
concatenation of "a" and "b" can be optimized at compile time to "ab"), and
114106
then "lower" that into js string builtins once more.
107+
115108
* Blocks
116-
* Binaryen IR has only one node that contains a variable-length list of
117-
operands: the block. WebAssembly on the other hand allows lists in loops,
118-
if arms, and the top level of a function. Binaryen's IR has a single
119-
operand for all non-block nodes; this operand may of course be a block.
120-
The motivation for this property is that many passes need special code
121-
for iterating on lists, so having a single IR node with a list simplifies
122-
them.
123-
* As in wasm, blocks and loops may have names. Branch targets in the IR are
124-
resolved by name (as opposed to nesting depth). This has 2 consequences:
109+
110+
* Binaryen IR has only one control flow structure that contains a
111+
variable-length list of children: the block. WebAssembly on the other hand
112+
allows all control flow structures, such as loops, if arms, and function
113+
bodies, to have multiple children. In Binaryen IR, these other control flow
114+
structures have a single child. This child may of course be a block. The
115+
motivation for this property is that many passes need special code for
116+
iterating on lists of instructions, so having a single IR node with a list
117+
simplifies them.
118+
119+
* As in the Wasm text format, blocks and loops may have names. Branch targets
120+
in the IR are resolved by name (as opposed to nesting depth). This has 2
121+
consequences:
122+
125123
* Blocks without names may not be branch targets.
124+
126125
* Names are required to be unique. (Reading .wat files with duplicate names
127126
is supported; the names are modified when the IR is constructed).
128-
* As an optimization, a block that is the child of a loop (or if arm, or
129-
function toplevel) and which has no branches targeting it will not be
130-
emitted when generating wasm. Instead its list of operands will be directly
131-
used in the containing node. Such a block is sometimes called an "implicit
132-
block".
127+
128+
* As an optimization, a block with no name, which can never be a branch
129+
target, will not be emitted when generating wasm. Instead its list of
130+
children will be directly used in the containing control flow structure.
131+
Such a block is sometimes called an "implicit block".
132+
133133
* Reference Types
134+
134135
* The wasm text and binary formats require that a function whose address is
135136
taken by `ref.func` must be either in the table, or declared via an
136137
`(elem declare func $..)`. Binaryen will emit that data when necessary, but
137138
it does not represent it in IR. That is, IR can be worked on without needing
138139
to think about declaring function references.
139-
* Binaryen IR allows non-nullable locals in the form that the wasm spec does,
140-
(which was historically nicknamed "1a"), in which a `local.get` must be
141-
structurally dominated by a `local.set` in order to validate (that ensures
142-
we do not read the default value of null). Despite being aligned with the
143-
wasm spec, there are some minor details that you may notice:
140+
141+
* Binaryen IR allows non-nullable locals in the form that the Wasm spec does,
142+
in which a `local.get` must be structurally dominated by a `local.set` in
143+
order to validate (that ensures we do not read the default value of null).
144+
Despite being aligned with the Wasm spec, there are some minor details that
145+
you may notice:
146+
144147
* A nameless `Block` in Binaryen IR does not interfere with validation.
145148
Nameless blocks are never emitted into the binary format (we just emit
146-
their contents), so we ignore them for purposes of non-nullable locals. As
147-
a result, if you read wasm text emitted by Binaryen then you may see what
148-
seems to be code that should not validate per the spec (and may not
149-
validate in wasm text parsers), but that difference will not exist in the
150-
binary format (binaries emitted by Binaryen will always work everywhere,
151-
aside for bugs of course).
149+
their contents), so we ignore them for purposes of validating
150+
non-nullable locals. As a result, if you read wasm text emitted by
151+
Binaryen then you may see what seems to be code that should not validate
152+
per the spec (and may not validate in Wasm text parsers), but that
153+
difference will not exist in the binary format (binaries emitted by
154+
Binaryen will always work everywhere, aside from bugs of course).
155+
152156
* The Binaryen pass runner will automatically fix up validation after each
153157
pass (finding things that do not validate and fixing them up, usually by
154158
demoting a local to be nullable). As a result you do not need to worry
155159
much about this when writing Binaryen passes. For more details see the
156160
`requiresNonNullableLocalFixups()` hook in `pass.h` and the
157161
`LocalStructuralDominance` class.
162+
158163
* Binaryen IR uses the most refined types possible for references,
159164
specifically:
160-
* The IR type of a `ref.func` is always a specific function type, and not
161-
plain `funcref`. It is also non-nullable.
165+
166+
* The IR type of a `ref.func` is always an exact, non-nullable reference to
167+
a defined function type, and not plain `funcref`, even if no features
168+
beyond basic reference types are enabled.
169+
170+
* The IR type of allocation instructions such as `struct.new` or
171+
`array.new` is always an exact reference, even if Custom Descriptors are
172+
not enabled.
173+
162174
* Non-nullable types are also used for the type that `try_table` sends
163175
on branches (if we branch, a null is never sent), that is, it sends
164176
(ref exn) and not (ref null exn).
165-
In both cases if GC is not enabled then we emit the less-refined type in the
166-
binary. When reading a binary, the more refined types will be applied as we
167-
build the IR.
177+
178+
* As a result, non-nullable and exact references are generally allowed in
179+
the IR even when GC or Custom Descriptors is not enabled. When reading a
180+
binary, the more refined types will be applied as we build the IR.
181+
182+
In all cases the binary writer will generalize the type as necessary for
183+
the enabled feature set. For example, if only Reference Types is enabled,
184+
all function reference types will be emitted as `funcref`.
185+
168186
* `br_if` output types are more refined in Binaryen IR: they have the type of
169-
the value, when a value flows in. In the wasm spec the type is that of the
170-
branch target, which may be less refined. Using the more refined type here
171-
ensures that we optimize in the best way possible, using all the type
172-
information, but it does mean that some roundtripping operations may look a
173-
little different. In particular, when we emit a `br_if` whose type is more
174-
refined in Binaryen IR then we emit a cast right after it, so that the
175-
output has the right type in the wasm spec. That may cause a few bytes of
176-
extra size in rare cases (we avoid this overhead in the common case where
177-
the `br_if` value is unused).
178-
* Strings
179-
* Binaryen allows string views (`stringview_wtf16` etc.) to be cast using
180-
`ref.cast`. This simplifies the IR, as it allows `ref.cast` to always be
181-
used in all places (and it is lowered to `ref.as_non_null` where possible
182-
in the optimizer). The stringref spec does not seem to allow this though,
183-
and to fix that the binary writer will replace `ref.cast` that casts a
184-
string view to a non-nullable type to `ref.as_non_null`. A `ref.cast` of a
185-
string view that is a no-op is skipped entirely.
187+
the sent value operand, when it exists. In the Wasm spec the type is that
188+
of the branch target, which may be less refined. Using the more refined
189+
type here ensures that we optimize in the best way possible, using all the
190+
type information, but it does mean that some roundtripping operations may
191+
look a little different. In particular, when we emit a `br_if` whose type
192+
is more refined in Binaryen IR, then we emit a cast right after it to
193+
recover the more refined type. That may cause a few bytes of extra size in
194+
rare cases (we avoid this overhead in the common case where the `br_if`
195+
value is unused).
186196

187197
As a result, you might notice that round-trip conversions (wasm => Binaryen IR
188198
=> wasm) change code a little in some corner cases.

0 commit comments

Comments
 (0)