@@ -72,117 +72,127 @@ There are a few differences between Binaryen IR and the WebAssembly language:
72
72
* Binaryen IR [ is a tree] [ binaryen_ir ] , i.e., it has hierarchical structure,
73
73
for convenience of optimization. This differs from the WebAssembly binary
74
74
format which is a stack machine.
75
- * Consequently Binaryen's text format allows only s-expressions.
76
- WebAssembly's official text format is primarily a linear instruction list
77
- (with s-expression extensions). Binaryen can't read the linear style, but
78
- it can read a wasm text file if it contains only s-expressions.
75
+
79
76
* Binaryen uses Stack IR to optimize "stacky" code (that can't be
80
77
represented in structured form).
78
+
81
79
* When stacky code must be represented in Binaryen IR, such as with
82
80
multivalue instructions and blocks, it is represented with tuple types that
83
81
do not exist in the WebAssembly language. In addition to multivalue
84
82
instructions, locals and globals can also have tuple types in Binaryen IR
85
83
but not in WebAssembly. Experiments show that better support for
86
84
multivalue could enable useful but small code size savings of 1-3%, so it
87
85
has not been worth changing the core IR structure to support it better.
86
+
88
87
* Block input values (currently only supported in ` catch ` blocks in the
89
88
exception handling feature) are represented as ` pop ` subexpressions.
89
+
90
90
* Types and unreachable code
91
+
91
92
* WebAssembly limits block/if/loop types to none and the concrete value types
92
93
(i32, i64, f32, f64). Binaryen IR has an unreachable type, and it allows
93
94
block/if/loop to take it, allowing [ local transforms that don't need to
94
95
know the global context] [ unreachable ] . As a result, Binaryen's default
95
96
text output is not necessarily valid wasm text. (To get valid wasm text,
96
97
you can do ` --generate-stack-ir --print-stack-ir ` , which prints Stack IR,
97
98
this is guaranteed to be valid for wasm parsers.)
98
- * Binaryen ignores unreachable code when reading WebAssembly binaries. That
99
- means that if you read a wasm file with unreachable code, that code will be
100
- discarded as if it were optimized out (often this is what you want anyhow,
101
- and optimized programs have no unreachable code anyway, but if you write an
102
- unoptimized file and then read it, it may look different). The reason for
103
- this behavior is that unreachable code in WebAssembly has corner cases that
104
- are tricky to handle in Binaryen IR (it can be very unstructured, and
105
- Binaryen IR is more structured than WebAssembly as noted earlier). Note
106
- that Binaryen does support unreachable code in .wat text files, since as we
107
- saw Binaryen only supports s-expressions there, which are structured.
99
+
108
100
* Binaryen supports a ` stringref ` type. This is similar to the currently-
109
- frozen [ stringref proposal] , with the difference that the string type is a
101
+ inactive [ stringref proposal] , with the difference that the string type is a
110
102
subtype of ` externref ` rather than ` anyref ` . Doing so allows toolchains to
111
103
emit code in a form that uses [ js string builtins] which Binaryen can then
112
104
"lift" into stringref in its internal IR, optimize (for example, a
113
105
concatenation of "a" and "b" can be optimized at compile time to "ab"), and
114
106
then "lower" that into js string builtins once more.
107
+
115
108
* Blocks
116
- * Binaryen IR has only one node that contains a variable-length list of
117
- operands: the block. WebAssembly on the other hand allows lists in loops,
118
- if arms, and the top level of a function. Binaryen's IR has a single
119
- operand for all non-block nodes; this operand may of course be a block.
120
- The motivation for this property is that many passes need special code
121
- for iterating on lists, so having a single IR node with a list simplifies
122
- them.
123
- * As in wasm, blocks and loops may have names. Branch targets in the IR are
124
- resolved by name (as opposed to nesting depth). This has 2 consequences:
109
+
110
+ * Binaryen IR has only one control flow structure that contains a
111
+ variable-length list of children: the block. WebAssembly on the other hand
112
+ allows all control flow structures, such as loops, if arms, and function
113
+ bodies, to have multiple children. In Binaryen IR, these other control flow
114
+ structures have a single child. This child may of course be a block. The
115
+ motivation for this property is that many passes need special code for
116
+ iterating on lists of instructions, so having a single IR node with a list
117
+ simplifies them.
118
+
119
+ * As in the Wasm text format, blocks and loops may have names. Branch targets
120
+ in the IR are resolved by name (as opposed to nesting depth). This has 2
121
+ consequences:
122
+
125
123
* Blocks without names may not be branch targets.
124
+
126
125
* Names are required to be unique. (Reading .wat files with duplicate names
127
126
is supported; the names are modified when the IR is constructed).
128
- * As an optimization, a block that is the child of a loop (or if arm, or
129
- function toplevel) and which has no branches targeting it will not be
130
- emitted when generating wasm. Instead its list of operands will be directly
131
- used in the containing node. Such a block is sometimes called an "implicit
132
- block".
127
+
128
+ * As an optimization, a block with no name, which can never be a branch
129
+ target, will not be emitted when generating wasm. Instead its list of
130
+ children will be directly used in the containing control flow structure.
131
+ Such a block is sometimes called an "implicit block".
132
+
133
133
* Reference Types
134
+
134
135
* The wasm text and binary formats require that a function whose address is
135
136
taken by ` ref.func ` must be either in the table, or declared via an
136
137
` (elem declare func $..) ` . Binaryen will emit that data when necessary, but
137
138
it does not represent it in IR. That is, IR can be worked on without needing
138
139
to think about declaring function references.
139
- * Binaryen IR allows non-nullable locals in the form that the wasm spec does,
140
- (which was historically nicknamed "1a"), in which a ` local.get ` must be
141
- structurally dominated by a ` local.set ` in order to validate (that ensures
142
- we do not read the default value of null). Despite being aligned with the
143
- wasm spec, there are some minor details that you may notice:
140
+
141
+ * Binaryen IR allows non-nullable locals in the form that the Wasm spec does,
142
+ in which a ` local.get ` must be structurally dominated by a ` local.set ` in
143
+ order to validate (that ensures we do not read the default value of null).
144
+ Despite being aligned with the Wasm spec, there are some minor details that
145
+ you may notice:
146
+
144
147
* A nameless ` Block ` in Binaryen IR does not interfere with validation.
145
148
Nameless blocks are never emitted into the binary format (we just emit
146
- their contents), so we ignore them for purposes of non-nullable locals. As
147
- a result, if you read wasm text emitted by Binaryen then you may see what
148
- seems to be code that should not validate per the spec (and may not
149
- validate in wasm text parsers), but that difference will not exist in the
150
- binary format (binaries emitted by Binaryen will always work everywhere,
151
- aside for bugs of course).
149
+ their contents), so we ignore them for purposes of validating
150
+ non-nullable locals. As a result, if you read wasm text emitted by
151
+ Binaryen then you may see what seems to be code that should not validate
152
+ per the spec (and may not validate in Wasm text parsers), but that
153
+ difference will not exist in the binary format (binaries emitted by
154
+ Binaryen will always work everywhere, aside from bugs of course).
155
+
152
156
* The Binaryen pass runner will automatically fix up validation after each
153
157
pass (finding things that do not validate and fixing them up, usually by
154
158
demoting a local to be nullable). As a result you do not need to worry
155
159
much about this when writing Binaryen passes. For more details see the
156
160
` requiresNonNullableLocalFixups() ` hook in ` pass.h ` and the
157
161
` LocalStructuralDominance ` class.
162
+
158
163
* Binaryen IR uses the most refined types possible for references,
159
164
specifically:
160
- * The IR type of a ` ref.func ` is always a specific function type, and not
161
- plain ` funcref ` . It is also non-nullable.
165
+
166
+ * The IR type of a ` ref.func ` is always an exact, non-nullable reference to
167
+ a defined function type, and not plain ` funcref ` , even if no features
168
+ beyond basic reference types are enabled.
169
+
170
+ * The IR type of allocation instructions such as ` struct.new ` or
171
+ ` array.new ` is always an exact reference, even if Custom Descriptors are
172
+ not enabled.
173
+
162
174
* Non-nullable types are also used for the type that ` try_table ` sends
163
175
on branches (if we branch, a null is never sent), that is, it sends
164
176
(ref exn) and not (ref null exn).
165
- In both cases if GC is not enabled then we emit the less-refined type in the
166
- binary. When reading a binary, the more refined types will be applied as we
167
- build the IR.
177
+
178
+ * As a result, non-nullable and exact references are generally allowed in
179
+ the IR even when GC or Custom Descriptors is not enabled. When reading a
180
+ binary, the more refined types will be applied as we build the IR.
181
+
182
+ In all cases the binary writer will generalize the type as necessary for
183
+ the enabled feature set. For example, if only Reference Types is enabled,
184
+ all function reference types will be emitted as ` funcref ` .
185
+
168
186
* ` br_if ` output types are more refined in Binaryen IR: they have the type of
169
- the value, when a value flows in. In the wasm spec the type is that of the
170
- branch target, which may be less refined. Using the more refined type here
171
- ensures that we optimize in the best way possible, using all the type
172
- information, but it does mean that some roundtripping operations may look a
173
- little different. In particular, when we emit a ` br_if ` whose type is more
174
- refined in Binaryen IR then we emit a cast right after it, so that the
175
- output has the right type in the wasm spec. That may cause a few bytes of
176
- extra size in rare cases (we avoid this overhead in the common case where
177
- the ` br_if ` value is unused).
178
- * Strings
179
- * Binaryen allows string views (` stringview_wtf16 ` etc.) to be cast using
180
- ` ref.cast ` . This simplifies the IR, as it allows ` ref.cast ` to always be
181
- used in all places (and it is lowered to ` ref.as_non_null ` where possible
182
- in the optimizer). The stringref spec does not seem to allow this though,
183
- and to fix that the binary writer will replace ` ref.cast ` that casts a
184
- string view to a non-nullable type to ` ref.as_non_null ` . A ` ref.cast ` of a
185
- string view that is a no-op is skipped entirely.
187
+ the sent value operand, when it exists. In the Wasm spec the type is that
188
+ of the branch target, which may be less refined. Using the more refined
189
+ type here ensures that we optimize in the best way possible, using all the
190
+ type information, but it does mean that some roundtripping operations may
191
+ look a little different. In particular, when we emit a ` br_if ` whose type
192
+ is more refined in Binaryen IR, then we emit a cast right after it to
193
+ recover the more refined type. That may cause a few bytes of extra size in
194
+ rare cases (we avoid this overhead in the common case where the ` br_if `
195
+ value is unused).
186
196
187
197
As a result, you might notice that round-trip conversions (wasm => Binaryen IR
188
198
=> wasm) change code a little in some corner cases.
0 commit comments