Skip to content

Commit d9190b5

Browse files
docs: update readme
Signed-off-by: Henry Gressmann <[email protected]>
1 parent 7d9c1d0 commit d9190b5

File tree

3 files changed

+103
-98
lines changed

3 files changed

+103
-98
lines changed

BENCHMARKS.md

+21-14
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# Benchmark results
22

3-
All benchmarks are run on a Ryzen 7 5800X, with 32GB of RAM, running Linux 6.6.
4-
WebAssembly files are optimized using [wasm-opt](https://github.com/WebAssembly/binaryen)
3+
All benchmarks are run on a Ryzen 7 5800X with 32GB of RAM, running Linux 6.6.
4+
WebAssembly files are optimized using [wasm-opt](https://github.com/WebAssembly/binaryen),
55
and the benchmark code is available in the `benches` folder.
66

7+
These are mainly preliminary benchmarks, and I will be adding more in the future that are also looking into memory usage and other metrics.
8+
79
## WebAssembly Settings
810

911
All WebAssembly files are compiled with the following settings:
@@ -15,45 +17,50 @@ All WebAssembly files are compiled with the following settings:
1517

1618
All runtimes are compiled with the following settings:
1719

18-
- `unsafe` features are enabled
20+
- `unsafe` features are enabled.
1921
- `opt-level` is set to 3, `lto` is set to `thin`, `codegen-units` is set to 1.
2022

23+
## Versions
24+
25+
- `tinywasm`: `0.4.0`
26+
- `wasmi`: `0.31.0`
27+
- `wasmer`: `4.2.0`
28+
2129
## Results
2230

2331
| Benchmark | Native | TinyWasm | Wasmi | Wasmer (Single Pass) |
2432
| ------------ | ------ | -------- | -------- | -------------------- |
25-
| `argon2id` | 0.52ms | 110.08ms | 44.408ms | 4.76ms |
2633
| `fib` | 6ns | 44.76µs | 48.96µs | 52µs |
2734
| `fib-rec` | 284ns | 25.565ms | 5.11ms | 0.50ms |
35+
| `argon2id` | 0.52ms | 110.08ms | 44.408ms | 4.76ms |
2836
| `selfhosted` | 45µs | 2.18ms | 4.25ms | 258.87ms |
2937

30-
### Argon2id
31-
32-
This benchmark runs the Argon2id hashing algorithm, with 2 iterations, 1KB of memory, and 1 parallel lane.
33-
I had to decrease the memory usage from the default to 1KB, because especially the interpreters were struggling to finish in a reasonable amount of time.
34-
This is something where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances.
35-
3638
### Fib
3739

3840
The first benchmark is a simple optimized Fibonacci function, which is a good way to show the overhead of calling functions and parsing the bytecode.
39-
TinyWasm is slightly faster then Wasmi here, but that's probably because of the overhead of parsing the bytecode as TinyWasm uses a custom bytecode to pre-process the WebAssembly bytecode.
41+
TinyWasm is slightly faster than Wasmi here, but that's probably because of the overhead of parsing the bytecode, as TinyWasm uses a custom bytecode to pre-process the WebAssembly bytecode.
4042

4143
### Fib-Rec
4244

4345
This benchmark is a recursive Fibonacci function, which highlights some of the issues with the current implementation of TinyWasm's Call Stack.
4446
TinyWasm is a lot slower here, but that's because there's currently no way to reuse the same Call Frame for recursive calls, so a new Call Frame is allocated for every call. This is not a problem for most programs, and the upcoming `tail-call` proposal will make this a lot easier to implement.
4547

48+
### Argon2id
49+
50+
This benchmark runs the Argon2id hashing algorithm, with 2 iterations, 1KB of memory, and 1 parallel lane.
51+
I had to decrease the memory usage from the default to 1KB, because especially the interpreters were struggling to finish in a reasonable amount of time.
52+
This is where `simd` instructions would be really useful, and it also highlights some of the issues with the current implementation of TinyWasm's Value Stack and Memory Instances.
53+
4654
### Selfhosted
4755

4856
This benchmark runs TinyWasm itself in the VM, and parses and executes the `print.wasm` example from the `examples` folder.
49-
This is a godd way to show some of TinyWasm's strengths - the code is pretty large at 702KB and Wasmer struggles massively with it, even with the Single Pass compiler. I think it's a decent real-world performance benchmark, but definitely favors TinyWasm a bit.
57+
This is a good way to show some of TinyWasm's strengths - the code is quite large at 702KB and Wasmer struggles massively with it, even with the Single Pass compiler. I think it's a decent real-world performance benchmark, but it definitely favors TinyWasm a bit.
5058

5159
Wasmer also offers a pre-parsed module format, so keep in mind that this number could be a bit lower if that was used (but probably still on the same order of magnitude). This number seems so high that I'm not sure if I'm doing something wrong, so I will be looking into this in the future.
5260

5361
### Conclusion
5462

55-
After profiling and fixing some low hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will be focusing on improving in the future, trying out to use
56-
Arena Allocation and other data structures to improve performance. Still, I'm quite happy with the results, especially considering the use of standard Rust data structures. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will be looking into improving that as well.
63+
After profiling and fixing some low-hanging fruits, I found the biggest bottleneck to be Vector operations, especially for the Value Stack, and having shared access to Memory Instances using RefCell. These are the two areas I will be focusing on improving in the future, trying out Arena Allocation and other data structures to improve performance. Additionally, typed FuncHandles have a significant overhead over the untyped ones, so I will be looking into improving that as well. Still, I'm quite happy with the results, especially considering the use of standard Rust data structures.
5764

5865
# Running benchmarks
5966

0 commit comments

Comments
 (0)