Implement `open`/`stat`/`mkdir`/`symlink` for standalone WASI mode #24246

jiixyj · 2025-05-02T11:34:56Z

Based on #18285, implement just enough syscalls to make the test_fcntl_open test from test_core work in standalone WASI mode.

Compared to #18285 I moved the preopen handling into their own paths.{h,c} component. The idea is that this component will also be responsible for cwd handling in a follow up PR.

Some notes:

The preopen handling is done before main runs. Therefore, I removed the locking functions.
To enable the test_fcntl_open test I had to extend the @also_with_standalone_wasm decorator a bit. The test doesn't run on node and wasmer. Wasmer crashes when resolving symlinks. However, both wasmtime and toywasm successfully run the test.
Standalone WASM engines now get a temporary fs folder inside the out/test folder to put temporary files in. This folder is cleaned after each WASM engine is done. I couldn't make this work for wasmer, but for wasmtime and toywasm it works very nicely.

jiixyj · 2025-05-02T12:09:12Z

You can run the tests locally by adding some WASM engines to WASM_ENGINES in .emscripten and then:

./test/runner -v test_hello_argc_standalone
# or
./test/runner -v test_fcntl_open_standalone

jiixyj · 2025-05-02T12:20:45Z

Oh, and I have patched wasmer a bit, so behavior compared to a release version may differ:

diff --git a/lib/wasix/src/syscalls/wasix/path_open2.rs b/lib/wasix/src/syscalls/wasix/path_open2.rs
index 6305b8bf59..c2297e05dc 100644
--- a/lib/wasix/src/syscalls/wasix/path_open2.rs
+++ b/lib/wasix/src/syscalls/wasix/path_open2.rs
@@ -148,7 +148,7 @@ pub(crate) fn path_open_internal(
     //
     // Maximum rights: should be the working dir rights
     // Minimum rights: whatever rights are provided
-    let adjusted_rights = /*fs_rights_base &*/ working_dir_rights_inheriting;
+    let adjusted_rights = fs_rights_base & working_dir_rights_inheriting;
     let mut open_options = state.fs_new_open_options();
 
     let target_rights = match maybe_inode {
@@ -276,7 +276,10 @@ pub(crate) fn path_open_internal(
                 }
             }
             Kind::Dir { .. } => {
-                if fs_rights_base.contains(Rights::FD_WRITE) {
+                if fs_rights_base.contains(Rights::FD_WRITE)
+                    || o_flags.contains(Oflags::TRUNC)
+                    || o_flags.contains(Oflags::CREATE)
+                {
                     return Ok(Err(Errno::Isdir));
                 }
             }

jiixyj · 2025-05-02T15:20:19Z

The other.test_standalone_whole_archive test complains about missing _fd_prestat_dir_name and _fd_prestat_get symbols:

error: undefined symbol: fd_prestat_dir_name (referenced by root reference (e.g. compiled C/C++ code))                                                                                          
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`                                                                                                           
warning: _fd_prestat_dir_name may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
error: undefined symbol: fd_prestat_get (referenced by root reference (e.g. compiled C/C++ code))                                                                                               
warning: _fd_prestat_get may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library

Those are used in the preopen functions that run before main. Does anyone know how to fix this?

sbc100 · 2025-05-02T17:34:37Z

fd_prestat_dir_name and fd_prestat_get are WASI syscalls which are implemented in libwasi.js.. seems strange that they would be reported as undefined symbols.

sbc100 · 2025-05-02T17:42:15Z

test/common.py

@@ -1562,6 +1584,9 @@ def run_js(self, filename, engine=None, args=None,
      else:
        self.fail('JS subprocess failed (%s): %s (expected=%s).  Output:\n%s' % (error.cmd, error.returncode, assert_returncode, ret))

+    if run_in_tmpdir:
+      force_delete_contents(self.in_dir('fs'))


In general tests do not need to (indeed should not since helps with debugging) cleanup after themselves like this.

That's a good point! While there already is some existing cleanup logic it only ran after each "full" test, not in between each WASM engine run when iterating over them. This made the test fail because each test expects a blank directory.

What do you think about a subdir per each engine? So out/temp/wasmer, out/temp/wasmtime, etc.? Then the existing cleanup code should work just as is.

What do you think about a subdir per each engine? So out/temp/wasmer, out/temp/wasmtime, etc.? Then the existing cleanup code should work just as is.

I implemented this and it works very well!

sbc100 · 2025-05-02T17:45:12Z

test/common.py

@@ -1505,7 +1522,11 @@ def cleanup(line):
  def run_js(self, filename, engine=None, args=None,
             assert_returncode=0,
             interleaved_output=True,
-             input=None):
+             input=None,
+             run_in_tmpdir=False):


I wonder if we can avoid adding this extra logic to run_js which is use in a lot of different places.

Could we instead allow just add cwd=None here and allow higher level functions to pass CWD?

Is there some fundamental reason why the tests need to run in a subdirectory?

Could we instead allow just add cwd=None here and allow higher level functions to pass CWD?

Good idea! I'll try to implement this.

Is there some fundamental reason why the tests need to run in a subdirectory?

The original reasons were to enable cleanup of this dir after each WASM engine and for the test not to interfere with the *.wasm/*.js/stdout/... files in out/test.

Good idea! I'll try to implement this.

Done!

sbc100 · 2025-05-02T17:46:08Z

test/common.py

@@ -981,14 +997,14 @@ def get_nodejs(self):
      return None
    return config.NODE_JS_TEST

-  def require_node(self):
+  def require_node(self, allow_wasm_engines=False):


It seems odd to mark a test as require_node but also allow wasm engines. Maybe we can just remove require_node from the tests in question instead?

Yeah, I wonder about this, too. The original reason I did it like this comes from this line in also_with_standalone_wasm:

emscripten/test/common.py

Line 639 in 1ae691e

nodejs = self.require_node()

Eventually this ends up in require_engine() where self.wasm_engines is set to an empty list. So what happened is that none of the WASM engines were actually used for any of the also_with_standalone_wasm tests!

Alternatively I could stash the self.wasm_engines in a local variable, then call self.require_node(), and then repopulate self.wasm_engines?

Alternatively I could stash the self.wasm_engines in a local variable, then call self.require_node(), and then repopulate self.wasm_engines?

I've implemented this, avoiding this allow_wasm_engines parameter.

…gine

jiixyj · 2025-05-03T21:39:22Z

fd_prestat_dir_name and fd_prestat_get are WASI syscalls which are implemented in libwasi.js.. seems strange that they would be reported as undefined symbols.

Ah, they were gated behind PURE_WASI! I could fix this by gating the new preopen functions in the .c code behind EMSCRIPTEN_PURE_WASI as well.

jiixyj · 2025-05-04T19:32:30Z

To fix some failing CI tests I had to do some additional things:

Updated wasmtime to v32.0.0
Added stubs for some WASI syscalls to libsigs.js/libwasi.js and to scripts under tools

I also added attribution to wasi-libc to LICENSE. Does that look OK to you?

The test-browser-chrome-2gb failed in the latest run, but I think this may be just a bit flaky?

sbc100

Thanks for working on this!

I like the direction of this PR. I've not yet had a chance for fully grok the code in paths.c/h but the rest is looking good

sbc100 · 2025-05-05T16:05:09Z

system/lib/standalone/paths.c

+#include <string.h>
+#include <sysexits.h>
+
+/// A name and file descriptor pair.


How much of the code in this file is actually derived from the wasi-libc code? If its actually derived from that codebase we should probably say so in the licence comment at the top.

Does wasi-libc use the triple-slash (///) for comments? If not I would probably stick to normal C comments.

Most of the code is derived from (a slightly older version of) preopens.c from wasi-libc: https://github.com/WebAssembly/wasi-libc/blob/main/libc-bottom-half/sources/preopens.c

In the meantime they added on-demand loading of the preopens (instead of doing it before main) and the ability to reset the preopens. Both of these add complexity though since you cannot assume any longer that the preopens are immutable. So I thought as a first step the "simple" version is better.

I added license headers to paths.{h,c} where I mention that this code is derived from wasi-libc.

I also converted to #pragma once and converted comments and brace style to Emscripten's. Looking at the wasi-libc code they weren't consistent about it anyways...

system/lib/standalone/paths.c

src/lib/libwasi.js

sbc100 · 2025-05-09T17:30:52Z

test/common.py

@@ -623,6 +623,9 @@ def metafunc(self, standalone):
      if not standalone:
        func(self)
      else:
+        nonlocal exclude_engines
+        if exclude_engines is None:
+          exclude_engines = []


I think you just put these two lines at the top of the function (before the def decorated) and avoid the nonlocal thing.

You could also just write exclude_engines=[] in the signature and add # noqa to silence the warning.

done! exclude_engines isn't modified so doing it in the signature sounds good.

sbc100 · 2025-05-09T17:38:21Z

test/common.py

+          wasm_engines = []
+        else:
+          wasm_engines = [engine for engine in self.wasm_engines
+            if all(excluded not in os.path.basename(engine[0]) for excluded in exclude_engines)]


I find these double nested comprehension quite hard to understand. Can we expand into a loop or split in to two statements perhaps?

How about:

wasm_engines = [] if not impure: for engine in self.wasm_engines: basename = os.path.basename(engine[0]) if not any(pattern in basename for pattern in excluded_engines): wasm_engines.append(engine)

Agreed, this is much more readable!

sbc100 · 2025-05-09T17:39:47Z

test/common.py

@@ -614,7 +614,7 @@ def can_do_standalone(self, impure=False):

 # Impure means a test that cannot run in a wasm VM yet, as it is not 100%
 # standalone. We can still run them with the JS code though.
-def also_with_standalone_wasm(impure=False):
+def also_with_standalone_wasm(impure=False, exclude_engines=None):


Can you convert add some info about exclude_engines to the above comment?

Something like: "exclude_engines" is a list of engine names on which the test cannot run. These are pattern that are matched against the basename of the engine executable.

sbc100 · 2025-05-09T17:40:07Z

test/common.py

+        else:
+          nodejs = self.require_node()
+          self.node_args += shared.node_bigint_flags(nodejs)
+        # `self.require_node()` clears `self.wasm_engines`, so set them here.


I don't love this, but we can refactor later I suppose.

sbc100 · 2025-05-09T17:43:49Z

test/common.py

+    for engine, subdir in engines:
+      if subdir:
+        subdir = self.in_dir(subdir)
+        ensure_dir(subdir)


So it looks for wasm engines only your want to run the JS in a new directory with the name of the engine?

I guess that will work for some tests, but don't some tests need to be run in alongside the files in the test directory?

Why is this needed? What happens if you don't do the chdir here?

When I don't run inside a clean directory, the test for the second wasm engine would fail when trying to create test-file here, as it already exists from the first wasm engine.

The current cleanup logic does not clean up the tests working directory after each engine, but only once in setUp for the whole thing.

I guess that will work for some tests, but don't some tests need to be run in alongside the files in the test directory?

Yep, I have to admin that I don't have a good overview of the tests. But I thought that since the wasm_engines didn't have access to the filesystem before this PR, I at least can't break anything with this approach. The downside is of course that adding @also_with_standalone_wasm to a test like this won't work...

I see.. so the problem only exists when running a single test with mutliple engines at the same time?

This is not something that gets a lot testing.. I for one basically never do this.

I kind of wonder if we should remove this feature? For now could we just run in the first wasm engine instead of in all of them?

Actually the EMTEST_USE_ALL_ENGINES setting is false so enless you enable it we should only be running in a single engine. Perhaps we should fix that?

How about this solution:

Leave engines as a simple list.

Add a self.directory_per_engine test field which defaults to false

Set this field to true in the tests that need it.

Add this code here in the loop:

for engine, subdir in engines: cwd = None if self.directory_per_engine: # Some tests require a fresh directory for each engine, for example, because # they create file and are sensitive to existing files. dirname = os.path.basename(engine[0]) ensure_dir(dirname) cwd = dirname

Good idea, I'll try to implement this.

sbc100 · 2025-05-09T17:48:29Z

system/lib/standalone/standalone.c

+    } else {
+      return -EINVAL;
+    }
+  }


Is there some reason to add this extra scope here?

Yes, it is there to limit accmode's scope. I don't feel strongly about this, though.

sbc100 · 2025-05-09T17:52:26Z

system/lib/standalone/paths.c

+oserr:
+  _Exit(EX_OSERR);
+software:
+  _Exit(EX_SOFTWARE);


Why not just abort() above instead of these goto + exit stuff?

Good question -- this seems to be a wasi-libc thing. The code in __main_void (also derived from wasi-libc) uses EX_OSERR as well:

emscripten/system/lib/standalone/__main_void.c

Line 36 in f3d9eb4

__wasi_proc_exit(EX_OSERR);

sbc100 · 2025-05-09T19:06:34Z

This change looks like its almost ready to land.

Before we do though I wanted to check what your motivation for trying push this forward is? Is there some use case where you want to use emscripten to target non-web engines? Is there some reason why you need to use emscripten and not just wasi-sdk?

jiixyj · 2025-05-09T20:20:21Z

Before we do though I wanted to check what your motivation for trying push this forward is? Is there some use case where you want to use emscripten to target non-web engines? Is there some reason why you need to use emscripten and not just wasi-sdk?

For the fun of it I'm trying to port a hobby project of mine (CLI tool written in C++, using exceptions for error handling) to WASI. I actually tried wasi-sdk a few months ago but couldn't quite get exceptions to work. Then a few weeks ago I saw that Emscripten supported both a STANDALONE_WASM/PURE_WASI mode and even out of the box support for the new "exnref" exceptions (using -sWASM_LEGACY_EXCEPTIONS=0). So in the end it was faster for me to get to a working .wasm blob using Emscripten. But yeah, "on paper", wasi-sdk is more suitable for my use case...

sbc100 · 2025-05-12T19:22:40Z

system/lib/standalone/paths.c

+    const preopen* pre = &preopens[i];
+    assert(pre->prefix != NULL);
+    assert(pre->fd != (__wasi_fd_t)-1);
+#ifdef __wasm__


Why ifdef __wasm__ here? IIUC this code only compiles and run on wasm

sbc100 · 2025-05-12T19:23:58Z

system/lib/standalone/paths.c

+  preopen_capacity = new_capacity;
+  free(old_preopens);
+
+  assert_invariants();


Why not just preopens = realloc(preopen, new_capacity) here?

sbc100 · 2025-05-12T19:27:56Z

system/lib/standalone/paths.c

+      path++;
+    } else if (path[0] == '.' && path[1] == '/') {
+      path += 2;
+    } else if (path[0] == '.' && path[1] == 0) {


use '\0' here for the terminator?

This case looks like its trying to match . on its own and then return the empty string.. is that right?

sbc100 · 2025-05-12T19:30:17Z

system/lib/standalone/paths.c

+        if (!register_preopened_fd(fd, prefix)) {
+          goto software;
+        }
+        free(prefix);


Given that register_preopened_fd is going to store and use the prefix, why not pass owership? i.e. don't call free() here or strdup in register_preopened_fd?

kripken

(otherwise no comments aside from @sbc100 's)

kripken · 2025-05-12T20:58:27Z

system/lib/standalone/paths.c

+ *
+ * The preopen code is based on wasi-libc's `preopens.c` which is licensed
+ * under a MIT style license. This license can also be found in the LICENSE
+ * file.


Please keep the copyright line from the original file, in addition to this text (which is good).

Implement open/stat/mkdir/symlink for standalone WASI mode

c606386

jiixyj mentioned this pull request May 2, 2025

Add CWD handling to standalone WASI mode, and some related syscalls #24247

Draft

remove unneeded list comprehension

46b7fa4

jiixyj mentioned this pull request May 2, 2025

Extend standalone support #18285

Open

jiixyj added 3 commits May 2, 2025 14:26

skip standalone WASM tests when no engines are configured

16b1925

update wasmtime version in CI

e12df0f

try to fix 'test_time' and 'test_console_out' tests

65bec88

sbc100 approved these changes May 2, 2025

View reviewed changes

jiixyj added 7 commits May 3, 2025 16:19

add stubbed out path_filestat_get to libwasi.js

ca3f795

do preopen handling only in PURE_WASI mode

52a4e03

add path_filestat_get__nothrow: true

53c309b

run standalone WASM tests in a subdirectory of 'out/test' for each en…

65e8c96

…gine

try to fix CI

5303f12

refactor to avoid 'allow_wasm_engines' parameter

95eab07

remove hack by adding a stubbed out path_symlink to libwasi.js

bd4a07c

jiixyj added 3 commits May 4, 2025 08:22

Merge remote-tracking branch 'origin/main' into fcntl-open-test

73e982e

add attribution to wasi-libc

36e66d8

should be wasm_engines instead of self.wasm_engines here

cc87f5d

sbc100 reviewed May 5, 2025

View reviewed changes

jiixyj added 3 commits May 9, 2025 17:37

add license headers and use pragma once for consistency

67905b1

Merge remote-tracking branch 'origin/main' into fcntl-open-test

88650d8

use double slash comments and braces for if statements consistently

f3a80bb

sbc100 reviewed May 9, 2025

View reviewed changes

src/lib/libwasi.js Show resolved Hide resolved

wrap new libwasi.js stubs in ALLOW_UNIMPLEMENTED_SYSCALLS

534223f

sbc100 reviewed May 9, 2025

View reviewed changes

jiixyj added 3 commits May 9, 2025 20:44

default 'exclude_engines' in argument list since we don't mutate it

16d2ba7

simplify calculation of final 'wasm_engines'

eb35b6b

add comment explaining 'exclude_engines' parameter

1ec092a

sbc100 requested a review from kripken May 12, 2025 19:19

sbc100 reviewed May 12, 2025

View reviewed changes

kripken reviewed May 12, 2025

View reviewed changes

Implement open/stat/mkdir/symlink for standalone WASI mode #24246

Are you sure you want to change the base?

Implement open/stat/mkdir/symlink for standalone WASI mode #24246

Conversation

jiixyj commented May 2, 2025 • edited Loading

jiixyj commented May 2, 2025

jiixyj commented May 2, 2025

jiixyj commented May 2, 2025

sbc100 commented May 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiixyj commented May 3, 2025

jiixyj commented May 4, 2025

sbc100 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 commented May 9, 2025

jiixyj commented May 9, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Implement `open`/`stat`/`mkdir`/`symlink` for standalone WASI mode #24246

Implement `open`/`stat`/`mkdir`/`symlink` for standalone WASI mode #24246

jiixyj commented May 2, 2025 •

edited

Loading