Skip to content

Implement open/stat/mkdir/symlink for standalone WASI mode #24246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

jiixyj
Copy link

@jiixyj jiixyj commented May 2, 2025

Based on #18285, implement just enough syscalls to make the test_fcntl_open test from test_core work in standalone WASI mode.

Compared to #18285 I moved the preopen handling into their own paths.{h,c} component. The idea is that this component will also be responsible for cwd handling in a follow up PR.

Some notes:

  • The preopen handling is done before main runs. Therefore, I removed the locking functions.
  • To enable the test_fcntl_open test I had to extend the @also_with_standalone_wasm decorator a bit. The test doesn't run on node and wasmer. Wasmer crashes when resolving symlinks. However, both wasmtime and toywasm successfully run the test.
  • Standalone WASM engines now get a temporary fs folder inside the out/test folder to put temporary files in. This folder is cleaned after each WASM engine is done. I couldn't make this work for wasmer, but for wasmtime and toywasm it works very nicely.

@jiixyj
Copy link
Author

jiixyj commented May 2, 2025

You can run the tests locally by adding some WASM engines to WASM_ENGINES in .emscripten and then:

./test/runner -v test_hello_argc_standalone
# or
./test/runner -v test_fcntl_open_standalone

@jiixyj
Copy link
Author

jiixyj commented May 2, 2025

Oh, and I have patched wasmer a bit, so behavior compared to a release version may differ:

diff --git a/lib/wasix/src/syscalls/wasix/path_open2.rs b/lib/wasix/src/syscalls/wasix/path_open2.rs
index 6305b8bf59..c2297e05dc 100644
--- a/lib/wasix/src/syscalls/wasix/path_open2.rs
+++ b/lib/wasix/src/syscalls/wasix/path_open2.rs
@@ -148,7 +148,7 @@ pub(crate) fn path_open_internal(
     //
     // Maximum rights: should be the working dir rights
     // Minimum rights: whatever rights are provided
-    let adjusted_rights = /*fs_rights_base &*/ working_dir_rights_inheriting;
+    let adjusted_rights = fs_rights_base & working_dir_rights_inheriting;
     let mut open_options = state.fs_new_open_options();
 
     let target_rights = match maybe_inode {
@@ -276,7 +276,10 @@ pub(crate) fn path_open_internal(
                 }
             }
             Kind::Dir { .. } => {
-                if fs_rights_base.contains(Rights::FD_WRITE) {
+                if fs_rights_base.contains(Rights::FD_WRITE)
+                    || o_flags.contains(Oflags::TRUNC)
+                    || o_flags.contains(Oflags::CREATE)
+                {
                     return Ok(Err(Errno::Isdir));
                 }
             }

@jiixyj
Copy link
Author

jiixyj commented May 2, 2025

The other.test_standalone_whole_archive test complains about missing _fd_prestat_dir_name and _fd_prestat_get symbols:

error: undefined symbol: fd_prestat_dir_name (referenced by root reference (e.g. compiled C/C++ code))                                                                                          
warning: To disable errors for undefined symbols use `-sERROR_ON_UNDEFINED_SYMBOLS=0`                                                                                                           
warning: _fd_prestat_dir_name may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library
error: undefined symbol: fd_prestat_get (referenced by root reference (e.g. compiled C/C++ code))                                                                                               
warning: _fd_prestat_get may need to be added to EXPORTED_FUNCTIONS if it arrives from a system library

Those are used in the preopen functions that run before main. Does anyone know how to fix this?

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2025

fd_prestat_dir_name and fd_prestat_get are WASI syscalls which are implemented in libwasi.js.. seems strange that they would be reported as undefined symbols.

test/common.py Outdated
@@ -1562,6 +1584,9 @@ def run_js(self, filename, engine=None, args=None,
else:
self.fail('JS subprocess failed (%s): %s (expected=%s). Output:\n%s' % (error.cmd, error.returncode, assert_returncode, ret))

if run_in_tmpdir:
force_delete_contents(self.in_dir('fs'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general tests do not need to (indeed should not since helps with debugging) cleanup after themselves like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! While there already is some existing cleanup logic it only ran after each "full" test, not in between each WASM engine run when iterating over them. This made the test fail because each test expects a blank directory.

What do you think about a subdir per each engine? So out/temp/wasmer, out/temp/wasmtime, etc.? Then the existing cleanup code should work just as is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about a subdir per each engine? So out/temp/wasmer, out/temp/wasmtime, etc.? Then the existing cleanup code should work just as is.

I implemented this and it works very well!

test/common.py Outdated
@@ -1505,7 +1522,11 @@ def cleanup(line):
def run_js(self, filename, engine=None, args=None,
assert_returncode=0,
interleaved_output=True,
input=None):
input=None,
run_in_tmpdir=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid adding this extra logic to run_js which is use in a lot of different places.

Could we instead allow just add cwd=None here and allow higher level functions to pass CWD?

Is there some fundamental reason why the tests need to run in a subdirectory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead allow just add cwd=None here and allow higher level functions to pass CWD?

Good idea! I'll try to implement this.

Is there some fundamental reason why the tests need to run in a subdirectory?

The original reasons were to enable cleanup of this dir after each WASM engine and for the test not to interfere with the *.wasm/*.js/stdout/... files in out/test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I'll try to implement this.

Done!

test/common.py Outdated
@@ -981,14 +997,14 @@ def get_nodejs(self):
return None
return config.NODE_JS_TEST

def require_node(self):
def require_node(self, allow_wasm_engines=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to mark a test as require_node but also allow wasm engines. Maybe we can just remove require_node from the tests in question instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wonder about this, too. The original reason I did it like this comes from this line in also_with_standalone_wasm:

nodejs = self.require_node()

Eventually this ends up in require_engine() where self.wasm_engines is set to an empty list. So what happened is that none of the WASM engines were actually used for any of the also_with_standalone_wasm tests!

Alternatively I could stash the self.wasm_engines in a local variable, then call self.require_node(), and then repopulate self.wasm_engines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively I could stash the self.wasm_engines in a local variable, then call self.require_node(), and then repopulate self.wasm_engines?

I've implemented this, avoiding this allow_wasm_engines parameter.

@jiixyj
Copy link
Author

jiixyj commented May 3, 2025

fd_prestat_dir_name and fd_prestat_get are WASI syscalls which are implemented in libwasi.js.. seems strange that they would be reported as undefined symbols.

Ah, they were gated behind PURE_WASI! I could fix this by gating the new preopen functions in the .c code behind EMSCRIPTEN_PURE_WASI as well.

@jiixyj
Copy link
Author

jiixyj commented May 4, 2025

To fix some failing CI tests I had to do some additional things:

  • Updated wasmtime to v32.0.0
  • Added stubs for some WASI syscalls to libsigs.js/libwasi.js and to scripts under tools

I also added attribution to wasi-libc to LICENSE. Does that look OK to you?

The test-browser-chrome-2gb failed in the latest run, but I think this may be just a bit flaky?

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

I like the direction of this PR. I've not yet had a chance for fully grok the code in paths.c/h but the rest is looking good

#include <string.h>
#include <sysexits.h>

/// A name and file descriptor pair.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the code in this file is actually derived from the wasi-libc code? If its actually derived from that codebase we should probably say so in the licence comment at the top.

Does wasi-libc use the triple-slash (///) for comments? If not I would probably stick to normal C comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code is derived from (a slightly older version of) preopens.c from wasi-libc: https://github.com/WebAssembly/wasi-libc/blob/main/libc-bottom-half/sources/preopens.c

In the meantime they added on-demand loading of the preopens (instead of doing it before main) and the ability to reset the preopens. Both of these add complexity though since you cannot assume any longer that the preopens are immutable. So I thought as a first step the "simple" version is better.

I added license headers to paths.{h,c} where I mention that this code is derived from wasi-libc.

I also converted to #pragma once and converted comments and brace style to Emscripten's. Looking at the wasi-libc code they weren't consistent about it anyways...

test/common.py Outdated
@@ -623,6 +623,9 @@ def metafunc(self, standalone):
if not standalone:
func(self)
else:
nonlocal exclude_engines
if exclude_engines is None:
exclude_engines = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just put these two lines at the top of the function (before the def decorated) and avoid the nonlocal thing.

You could also just write exclude_engines=[] in the signature and add # noqa to silence the warning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! exclude_engines isn't modified so doing it in the signature sounds good.

test/common.py Outdated
wasm_engines = []
else:
wasm_engines = [engine for engine in self.wasm_engines
if all(excluded not in os.path.basename(engine[0]) for excluded in exclude_engines)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find these double nested comprehension quite hard to understand. Can we expand into a loop or split in to two statements perhaps?

How about:

wasm_engines = []
if not impure:
  for engine in self.wasm_engines:
     basename = os.path.basename(engine[0])
     if not any(pattern in basename for pattern in excluded_engines):
       wasm_engines.append(engine)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is much more readable!

test/common.py Outdated
@@ -614,7 +614,7 @@ def can_do_standalone(self, impure=False):

# Impure means a test that cannot run in a wasm VM yet, as it is not 100%
# standalone. We can still run them with the JS code though.
def also_with_standalone_wasm(impure=False):
def also_with_standalone_wasm(impure=False, exclude_engines=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you convert add some info about exclude_engines to the above comment?

Something like: "exclude_engines" is a list of engine names on which the test cannot run. These are pattern that are matched against the basename of the engine executable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

else:
nodejs = self.require_node()
self.node_args += shared.node_bigint_flags(nodejs)
# `self.require_node()` clears `self.wasm_engines`, so set them here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this, but we can refactor later I suppose.

for engine, subdir in engines:
if subdir:
subdir = self.in_dir(subdir)
ensure_dir(subdir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks for wasm engines only your want to run the JS in a new directory with the name of the engine?

I guess that will work for some tests, but don't some tests need to be run in alongside the files in the test directory?

Why is this needed? What happens if you don't do the chdir here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I don't run inside a clean directory, the test for the second wasm engine would fail when trying to create test-file here, as it already exists from the first wasm engine.

The current cleanup logic does not clean up the tests working directory after each engine, but only once in setUp for the whole thing.

I guess that will work for some tests, but don't some tests need to be run in alongside the files in the test directory?

Yep, I have to admin that I don't have a good overview of the tests. But I thought that since the wasm_engines didn't have access to the filesystem before this PR, I at least can't break anything with this approach. The downside is of course that adding @also_with_standalone_wasm to a test like this won't work...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. so the problem only exists when running a single test with mutliple engines at the same time?

This is not something that gets a lot testing.. I for one basically never do this.

I kind of wonder if we should remove this feature? For now could we just run in the first wasm engine instead of in all of them?

Actually the EMTEST_USE_ALL_ENGINES setting is false so enless you enable it we should only be running in a single engine. Perhaps we should fix that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this solution:

  • Leave engines as a simple list.
  • Add a self.directory_per_engine test field which defaults to false
  • Set this field to true in the tests that need it.
  • Add this code here in the loop:
for engine, subdir in engines:
  cwd = None
  if self.directory_per_engine:
    # Some tests require a fresh directory for each engine, for example, because
    # they create file and are sensitive to existing files.
    dirname = os.path.basename(engine[0])
    ensure_dir(dirname)
    cwd = dirname

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll try to implement this.

} else {
return -EINVAL;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason to add this extra scope here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is there to limit accmode's scope. I don't feel strongly about this, though.

oserr:
_Exit(EX_OSERR);
software:
_Exit(EX_SOFTWARE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just abort() above instead of these goto + exit stuff?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- this seems to be a wasi-libc thing. The code in __main_void (also derived from wasi-libc) uses EX_OSERR as well:

__wasi_proc_exit(EX_OSERR);

@sbc100
Copy link
Collaborator

sbc100 commented May 9, 2025

This change looks like its almost ready to land.

Before we do though I wanted to check what your motivation for trying push this forward is? Is there some use case where you want to use emscripten to target non-web engines? Is there some reason why you need to use emscripten and not just wasi-sdk?

@jiixyj
Copy link
Author

jiixyj commented May 9, 2025

Before we do though I wanted to check what your motivation for trying push this forward is? Is there some use case where you want to use emscripten to target non-web engines? Is there some reason why you need to use emscripten and not just wasi-sdk?

For the fun of it I'm trying to port a hobby project of mine (CLI tool written in C++, using exceptions for error handling) to WASI. I actually tried wasi-sdk a few months ago but couldn't quite get exceptions to work. Then a few weeks ago I saw that Emscripten supported both a STANDALONE_WASM/PURE_WASI mode and even out of the box support for the new "exnref" exceptions (using -sWASM_LEGACY_EXCEPTIONS=0). So in the end it was faster for me to get to a working .wasm blob using Emscripten. But yeah, "on paper", wasi-sdk is more suitable for my use case...

@sbc100 sbc100 requested a review from kripken May 12, 2025 19:19
const preopen* pre = &preopens[i];
assert(pre->prefix != NULL);
assert(pre->fd != (__wasi_fd_t)-1);
#ifdef __wasm__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ifdef __wasm__ here? IIUC this code only compiles and run on wasm

preopen_capacity = new_capacity;
free(old_preopens);

assert_invariants();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just preopens = realloc(preopen, new_capacity) here?

path++;
} else if (path[0] == '.' && path[1] == '/') {
path += 2;
} else if (path[0] == '.' && path[1] == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use '\0' here for the terminator?

This case looks like its trying to match . on its own and then return the empty string.. is that right?

if (!register_preopened_fd(fd, prefix)) {
goto software;
}
free(prefix);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that register_preopened_fd is going to store and use the prefix, why not pass owership? i.e. don't call free() here or strdup in register_preopened_fd?

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(otherwise no comments aside from @sbc100 's)

*
* The preopen code is based on wasi-libc's `preopens.c` which is licensed
* under a MIT style license. This license can also be found in the LICENSE
* file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the copyright line from the original file, in addition to this text (which is good).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants