Skip to content

Extend standalone support #18285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

jerbob92
Copy link

@jerbob92 jerbob92 commented Dec 1, 2022

  • Implement _emscripten_throw_longjmp that aborts the program
  • Add nice messages to __cxa_throw and __cxa_allocate_exception
  • Implement getentropy using WASI
  • Implement WASI FS for syscalls lstat64, stat64, newfstatat, getdents64, fstat64 and openat

This is far from complete if you look at the WASI FS spec and the syscalls, but at least it gets basic operations like directory listing, file/directory opening and file/directory stat working. It was enough for me to get pdfium working in standalone mode.

I didn't really know what I should do with AT_FDCWD or what the runtime should do with it when passing AT_FDCWD. AT_FDCWD is also negative while I think fd in WASI is unsigned?

Most of the syscall code was taken from or based on wasi-libc.

@jerbob92 jerbob92 changed the title Implement _emscripten_throw_longjmp that aborts the program, add nice… Extend standalone support Dec 1, 2022
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

I left a few nits. Perhaps we can maybe split this up.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 1, 2022

Can we add some some new tests based on these?

@jerbob92
Copy link
Author

jerbob92 commented Dec 5, 2022

Can we add some some new tests based on these?

I have never developed in C before, I don't really know where to start, can you give me some hints where you want these tests added and what they should test?

sbc100 added a commit that referenced this pull request Dec 5, 2022
sbc100 added a commit that referenced this pull request Dec 5, 2022
@@ -155,24 +462,32 @@ double emscripten_get_now(void) {
return (1000 * clock()) / (double)CLOCKS_PER_SEC;
}

__attribute__((__weak__))
void _emscripten_throw_longjmp() {
REPORT_UNSUPPORTED(do an invalid longjmp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call longjmp would be more accurate I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could only trigger this by doing an invalid jump, but I think you would know better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.. all longjmp's require callout out of JS IIUC. WebAssembly, as it stands, has no support for longjmp so we call out the JS VM to throw for us.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Not that it matters but do you have any C to demonstrate this? Just curious.
In my tests I was only able to trigger this by doing an invalid jmp. I think the jumps were being handled by the invoke_* methods?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any call to longjmp from C code will called _emscripten_throw_longjmp IIUC.

So here I would just do REPORT_UNSUPPORTED(longjmp)

sbc100 added a commit that referenced this pull request Dec 6, 2022
@jerbob92 jerbob92 requested a review from sbc100 December 11, 2022 12:11
* - https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_14
* - https://github.com/WebAssembly/wasi-libc/blob/wasi-sdk-16/libc-bottom-half/sources/preopens.c#L215
*/
#define __WASI_FD_ROOT 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the way it works is that there can be any number of pre-opens, starting at 3: https://github.com/WebAssembly/wasi-libc/blob/30094b6ed05f19cee102115215863d185f2db4f0/libc-bottom-half/sources/preopens.c#L212-L215. And non of these pre-opens are necessarily the root, they can be mounted in various places.

Also, can you move this macros (if you keep it), into standalone.c. api.h is copied from wasi-libc, and I think its better to avoid have local modifications to it.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we really needs some tests for this.

The most basic test for the standalone syscalls is in test_standalone_syscalls in test_other.py.

For other tests look for @also_with_standalone_wasm in test_core.py

@codefromthecrypt
Copy link

on behalf of wasm people I know I would like to thank @jerbob92 and @sbc100 for progressing this, as it not only unlocks things like pdfium, but also generic wasm utilities that use emscripten (like wabt). Ex being able to run wat2wasm using any wasi runtime instead of whatever was released and packaged.

If any one not currently involved has skills to contribute, please do as this is very likely well into "hobby time" for @jerbob92 and it is at the core a pretty substantial infrastructure change and a lot of work to test it. other wasm people will thank you, but I'll thank in advance!

@sbc100
Copy link
Collaborator

sbc100 commented Dec 14, 2022

on behalf of wasm people I know I would like to thank @jerbob92 and @sbc100 for progressing this, as it not only unlocks things like pdfium, but also generic wasm utilities that use emscripten (like wabt). Ex being able to run wat2wasm using any wasi runtime instead of whatever was released and packaged.

If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too.

@codefromthecrypt
Copy link

@sbc100 ps your last comment didn't format right. might want to polish it

@codefromthecrypt
Copy link

If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too.

good idea, on the wabt thing and indeed a separate topic if they are open to decoupling from emscripten.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 14, 2022

If you just wast a wasi version of the wabt tools I think the simplest path would be to use wasi-sdk to build it. Of course that doesn't mean we shouldn't land this change too.

good idea, on the wabt thing and indeed a separate topic if they are open to decoupling from emscripten.

IIUC wabt is not coupled to emscripten, we just happen to build it with emscripten for the web demo.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 14, 2022

(I guess maybe what you are saying is that it would be nice if the web version of wabt, which is run by emscripten, happened to also be WASI compliant.. in that case I agree that would be very cool).

@codefromthecrypt
Copy link

yeah sorry it was that we were looking for a wabt compiled to wasm and found the one that was compiled with emscripten for the web. I raised this issue for out of browser wasi binary WebAssembly/wabt#2101

@jerbob92
Copy link
Author

Thanks @codefromthecrypt! So in hindsight, I don't think my C skills are good enough to get this PR into a merge-able state, implementing AT_FDCWD, using the preload dirs and not the FD 3, implementing tests.

@codefromthecrypt
Copy link

@jerbob92 no worries, I think you got things very far. I'll keep recruiting to whatever end on this.

@jerbob92 jerbob92 force-pushed the implement-more-syscalls-for-standalone branch from 55abfc0 to 9fbd75d Compare April 14, 2023 10:20
@jerbob92
Copy link
Author

@sbc100 I have done some more progress on this PR now:

  • Implemented WASI pre-opens
  • Implement mkdirat

I do have some questions though:

  • How to get started on adding tests for this?
  • The standalone.c file is getting very big right now, how do we want to split this up? Perhaps something like standalone_wasi_preopens.c, standalone_wasi_fs.c, standalone_wasi_random.c go keep things clean?
  • Is there an easy way to check which syscalls still need to be implemented in standalone to make it full WASI compliant?
  • Code is mostly copied from wasi-libc right now with some changes, is this something that we want?
  • wasi-libc has a whole system to maintain a CWD, is this also something that we want to implement?

@sbc100
Copy link
Collaborator

sbc100 commented Apr 14, 2023

@sbc100 I have done some more progress on this PR now:

  • Implemented WASI pre-opens
  • Implement mkdirat

I do have some questions though:

  • How to get started on adding tests for this?
  • The standalone.c file is getting very big right now, how do we want to split this up? Perhaps something like standalone_wasi_preopens.c, standalone_wasi_fs.c, standalone_wasi_random.c go keep things clean?

I don't feel strongly about this. Perhaps use the same split that wasi-libc uses? Also I don't think we need standalone in each filename since they will all be in the standalone directory.

  • Is there an easy way to check which syscalls still need to be implemented in standalone to make it full WASI compliant?

The ideas is that we will be running the full wasi testsuite. I started adding some of it in #12704 and I have some
more plans.

  • Code is mostly copied from wasi-libc right now with some changes, is this something that we want?

I don't think its a problem, but please document the origin and try to document any changes from upstream.

  • wasi-libc has a whole system to maintain a CWD, is this also something that we want to implement?

Sure. We could even consider adding wasi-libc as a submodule and including certain files directly?

@jerbob92
Copy link
Author

I don't feel strongly about this. Perhaps use the same split that wasi-libc uses? Also I don't think we need standalone in each filename since they will all be in the standalone directory.

Sounds good!

The ideas is that we will be running the full wasi testsuite. I started adding some of it in #12704 and I have some
more plans.

Nice, that will make it a lot easier!

I don't think its a problem, but please document the origin and try to document any changes from upstream.
Sure. We could even consider adding wasi-libc as a submodule and including certain files directly?

If that's a possibility that would be great, it would make it a lot easier because right now I'm just cherry-picking wasi-libc code to get the syscalls/WASI calls working that I require, especially if we want to pass the whole wasi-testsuite then we have to copy a lot from was-libc. Only thing I noticed while copying code is that their WASI function signature is a bit different sometimes, which prevents us from directly using the C code (AFAIK), so we might want to look into making that possible. For example:
wasi-libc __wasi_path_filestat_get = __wasi_errno_t __wasi_path_filestat_get(__wasi_fd_t fd, __wasi_lookupflags_t flags, const char *path, __wasi_filestat_t *retptr0)
Emscripten __wasi_path_filestat_get = __wasi_errno_t __wasi_path_filestat_get(__wasi_fd_t fd, __wasi_lookupflags_t flags, const char *path, size_t path_len, __wasi_filestat_t *buf)

So Emscripten requires the path length and wasi-libc doesn't. But perhaps we can work around this by completely using wasi-libc for the wasi part.

return 0;
}

int rmdirat(int dirfd, intptr_t path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this static and make it clear this is an internal helper?

@karelbilek
Copy link

A stupid nit - if it was based on wasi libc, shouldn't there be copyright attribution somewhere? (Or is that not necessary?)

@sbc100
Copy link
Collaborator

sbc100 commented Apr 25, 2023

A stupid nit - if it was based on wasi libc, shouldn't there be copyright attribution somewhere? (Or is that not necessary?)

No stupid at all, we should certainly consider how to handle this. I'm not sure why the right answer is. Or how we should deal with keeping the codebases in sync. One easy option wold be to add wasi-libc as a submodule instead of duplicating the code here.

@agnivade
Copy link

agnivade commented Dec 4, 2023

Sounds like your libc re-implementations would then be built on top of the syscalls layer which is then, by definition, not portable between OSes.. meaning you would need to re-implement for each OS. But I guess you decided to trade off those two forms of non-portableness?

Correct. Go has proper cross-compilation support and automatically takes care of this.

For example, if you are already implementing the emscripten ABI/API, why not just continue to do that..

Oh yeah we could. But from what I understand, the openat and other similar functions need to be exposed for us to re-implement it. @jerbob92 can probably provide more details as to why it's not possible to do this from Go. Currently, only these can be overridden:

  (import "wasi_snapshot_preview1" "clock_time_get" (func $__wasi_clock_time_get (type 31)))
  (import "wasi_snapshot_preview1" "proc_exit" (func $__wasi_proc_exit (type 3)))
  (import "wasi_snapshot_preview1" "fd_write" (func $__wasi_fd_write (type 10)))
  (import "wasi_snapshot_preview1" "fd_read" (func $__wasi_fd_read (type 10)))
  (import "wasi_snapshot_preview1" "fd_close" (func $__wasi_fd_close (type 0)))
  (import "wasi_snapshot_preview1" "fd_seek" (func $__wasi_fd_seek (type 66)))
  (import "wasi_snapshot_preview1" "environ_sizes_get" (func $__wasi_environ_sizes_get (type 1)))
  (import "wasi_snapshot_preview1" "environ_get" (func $__wasi_environ_get (type 1)))

@jerbob92
Copy link
Author

jerbob92 commented Dec 4, 2023

IMHO it's way better to add support for WASI in Emcripten than to add Emscripten ABI/API support for every runtime out there.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 4, 2023

IMHO it's way better to add support for WASI in Emcripten than to add Emscripten ABI/API support for every runtime out there.

Perhaps, but I think that argument only really makes sense if the resulting binary is WASI-compaible. In this case it sounds like the resulting binary will contain a bunch of emscripten-specific stuff anyway, so won't be WASI-compliant moulde, right? (or am I misunderstanding?)

@jerbob92
Copy link
Author

jerbob92 commented Dec 4, 2023

IMHO it's way better to add support for WASI in Emcripten than to add Emscripten ABI/API support for every runtime out there.

Perhaps, but I think that argument only really makes sense if the resulting binary is WASI-compaible. In this case it sounds like the resulting binary will contain a bunch of emscripten-specific stuff anyway, so won't be WASI-compliant moulde, right? (or am I misunderstanding?)

Correct, there are still a few things that are needed to run Emscripten binaries:

  1. Embind adds a lot of Emscripten imports, but you don't have to use Embind so I don't think that's a problem
  2. emscripten_notify_memory_growth
  3. _emscripten_throw_longjmp
  4. invoke_xxx generated methods for indirect function calls and catching errors

emscripten_notify_memory_growth could probably be removed in standalone mode (don't know if it adds any value there, in Wazero it doesn't do anything). I suspect that 3 and 4 can be get rid of when more features of WASI get to an implementation phase.

@kripken
Copy link
Member

kripken commented Dec 2, 2024

@sbc100 What do you think about moving forward with this PR? It should at least help in some cases. E.g. one of the syscalls missing in #23008 is implemented here (and perhaps more, I'm not sure by the names).

@sbc100
Copy link
Collaborator

sbc100 commented Dec 2, 2024

Sure, I'd be OK with moving forward with this. I think it still needs a bit of work though.

@turbolent
Copy link

Would it be possible to create a TODO list for the work that still need to be done to get this PR merged? I'd be happy to help, but not quite sure where to start. Thanks!

@karelbilek
Copy link

karelbilek commented Apr 23, 2025 via email

@kripken
Copy link
Member

kripken commented Apr 24, 2025

Concretely I think what is needed here is

  • Add testing to cover the new functionality
  • Get CI fully passing

Anything else @sbc100 ?

case O_RDONLY:
case O_RDWR:
case O_WRONLY:
if ((flags & O_RDONLY) != 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this so far!

Musl defines O_RDONLY as 0, so this line will not work I think. This will only work in implementations where the file access modes are single bits, one for each file access mode (like wasi-libc, where this is the case).

The robust, POSIX compliant way to deal with file access modes is to:

  1. first mask the file access modes part off with flags & O_ACCMODE
  2. then compare the resulting value directly to the five defined access modes (O_RDONLY, O_WRONLY, O_RDWR, O_SEARCH, O_EXEC)

With this, it should be possible to open a file as O_RDONLY.

Oh, and if you test with Wasmer: Right now, it doesn't support opening a file with O_RDONLY (wasmerio/wasmer#4892). This took me quite a while to figure out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code comes from wasi-libc, which also uses musl, so then it would be broken there as well?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly how much they use from musl, but they defined the file access mode bits themselves at least:

https://github.com/WebAssembly/wasi-libc/blob/e9524a0980b9bb6bb92e87a41ed1055bdda5bb86/libc-bottom-half/headers/public/__header_fcntl.h#L18-L22

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah they defined it both at the top and bottom half (with different values). I think we will known soon enough if this works or not if we add tests.

I'm mostly just copy pasting stuff from wasi-libc here, so I wouldn't know if it's correct, it works for me but I'm not sure if I hit this code path.


// If we can't find a preopen for it, fail as if we can't find the path.
if (dirfd == -1) {
errno = ENOENT;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be return -ENOENT. And similar elsewhere. In wasi-libc, this code is used in function that use the "return -1 + errno" error convention, while the __syscall_* functions use the "return negative error" convention of the Linux syscall API.

@jiixyj
Copy link

jiixyj commented May 1, 2025

@jerbob92 : Are you or is anyone else working on the test integration at the moment? Based on this PR I managed to get the ./test/fcntl/test_fcntl_open.c test working and hooked up using the @also_with_standalone_wasm test decorator.

I used this test because it is pretty simple. It just needed open, stat, mkdir and symlink.

Should I open a new PR?

@jerbob92
Copy link
Author

jerbob92 commented May 1, 2025

@jiixyj Not that I know of! New MR is fine, you can also update this one if you have access.

@jiixyj
Copy link

jiixyj commented May 2, 2025

Based on code from this PR and wasm-libc, I've opened PRs #24246 and #24247 if anyone wants to have a look.

I have mostly concentrated on the filesystem stuff, so not all syscalls from this PR are included. But it shouldn't be too hard to add them in follow up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants