Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] add wasm-bindgen support #23493

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

walkingeyerobot
Copy link
Collaborator

This is an early draft PR for the purposes of gathering feedback early. There are also pending changes to wasm-bindgen.

How this works:

  1. Cargo builds Rust code targeting wasm32-unknown-emscripten into a .a file.
  2. Emscripten is invoked with any C++ sources and the just built Rust .a file.
  3. Emscripten builds C++ sources and then calls out to wasm-ld to link the C++ and Rust into a .wasm file.
  4. wasm-bindgen is run on that .wasm file, producing a new .wasm file, a library.js file, and a pre.js file.
  5. Emscripten constructs its own .js, integrating the wasm-bindgen .js files.

You can see a demo more easily at https://github.com/walkingeyerobot/cxx-rust-demo. library_wbg.js and pre.js are approximately what will be produced by wasm-bindgen for consumption by Emscripten.

Some TODOs:

  1. Figure out how to pass the exported symbols from the rust compiler to Emscripten. These are symbols that need to be passed to wasm-ld so they're not removed in the final .wasm but that may not necessarily be present after wasm-bindgen processes the .wasm. wasm-bindgen at compile time puts the information it needs to generate JS inside the .wasm file itself in the form of _describe functions. These functions are then removed after JS generation.
  2. Merge the .js files produced by wasm-bindgen. This shouldn't be that hard; I just haven't gotten around to it yet. This would simplify the code for both Emscripten and wasm-bindgen.
  3. Get wasm-bindgen tests to pass. Early efforts here have revealed some very odd compiler differences between -unknown and -emscripten that I'll have to fix.
  4. Have this work end-to-end via wasm-pack. I'll have a draft PR for this soon (tm).

I'm mostly looking for feedback on the first point about exported symbols and about the general addition of -sWASM_BINDGEN to Emscripten. Again, this is very early, but it's a pretty big feature, so I thought it best to start discussions now.

cc @daxpedda, who I've been working with on the wasm-bindgen side.

@kripken
Copy link
Member

kripken commented Jan 24, 2025

wasm-bindgen at compile time puts the information it needs to generate JS inside the .wasm file itself in the form of _describe functions.

Does rustc then read the wasm to find those function names, and pass those names to wasm-ld? (if not, how does it find those names?)

In general if we need to read metadata-type info from the wasm, then we have a minimal parser in tools/webassembly.py. If we need something more complex, a binaryen pass is an option.

@walkingeyerobot
Copy link
Collaborator Author

wasm-bindgen itself is two pieces: a library that allows you to annotate your rust code marking things to be exported, and a tool that consumes a .wasm file and reads those annotations to produce a companion js file. rustc knows about those function names because wasm-bindgen as a library provided the annotations. If rustc invokes the linker itself, it's able to pass that information along. However, because we need to also build C++, we're only using rustc to compile and not drive the whole process, so we need to have it output that information elsewhere.

One (very naive) possibility is to have rustc invoke a fake linker that just writes the -sEXPORTED_FUNCTIONS to a file for emscripten to read later.

tools/link.py Outdated
@@ -1932,6 +1933,11 @@ def phase_post_link(options, state, in_wasm, wasm_target, target, js_syms, base_

settings.TARGET_JS_NAME = os.path.basename(state.js_target)

if settings.WASM_BINDGEN:
phase_wasm_bindgen(in_wasm)
settings.PRE_JS_FILES += [os.path.abspath(get_emscripten_temp_dir() + '/wbg_out/wbg_pre.js')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can assume that get_emscripten_temp_dir() is absolute.

Also, we an in_temp helper for generated temporarily files. See its use elsewhere in this file.

What do you think about using bindgen instead of the (IMHO less obvious) wbg acronym?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the unnecessary os.path.abspath calls, and happy to use bindgen over wbg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like in_temp() doesn't handle the nested directory in the temp directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants