Skip to content

Prepare for integration with String Builtins #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

guybedford
Copy link
Collaborator

@guybedford guybedford commented Feb 5, 2025

This prepares for the integration with string builtins, without explicitly rebasing to it yet.

We add a note to all spec steps that apply on the merge path:

  • Passing builtins: ['js-string'] and not providing a string import module for now
  • Using the find a builtin method from the string builtins proposal to exclude all (importName, moduleName) pairs that correspond to provided compile-time builtin implementations. This way, fallbacks can still apply within builtin modules to polyfills.

@guybedford guybedford force-pushed the reserve-wasm-namespace branch from b1da11a to a49d71c Compare February 5, 2025 23:23
@guybedford
Copy link
Collaborator Author

guybedford commented Feb 6, 2025

As written this throws a link error at compile time. We have a few options here:

  1. Link error at compile time
  2. Compile error at compile time
  3. Later link error at instantiation time
  4. Later compile error at instantiation time

Thoughts / feedback welcome what might be most suitable // cc @eqrion

@guybedford
Copy link
Collaborator Author

For now, I've changed the error to be a Compile error per (2) above.

@guybedford
Copy link
Collaborator Author

There is actually a benefit to doing this error late in that source phase imports can still support polyfilling the wasm: namespace in custom instantiations while only instance-level ESM imports have to fail, so I think that argument tips the scales towards (3) here.

@eqrion
Copy link
Contributor

eqrion commented Feb 6, 2025

There is actually a benefit to doing this error late in that source phase imports can still support polyfilling the wasm: namespace in custom instantiations while only instance-level ESM imports have to fail, so I think that argument tips the scales towards (3) here.

Double checking my understanding here.

So if you have foo.wasm with (import "wasm:js-string" ...).

On a browser that doesn't support js-string builtins.

  1. import source wasmModule from "foo.wasm" will give you a module which must be instantiated and the wasm:js-string imports must be provided by the user.
  2. import { .. } from "foo.wasm" will fail with a link error.

On a browser that does support js-string builtins.

  1. import source wasmModule from "foo.wasm" will give you a module which can be instantiated without providing the js-string imports because the host already provides them. The user cannot override the js-string imports because the host already provided them during compilation.
  2. import { .. } from "foo.wasm" will succeed and the js-string imports are provided by the host.

Is that accurate?

@guybedford
Copy link
Collaborator Author

Yes, that's a correct summary, with the same rules applying to any future builtins.

@kmiller68
Copy link

kmiller68 commented Feb 6, 2025

We might want to let them override it in the supporting js-string browser. That way if they have some bug in their polyfill they keep getting the polyfill. On the other hand they could still get that wrong as folks do in JS. Since wasm isn't a source language, I would expect folks are less likely to camp on names the committee would want though...

Ideally, there would be some magic key that sites have to have to get the built-in version and that magic key is only picked/published once a browser actually ships the feature. That way browsers can turn on the built-in and sites have to redeploy to actually start using it. I don't know how we would do that though... but maybe this comment will inspire someone smarter than me to figure it out!

@guybedford
Copy link
Collaborator Author

Yes, we have the possibility to hard-code a value here under the wasm: namespace in future, but for now this is not included, and the Node.js implementation will just follow this PR exactly.

I'll also post up a new web platform test to ensure this behaviour is captured in the ESM Integration WPT.

Would be great to get a review / approval before I go ahead further though if someone can take a look.

@guybedford
Copy link
Collaborator Author

I think if an override behaviour is desired, then that should be supported in the custom instantiation workflows by allowing importObj to override instead of having string builtins always take precedence. But that would be a normative change in the string builtins to avoid polyfill upgrade issues on the esm integration path.

Other options might be to have a new setting allowInstantiationBuiltinOverrides: true for compilation that the ESM integration could specifically enable in compilation.

And of course the import attributes option for setting compile options can also apply in due course. But the first two items above are probably the best bet. @eqrion do you think the concern of broken polyfill upgrade paths / permitting builtin virtualization is worth considering a feature like this for?

@annevk
Copy link
Member

annevk commented Feb 10, 2025

So this results in wasm: becoming a URL scheme? It seems that needs a somewhat broader discussion with more people involved. But it only works within the context of WebAssembly? I'm not sure I fully understand.

@guybedford
Copy link
Collaborator Author

guybedford commented Feb 10, 2025

@annevk currently today, string builtins are never exposed to the module system and behave as implicit bindings more analogous to JS globals (wasm:js-string is fully removed from the public imports of the Wasm module).

This PR explicitly adds a new feature to the ESM integration to make it stricter about handling other wasm: specifiers - to ban instantiation of unknown wasm: prefixed modules so that new builtins can be added in future. All other custom schemes remain supported though.

That is import './x.wasm' where x.wasm contains an import to wasm:unknown-builtin will throw with this PR landed, but does not throw without this PR landed.

Today anyone can write a JS file containing:

import mod from 'node:builtin'

And an import map containing:

<script type="importmap">
{
  "imports": {
    "node:builtin": "/implementation.js"
  }
}
</script>

And polyfill eg Node.js builtins.

The Wasm scheme as defined here is actually more strict than JS, and that makes sense because it is strictly not an import scheme but a lower level implicit compile time import binding scheme.

The discussions here have then been about polyfill paths for unknown builtins in the import source mod from './x.wasm' source phase case, where:

WebAssembly.instantiate(mod, { 'wasm:unknown-builtin': { ... } })

would be useful just like we can do for compileStreaming.

Does that help clarify your understanding of the situation?

@annevk
Copy link
Member

annevk commented Feb 10, 2025

I think so, but this makes wasm: a web-observable URL scheme, whereas node: and whatever: are not. As such this does seem like a fairly big decision that probably needs wide buy-in.

@guybedford
Copy link
Collaborator Author

@annevk I'm unclear on how this makes wasm: web-observable differently to any other namespacing. Can you clarify exactly what aspect of observability is the issue here? Is it that one could inspect the error that Wasm denies imports to the wasm: scheme?

If the error is the issue we could change this PR to no longer throw for the wasm: scheme while we gain wider support for this through some venue? Suggestions for what venue would be appreciated.

@guybedford
Copy link
Collaborator Author

I've added a commit here to remove the wasm: scheme error entirely, to follow up with that change separately in a new PR to gain wider support for. @annevk please take a look and let me know if this resolves your concerns?

@eqrion @kmiller68 @lukewagner I would highly value an approval to be able to move this forward and get the Node.js implementation merged.

@sjrd
Copy link

sjrd commented Feb 10, 2025

One nice property of the JS string builtins, as they are currently specced, is that they are polyfillable. Not only as a whole today, but also partially in the future. If a later proposal adds new functions in the existing string builtins module, it is possible to polyfill only the new functions when they are not implemented yet, while benefitting from the fast native implementation of the (older) existing builtins. Moreover, when an engine adds support for a new function, we automatically get the better implementation.

Is it possible to preserve these very nice properties in the ESM integration?

@lukewagner
Copy link
Member

Despite the :, I don't think the wasm: prefix in wasm:js-string is a "scheme". In particular, the fetch algorithm shouldn't ever need to consider wasm: because wasm:js-string and other builtins are not being considered URLs; rather they are taking another branch in the ESM logic (like bareword names).

@guybedford
Copy link
Collaborator Author

guybedford commented Feb 11, 2025

One way we can retain polyfillability while not making the wasm: scheme blanket observable is to explicitly deny only wasm:js-strings as the builtin import name that is reserved by compile time linking.

This way, any future compile time imports remain implementable and thus polyfillable, in both the evaluation and source phases (via import maps or custom Wasm instantiation respectively), and we don't have any matching rule on imports for the wasm: prefix, instead just excluding the one known compile time import from the import list.

This makes the tradeoff that adding new builtins in future may be seen as a breaking change of sorts, but that might be possible if we explicitly ensure Wasm toolchains never use the wasm: scheme unless following the standard.

The opt-out clause here then might be to allow import attributes to customize the explicit compile-time imports instead of relying on the default - import source mod from './mod.wasm' with { builtins: 'js-strings' } or otherwise.

For now I've updated to this approach, let's discuss the tradeoffs as part of the meeting agenda further, but we do need to come to a resolution on this soon for implementations to ship.

@annevk
Copy link
Member

annevk commented Feb 11, 2025

@lukewagner wouldn't it be the case that if a wasm: URL scheme got standardized independent of this effort and supported by user agents, it would not work in module specifiers because of this special case?

@sjrd
Copy link

sjrd commented Feb 11, 2025

One way we can retain polyfillability while not making the wasm: scheme blanket observable is to explicitly deny only wasm:js-strings as the builtin import name that is reserved by compile time linking.

This way, any future compile time imports remain implementable and thus polyfillable, in both the evaluation and source phases (via import maps or custom Wasm instantiation respectively), and we don't have any matching rule on imports for the wasm: prefix, instead just excluding the one known compile time import from the import list.

IIUC, this allows new separate modules to be polyfilled (like a hypothetical wasm:js-numbers). But it does not allow to polyfill a new function within the js-strings module (like a hypothetical indexOf for strings).

@eqrion
Copy link
Contributor

eqrion commented Feb 11, 2025

@lukewagner wouldn't it be the case that if a wasm: URL scheme got standardized independent of this effort and supported by user agents, it would not work in module specifiers because of this special case?

I'm not familiar with all the subtleties here, so I may be missing something. Who would standardize a URL scheme for wasm:? The Wasm CG chose the wasm: naming scheme for where we want to put our JS builtins, and at least Firefox and Chrome implement this as part of the Wasm JS-API currently (through an opt-in flag when compiling a module). So I don't think Wasm CG would be likely to also independently standardize a 'wasm:' URL scheme.

@annevk
Copy link
Member

annevk commented Feb 11, 2025

It was pointed out to me that I might have been misled by (import "wasm:js-string" ...) above and that this only intends to reserve the "wasm" module specifier. Not all specifiers starting with "wasm:". That still seems like something that needs some agreement between TC39, Wasm, and WHATWG, as it's essentially the built-in modules debate again, but it's at least not a new URL scheme.

@guybedford
Copy link
Collaborator Author

@annevk as far as I'm interpreting the feedback, are you saying that the handling here that explicitly excludes the known compile time builtin string list is preferable to an approach that filters for a wasm: prefix as either supported at compile-time or throwing? So that it is preferred because it special cases individual specifiers over defining something analogous to a scheme-like rule? It would help to know if this is definitely the take, to be able to clearly communicate where future consensus might be needed.

In both cases, these special import names are specific to Wasm and entirely internal to Wasm though - everything happens at compilation such that that they aren't exposed to the ECMA module system at all (WebAssembly.Module.imports(mod) and the module record [[ModuleRequests]] do not include the compile-time builtin).

@sjrd I think polyfilling individual functions is a question for the string builtins proposal itself, it would be great to see that discussion tracked as an issue in that repo if you are able to follow-up on this?

@sjrd
Copy link

sjrd commented Feb 11, 2025

@sjrd I think polyfilling individual functions is a question for the string builtins proposal itself, it would be great to see that discussion tracked as an issue in that repo if you are able to follow-up on this?

This was already discussed, and is already explained in the explainer, starting here:
https://github.com/WebAssembly/js-string-builtins/blob/main/proposals/js-string-builtins/Overview.md#progressive-enhancement
It was also specifically reaffirmed in this issue, for example:
WebAssembly/js-string-builtins#40

@eqrion
Copy link
Contributor

eqrion commented Feb 11, 2025

It was pointed out to me that I might have been misled by (import "wasm:js-string" ...) above and that this only intends to reserve the "wasm" module specifier. Not all specifiers starting with "wasm:". That still seems like something that needs some agreement between TC39, Wasm, and WHATWG, as it's essentially the built-in modules debate again, but it's at least not a new URL scheme.

@annevk

For what it's worth, a TAG review was done of the original js-string-builtins proposal [1]. It was closed though with a comment that the TAG lacked the right 'domain expertise' for this area. The original proposal also didn't integrate with ESM yet, so that part was lacking.

Wasm builtins have a smaller scope than the original built-in modules proposal (AFAICT). They're a space for operations on JS primitives or JS builtin objects that are around the size of a wasm instruction. It's not intended for general Web API's or even most of the JS builtin API's.

[1] w3ctag/design-reviews#940

@guybedford From reading the discussion here, I'm remembering some of the subtleties around polyfilling in the JS-API part of the proposal.

While it'd be nice to reserve the 'wasm:' prefix here for flexibility in future evolution, that seems like it would make polyfilling not work unless users were to ship different import names for different versions of browsers (which is a probably a non-starter).

Reserving just specifiers that a browser currently implements (like 'wasm:js-string'), would prevent polyfiling new additions to an existing specifier (as @sjrd points out).

Both of those use-cases work with the Wasm JS-API because (1) the users opts-in to the exact set of builtin collections (e.g. js-string) they want. Anything not requested falls back to normal instantiation. And (2) if a user imports a function from a builtin collection that is not supported by the browser, we fall back to normal instantiation for that individual import.

The Wasm JS-API design does have the risk that when a wasm engine implements a new builtin, it could break a polyfill that has a bug/quirk that the user is relying on as their imports are overridden automatically by the engine (as @kmiller68 points out). I don't remember if this was discussed or if there is a good alternative to avoid this. For the JS-API this only happens in the case where an engine adds a new builtin to an existing collection, as that's the case where you automatically get the builtin to override anything you provided. Maybe that's acceptable?

I wonder if the following design for ESM integration would work:

  1. Add an import attribute for the wasm builtin collections to enable: either 'all', 'none', or a list of specific ones to enable. The default if not specified would be 'all' (as opposed to the JS-API which defaults to 'none').
  2. Follow the same behavior for unrecognized builtin imports as for the Wasm JS-API. If an import is to a builtin collection that is not implemented or not requested, then we fall back to how normal imports work. If an import is to a function in a builtin collection that is enabled, but the function is not implemented, then also fall back to normal import rules.

This would give us a better polyfilling story than if we had reserved everything under wasm:, but will mean that the wasm CG will need to be more careful in selecting names of collections and functions. I think as long as toolchains follow a rule of not using anything in wasm: that is not standardized (or being standardized) we'll be okay. Wasm is also generally not human authored, and so if the Wasm CG needs to pick an odd name to avoid a collision that's not a big deal either.

e.g. for the issue in #95 around what name to use for the string constants feature, the CG will have multiple options for names to pick in the future even if the whole 'wasm:' prefix is not reserved. The important bit for string constants is just that we don't have a good name to use today because we have binary size concerns. So I'd prefer to just not pick anything right away.

@guybedford
Copy link
Collaborator Author

  1. Add an import attribute for the wasm builtin collections to enable: either 'all', 'none', or a list of specific ones to enable. The default if not specified would be 'all' (as opposed to the JS-API which defaults to 'none').
  2. Follow the same behavior for unrecognized builtin imports as for the Wasm JS-API. If an import is to a builtin collection that is not implemented or not requested, then we fall back to how normal imports work. If an import is to a function in a builtin collection that is enabled, but the function is not implemented, then also fall back to normal import rules.

@eqrion these are good suggestions and I agree. For (1), I've created an issue to follow up on the attribute in #99, and I hope to have a PR up shortly.

For (2) I've followed your suggestion as described by changing the integration to explicitly use the find a builtin algorithm and noting this semantic only works when fully merged with the string builtins spec. Note, unless I'm misinterpreting the spec, I believe this also requires the upstream fix WebAssembly/js-string-builtins#46 to work as you describe.

Guy Bedford and others added 3 commits February 11, 2025 13:42
@zamfofex
Copy link

@annevk Sorry to butt in a bit, but I want to clarify that when you see wasm:js-string, it refers to a module name (or “specifier”, formally). It’s similar to import {} from "wasm:js-string" in JavaScript code. It is not similar to import {"js-string" as string} from "wasm".

It is true that if some other entity decides to specify a wasm: URL scheme, that it wouldn’t be accessible from Wasm (or at least wasm:js-string wouldn’t, even if it were given a different set of semantics under this new spec). Though ideally, that just wouldn’t happen, or at least such a spec wouldn’t be targetting browsers specifically. (Specially since this has already been implemented and shipped, I doubt browsers would be willing to change their behavior.)

@annevk
Copy link
Member

annevk commented Feb 26, 2025

Yes, that is the concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants