Skip to content

Conversation

@lindell
Copy link

@lindell lindell commented Dec 1, 2025

Problem

Today, the emscripten_destroy_web_audio_node function only disconnects the AudioNode and removes it from the registry.

This does not guarantee that process() callback wont be called again, and therefore the registered callback in emscripten_create_wasm_audio_worklet_node might continue to be called.

If the userData pointer is used, it might be very hard to free that resource without risking use after free.

How this PR fixes this

When emscripten_destroy_web_audio_node is used, send a message to the Node, that will set a property to ensure that any access to WASM memory will no longer happen.

Since the existing function would not have a way of knowing when it is safe to free any resource connected to the userData pointer. Another function, emscripten_destroy_web_audio_node_async is added, that will get a callback when we can ensure that no more calls to the registered EmscriptenWorkletNodeProcessCallback will happen.

I'm unsure if the callback should happen on the main thread, or in the AudioContext.

Fixes: #25884

@sbc100 sbc100 requested a review from cwoffenden December 1, 2025 16:34
// Explicitly disconnect the node from Web Audio graph before letting it GC,
// to work around browser bugs such as https://webkit.org/b/222098#c23
EmAudio[objectHandle].port.postMessage({'stop': 1});
EmAudio[objectHandle].disconnect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use a shared memory location here to guarantee that the callback will never fire again once this function returns?

Then we would not need emscripten_destroy_web_audio_node_async at all I think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that shared location will then be leaked memory? As it will never be able to tell when the last process callback will be (at least according to the WebAudio spec)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting yes. What about this sequence of operations:

  1. Main thread sets "shutdown" bit.
  2. Main thread blocks/spins until worklet's next "process" callback which consumes the "shutdown" bit and sets a JS flag preventing any future "process" callback. The worklet would then set the "shutdown_complete" bit unblocking the main thread.
  3. Main thread is now free to release all shared memory resources.

Assuming the audio worklet always make progress during the emscripten_destroy_web_audio_node I think it should be OK with block like this. WDYT @cwoffenden ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of blocking in the main thread should be fine, since we'd already be in the main thread, waiting on the AW, so at most it should be spinning for 3ms with the default quantum size of 128 samples (as this becomes adjustable this blocking time will increase).

It would need some testing to prove that spinning for 3ms doesn't cause Chrome to start delaying timeouts or frames, so I think in general the async approach is cleaner.

Copy link
Author

@lindell lindell Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very much prefer if we could avoid the callback here.

Assuming process will always be called afterwards, that should work. But is that really guaranteed to happen? Should at least not be when using an Offline Audio context?

Assuming we can guarantee it.
Do we even need the spinlock? If we set the shutdown bit, we can already then guarantee that the process callback wont be called. So there is no need to wait at that point?
As soon as the shutdown bit is set, we should be free to free any resource that is connected to the Process callback?

Copy link
Author

@lindell lindell Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwoffenden https://ui.perfetto.dev is really useful when understanding how the code is running.

Here are some chrome traces of a (non emscripten) audio worklet:

Demo page here: https://lindell.me/audio-context-demos/noise-generator.html

Copy link
Author

@lindell lindell Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 Your suggestion would work from a free perspective (being sure you can free when the shutdown_competed bit is set. I was referring to my modified suggestion which will not since the process callback might already be running.

While a predictable 3ms spin might be acceptable, we can see this might take hundreds of milliseconds.

Furthermore, if the Audio Context is suspended, the process callback may never fire, causing the main thread to spinlock indefinitely (deadlock), or if we have a max timeout, cause a use after free when it resumes. Suspended audio contexts are simply no longer requesting audio through the audio graph and thus not calling the process callback on worklet nodes. While MessagePorts are still handled:
https://lindell.me/audio-context-demos/suspended-messages.html

Similarly, with OfflineAudioContexts. Will only call process, when a chunk of audio is actively requested. While MessagePorts will still be processed: https://lindell.me/audio-context-demos/offline-messages.html

Because we cannot guarantee the process scheduler's behavior, we cannot safely block the main thread waiting for it.

Reusing the existing API function without any new functions is definitely to preferred if it can be done without other consequences, which I do unfortunately not think we can do. We do not know

  1. How long it might take for schedules (even if we could build the sync API on top of MessagePorts)
  2. How performance sensitive users are.
  3. How many Worklet nodes the users might need to destroy at once. Since WebAudio is designed to be used with a graph of nodes. There could definitely be scenarios with a lot of different worklet nodes.

If we implement this on top of the existing API, hoping the spinlocking will be fine, and where users can expect the userData will not be used after the destroy. And then some time later realises that this will be unacceptable for some new user. We will not be able to reverse it without breaking existing uses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the Audio Context is suspended

A very valid point, suspended either explicitly or simply because the tab is backgrounded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there more information needed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindell lindell changed the title [AUDIO WORKLET] Support proper shutdown if audio node [AUDIO WORKLET] Support proper shutdown of audio node Dec 3, 2025
@juj
Copy link
Collaborator

juj commented Dec 9, 2025

I posted a comment at WebAudio/web-audio-api#2658 (comment) to involve the spec folks.

This indeed looks like a very difficult problem.

We definitely do want a synchronous emscripten_destroy_web_audio_node(), which guarantees that it is safe and possible to free() any and all shared data control structures that might have been used between an Audio Worklet node and the main thread, synchronously after the emscripten_destroy_web_audio_node() call returns.

Otherwise would be madness.

The only way I can see to implement that mechanism today is to malloc() a short control block that contains a bool shouldShutDown; boolean, and have the Audio Worklet side poll for that boolean in the process() function. When shouldShutDown becomes true, then the Audio Worklet will free() the control block and return false;

This will unfortunately lead to a memory leak if the Web Audio graph never resumes, but that is not the end of the world. The way bigger problem here seems to be that this will unconditionally require involving the dynamic memory allocator into Audio Worklets (which the current design of Audio Worklets has not required so far)

@lindell
Copy link
Author

lindell commented Dec 9, 2025

@juj It does also not take into account that the process callback can currently run when the shouldShutDown is set. This would still allow for use after frees.

@juj
Copy link
Collaborator

juj commented Dec 9, 2025

Here's my attempt to fix. Iiuc this should give a synchronous guarantee that the audio worklet will no longer process after being shut down: main...juj:emscripten:fix_emscripten_destroy_web_audio_node

Did not give it a test yet though, wrote it blind. So at this point just to give an idea.

@lindell
Copy link
Author

lindell commented Dec 9, 2025

@juj Unfortunetely that implementation will not work if emscripten_destroy_web_audio_node is called while the the process callback is running and we free resources connected to in on the main thread right after emscripten_destroy_web_audio_node is called.

@juj
Copy link
Collaborator

juj commented Dec 9, 2025

Hmm yeah, I was missing a semaphore to prevent the main thread from proceeding while the callback is live. Added one now, I think that should help synchronize the main thread better.

@lindell
Copy link
Author

lindell commented Dec 10, 2025

@juj This approach looks good to me 👍 Since we will might spinlock in rare cases, but that will not be dependent on the scheduling, just that the worklet execution will complete. And it keeps the API sync.

🚀

@lindell
Copy link
Author

lindell commented Dec 11, 2025

@sbc100 @cwoffenden What do you think about @juj's approach? If you like it, I close this one and Juj can create another PR.

@cwoffenden
Copy link
Collaborator

cwoffenden commented Dec 11, 2025

What do you think about @juj's approach?

I think it's a good way of achieving this. I would wrap the atomics calls with Emscripten's existing code where possible, then they're function calls rather than adding more code (and that atomic exchange, at first glance, isn't quite right).

And to add: whilst I said I wouldn't personally use such a call, I do think it's needed, and I'm 50/50 on thinking that not shutting down is why the AW CI tests hang.

@juj
Copy link
Collaborator

juj commented Dec 11, 2025

There are some issues with my branch proposal that I don't like:

  • it adds unconditional dependency to malloc() from Audio Worklets. I took a lot of care with the original Audio Worklets implementation to ensure that it wouldn't fundamentally have ties to malloc/free, to provide memory management flexibility and small code size.
  • it adds code size that is not easy to recover. This is a textbook scenario of "only fixes an issue some people have, but at the cost of regressing code size for every project that utilizes Audio Worklets." Ideally, we'd codegen the fix in only if emscripten_destroy_web_audio_node() is ever called, but that is not easy to achieve in our current system. I am leaning towards adding a -sAUDIO_WORKLET_SUPPORT_DESTROY_NODE linker flag to opt out from this code (defaulting true of course)
  • current code creates a memory leak in scenarios where a node is not added to the graph at all (e.g. if one calls emscripten_create_wasm_audio_worklet_node() + emscripten_destroy_web_audio_node() without ever connecting the created node in the graph in between. Will need to fix that.
  • there was that use case somewhere that asked supporting Audio Worklets without SharedArrayBuffer - this code will veer farther from that being a possibility.

I would wrap the atomics calls with Emscripten's existing code where possible, then they're function calls rather than adding more code

There unfortunately does not exist JS code in Emscripten to do JS-side atomics calls - we only have wasm side atomics. (Calling from JS to wasm to do the atomics wouldn't be useful)

Dedicating JS functions for the atomics would have outlining benefit to avoid repeating the long string Atomics.exchange, although two occurrences of that function is barely a size win. I could try giving that a go though.

(and that atomic exchange, at first glance, isn't quite right).

Yeh, that needs to be tested in detail - didn't run the code yet.

@cwoffenden
Copy link
Collaborator

There unfortunately does not exist JS code in Emscripten to do JS-side atomics calls

True, I wrote them recently for testing pure JS Emscripten-like code, and I had to just remind myself:

https://github.com/cwoffenden/chrome-lock-bug/blob/main/lock-bug-js.html#L62

Dedicating JS functions for the atomics would have outlining benefit to avoid repeating

You might be able to use the JS lock functions directly.

@sbc100
Copy link
Collaborator

sbc100 commented Dec 11, 2025

  • it adds unconditional dependency to malloc() from Audio Worklets. I took a lot of care with the original Audio Worklets implementation to ensure that it wouldn't fundamentally have ties to malloc/free, to provide memory management flexibility and small code size.

Given that the control blocks being allocated are only 8 bytes, can we consider simply statically allocating a fixed amount of them? i.e. would it be reasonable to set and upper limit on the number of active audio worklets to say, 16, or 32? ~100 bytes of static BSS data is way cheaper than malloc/free.

@juj
Copy link
Collaborator

juj commented Dec 11, 2025

Given that the control blocks being allocated are only 8 bytes, can we consider simply statically allocating a fixed amount of them? i.e. would it be reasonable to set and upper limit on the number of active audio worklets to say, 16, or 32? ~100 bytes of static BSS data is way cheaper than malloc/free.

I considered that as well, though there will definitely be some use case that allocates Audio Worklet Nodes to implement some massive audio graph synthesis, that would then have an issue of not having the ability to allocate 50, 100, 1000 or whatever number of nodes they need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUDIO WORKLET] Support proper shutdown if audio node

4 participants