Skip to content

GC related safety and soundness issues #569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
8 of 21 tasks
gmorenz opened this issue Mar 23, 2025 · 9 comments
Open
8 of 21 tasks

GC related safety and soundness issues #569

gmorenz opened this issue Mar 23, 2025 · 9 comments

Comments

@gmorenz
Copy link
Contributor

gmorenz commented Mar 23, 2025

While working on #520 other soundness and safety issues came up, this is something of an omnibus issue regarding them.

By "handle like types" I mean RootedGuard, RootedVec, MutableHandle, and JS::MutableHandle, as well as potentially RootedTraceableBox, and CustomAutoRooter.

Soundness - correct use of the public API results in undefined behavior

  • Traceable modifies values behind a &T reference (Traceable is unsound #560).
  • RootedGuard stores a &mut reference to data the GC looks at and modifies over GC pauses (RootedGuard<T> construction related unsoundness #564).
  • MutableHandle is constructed via a &mut reference, which will be invalidated by GC (Remove DerefMut impl for RootedGuard #572).
  • Handle stores an & reference to data (not wrapped in a Cell type) that the GC modifies over GC pauses.
  • The Deref implementations for handle like types allow for safe code to hold an & reference to data (not wrapped in a Cell type) that the GC will modify at over GC pauses. (Arguably a safety issue not a soundness issue, but I think we'll end up allowing borrowing over GC pauses).
    • Some of this data is modified via Traceable.
    • Some of this data is modified directly in Rooted via RootKind.
  • (TODO: Verify) ForOfIteratorGuard::new is suspicious, it likely has issues similar to those described in RootedGuard<T> construction related unsoundness #564.

Safety - it's possible to use this improperly in safe code

Assumptions in comments on public APIs

  • The comment added in Remove DerefMut impl for RootedGuard #572 on the as_ptr method assumes that the GC will only read - i.e. writes will occur through interior mutability. If that ends up not being the case, the comment should be updated.
@gmorenz
Copy link
Contributor Author

gmorenz commented Mar 23, 2025

Proposed fixes (some of this I'm posting for the first time, other of this I'm copying in from previous discussions):

GC using values while we have a &mut reference to them.

Apart from stopping holding &mut references ourselves, we want to stop the user from doing so. I.e. we need to remove DerefMut impls.

We can replace these with

  • Cell like get and set functions (since we're single threaded these are safe from GC pauses)
  • An as_mut method on MutableHandle that borrows JSContext (mutably if necessary) to prevent GC-invoking methods from being called for the lifetime of the returned reference. Which presumably requires Add JSContext wrapper for bindings with fewer raw pointers. #554 (also see discussion over here).
  • If necessary: A projection macro that lets us go from MutableHandle<Struct> to MutableHandle<Struct::Field> (similar to pin_project).

GC modifying values while we have a & reference to them.

There are two approaches here:

  • Wrap things the GC modifies with types that use interior mutability
  • Give the GC a mutable reference (or raw pointer with write access that won't invalidate anything)

I think we'll end up doing the former. Servo's extensions to tracing and the GC already uphold this, with the exception for the small pointer sized types we implement Traceable or RootKind on in this crate - and those types are effectively opaque (we don't dereference them). This makes this approach tractable.

The other approach has significant usability issues. We'd need to only hold Handle references over GC barriers, which would mean

  • We would need to be able to project Handles to structs subfields.
  • Without nightly features (arbitrary_self_types, dispatch_from_dyn, unsize) we would be unable implement methods/trait methods that take a Handle<T> as a self parameter, resulting in ergonomic issues.
  • Even with nightly features handling enums might be problematic.

JS::[Mutable]Handle

Try and stop anything from using these, and then remove the offending methods. Ideally the mozjs versions of these types would be the only ones used, but if not the JS versions can be treated like raw pointers.

@gmorenz
Copy link
Contributor Author

gmorenz commented Mar 26, 2025

I missed that how we're construction MutableHandle is invalid in the presence of GC (it's created through an intermediate &mut that GC will invalidate). Here's a reduction of the problem for understanding and miri. PR for that, as well as for removing some DerefMut impls, incoming.

@sagudev
Copy link
Member

sagudev commented Mar 26, 2025

NOTE-TO-SELF:
Another interesting thing is this:

mozjs/mozjs/src/gc/root.rs

Lines 194 to 196 in 3e8e9d3

pub fn handle(&self) -> Handle<T> {
unsafe { Handle::new(&*self.ptr) }
}

where we create Handle from MutableHandle. I am not 100% sure how implicit lifetimes works, but I would feel safer if we explicitly return Handle<'a, T> (is that enough or should we also include lifetime of &self?). I think this should do it pub fn <'a: 'b> handle(&'b self) -> Handle<'b T> {.

@gmorenz
Copy link
Contributor Author

gmorenz commented Mar 26, 2025

We should only need the lifetime of &self.

We want the lifetime of &self because we want to borrow the RootedGuard for the lifetime of the Handle. Otherwise you could call this, get a Handle for the duration of the underlying data, then call handle_mut, and get a mutable handle that is live at the same time. Which we are trying to avoid.

Meanwhile in &'a RootedGuard<'b, T> it's already the case that 'b lives at least as long as 'a, you can't have a reference to something that isn't alive.

I think the implicit lifetimes are correct here, I agree it would be nice if they were explicit.

@sagudev
Copy link
Member

sagudev commented Mar 27, 2025

Meanwhile in &'a RootedGuard<'b, T> it's already the case that 'b lives at least as long as 'a, you can't have a reference to something that isn't alive.

I think it would still be worth to be explicit about this in function bounds, so having both (inner lifetime and &self lifetime).

@gmorenz
Copy link
Contributor Author

gmorenz commented Mar 27, 2025

So in the context of the outstanding PRs, you prefer

impl<'a, T> HandleType<'a, T> {
    /// Safety: GC must not run during the lifetime of the returned reference.
    pub unsafe fn as_mut<'b>(&'b mut self) -> &'b mut T 
    where
        'a: 'b
    {
        &mut *(self.as_ptr())
    }
}

I guess I have a weak preference against this (it feels like unnecessary information, that someone might read as "this isn't normal" when it is normal), but not a very strong one. I'll update the PRs to do this.

@aapoalas
Copy link

Hello from out of nowhere! I thought I might have some potentially interesting suggestions / thoughts on GC safety and soundness, from the point of view of a person writing a JS engine in Rust which has... well, quite a different setup from SpiderMonkey, but still of course has GC and all the questions that come with it.

In my/our engine we don't have direct pointers to GC data, instead using handles, so a lot of the problems you deal with are in some ways automatically dealt with. But, because we use unrooted handles by default we actually get a lot of the problems right back. The way I've "resolved" this issue is that I pass around an exclusive, non-clonable but reborrowable GcScope ZST. Each handle binds itself to this GcScope through a shared reference, so they "observe" the GC. Any call to GC or a method that may trigger GC requires a mutable reference to the GcScope, which then invalidates all the handles that are "observing" the GC. There's quite a bit of manual pain with this which I won't go into here, but the main point is that with this it's possible to have the borrow checker nicely and cleanly verify GC safety.

From an outside perspective (and from my personal philosophy on GC) I would think that you could and would want to use the JSContext reference as your GcScope, so eg. getting a &mut to the T of a HandleType could be made safe with:

fn as_mut<'gc>(&mut self, _: &'gc JSContext) -> &'gc mut T {
  &mut *(self.as_ptr())
}

(some additional lifetime bounds for 'a and self may be necessary or at least beneficial).

Now assuming that &mut JsContext is always required for performing GC then this should be perfectly safe as the &mut T would be invalidated by a (potential) GC call. Or if JsContext is used mutably by too many other calls that have nothing to do with GC and shouldn't actually invalidated &mut T, then you might consider a reborrowable ZST token as a stand-in for your GC like I've done, though that's probably way too big of a change to really consider.

From a philosophical standpoint, this also seems to make sense (to me anyway): The JSContext is the way to access the GC heap; that a HandleType internally holds a direct pointer into the GC heap is sort of secondary. The handle is still just a way to find your referred-to data in the GC heap, and the only one with the permission to access the GC heap unfettered is the JSContext. (Then there's the question of "how long is this handle valid" which is different [but always less] than the JSContext lifetime; that's actually what we use our GcToken for but that's not really exactly relevant.)

My apologies if this is unnecessary chatter; I have somewhat of a vested interest in Servo's JS engine binding system and a highly vested interest in GC modeling in Rust, so wanted to chime in. If this is unwanted, please let me know or just hide the comment. If this is wanted/interesting/confusing, I'll be happy to exchange more messages here or elsewhere. Cheers!

@sagudev
Copy link
Member

sagudev commented Mar 30, 2025

Now assuming that &mut JsContext is always required for performing GC then this should be perfectly safe as the &mut T would be invalidated by a (potential) GC call. Or if JsContext is used mutably by too many other calls that have nothing to do with GC and shouldn't actually invalidated &mut T, then you might consider a reborrowable ZST token as a stand-in for your GC like I've done, though that's probably way too big of a change to really consider.

Something like this is considered, but this will require a lot of work, mainly passing JSContext everywhere in servo, see: #520 (comment). You might also be interested in https://github.com/asajeffrey/josephine, that has some interesting GC-rust ideas.

nit: JSContext lifetime should be named 'no_gc as it ensures that while this lifetime is alive, no GC will happen.

@gmorenz
Copy link
Contributor Author

gmorenz commented Mar 30, 2025

Hi @aapoalas - speaking for myself super happy to have you here! I've read about and watched your talk on how you're implementing GC in nova, and it seems quite clever. I think we're ok on this issue, but taking this as an opportunity to record some thoughts:

This API has been proposed, to my surprise I don't think we need it, at least for now. I managed to remove the (main) DerefMut impls that were the only way &mut pointers to GC-tracked data were acquired. The only two non-trivial1 uses of them were both wrong (holding GCed data across a GC pause). Once #554 is done (i.e. we even have a context type that tracks lifetimes) I think we should add it, because why not, but we can manage without it.


Some important context here - and I'm curious if anyone here will disagree with how I phrase this since this is the first time I'm writing it down - Servo's GC can be thought of as two nested GCs

  • Spidermonkey's GC: A relocating GC. Handles into this GC are called Value and *mut JSObject instead of Handle. We just treat these handles as black boxes we can call functions on, and that we need to trace correctly so that they stay updated.
  • Servo's GC: A non-relocating GC built on top of the spidermonkey's GC. This works by having objects that servo's GC is tracking having an opaque *mut JSObject that represents the object in spidermonkey's GC, and integrating with spidermonkey's tracing infrastructure.

As a result handles into servo's GC (what we actually call Handle in this crate) get to be pointers to non-relocating objects, not indices into an arena like I understand you're doing in Nova.

This has the interesting consequence, that servo frequently takes advantage of, that handles can be safely de-referenced into & references, even across GC pauses. Or at least they could be if the values that the GC modifies (the spidermonkey-handles, like *mut JSObject) were allowed to be modified while we hold a & reference containing them - i.e. if we used interior mutability. This is the major outstanding source of unsoundness in this issue: Both mozjs directly and servo create & references that live over GC pauses - and yet we haven't marked the things the GC modifies as having interior mutability.

I don't think we can (or should) eliminate the & references. We want to remain on stable (I assume), and on stable we can't implement methods on Handle<T> types, so it's an important and heavily used ergonomic win in servo that &T can live across GC pauses. Worse, on stable, we can't use Handle<T> as the receiver of trait-object methods, so traits like Callback would become next to impossible if we tried to ban holding &T over GC pauses (we could attempt to recreate trait-object machinery using macros ourselves if we really wanted to, but it would be painful). Which means I think we should go down the route of wrapping everything the spidermonkey-GC might modify with interior mutability.


Back to the proposed API, there is a related one that might help more. Servo makes use of interior mutability for things that it mutates (which explains why we don't need &mut references from handles very often). Currently it does so with RefCell, from which a mutable-borrow can be incorrectly held over a GC pause, leading to bugs like this one (explained and fixed here). We should be able to create our own Cell type that controls access via borrowing context, which eliminates these bugs, and eliminates the runtime overhead of tracking accesses. With an API like

pub struct JSCell<T> { inner: UnsafeCell<T> }
impl<T> JSCell<T> {
    fn new(val: T) -> Self;
    fn into_inner(self) -> T;
    fn as_mut(&mut self) -> &mut T;
    fn get(&self, &JSContext<'_>) -> T where T: Copy;
    fn set(&self, &mut JSContext<'_>, val: T);
    fn borrow_mut<'a>(&'a self, &'a mut JSContext<'_>) -> &'a mut T;
    fn borrow<'a>(&'a self, &'a JSContext<'_>) -> &'a T;
}

(Which also relies on #554 for a context type that tracks lifetimes properly)


I haven't thought about this deeply, or closely read the relevant code, but there might also be an opportunity for an fn as_mut like method that allows us to peak into the spidermonkey objects directly in cases where we currently copy data out of them (e.g. with record).

Footnotes

  1. Wasn't either

    1. *handle_mut = value, replaced with handle_mut.set(value) which prevents a concurrent GC pause - since we have a single GC per thread.
    2. handle_mut_to_option_t.take(), left alone by adding a .take function to HandleMut<Option<_>> - again safe since our single threaded nature means the GC isn't running concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants