Skip to content

Conversation

@ARR4N
Copy link
Contributor

@ARR4N ARR4N commented Oct 10, 2025

Explicit freeing of proposals propagated through libevm (i.e. geth) plumbing has proven difficult when not being committed as they are simply dropped for the GC to collect. Furthermore, strict ordering of calls to Proposal.Drop() (or Commit()) before Database.Close() is required to avoid segfaults. This PR implements a fix for both issues:

  1. All new Proposals have a GC finalizer attached, which calls Drop(). This is safe because it is a no-op if called twice or after a call to Commit().
  2. The Database has a sync.WaitGroup introduced, which tracks all outstanding proposals. Calls to Commit() / Drop() decrement the group counter (only once per Proposal).
  3. Database.Close() waits on the WaitGroup before freeing its own handle, avoiding segfaults.

Assuming that all calls to Database.Propose() and Proposal.Propose() occur before the call to Database.Close() then this is a correct usage of sync.WaitGroup's documented requirement for ordering of calls to Add() and Wait().

An integration test demonstrates blocking and eventual return of Database.Close(), specifically due to the unreachability of un-dropped, un-committed Proposals, resulting in their finalizers decrementing the WaitGroup.

@ARR4N ARR4N marked this pull request as ready for review October 10, 2025 12:24
@ARR4N ARR4N self-assigned this Oct 10, 2025
ffi/proposal.go Outdated
}

func (p *Proposal) afterDisowned() {
p.freeOnce.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we've prevented some racy behavior here anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Can you give examples? Technically this Once isn't necessary, but I put it in defensively in case the rest of the code is refactored and current invariants no longer hold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do rely that the consumer isn't trying to simultaneously Commit and Drop, which seems reasonable. However, in that case, you would get UB. It's probably best to make UB completely impossible, even if the actions to make it happen are unreasonable

Copy link
Contributor Author

@ARR4N ARR4N Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can definitely still race and both call the Rust code, but only under invalid usage as you say. This just guarantees that the WaitGroup never panics by going negative.

Copy link
Contributor

@demosdemon demosdemon Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't prevent racy behavior. If we wanted to do that, the freeOnce call also needs to wrap the C.fwd_free_proposal and C.fwd_commit_proposal calls as there's nothing preventing a concurrent call to Drop()/Commit() while this is running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. Is that worth preventing (in a separate PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting them in the Once would be problematic as, if either returned an error, then they couldn't be called again.

There's no need to place it in a separate PR IMO as it's simply a mutex. It's also worth doing to allay @demosdemon's concerns here:

That could also potentially allow the finalizer to run on the proposal concurrently while commit is running; but, I believe that won't actually happen in practice because of the Once.

It's true that it won't actually happen because the Proposal remains alive long enough, but an explicit lock is much easier to reason about than GC lifetime.

case <-done:
t.Errorf("%T.Close() returned with undropped %T", db, p0) //nolint:forbidigo // Use of require is impossible without a hack like require.False(true)
case <-time.After(300 * time.Millisecond):
// TODO(arr4n) use `synctest` package when at Go 1.25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, I've never heard of this. This does seem to solve a pretty common pattern in testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can't wait to start using it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could use it now if we add GOEXPERIMENT=synctest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could use it now if we add GOEXPERIMENT=synctest

Unfortunately it requires compiling Go itself with this, not just running the test.

ARR4N and others added 2 commits October 10, 2025 15:58
ffi/proposal.go Outdated
}

func (p *Proposal) afterDisowned() {
p.freeOnce.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do rely that the consumer isn't trying to simultaneously Commit and Drop, which seems reasonable. However, in that case, you would get UB. It's probably best to make UB completely impossible, even if the actions to make it happen are unreasonable

Copy link
Contributor

@demosdemon demosdemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the use of WaitGroup. My main concern is the explicit GC call.

But, something else I noticed is that there's nothing clearing the finalizer that's been set. So, the finalizer will always call Drop even if Drop or Commit was called outside of the finalizer. Ideally we would clear the finalizer to prevent that from happening if there was an explicit call. That could also potentially allow the finalizer to run on the proposal concurrently while commit is running; but, I believe that won't actually happen in practice because of the Once.

ffi/firewood.go Outdated
return nil
}

runtime.GC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the GC call. I assume it's to try to eagerly run any outstanding finalizers. But, GC will also include everything else and may penalize us more than necessary.

Copy link
Contributor Author

@ARR4N ARR4N Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's to try to eagerly run any outstanding finalizers

Yup

GC will also include everything else and may penalize us more than necessary

Good point. I've put it in a separate go routine to avoid this, but I think it's important to still include due to the above.

ffi/firewood.go Outdated
}

runtime.GC()
db.proposals.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that a leaked proposal now becomes a hang at exit time?

If so, we should consider returning an error instead of hanging. This would make it much harder to debug if it happens on a system we have no control over, but if there's a log that says "hey there was a leaked proposal" somewhere that would make debugging a lot easier.

One way to do this is via a timeout, maybe with some large amount of time (60 seconds).

Copy link
Contributor Author

@ARR4N ARR4N Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idiomatic approach is to accept a Context and then add an extra timeout. @alarso16 do we absolutely have to conform to the kvBackend interface (which precludes adding the Context)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is consumed by triedb. This struct doesn't implement any particular interface required by libevm, but will be called on DBOverride.Close(). Accepting a context would be my first thought anywhere else, so maybe we just require the consumer to understand that the operation may hang by sending a context. libevm can create an ephemeral context I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context added.

case <-done:
t.Errorf("%T.Close() returned with undropped %T", db, p0) //nolint:forbidigo // Use of require is impossible without a hack like require.False(true)
case <-time.After(300 * time.Millisecond):
// TODO(arr4n) use `synctest` package when at Go 1.25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could use it now if we add GOEXPERIMENT=synctest

@ARR4N
Copy link
Contributor Author

ARR4N commented Oct 13, 2025

Really like the use of WaitGroup. My main concern is the explicit GC call.

Addressed in your code-specific comment

But, something else I noticed is that there's nothing clearing the finalizer that's been set. So, the finalizer will always call Drop even if Drop or Commit was called outside of the finalizer. Ideally we would clear the finalizer to prevent that from happening if there was an explicit call. That could also potentially allow the finalizer to run on the proposal concurrently while commit is running; but, I believe that won't actually happen in practice because of the Once.

A call to Drop after a call to either of the others is a no-op, and the two are now thread-safe with respect to each other although your final point is correct about it not actually happening. Further details here.

@ARR4N ARR4N requested review from alarso16 and demosdemon October 13, 2025 11:38
Copy link
Contributor

@alarso16 alarso16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one question we may want to answer prior to merging this is how this should effect the proposal API. Should Drop even be exposed to the user if we guarantee to free memory on GC? I think yes, it should still be available, but wanted to check with others


// disownHandle is the common path of [Proposal.Commit] and [Proposal.Drop], the
// `fn` argument defining the method-specific behaviour.
func (p *Proposal) disownHandle(fn func(*C.ProposalHandle) error, disownEvenOnErr bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the behavior is clarified for what happens to the lifetime in the error case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. It actually caught me off guard when I first did this refactoring.

func (p *Proposal) Drop() error {
if p.handle == nil {
return nil
if err := p.disownHandle(dropProposal, false); err != nil && err != errDroppedProposal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using errors.Is better practice? Or since we know that the err isn't wrapped, this check is easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using errors.Is better practice?

Nope, it's redundant here.

Or since we know that the err isn't wrapped, this check is easier?

Exactly. It's useful when there's a desire to wrap an error, but in this case there isn't any.

@ARR4N
Copy link
Contributor Author

ARR4N commented Oct 15, 2025

Should Drop even be exposed to the user if we guarantee to free memory on GC? I think yes, it should still be available, but wanted to check with others

I think it absolutely should be, otherwise the only way to close the database is to guarantee that every proposal becomes unreachable.

return fmt.Errorf("at least one reachable %T neither dropped nor committed", &Proposal{})
}

if err := getErrorFromVoidResult(C.fwd_close_db(db.handle)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the user makes a mistake and forgets to drop a proposal, the database won't close. I think this is the behavior we should enforce, but it does seem weird

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants