- 
                Notifications
    You must be signed in to change notification settings 
- Fork 24
          feat(ffi): Database.Close() guarantees proposals committed or freed
          #1349
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
        
          
                ffi/proposal.go
              
                Outdated
          
        
      | } | ||
|  | ||
| func (p *Proposal) afterDisowned() { | ||
| p.freeOnce.Do(func() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that we've prevented some racy behavior here anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? Can you give examples? Technically this Once isn't necessary, but I put it in defensively in case the rest of the code is refactored and current invariants no longer hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do rely that the consumer isn't trying to simultaneously Commit and Drop, which seems reasonable. However, in that case, you would get UB. It's probably best to make UB completely impossible, even if the actions to make it happen are unreasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can definitely still race and both call the Rust code, but only under invalid usage as you say. This just guarantees that the WaitGroup never panics by going negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't prevent racy behavior. If we wanted to do that, the freeOnce call also needs to wrap the C.fwd_free_proposal and C.fwd_commit_proposal calls as there's nothing preventing a concurrent call to Drop()/Commit() while this is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see. Is that worth preventing (in a separate PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting them in the Once would be problematic as, if either returned an error, then they couldn't be called again.
There's no need to place it in a separate PR IMO as it's simply a mutex. It's also worth doing to allay @demosdemon's concerns here:
That could also potentially allow the finalizer to run on the proposal concurrently while commit is running; but, I believe that won't actually happen in practice because of the
Once.
It's true that it won't actually happen because the Proposal remains alive long enough, but an explicit lock is much easier to reason about than GC lifetime.
| case <-done: | ||
| t.Errorf("%T.Close() returned with undropped %T", db, p0) //nolint:forbidigo // Use of require is impossible without a hack like require.False(true) | ||
| case <-time.After(300 * time.Millisecond): | ||
| // TODO(arr4n) use `synctest` package when at Go 1.25 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, I've never heard of this. This does seem to solve a pretty common pattern in testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can't wait to start using it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could use it now if we add GOEXPERIMENT=synctest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could use it now if we add
GOEXPERIMENT=synctest
Unfortunately it requires compiling Go itself with this, not just running the test.
Co-authored-by: Austin Larson <[email protected]> Signed-off-by: Arran Schlosberg <[email protected]>
        
          
                ffi/proposal.go
              
                Outdated
          
        
      | } | ||
|  | ||
| func (p *Proposal) afterDisowned() { | ||
| p.freeOnce.Do(func() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do rely that the consumer isn't trying to simultaneously Commit and Drop, which seems reasonable. However, in that case, you would get UB. It's probably best to make UB completely impossible, even if the actions to make it happen are unreasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the use of WaitGroup. My main concern is the explicit GC call.
But, something else I noticed is that there's nothing clearing the finalizer that's been set. So, the finalizer will always call Drop even if Drop or Commit was called outside of the finalizer. Ideally we would clear the finalizer to prevent that from happening if there was an explicit call. That could also potentially allow the finalizer to run on the proposal concurrently while commit is running; but, I believe that won't actually happen in practice because of the Once.
        
          
                ffi/firewood.go
              
                Outdated
          
        
      | return nil | ||
| } | ||
|  | ||
| runtime.GC() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the GC call. I assume it's to try to eagerly run any outstanding finalizers. But, GC will also include everything else and may penalize us more than necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's to try to eagerly run any outstanding finalizers
Yup
GC will also include everything else and may penalize us more than necessary
Good point. I've put it in a separate go routine to avoid this, but I think it's important to still include due to the above.
        
          
                ffi/firewood.go
              
                Outdated
          
        
      | } | ||
|  | ||
| runtime.GC() | ||
| db.proposals.Wait() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that a leaked proposal now becomes a hang at exit time?
If so, we should consider returning an error instead of hanging. This would make it much harder to debug if it happens on a system we have no control over, but if there's a log that says "hey there was a leaked proposal" somewhere that would make debugging a lot easier.
One way to do this is via a timeout, maybe with some large amount of time (60 seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idiomatic approach is to accept a Context and then add an extra timeout. @alarso16 do we absolutely have to conform to the kvBackend interface (which precludes adding the Context)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is consumed by triedb. This struct doesn't implement any particular interface required by libevm, but will be called on DBOverride.Close(). Accepting a context would be my first thought anywhere else, so maybe we just require the consumer to understand that the operation may hang by sending a context. libevm can create an ephemeral context I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context added.
| case <-done: | ||
| t.Errorf("%T.Close() returned with undropped %T", db, p0) //nolint:forbidigo // Use of require is impossible without a hack like require.False(true) | ||
| case <-time.After(300 * time.Millisecond): | ||
| // TODO(arr4n) use `synctest` package when at Go 1.25 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could use it now if we add GOEXPERIMENT=synctest
| 
 Addressed in your code-specific comment 
 A call to  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one question we may want to answer prior to merging this is how this should effect the proposal API. Should Drop even be exposed to the user if we guarantee to free memory on GC? I think yes, it should still be available, but wanted to check with others
|  | ||
| // disownHandle is the common path of [Proposal.Commit] and [Proposal.Drop], the | ||
| // `fn` argument defining the method-specific behaviour. | ||
| func (p *Proposal) disownHandle(fn func(*C.ProposalHandle) error, disownEvenOnErr bool) error { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that the behavior is clarified for what happens to the lifetime in the error case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. It actually caught me off guard when I first did this refactoring.
| func (p *Proposal) Drop() error { | ||
| if p.handle == nil { | ||
| return nil | ||
| if err := p.disownHandle(dropProposal, false); err != nil && err != errDroppedProposal { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using errors.Is better practice? Or since we know that the err isn't wrapped, this check is easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using
errors.Isbetter practice?
Nope, it's redundant here.
Or since we know that the err isn't wrapped, this check is easier?
Exactly. It's useful when there's a desire to wrap an error, but in this case there isn't any.
| 
 I think it absolutely should be, otherwise the only way to close the database is to guarantee that every proposal becomes unreachable. | 
| return fmt.Errorf("at least one reachable %T neither dropped nor committed", &Proposal{}) | ||
| } | ||
|  | ||
| if err := getErrorFromVoidResult(C.fwd_close_db(db.handle)); err != nil { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the user makes a mistake and forgets to drop a proposal, the database won't close. I think this is the behavior we should enforce, but it does seem weird
Explicit freeing of proposals propagated through
libevm(i.e.geth) plumbing has proven difficult when not being committed as they are simply dropped for the GC to collect. Furthermore, strict ordering of calls toProposal.Drop()(orCommit()) beforeDatabase.Close()is required to avoid segfaults. This PR implements a fix for both issues:Proposals have a GC finalizer attached, which callsDrop(). This is safe because it is a no-op if called twice or after a call toCommit().Databasehas async.WaitGroupintroduced, which tracks all outstanding proposals. Calls toCommit()/Drop()decrement the group counter (only once perProposal).Database.Close()waits on theWaitGroupbefore freeing its own handle, avoiding segfaults.Assuming that all calls to
Database.Propose()andProposal.Propose()occur before the call toDatabase.Close()then this is a correct usage ofsync.WaitGroup's documented requirement for ordering of calls toAdd()andWait().An integration test demonstrates blocking and eventual return of
Database.Close(), specifically due to the unreachability of un-dropped, un-committedProposals, resulting in their finalizers decrementing theWaitGroup.