Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions ffi/firewood.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
"errors"
"fmt"
"runtime"
"sync"
)

// These constants are used to identify errors returned by the Firewood Rust FFI.
Expand All @@ -49,7 +50,8 @@ type Database struct {
// handle is returned and accepted by cgo functions. It MUST be treated as
// an opaque value without special meaning.
// https://en.wikipedia.org/wiki/Blinkenlights
handle *C.DatabaseHandle
handle *C.DatabaseHandle
proposals sync.WaitGroup
}

// Config configures the opening of a [Database].
Expand Down Expand Up @@ -139,6 +141,9 @@ func (db *Database) Update(keys, vals [][]byte) ([]byte, error) {
return getHashKeyFromHashResult(C.fwd_batch(db.handle, kvp))
}

// Propose creates a new proposal with the given keys and values. The proposal
// is not committed until [Proposal.Commit] is called. See [Database.Close] re
// freeing proposals.
func (db *Database) Propose(keys, vals [][]byte) (*Proposal, error) {
if db.handle == nil {
return nil, errDBClosed
Expand All @@ -151,8 +156,7 @@ func (db *Database) Propose(keys, vals [][]byte) (*Proposal, error) {
if err != nil {
return nil, err
}

return getProposalFromProposalResult(C.fwd_propose_on_db(db.handle, kvp), db)
return getProposalFromProposalResult(C.fwd_propose_on_db(db.handle, kvp), &db.proposals)
}

// Get retrieves the value for the given key. It always returns a nil error.
Expand Down Expand Up @@ -233,17 +237,23 @@ func (db *Database) Revision(root []byte) (*Revision, error) {

// Close releases the memory associated with the Database.
//
// This is not safe to call while there are any outstanding Proposals. All proposals
// must be freed or committed before calling this.
// This blocks until all outstanding Proposals are either unreachable or one of
// [Proposal.Commit] or [Proposal.Drop] has been called on them. Unreachable
// proposals will be automatically dropped before Close returns, unless an
// alternate GC finalizer is set on them.
//
// This is safe to call if the pointer is nil, in which case it does nothing. The
// pointer will be set to nil after freeing to prevent double free. However, it is
// not safe to call this method concurrently from multiple goroutines.
// This is safe to call if the handle pointer is nil, in which case it does
// nothing. The pointer will be set to nil after freeing to prevent double free.
// However, it is not safe to call this method concurrently from multiple
// goroutines.
func (db *Database) Close() error {
if db.handle == nil {
return nil
}

runtime.GC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the GC call. I assume it's to try to eagerly run any outstanding finalizers. But, GC will also include everything else and may penalize us more than necessary.

Copy link
Contributor Author

@ARR4N ARR4N Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's to try to eagerly run any outstanding finalizers

Yup

GC will also include everything else and may penalize us more than necessary

Good point. I've put it in a separate go routine to avoid this, but I think it's important to still include due to the above.

db.proposals.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that a leaked proposal now becomes a hang at exit time?

If so, we should consider returning an error instead of hanging. This would make it much harder to debug if it happens on a system we have no control over, but if there's a log that says "hey there was a leaked proposal" somewhere that would make debugging a lot easier.

One way to do this is via a timeout, maybe with some large amount of time (60 seconds).

Copy link
Contributor Author

@ARR4N ARR4N Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idiomatic approach is to accept a Context and then add an extra timeout. @alarso16 do we absolutely have to conform to the kvBackend interface (which precludes adding the Context)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is consumed by triedb. This struct doesn't implement any particular interface required by libevm, but will be called on DBOverride.Close(). Accepting a context would be my first thought anywhere else, so maybe we just require the consumer to understand that the operation may hang by sending a context. libevm can create an ephemeral context I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context added.


if err := getErrorFromVoidResult(C.fwd_close_db(db.handle)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the user makes a mistake and forgets to drop a proposal, the database won't close. I think this is the behavior we should enforce, but it does seem weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the behavior we should enforce

Me too because it suggests a leak somewhere. Not necessarily a raw resource leak, but something smelly on the consuming side.

but it does seem weird

How so? I'm not disagreeing as much as wanting to understand more precisely in case there are alternative approaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make a mistake as a user, even if I should have dropped any outstanding proposals, should we really rely on the OS to close the file later? We know that we can handle it more gracefully

return fmt.Errorf("unexpected error when closing database: %w", err)
}
Expand Down
57 changes: 57 additions & 0 deletions ffi/firewood_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1050,3 +1050,60 @@ func TestGetFromRootParallel(t *testing.T) {
r.NoError(err, "Parallel operation failed")
}
}

func TestProposalHandlesFreed(t *testing.T) {
t.Parallel()

db, _, err := newDatabase(filepath.Join(t.TempDir(), "test_GC_drops_proposal.db"))
require.NoError(t, err)

// These MUST NOT be committed nor dropped as they demonstrate that the GC
// finalizer does it for us.
p0, err := db.Propose(kvForTest(1))
require.NoErrorf(t, err, "%T.Propose(...)", db)
p1, err := p0.Propose(kvForTest(1))
require.NoErrorf(t, err, "%T.Propose(...)", p0)

// Demonstrates that explicit [Proposal.Commit] and [Proposal.Drop] calls
// are sufficient to unblock [Database.Close].
var keep []*Proposal //nolint:prealloc
for name, free := range map[string](func(*Proposal) error){
"Commit": (*Proposal).Commit,
"Drop": (*Proposal).Drop,
} {
p, err := db.Propose(kvForTest(1))
require.NoErrorf(t, err, "%T.Propose(...)", db)
require.NoErrorf(t, free(p), "%T.%s()", p, name)
keep = append(keep, p)
}

done := make(chan struct{})
go func() {
require.NoErrorf(t, db.Close(), "%T.Close()", db)
close(done)
}()

select {
case <-done:
t.Errorf("%T.Close() returned with undropped %T", db, p0) //nolint:forbidigo // Use of require is impossible without a hack like require.False(true)
case <-time.After(300 * time.Millisecond):
// TODO(arr4n) use `synctest` package when at Go 1.25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, I've never heard of this. This does seem to solve a pretty common pattern in testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can't wait to start using it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could use it now if we add GOEXPERIMENT=synctest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could use it now if we add GOEXPERIMENT=synctest

Unfortunately it requires compiling Go itself with this, not just running the test.

}

runtime.KeepAlive(p0)
runtime.KeepAlive(p1)
p0 = nil
p1 = nil //nolint:ineffassign // Makes the value unreachable, allowing the finalizer to call Drop()

// In practice there's no need to call [runtime.GC] if [Database.Close] is
// called after all proposals are unreachable, as it does it itself.
runtime.GC()
// Note that [Database.Close] waits for outstanding proposals, so this would
// block permanently if the unreachability of `p0` and `p1` didn't result in
// their [Proposal.Drop] methods being called.
<-done

for _, p := range keep {
runtime.KeepAlive(p)
}
}
43 changes: 25 additions & 18 deletions ffi/proposal.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,13 @@ import (
"errors"
"fmt"
"runtime"
"sync"
"unsafe"
)

var errDroppedProposal = errors.New("proposal already dropped")

type Proposal struct {
// The database this proposal is associated with. We hold onto this to ensure
// the database handle outlives the proposal handle, which is required for
// the proposal to be valid.
db *Database

// handle is an opaque pointer to the proposal within Firewood. It should be
// passed to the C FFI functions that operate on proposals
//
Expand All @@ -34,6 +30,12 @@ type Proposal struct {
// this handle, so it should not be used after those calls.
handle *C.ProposalHandle

// [Database.Close] blocks on this WaitGroup, which is incremented by
// [getProposalFromProposalResult], and decremented by either
// [Proposal.Commit] or [Proposal.Done].
openProposals *sync.WaitGroup
freeOnce sync.Once

// The proposal root hash.
root []byte
}
Expand All @@ -58,8 +60,8 @@ func (p *Proposal) Get(key []byte) ([]byte, error) {
return getValueFromValueResult(C.fwd_get_from_proposal(p.handle, newBorrowedBytes(key, &pinner)))
}

// Propose creates a new proposal with the given keys and values.
// The proposal is not committed until Commit is called.
// Propose is equivalent to [Database.Propose] except that the new proposal is
// based on `p`.
func (p *Proposal) Propose(keys, vals [][]byte) (*Proposal, error) {
if p.handle == nil {
return nil, errDroppedProposal
Expand All @@ -72,8 +74,7 @@ func (p *Proposal) Propose(keys, vals [][]byte) (*Proposal, error) {
if err != nil {
return nil, err
}

return getProposalFromProposalResult(C.fwd_propose_on_proposal(p.handle, kvp), p.db)
return getProposalFromProposalResult(C.fwd_propose_on_proposal(p.handle, kvp), p.openProposals)
}

// Commit commits the proposal and returns any errors.
Expand All @@ -86,8 +87,7 @@ func (p *Proposal) Commit() error {
}

_, err := getHashKeyFromHashResult(C.fwd_commit_proposal(p.handle))
p.handle = nil // we no longer own the proposal handle

p.afterDisowned()
return err
}

Expand All @@ -104,25 +104,32 @@ func (p *Proposal) Drop() error {
if err := getErrorFromVoidResult(C.fwd_free_proposal(p.handle)); err != nil {
return fmt.Errorf("%w: %w", errFreeingValue, err)
}

p.handle = nil // Prevent double free

p.afterDisowned()
return nil
}

func (p *Proposal) afterDisowned() {
p.freeOnce.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that we've prevented some racy behavior here anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Can you give examples? Technically this Once isn't necessary, but I put it in defensively in case the rest of the code is refactored and current invariants no longer hold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do rely that the consumer isn't trying to simultaneously Commit and Drop, which seems reasonable. However, in that case, you would get UB. It's probably best to make UB completely impossible, even if the actions to make it happen are unreasonable

Copy link
Contributor Author

@ARR4N ARR4N Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can definitely still race and both call the Rust code, but only under invalid usage as you say. This just guarantees that the WaitGroup never panics by going negative.

Copy link
Contributor

@demosdemon demosdemon Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't prevent racy behavior. If we wanted to do that, the freeOnce call also needs to wrap the C.fwd_free_proposal and C.fwd_commit_proposal calls as there's nothing preventing a concurrent call to Drop()/Commit() while this is running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see. Is that worth preventing (in a separate PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting them in the Once would be problematic as, if either returned an error, then they couldn't be called again.

There's no need to place it in a separate PR IMO as it's simply a mutex. It's also worth doing to allay @demosdemon's concerns here:

That could also potentially allow the finalizer to run on the proposal concurrently while commit is running; but, I believe that won't actually happen in practice because of the Once.

It's true that it won't actually happen because the Proposal remains alive long enough, but an explicit lock is much easier to reason about than GC lifetime.

p.handle = nil
p.openProposals.Done()
})
}

// getProposalFromProposalResult converts a C.ProposalResult to a Proposal or error.
func getProposalFromProposalResult(result C.ProposalResult, db *Database) (*Proposal, error) {
func getProposalFromProposalResult(result C.ProposalResult, openProposals *sync.WaitGroup) (*Proposal, error) {
switch result.tag {
case C.ProposalResult_NullHandlePointer:
return nil, errDBClosed
case C.ProposalResult_Ok:
body := (*C.ProposalResult_Ok_Body)(unsafe.Pointer(&result.anon0))
hashKey := *(*[32]byte)(unsafe.Pointer(&body.root_hash._0))
proposal := &Proposal{
db: db,
handle: body.handle,
root: hashKey[:],
handle: body.handle,
root: hashKey[:],
openProposals: openProposals,
}
openProposals.Add(1)
runtime.SetFinalizer(proposal, (*Proposal).Drop)
return proposal, nil
case C.ProposalResult_Err:
err := newOwnedBytes(*(*C.OwnedBytes)(unsafe.Pointer(&result.anon0))).intoError()
Expand Down