Hi everyone,
I run a production hot wallet on top of lwk_wollet that has accumulated 30k+ transactions over its lifetime. As the update log and the in-memory caches grew, the process began to OOM-crash on boot: rehydration via Wollet::new allocates the full historical state (transactions, derivation maps, heights, timestamps) into resident memory before the wallet is usable, and on our hardware that final state crosses the available RAM budget.
We already maintain an external indexer that holds the canonical transaction history and can serve any subset of it on demand, so most of what Wollet rebuilds at boot is data we do not need to keep in process to operate. What we are missing from LWK is a way to construct a Wollet directly from a minimal, application-supplied state — bypassing the Update-replay path that forces full history into memory.
This issue describes that gap and proposes a narrow addition (Wollet::from_state) to close it, while being explicit about which existing LWK mechanisms are not sufficient and why.
Summary
lwk_wollet already exposes a public Persister trait (persister.rs:35) and a Wollet::new(network, Arc<dyn DynStore>, descriptor) constructor (wollet.rs:448) that accepts any custom backend, so applications can already plug in their own storage (database, key-value store, etc.). That part is solved.
The remaining gap is that rehydration always goes through linear replay of every persisted Update in Wollet::restore_updates (update.rs:378):
pub(crate) fn restore_updates(&mut self) -> Result<(), Error> {
for i in 0.. {
if let Some(update) = self.updates_persister.get_update(i)? {
self.apply_update_inner(update)?;
} else {
let mut next_update_index = self.updates_persister.next_update_index()?;
*next_update_index = self.updates_persister.merge_updates(i)?;
break;
}
}
Ok(())
}
I would like a constructor that takes already-materialized state and skips replay entirely, e.g. Wollet::from_state(network, descriptor, WolletFullState).
Motivation
In production we operate large, long-lived wallets whose persisted update log grows to hundreds of MB and whose fully-rehydrated Wollet reaches multi-GB resident memory. Boot time is dominated by restore_updates, and steady-state memory is dominated by the in-memory caches that replay populates. We have an external indexer that already holds the canonical transaction history and can produce a minimal materialized state on demand; we need a way to feed that state into Wollet without round-tripping through the Update log.
Why existing mechanisms do not solve this
LWK already has compaction-flavored machinery, and it is worth being explicit about why none of it closes this gap:
only_tip coalescing (update.rs:145-159). When a new Update only moves the tip and the previous one did too, the persister overwrites the previous entry instead of appending. The Persister trait doc explicitly asks implementors to behave this way.
Update::prune (update.rs:231) and the per-batch prune in update.rs:36. Strips transactions/scripts not relevant to the wallet from a single Update before persisting.
merge_threshold (wollet.rs:113, implemented in update.rs:91). When more than threshold updates are persisted, they are folded into a single consolidated Update at index 0 via Update::merge (update.rs:314). With with_persisted_txs (wollet.rs:103) this is forced to Some(1), so at most one Update is ever on disk.
These three mechanisms address disk footprint and boot time. They do not address resident memory, which is the motivation here.
A consolidated snapshot-style update — even a hypothetical perfect one — produces the same in-memory Wollet state as replaying the original sequence. After construction, cache.all_txs, the script/derivation maps, the height map, and the unspent set hold exactly the same data regardless of whether they were built from 10,000 small Updates or one merged one. The cost is in the final state, not in the replay path. Replacing many small persisted entries with one large entry is a change of intermediate representation, not of steady-state memory.
Concretely, in our deployment:
| Mechanism |
Disk |
Boot time |
Resident RAM |
| Status quo |
~800 MB |
~30 s |
~7 GB |
| Hypothetical aggressive merge |
~50 MB |
~5 s |
~7 GB |
from_state with semantic prune |
application-controlled |
~0 s |
application-controlled |
What would actually reduce resident memory
Semantic pruning, not compaction. A from_state path lets the application supply only the data it needs to operate going forward — for example:
- Only transactions producing UTXOs that are still unspent.
- Only the derivation range from
last_unused onward.
- Only the current tip, not historical tips/timestamps.
Trade-offs the application opts into when using this path:
wallet.transactions() no longer returns full history.
- Reorgs deeper than the snapshot horizon require a rescan.
- Audit/tax features built on top of the wallet must source history elsewhere.
These are acceptable for our use case (transaction history lives in an external indexer); they would not be acceptable as defaults. Hence the request is an additional constructor, not a behavior change to existing ones.
Proposed shape
impl Wollet {
/// Construct a `Wollet` from already-materialized state, skipping the
/// `Update`-replay path used by `Wollet::new`.
///
/// The caller is responsible for the consistency of `state`. Fields the
/// caller omits will be reflected as-is in the resulting wallet — e.g.
/// transactions not included will not appear in `wallet.transactions()`,
/// and reorgs deeper than the snapshot horizon will require a rescan.
pub fn from_state(
network: Network,
descriptor: WolletDescriptor,
state: WolletFullState,
) -> Result<Self, Error>;
}
Where WolletFullState is a public, serializable struct exposing the fields needed to populate Cache directly: tip, scripts/paths maps, heights, unspent set, last_unused counters, and whatever subset of transactions the caller chooses to include.
The existing private WolletConciseState (wollet.rs:261) is close to what is needed for the read/scan side, but a write-side equivalent would need the transaction set too. Happy to iterate on the exact shape.
Scope and non-goals
- Pure addition; no change to existing constructors, persister semantics, or default behavior.
- Does not require touching
merge_threshold, only_tip coalescing, or Update::prune.
- Not proposing to remove or weaken history retention by default — applications that need full history continue to use
Wollet::new.
- Migration path for existing deployments is the application's responsibility: build a
WolletFullState once from a full-replayed Wollet, persist it externally, then use from_state thereafter.
Alternatives considered
- Tuning
merge_threshold — addressed above; reduces disk and boot, not RAM.
- Custom
Persister that fabricates a single synthetic Update — possible today, but Update is not a stable serialization target for materialized state, and the replay still goes through apply_update_inner, paying the same allocation cost. A direct Cache injection is cleaner and explicit about the contract.
- Lazy/streamed transaction loading inside
Cache — larger refactor, changes semantics of existing accessors, and not needed if the application can supply a pruned state up front.
Happy to send a PR if the direction is acceptable.
Hi everyone,
I run a production hot wallet on top of
lwk_wolletthat has accumulated 30k+ transactions over its lifetime. As the update log and the in-memory caches grew, the process began to OOM-crash on boot: rehydration viaWollet::newallocates the full historical state (transactions, derivation maps, heights, timestamps) into resident memory before the wallet is usable, and on our hardware that final state crosses the available RAM budget.We already maintain an external indexer that holds the canonical transaction history and can serve any subset of it on demand, so most of what
Wolletrebuilds at boot is data we do not need to keep in process to operate. What we are missing from LWK is a way to construct aWolletdirectly from a minimal, application-supplied state — bypassing theUpdate-replay path that forces full history into memory.This issue describes that gap and proposes a narrow addition (
Wollet::from_state) to close it, while being explicit about which existing LWK mechanisms are not sufficient and why.Summary
lwk_wolletalready exposes a publicPersistertrait (persister.rs:35) and aWollet::new(network, Arc<dyn DynStore>, descriptor)constructor (wollet.rs:448) that accepts any custom backend, so applications can already plug in their own storage (database, key-value store, etc.). That part is solved.The remaining gap is that rehydration always goes through linear replay of every persisted
UpdateinWollet::restore_updates(update.rs:378):I would like a constructor that takes already-materialized state and skips replay entirely, e.g.
Wollet::from_state(network, descriptor, WolletFullState).Motivation
In production we operate large, long-lived wallets whose persisted update log grows to hundreds of MB and whose fully-rehydrated
Wolletreaches multi-GB resident memory. Boot time is dominated byrestore_updates, and steady-state memory is dominated by the in-memory caches that replay populates. We have an external indexer that already holds the canonical transaction history and can produce a minimal materialized state on demand; we need a way to feed that state intoWolletwithout round-tripping through theUpdatelog.Why existing mechanisms do not solve this
LWK already has compaction-flavored machinery, and it is worth being explicit about why none of it closes this gap:
only_tipcoalescing (update.rs:145-159). When a newUpdateonly moves the tip and the previous one did too, the persister overwrites the previous entry instead of appending. ThePersistertrait doc explicitly asks implementors to behave this way.Update::prune(update.rs:231) and the per-batchpruneinupdate.rs:36. Strips transactions/scripts not relevant to the wallet from a singleUpdatebefore persisting.merge_threshold(wollet.rs:113, implemented inupdate.rs:91). When more thanthresholdupdates are persisted, they are folded into a single consolidatedUpdateat index0viaUpdate::merge(update.rs:314). Withwith_persisted_txs(wollet.rs:103) this is forced toSome(1), so at most oneUpdateis ever on disk.These three mechanisms address disk footprint and boot time. They do not address resident memory, which is the motivation here.
A consolidated snapshot-style update — even a hypothetical perfect one — produces the same in-memory
Wolletstate as replaying the original sequence. After construction,cache.all_txs, the script/derivation maps, the height map, and the unspent set hold exactly the same data regardless of whether they were built from 10,000 smallUpdates or one merged one. The cost is in the final state, not in the replay path. Replacing many small persisted entries with one large entry is a change of intermediate representation, not of steady-state memory.Concretely, in our deployment:
from_statewith semantic pruneWhat would actually reduce resident memory
Semantic pruning, not compaction. A
from_statepath lets the application supply only the data it needs to operate going forward — for example:last_unusedonward.Trade-offs the application opts into when using this path:
wallet.transactions()no longer returns full history.These are acceptable for our use case (transaction history lives in an external indexer); they would not be acceptable as defaults. Hence the request is an additional constructor, not a behavior change to existing ones.
Proposed shape
Where
WolletFullStateis a public, serializable struct exposing the fields needed to populateCachedirectly: tip, scripts/paths maps, heights, unspent set,last_unusedcounters, and whatever subset of transactions the caller chooses to include.The existing private
WolletConciseState(wollet.rs:261) is close to what is needed for the read/scan side, but a write-side equivalent would need the transaction set too. Happy to iterate on the exact shape.Scope and non-goals
merge_threshold,only_tipcoalescing, orUpdate::prune.Wollet::new.WolletFullStateonce from a full-replayedWollet, persist it externally, then usefrom_statethereafter.Alternatives considered
merge_threshold— addressed above; reduces disk and boot, not RAM.Persisterthat fabricates a single syntheticUpdate— possible today, butUpdateis not a stable serialization target for materialized state, and the replay still goes throughapply_update_inner, paying the same allocation cost. A directCacheinjection is cleaner and explicit about the contract.Cache— larger refactor, changes semantics of existing accessors, and not needed if the application can supply a pruned state up front.Happy to send a PR if the direction is acceptable.