Skip to content

Conversation

@qnikst
Copy link

@qnikst qnikst commented Sep 1, 2025

This proposal defines the Simple Canonical Ledger State (SCLS), a stable, versioned, and verifiable file format for representing the Cardano ledger state. It specifies a segmented binary container with deterministic CBOR encodings, per-chunk commitments, and a manifest that enables identical snapshots across implementations, supports external tools (e.g., Mithril), and future-proofs distribution and verification of state.


(rendered latest document)

@rphair rphair changed the title Proposal of canonical ledger state format CIP-???? | Canonical Ledger State Sep 1, 2025
Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging Triage for initial presentation at next CIP meeting: https://hackmd.io/@cip-editors/118

Note it's impossible this will be given a CIP number 015x as the directory is currently named (at this time the candidate list goes up to 0161), but you could leave the directory name the same since there wouldn't be any perceived naming conflicts.

@rphair rphair added Category: Ledger Proposals belonging to the 'Ledger' category. State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Sep 1, 2025
@qnikst qnikst force-pushed the cip-canonical branch 2 times, most recently from 429a6dc to f0ca42d Compare September 1, 2025 19:42
This proposal defines the Simple Canonical Ledger State (SCLS),
a stable, versioned, and verifiable file format for representing
the Cardano ledger state. It specifies a segmented binary container
with deterministic CBOR encodings, per-chunk commitments, and
a manifest that enables identical snapshots across implementations,
supports external tools (e.g., Mithril), and future-proofs distribution
and verification of state.

> Co-Authored-By: Nicholas Clarke <[email protected]>
> Co-Authored-By: João Santos Reis <[email protected]>
@qnikst
Copy link
Author

qnikst commented Sep 1, 2025

@rphair thanks for the fast feedback, comments, suggestions and explanation of the reason behind them.
I've updated the CIP to reflect all the comments and added some more headers where I expect linking is important!

@rphair
Copy link
Collaborator

rphair commented Sep 1, 2025

@qnikst thanks but please you should avoid force-pushing again here. This is an extensive proposal and it would detract from the review process to have to keep re-reviewing the whole document each time we lose the change & suggestion history in this branch.

Not these least of the problems we'd keep having is that (as you can see at this time) all the review threads above remain unresolved. If you accept these changes on GitHub and pull the changes back to your local branch, the issues above will be marked resolved and then editors, co-authors, and Ledger reviewers won't have to go over them all again later.

I appreciate your first-time submission and look forward to an enthusiastic review of this one, but FYI we've already asked authors please not to force push here in CIP-0001: https://github.com/cardano-foundation/CIPs/blob/master/CIP-0001/README.md#1-early-stages

@rphair
Copy link
Collaborator

rphair commented Sep 2, 2025

p.s. @qnikst thanks for confirming resolution of the presentation issues above... I think @WhatisRT @lehins now that the format is more canonicalised I think it could be ready for at least your initial review. I would personally be in favour of confirming this as a CIP candidate (i.e. assigning a number) but if you can post any initial reactions in the next 2½ hours we'll also go over these at the meeting.

Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnikst the recent changes look great (especially the deeper level headers); currently I think the only presentation issues are that the Markdown footnotes are not doing what I think you intended them to do...

see, for correct usage: https://docs.github.com/en/enterprise-cloud@latest/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#footnotes

Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qnikst in the CIP meeting today we decided to leave this in Triage since other CIP editors had interested & critical reactions to it: but it came in shortly before the meeting so there hadn't been time to properly formulate reservations about it. This tag will keep this submission at the top of the editors' stack and help get those reservations documented ASAP so you can start working with them.

One bit that came from the meeting discussion is that CBOR is more confining than JSON would be: especially when canonical (cc @Crypto2099 to elaborate perhaps). With other languages having better support for JSON objects, perhaps this specification could be made in JSON, or with JSON in parallel with CBOR.

@qnikst
Copy link
Author

qnikst commented Sep 2, 2025

@rphair thanks for updating the status!

I’ve added a CBOR vs JSON section under Implementation Alternatives, which explains the reasoning behind the current choice and the proposed direction.

We are not opposed to defining schemas using json-schema, especially given the ongoing work around CIP-139 where shared efforts could be valuable. However, our current focus is on CBOR, but we are looking at schema and plan to keep data compatible. At this point we believe that json-schema specification could be derived from the CBOR definition at a later stage.

At this point, we do not see strong reasons to prioritize a standalone json-schema description, but we would very much welcome feedback if others believe there are benefits to doing so earlier.

### Implementation Plan

1. [ ] Prototype SCLS writer/reader.
1. [ ] Refine specification and finalise CDDL.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. [ ] Refine specification and finalise CDDL.
2. [ ] Refine specification and finalise CDDL.

fix list :)

Copy link
Collaborator

@rphair rphair Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this syntax is acceptable on Markdown... you can use any number you want and Markdown will normalise it, so using 1. for each ordered list item is a common Markdown convention. However what we keep seeing on CIPs is that GitHub displays it as an UNordered list... which I currently see here & which is a "browser dependence" listed in GitHub errata.

TL;DR how it appears now on GitHub is the best we can hope for (an unordered list) and other Markdown renderers would likely display this as a properly ordered list.

Copy link
Collaborator

@Crypto2099 Crypto2099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This otherwise looks good aside from a couple of strange grammar issues pointed out. Note there was an earlier mention about using JSON as a transport layer which I fully support for durable use cases such as this.

@rphair rphair changed the title CIP-???? | Canonical Ledger State CIP-0165? | Canonical Ledger State Sep 16, 2025
Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as CIP candidate at the biweekly meeting today — though still awaiting review from particular teams & welcoming feedback from Haskell node Ledger reviewers (cc @lehins @WhatisRT) — mainly because this proposed standard format would be useful across the multi-node ecosystem.

@qnikst please rename the containing directory to CIP-0165 and update the readable proposal link in your top comment accordingly. 🎉

Tagging @jpraynaud who expressed interest in a cross-disciplinary review of this proposal & welcoming others from diverse backgrounds to review & tag others who may be interested.

@qnikst
Copy link
Author

qnikst commented Sep 16, 2025

Thanks for update, @rphair. I've renamed directory and updated a comment.

Would welcome any feedback and review from the other teams.

@rphair rphair added State: Confirmed Candiate with CIP number (new PR) or update under review. and removed State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Sep 17, 2025
Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the CIP leaves me with more questions than answers. In short, I cannot draw a parallel with what Ledger actually needs and the proposed format for the ledger state.

First of all Ledger State must represent all of the effects of all prior transactions on chain at any point in time. It must also contain the minimal possible data that is necessary for validation of any future block. When I read this CIP it makes me believe that there are more uses cases that are attempted to be added to the ledger state that are irrelevant for the Ledger itself. Things like indices, tombstones, bloom filters or metadata is not something that ledger cares about, therefore I don't understand their relevance to the goal of this CIP. In other words, the goal should be how to store and read the data to and from a file, instead of creating a mini database that can be used to write queries on and figure out history of the chain. This is the job that is left for indexers not a cardano node. Unless of course I misunderstood the goal.

Second of all. Today Ledger state was designed to follow the hierarchy of the Ledger rules. In other words structure of today's Ledger state is completely ignored and there is no mention of the impact it would have on Ledger. For example there is some unavoidable duplication of data in the ledger state (eg. delegation of staking credential to a DRep), which needs to be accounted for when deserializing the data in order for that data to be shared and not be duplicated in memory. For sharing to work data need to be stored in a specific order. I would love to see a thorough description of how we would transition from the ledger state we have today to the format that is being suggested without impacting performance of the node and affecting the enormous complexity of all of the Ledger rules that have been developed since the beginning of the Shelley era 5 years ago.

So, my opinion on this CIP is that it defines the format very well, but it does not describe how it will work for an existing node that keeps the Cardano running today. Since the word "Ledger" is in the title, and majority of the features described in the CIP go against what ledger needs today I cannot endorse this CIP until those points are clarified.

The only way I see the proposed format being viable today would be through some tool that is capable of converting current representation of the ledger state to/from the suggested format that would hopefully not even run on the same machine as the block producing node. It's always possible that I just don't see very well. 🤓 If the goal is to create a format that can be converted to and from the actual format that the node can interpret then I think this would work, but it must not be done by the block producing node, because that cannot be done efficiently enough to not become a problem, due to the size of the data. This sort of detail needs to be explicitly stated in this CIP

One more important point I would like to bring up that is relevant to this CIP and how canonical ledger state is hoped to be used by other node implementations. Sometimes we need to store extra data in the ledger state in order to later guarantee efficient implementation of a corresponding rule. However, this becomes an implementation detail that other nodes might solve differently (eg. UTxOHD or how DRep delegations are cleaned up). But, if we have a canonical ledger state then we cannot accommodate every different approach that other nodes will decide to take. Which makes me believe that in order to have a true canonical ledger state we would have to store an absolute minimum amount of data in the canonical ledger state and only when it is being restored we would do some post processing in order to bring it to the state that any particular ledger implementation expects.


**Structure:**

- `transaction:` `u64` — transaction number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transactions do not have numbers, nor are they stored in the ledger state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an issue of overuse of the word 'transaction' here 🤣

We already stripped out all uses of the word 'block' for the same reason...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm it was intended to use logical time, so it should be block number or slot number.
At this point I think that slot number may be better, but I'm not sure yet.

| pots | Accounting pots (reserves etc.) |
| spo | SPO state |
| drep | DRep state |
| gov | Governance action state |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Governance state has proposals which are stored in a forest with trees like structure. From, what I understand there was only a mention of flat maps and arrays like data structures being stored

a high-level proposal.
```

**Purpose:** Delta records are used to build iterative updates, when base format is created and we want to store additional transactions in a fast way. Delta records are designed to be compatible with UTxO-HD, LSM-Tree or other storage types where it's possible to stream list of updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which parts of the ledger state will record deltas and how? What's their format?

Most importantly why do we need to record deltas, why not resolve those deltas and just store the final data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, just the UTxO.

The reason is that resolving the deltas could be expensive, and we wish to avoid expensive computation during the serialisation of a big structure. So if the node can just write a delta which can be later resolved, that's a lot less work on the critical path where it could effect performace otherwise.


hash32 = bytes .size 32

tx_out = [0, shelley_tx_out// 1, babbage_tx_out]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's extremely inefficient to store TxOuts in the ledger state in CBOR format.
Earlier this year we introduced a different format for it (which is also used by UTxOHD), that has cut snapshot creation time in half and alleviated the problem of occasional block production being missed because of GC kicking in at the wrong time and taking too long. See IntersectMBO/ouroboros-consensus#868

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. We had so far proposed CBOR since it makes it easier for everyone to work with, but maybe that's not the best solution here

SCLS addresses these problems by:

- specifying a canonical, language-agnostic container and encoding rules;
- enabling streaming builds and partial verification (per-namespace roots);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of verification you have in mind? You mean data consistency rather than data validity, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only consistency.
I've updated the wording to be more explicit

- specifying a canonical, language-agnostic container and encoding rules;
- enabling streaming builds and partial verification (per-namespace roots);
- being extensible (e.g., optional indexes/Bloom filters) without breaking compatibility;
- remaining compatible with UTxO-HD/LSM on-disk structures and incremental (delta) updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be the goal, since that is an implementation detail, which others will likely implement differently and we might change ourselves

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that specific UTxO-HD compatibility isn't what we want. However, given that the Haskell node will be the main node producing snapshots for a long while in the conceivable future, we should bear some consideration to how it intends to store things in case we can take advantage of that. For example, a structure that forced us to load everything first into main memory would seem to be a distinct failure to me!

Copy link
Author

@qnikst qnikst Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make format friendly to solutions that may provide append-only log of the state changes. LSM and UTxO-HD provide that property.

I'm not opposed to removing this from the list of the goals as even without this properly we can have an efficient solution. Though having this property is really nice when considering a solution.

UPD I've checked the wording here, as we are not putting it as a goal, this "feature" is listed as one of the means how do we achieve the goals. So at this point it remains correct even if we will decide to drop delta-records.

@rphair
Copy link
Collaborator

rphair commented Sep 18, 2025

@lehins #1083 (review): Things like indices, tombstones, bloom filters or metadata is not something that ledger cares about, therefore I don't understand their relevance to the goal of this CIP. In other words, the goal should be how to store and read the data to and from a file, instead of creating a mini database that can be used to write queries on and figure out history of the chain. This is the job that is left for indexers not a cardano node.

Tagging the participants in the (informal Discord) query layer standard working group: for perhaps a cooperative review of this proposal, if not some cooperative development (or a better defined separation) with CIP-0139: @nazrhom @klntsky @ch1bo @agaffney @Crypto2099 @Ryun1

@nc6
Copy link
Contributor

nc6 commented Sep 18, 2025

(Replying out of order since I think the answers to some of the later paragraphs may clarify some of the earlier ones)

Second of all. Today Ledger state was designed to follow the hierarchy of the Ledger rules. In other words structure of today's Ledger state is completely ignored and there is no mention of the impact it would have on Ledger. For example there is some unavoidable duplication of data in the ledger state (eg. delegation of staking credential to a DRep), which needs to be accounted for when deserializing the data in order for that data to be shared and not be duplicated in memory. For sharing to work data need to be stored in a specific order. I would love to see a thorough description of how we would transition from the ledger state we have today to the format that is being suggested without impacting performance of the node and affecting the enormous complexity of all of the Ledger rules that have been developed since the beginning of the Shelley era 5 years ago.

So, my opinion on this CIP is that it defines the format very well, but it does not describe how it will work for an existing node that keeps the Cardano running today. Since the word "Ledger" is in the title, and majority of the features described in the CIP go against what ledger needs today I cannot endorse this CIP until those points are clarified.

There is no intention for the current ledger state format to be migrated to this format (or course, it's possible, but totally out of scope for this and not an intention). As you say, the ledger state in the current node has a number of intricacies which have developed over the years to solve specific concerns and to map to how the node works. Trying to match these in a format intended to be shared between nodes would be very difficult, and probably self-defeating. Not only that, but our assumption is that ledger will wish to keep updating its formats for efficiency, which would not be possible if it relied on a canonical state shared with other nodes.

If the goal is to create a format that can be converted to and from the actual format that the node can interpret then I think this would work, but it must not be done by the block producing node, because that cannot be done efficiently enough to not become a problem, due to the size of the data.

This is indeed the intention, sort of. I would say that the process can be collaborative - partially on the block producing node and partially not. The CIP has support for things like DELTA records to enable doing this in an efficient fashion. So rather than dumping the full ledger state each time (which we agree would be prohibitive), the format would support appending deltas to e.g. the previous epoch's snapshot. This should mesh well with the capabilities of UTxO-HD. The hash structure of the file (which isn't just a simple checksum) would allow snapshots with deltas to be compared to those without, and integrating the deltas could indeed be done by another node not connected to the block producer. Likewise, various other features that we include could also be built subsequently and/or by another node.

First of all Ledger State must represent all of the effects of all prior transactions on chain at any point in time. It must also contain the minimal possible data that is necessary for validation of any future block. When I read this CIP it makes me believe that there are more uses cases that are attempted to be added to the ledger state that are irrelevant for the Ledger itself. Things like indices, tombstones, bloom filters or metadata is not something that ledger cares about, therefore I don't understand their relevance to the goal of this CIP. In other words, the goal should be how to store and read the data to and from a file, instead of creating a mini database that can be used to write queries on and figure out history of the chain. This is the job that is left for indexers not a cardano node. Unless of course I misunderstood the goal.

That's true! And I think the goal is maybe misunderstood - the goal is not to replace the current ledger state (which is already well customised to its specific role), but to have a format that can be used for a few things:

  1. To give Mithril something to sign which is guaranteed not to change (or to change in a versioned way)
  2. To support, especially, Mithril signing in a future multi-node world, whilst allowing the internal representations used by those different nodes to differ.
  3. To provide a platform for node-independent testing.
  4. To To provide a more stable interface for some use cases for which the ledger state dump is used today. As we well know, this is marked as a "debug" feature but people do rely on it for tool building. Ideally they should not.

Most of the extra capabilities are indeed aligned more to the two later use cases, as well as a bit to your following paragraph:

One more important point I would like to bring up that is relevant to this CIP and how canonical ledger state is hoped to be used by other node implementations. Sometimes we need to store extra data in the ledger state in order to later guarantee efficient implementation of a corresponding rule. However, this becomes an implementation detail that other nodes might solve differently (eg. UTxOHD or how DRep delegations are cleaned up). But, if we have a canonical ledger state then we cannot accommodate every different approach that other nodes will decide to take. Which makes me believe that in order to have a true canonical ledger state we would have to store an absolute minimum amount of data in the canonical ledger state and only when it is being restored we would do some post processing in order to bring it to the state that any particular ledger implementation expects.

This was the other motivating use case for allowing "additional" stuff to exist in the CLS. We envisioned that specific nodes might want to keep extra data around for efficiency. The way the CLS is structured would allow e.g. the Haskell node to produce a file (A, B) and a Rust node to produce a file (A, C), where 'A' is the minimal state and B,C are extras related to those specific nodes. Then mithril signatures would confirm 'A' between both of them, while a Haskell node user could explicitly download the B parts as well for faster processing. One would obviously have to trust the node that the bits in B couldn't result in a meaningfully different state, but no more than they have to trust the node already.

Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no intention for the current ledger state format to be migrated to this format (or course, it's possible, but totally out of scope for this and not an intention). As you say, the ledger state in the current node has a number of intricacies which have developed over the years to solve specific concerns and to map to how the node works. Trying to match these in a format intended to be shared between nodes would be very difficult, and probably self-defeating. Not only that, but our assumption is that ledger will wish to keep updating its formats for efficiency, which would not be possible if it relied on a canonical state shared with other nodes.

I was hoping you were gonna say that. I believe that it is important to put this in writing into the CIP itself.

I would say that the process can be collaborative - partially on the block producing node and partially not. The CIP has support for things like DELTA records to enable doing this in an efficient fashion.

I am not sure this is a good idea, as this will result in redundant complexity. Maybe someone from consensus could chime on this, because if they don't need this functionality then why bother making the format more complex. CC @jasagredo
The reason why I don't think it is a good idea is because, as you confirmed, producing this SCLS would be prohibitively expensive for the block producing node anyways, so there is no reason to complicate the format in order to accommodate a component that cannot use it. I would suggest implement the most simple approach possible and whoever is producing the SCLS would resolve the deltas, before storing the data on disk.

The way the CLS is structured would allow e.g. the Haskell node to produce a file (A, B) and a Rust node to produce a file (A, C), where 'A' is the minimal state and B,C are extras related to those specific nodes.

I understand, but why B and C need to be part of this format, they could be stored in a separate file that would be used just by those nodes in their own preferred format? What is the point of having C for Haskell node or B for Rust node if they are going to use their own format for storing the ledger state anyways? Moreover, any minimal representation of the ledger state has all of the information necessary to produce those extra bits B or C that are needed for efficiency. The only possible use case I can think of that could utilize that extra metadata is that some nodes might choose to store in their ledger state from the chain data that is not needed for block validation, eg. AuxData or Anchors. For example if they want to support custom ledger state queries. But then what prevents some node storing the whole chain data in their metadata part of the state. (FYI, I prefer to think of the first S in SCLS as Standard instead of Simple, since it does not look that simple to me 🙈)
Please, take this comment with the grain of salt, since I am looking at it only from the ledger implementation perspective and I ignorant of other uses cases other people might have for this CIP.

In any case, with you comment @nc6 of not making SCLS format to be a direct replacement of the current ledger state format, I feel much better about this CIP and the rest of the comments I have are just my suggestions, which you can freely ignore if you like.

My last question for you @nc6 is: which party well be responsible for creating the translation tool for this new format for the Haskell node? Has this been discussed with the current core team or will it fall onto Tweag being responsible for implementing it? It would seem strange to me that this effort would only be about designing the standard without having a concrete implementation that supports the Cardano node.

@qnikst Good work on the CIP! Glad to see you finally contributing to Cardano 😉

@qnikst
Copy link
Author

qnikst commented Sep 18, 2025

Before I go through all the suggestions and apply them, let me explain our approach. We wanted to address the main scenarios that @nc6 mentioned, but since there are moving parts, we aimed for a format that does not force early hard decisions. Instead, we designed it to be extensible (so more scenarios can be supported without changing software) and resilient to changes coming from the ledger side (since we are not the ledger authors, there is always a risk of mistakes when formalising the common state).

That’s why we proposed a record-structured file. The idea is that if software does not support a record, it can safely ignore it and still use most of the features. This makes the format extendable.

To reduce risks, the concrete representation of entries in namespaces is defined in separate files. That way, we can update them based on feedback from the ledger authors, and introduce changes without modifying the core of the CIP.

In this layered structure, some records are required: header, chunk, and manifest. These are needed to keep, restore, and validate the ledger state. Other records are extensions that support more scenarios with minimal effort. They can be implemented in one file or split into separate files with the same format. If we do not want to make format more complex at this stage we can remove them from the current version of CIP and introduce later if we agree on adding more use-case scenarios.

On the implementation side, with this CIP Tweag has started work on a basic implementation. We plan to add commits integrating export into the Haskell node (at least for the UTxO namespace). Once this works, we can use it as a showcase for the approach. While we believe the proposed format is robust, we expect that some minor changes might be needed if integration shows weak points. After that, we can continue with the rest of the state and build tooling around it.

qnikst and others added 9 commits September 18, 2025 21:34
Co-authored-by: Alexey Kuleshevich <[email protected]>
Co-authored-by: Alexey Kuleshevich <[email protected]>
Add offset so it would be possible to reconcstruct manifest by reading the file
from the end.

Add number of entries and chunks per-namespace

Add a comment about the spec
* Clarify tombstone and value deletion

During an implementation it because obvous that it's easier to keep tombstones
an explicit value in delta blocks. In this MR we remove obsolete tombstone/v0
namespaces and update spec

* Explicitly require keys for entries

* Add clarification about tombstone entries
* CIP-0165: specification of the binary file

Provide a binary file specification in Kaitai Struct format
as well as html description of the format and svg diagram


Co-authored-by: João Santos Reis <[email protected]>
@jpraynaud
Copy link
Contributor

I just wanted to raise a concern about some of the requirements for the Mithril use case. These were part of the original motivation for introducing a Canonical Ledger Snapshot during the first Node Diversity Workshop, and I think it might be worth revisiting them to make sure we're still aligned with the original intent.

For a snapshot to be signed by Mithril, all nodes need to compute it at the same slot (ideally once per epoch, just before the 2k/f slot, which seems like a good timing). This part was left out of the current CIP compared to the original version Paul Clarke and I worked on, but I think it would still be helpful to include a section that explains the snapshot schedule and how the file names are structured for Mithril to recognize them once they are complete.

@qnikst
Copy link
Author

qnikst commented Oct 13, 2025

Hey @jpraynaud,

The point about the synchronised snapshots makes sense, but we left it out mostly since this CIP focuses on defining the format (and, to a lesser extent, content). Also, there are various ways for the node/Mithril to arrange the taking of the coordinated snapshots, and we didn't want that to be defined here (consensus/ledger/mithril teams likely have a lot more skin in the game and insight there). But we could include a note that we do need to support the synchronised snapshot use-case.

How the file names are structured for Mithril to recognize them once they are complete.

Yeah, this is something we discussed last week, with a plan to set up a call with the Mithril team this/next week to discuss

* Explicitly describe the length of the record payload

* Drop len_data field, we already know it because we can calculat that from datastructure size

* Introduce structure for the entries and a fixed size keys

This commit introduces a fixed size keys the size of the key depends on the
namespace that is used.

This approach allows us not to waste time on the encoding of the size
for each key and allows to have nice and fast keys for each namespace
used.

* remove tombstone namespace from the list of the namespaces

* Update CIP to explicitly tell how we deal with trees

* Update CIP-0165/README.md

Co-authored-by: João Santos Reis <[email protected]>

* Update CIP-0165/README.md

Co-authored-by: João Santos Reis <[email protected]>

* Fixes in spec

---------

Co-authored-by: João Santos Reis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Category: Ledger Proposals belonging to the 'Ledger' category. State: Confirmed Candiate with CIP number (new PR) or update under review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants