diff --git a/CIP-0165/README.md b/CIP-0165/README.md new file mode 100644 index 0000000000..1418e01200 --- /dev/null +++ b/CIP-0165/README.md @@ -0,0 +1,430 @@ +--- +CIP: 165 +Title: Canonical Ledger State +Category: Ledger +Status: Proposed +Authors: + - Nicholas Clarke + - Aleksandr Vershilov + - João Santos Reis +Implementors: + - Nicholas Clarke + - Aleksandr Vershilov + - João Santos Reis +Discussions: + - https://github.com/cardano-foundation/CIPs/pull/1083 +Created: 2025-08-18 +License: CC-BY-4.0 +--- + +## Abstract + +This proposal defines the Simple Canonical Ledger State (SCLS), a stable, versioned, and verifiable file format for representing the Cardano ledger state. It specifies a segmented binary container with deterministic CBOR encodings, per-chunk commitments, and a manifest that enables identical snapshots across implementations, supports external tools (e.g., Mithril), and future-proofs distribution and verification of state. + +This CIP specifies a canonical interchange format for the ledger state. It does not define, prescribe, or constrain the internal storage or representation of the ledger state within any node implementation. Internal formats remain an implementation detail; the canonical format applies only to export, interchange, and verification of the ledger state consistency. + +## Motivation: why is this CIP necessary? + +Ledger state serialisations are currently implementation details that may change over time. This makes them unsuitable as stable artifacts for distribution, signing, fast sync, or external tooling (e.g., db-sync, conformance testing, and Mithril checkpoints). Without a canonical format, two nodes at the same chain point can legitimately produce different byte streams for the same state, complicating verification and opening room for error in multi-implementation ecosystems. + +SCLS addresses these problems by: + +- specifying a canonical, language-agnostic container and encoding rules; +- enabling streaming builds and data consistency validation (per-namespace roots); +- being extensible (e.g., optional indexes/Bloom filters) without breaking compatibility; +- remaining compatible with UTxO-HD/LSM on-disk structures and incremental (delta) updates. + +Versioning and Upgrade Complexity: the proposed format defines a solution that will be able to support future protocol extensions with new eras without changing this CIP. The chosen approach allows implementers to define a client that will be interested only in particular parts of the state, but will not prevent it from storing, loading and verifying the interesting parts of the state. + +The concrete use-case scenarios for this CIP are: + +- allow building dump of the Cardano Ledger node state in a canonical format, so any two nodes would generate the same file. This would allow persistence, faster bootstrap and verification. +- such state can be verified by the other node against its own state and signed. It would allow us to fully utilize Mithril, when each node can sign the state independently. +- full conformance testing. Any implementation would be able to reuse test-suite of the Haskell node by importing data applying the test transaction and exporting data back. + +## Specification + +The Simple Canonical Ledger State (SCLS) is a segmented file format for Cardano ledger states, designed to support streaming and verifiability. Records are sequential, each tagged by type and independently verifiable by hash. Multi-byte values use network byte order (big-endian). + +### File Structure + +1. Sequence of records `(S, D)*` where `S` is a 32-bit size and `D` is the payload stored in typed record. +1. Each payload begins with a one-byte type tag, defining its type. + +Unsupported record types are skipped; core data remains accessible. + +### Record Types + +| Code | Name | Purpose | +| ---- | -------- | ------------------------------------------------------- | +| 0x00 | HDR | File header: magic, version, network, namespaces | +| 0x01 | MANIFEST | Global commitments, chunk table, summary | +| 0x10 | CHUNK | Ordered entries with per-chunk footer + hash | +| 0x11 | DELTA | Incremental updates (overlay; last-writer-wins) | +| 0x20 | BLOOM | Per-chunk Bloom filter | +| 0x21 | INDEX | Optional key→offset or value-hash indexes | +| 0x30 | DIR | Directory footer with offsets to metadata/index regions | +| 0x31 | META | Opaque metadata entries (e.g., signatures, notes) | + +Proposed file layout: + +```text +HDR, +(CHUNK[, BLOOM])*, +MANIFEST, +[INDEX]*, +[META]* +[DIR], +[ (DELTA[, BLOOM])* , MANIFEST, [INDEX]*, [DIR] ]* +``` + +At the first steps of implementation it would be enough to have the simpler structure: + +```text +HDR, +(CHUNK)*, +MANIFEST +``` + +All the other record types allow to introduce additional features, like delta-states, querying data and may be omitted in case if the user does not want those functionality. + +For the additional record types (all except `HDR, CHUNK, MANIFEST`) it's possible to keep such records outside of the file and build them iteratively, outside of the main dump process. This is especially useful for indices. + +#### HDR Record + +**Purpose:** identify file, version, and global properties. + +**Structure:** + +`REC_HDR` (once at start of file) + +- `magic` : `b"SCLS\0"` +- `version` : `u16` (start with `1`) +- `network_id` : `u8`  (`0` — mainnet, `1` — testnet) +- `slot_no` : `u64` identifier of the blockchain point (hash/slot). + +**Policy:** + +- appears once in the file header; +- must be read and verified first; +- supports magic bytes for file recognition. + +#### CHUNK Records + +**Purpose:** group entries for streaming and integrity; maintain global canonical order, see namespace and entries for more details. + +**Structure:** + +- `chunk_seq` : `u64` — sequence number of the record +- `chunk_format` : `u8` - format of the chunks +- `namespace` : `bstr` — namespace of the values stored in the CHUNK +- `entries` : `DataEntry` — list of length-prefixed data entries +- `footer {entries_count: u64, chunk_hash: blake28}` — hash value of the chunk of data, is used to keep integrity of the file. + +DataEntry is a blob of a key-valued data. The structure of the `DataEntry` is the following: + +- `size` : `u32` - size of the data +- `key` : `fixed size` - key is a fixed size blob where size depends on the namespace +- `value` : `bstr` — cbor data entry + +While the format requires each entry to have a key, it's still possible to support hierarchical structures, either by normalizing them +and keeping a path or hash as a key or by introducing an artificial key and keeping the entire hierarchy in a single key. The choice depends on each +namespace. If there are ways to express and support updating of a part of the tree, it is worth normalizing the tree. If the data is kept +and updated as a whole, a single artificial key can be used. + +**Policy:** + +- chunk size \~8–16MiB; footer required; +- data is stored in deterministically defined global order; in the lexical order of the keys; +- all keys in the record must be unique; +- all key-values in the record must refer to the same namespace; +- readers should verify footer before relying on the data; +- `chunk_hash = H(concat [ digest(e) | e in entries ])`; +- all keys in `CHUNK` `n` must be lexicographically lower than all keys in `CHUNK` `n+1`. + +The format proposes support of data compression. For future-compatibility the format is described by the `chunk_format` field, and following variants are introduced: + +| Code | Name | Description | +| ---- | ----- | --------------------------------------------- | +| 0x00 | RAW | Raw CBOR Entries | +| 0x01 | ZSTD | All entries are compressed with seekable zstd | +| 0x02 | ZSTDE | Compress each value independently | + +When calculating and verifying hashes, it's build over the uncompressed data. + +#### MANIFEST Record + +**Purpose:** index of chunks and information for file integrity check. + +**Structure:** + +- `total_entries`: `u64` — number of data entries in the file (integrity purpose only) +- `total_chunks`: `u64` — number of chunks in the file (integrity purpose only) +- `root_hash`: **Merkle root** of all `entry_e` in the chosen order, see verification for details +- `namespace_hashes`: CBOR table of Merkle roots for each namespace, mapping namespace name to the hash in a blake28 +- `prev_manifest_offset`: `u64` — offset of the previous manifest (used with delta files), zero if there is no previous manifest entry +- `summary`: `{ created_at, tool, comment? }` + +**Policy:** used to verify all the chunks. + +#### DELTA Record + +```text +TODO: exact contents of the DELTA record will be defined later, currently we describe +a high-level proposal. +``` + +**Purpose:** Delta records are used to build iterative updates, when base format is created and we want to store additional transactions in a fast way. Delta records are designed to be compatible with UTxO-HD, LSM-Tree or other storage types where it's possible to stream list of updates. + +Updating of the file in-place is unsafe so instead we store list of updated. + +All updates are written in the following way: + +- to update value a new entry with the same key should be stored; +- to remove value a special tombstone entry for the key should be stored. + +**Structure:** + +- `slot_no:` `u64` — slot number where changes were introduced +- `namespace:` `bstr` — namespace name +- `changes:` `CBOR` — array of the entries, either tombstone entry or value entry +- `footer:` `{entries_count, chunk_hash}` + +**Policy:** + +- chunk size \~8–16MiB; footer required; +- reader should verify hash before relying on data; +- dead entries are marked by the special tombstone entry; +- there must be only one element for the given key in the delta record. + +#### BLOOM Record + +TODO: define details or move to future work, (we propose to define exact format and properties, after the first milestone, when basic data will be implemented and tested. Then based on the benchmarks we could define exact properties we want to see) + +**Purpose:** additional information for allowing fast search and negative search. + +**Structure:** + +- `chunk_seq: u64` sequence number of the record. +- `m`: `u32` - total number of bits in the Bloom filter’s bitset (the length of the bit array). +- `k`: `u8` - number of independent hash functions used to map a key into bit positions in that array. +- `bitset`: `bytes[ceil(m/8)]` — actual bitset. + +#### INDEX Record + +```text +TODO: define structure, (we propose to define exact format and properties, after the first milestone, when basic data will be implemented and tested. Then based on the benchmarks we could define exact properties we want to see) +``` + +**Purpose:** allows fast search based on the value of the entries. + +The general idea is that we may want to write a query to the raw data using common format like `json-path` but that will run against CBOR. In this case while building we may build an index. Later queries can use indexes instead of direct traversal. + +**Policy:** + +- Indices are completely optional and do not change the hash of the entries in data. + +### Directory Record + +TODO: define structure, (we propose to define exact format and properties, after the first milestone, when basic data will be implemented and tested. Then based on the benchmarks we could define exact properties we want to see) + +**Purpose**: If a file has index records then they will be stored after the records with actual data, and directory record allow a fast way to find them. Directory record is intended to be the last record of the file and has a fixed size footer. + +**Structure:** + +- `metadata_offset:` `u64` offset of the previous metadata record, zero if there is no record +- `index_offset:` `u64` offset of the last index record, zero if there is no record + +### META Record + +**Purpose:** record with extra metadata that can be used to store 3rd party data, like signatures for Mithril, metadata information. This is an additional record that may be required for in the additional scenarios. + +**Structure:** + +- `entries: Entry[]` list of metadata entries, stored in lexicographical order +- `footer: {entries_count: u64, entries_hash}` + +Entry: + +- `subject: URI` — subject stored in the `URI` format +- `value: cbor` — data stored by the metadata entry owner. + +**Policy:** + +- Meta chunks are completely optional and does not change hash of the entries in data. +- `entries_hash = H(concat (digest e for e in entries))` + +### Namespaces and Entries + +In order to provide types of the values and be able to store and verify only partial state, a notion of namespaces is introduced. Each SCLS file may store values from one or more namespaces. + +#### Supported Namespaces + +Each logical table/type is a namespace identified by a canonical string (e.g., `"utxo"`, `"gov"`). + +| Shortname | Content | +| ------------ | ------------------------------- | +| utxo/v0 | UTxOs | +| stake/v0 | Stake delegation | +| rewards/v0 | Reward accounts | +| params/v0 | Protocol parameters | +| pots/v0 | Accounting pots (reserves etc.) | +| spo/v0 | SPO state | +| drep/v0 | DRep state | +| gov/v0 | Governance action state | +| hdr/v0 | Header state (e.g. nonces) | + +New namespaces may and will be introduced in the future. With new eras and features, new types of the data will be introduced and stored. In order to define what data is stored in the SCLS file, tools fill the `HDR` record and define namespaces. The order of the namespaces does not change the signatures and other integrity data. + +For future compatibility support we added version tag to the name, but it may be a subject of discussion + +#### Entries + +Data is stored in the list of `Entries`, each entry consist of the namespace and its data: + +- `size` : `u32` — length of the entry, stored in big endian; +- `key` : `bstr` - CBOR-encoded string key; +- `dom` : `bstr` – CBOR-encoded data (canonical form). + +`size` is used in a fast search scenario, this way its possible to skip values without interpretation. + +Exact definition of the domain data is left out in this CIP. We propose that ledger team would propose canonical representation for the types in each new era. For the types they must be in a canonical [CBOR format](https://datatracker.ietf.org/doc/html/rfc8949) with restrictions from [deterministic cbor](https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor). Values must not be derivable, that is, if some part of the state can be computed based on another part, then only the base one should be in the state." + +All concrete formats should be stored in attachment to this CIP and stored in `namespaces/namespaces.cddl`. All the changes should be introduced using current CIP update process. + +#### Canonicalization Rules + +- CBOR maps must be deterministic with sorted keys and no duplicates. +- Numbers use minimal encoding. +- Arrays follow fixed order. + +#### Verification + +- Entry digest: `digest(e) = H(0x01 || ns_str || key || value)`, +- Manifest stores overall root and per-namespace commitments. + +Merkle root is computed as a root value of the Merkle trees over all the live entry digests in canonical order; tombstones excluded, last-writer-wins for overlays. + +To describe in detail, basic chunks store all the values in the canonically ordered based in the key order. After having all values in the order we build a full Merkle tree of those values. + +The rule of the thumb is that when we calculate a hash of the data we take into account only the live (non deleted) values in canonical order. In the case when there is a single dump without delta records, this is exactly the order of how values are stored. But when delta—records appear we need to take into account that in the following records there may be values that are smaller than the ones in the base and some values may be deleted or updated. As a result writer should calculate a live-set of values, which can be done by running a streaming multi-merge algorithm (when we search a minimal value from a multiple records). In the case a value exists in multiple records we use a last—writer—wins rule. If there is a tombstone, we consider a value deleted and do not include it in a live-set. + +### Extensibility + +- Unknown fields in `HDR` or unknown chunk types can be skipped by readers. +- Allows future extension (e.g., index chunks, metadata) without breaking compatibility. + +## Rationale: how does this CIP achieve its goals? + +This CIP achieves its goals by: + +1. Define a canonical format that has the following properties: the format is very simple, it would be easy to write that an implementation in any languages. +2. The format is extensible and agnostic, it provides versioned format that simplified ledger evolution. +3. Removes ambiguity, allows signed checkpoints, and improves auditability + +Format defines canonical format and ordering for the stored data, thus allows reproducible and verifiable hashing of the data. It supports Mithril and fast node bootstrap use cases. + +### Prior Work and Alternatives + +#### Global alternatives: + +- **CIP PR #9 by Jean-Philippe Raynaud**: +the CIP that discusses the state and integration with Mithril a lot. Without much details CIP discusses immutable db and indices. Current CIP discussing adding indices as well, we believe that we can combine the approaches from the [work](https://github.com/cardano-scaling/CIPs/pull/9) and related work with our own and use the best of two words. + +- **CIP draft by Paul Clark**: +this was an early work of the CIP of the canonical ledger state. The work was more targeted towards what is stored in the files. Proposal also uses deterministic CBOR (canonical CBOR in this CIP). Proposal opens a discussion and rules about how and when snapshots should be created by the nodes, that is deliberately not discussed in the current CIP, as we do not want to impose restrictions on the nodes, and the format allow the nodes not to have any agreement on those rules. As a solution for extensibility and partiality the CIP proposes using a file per "namespace" (in the terminology of the current CIP), in our work we proposed to have a single chunked file that is more friendly for the producer. Currently we are considering at least to have an option for extracting multi-files version. See discussion in open questions. + +- **Do Nothing**: rejected due to interoperability and Mithril requirements. + +#### Implementation alternatives + +##### Container format + +More common container file formats: many container formats were evaluated, but most implementations do not meet all the required properties. The container format closest to the required properties are CARv2 and gitpack-files, but they are more complex and would require additional tooling and language support for all node implementations. For simplicity and ease of adoption, a straightforward binary format was chosen. It's still possible to express the current approach in both CARv2 and gitpack files, if it will be shown that the approach is superior. + +##### Data encoding + +- **gRPC**: current Haskell node is using CBOR-based ecosystem, so nodes have to support CBOR anyway. In contrast to self-describing CBOR, gRPC requires schema to read the document, and that may be a problem for future compatibility. +- **Plain CBOR stream**: easier decoding, but prevents skipping unwanted chunks needed for filtering and additional properties, like querying the data in the file. + +**JSON vs CBOR for Canonical Ledger State** + +There are strong reasons to prefer CBOR over JSON for representing the canonical ledger state. The ledger state is large and contains binary data; a binary format is therefore much more compact and efficient than JSON. + +While JSON libraries are widely available in nearly every language, JSON lacks a notion of canonical form. Two JSON serializations of the same object are not guaranteed to be byte-identical, so additional tooling and specification would be required to achieve determinism. + +By contrast, CBOR has a defined deterministic encoding (see [RFC 8949](https://datatracker.ietf.org/doc/html/rfc8949#section-4.2) and [restrictions](https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor)), making it suitable for a canonical format. CBOR also has mature implementations across many programming languages (list [here](https://cbor.io/impls.html)). + +Importantly, RFC 8949 also defines a mapping between CBOR and JSON. This allows us to specify a JSON view of the format so that downstream applications can consume the data using standard JSON tooling, while the canonical form remains CBOR. + + +##### Multi-file or single file? + +We considered using single file because it's more friendly to the producer, because it's possible to ensure required atomicity and durability properties, together with footers-in records, it's possible to validate that the data was actually written and is correct. In case of failure it's possible to find out exactly the place where the failure happened. + +However, we agree that for the consumer who wants to get partial states in will be much simpler to use multiple files. + +The proposed SCLS format does not contradict having multiple files, on the contrary for things like additional indices we suggest using additional files, it will work as the records have sequential numbers and we can reconstruct full file and have an order. In this proposal we would file to set an additional constraint of the tooling that will come with the libraries: that the tool should be able to generate multi-files on a request and convert formats between those + +##### Should files be byte-identical? + +Current approach does not provide byte-identical files, only the domain data that is stored and it's hashes are canonical. It means that tools like Mithril will have to use additional tooling or recalculate hash on their own. It's done for the purpose, this way software may add additional metadata entries, e.g. Mithril can add its own signatures to the file without violating validation properties. Other implementation may add records that are required for them to operate or bootstrap. It's true that other [approaches](https://hackmd.io/Q9eSEMYESICI9c4siTnEfw), does solve that issue by creating multiple files, each of them will be byte-identical. + +There are few solutions that we propose: + +1. allow tooling to export (or even generate) raw cbor files, that will have required byte-identical property +2. set additional restrictions on the policies for the records, and instead of defining variable size records require all the records to have exact number of entries inside. It will harm some properties of the hardware but the files will be byte-identical in case if they have similar metadata. Or the metadata we can place it in separate file, then everything will be byte-identical. + +## Open Questions + +**What is the exact implementation for data compression, especially for indexing and search?** + +There are many ways to write indices, hashtables or btrees, each of them may have interesting +properties. It's an open question which of them do we want to support. + +**Do we want the file be optimized for querying with external tools? If so how to achieve that?** + +We are proposing adding additional records types: + +- bloom records — they would allow faster search of the values by the key, still require file traversal; +- index records - it would allow faster search by key without full file traversal. + +Both changes will not change the structure of the file. + +**Do we want to support entries without natural keys? If so how can we do that?** + +There are three options that we see: + +- In chunks records make key optional, that will support such values. That will change the spec, as we must allow many of such values in the chunks records. +- Keep value in key, and zero value. +- Create a separate record type for values without keys. + +## Path to Active + +### Acceptance Criteria + +- [ ] Expert review and consensus from Ledger Committee, IOG, and Node teams. +- [ ] Reference implementation in Cardano node with CLI tool for export/import. +- [ ] Verified test vectors showing identical output across implementations, including Mithril compatibility. +- [ ] Full documentation and CDDL schemas. + +### Implementation Plan + +1. [ ] Prototype SCLS writer/reader. +1. [ ] Refine specification and finalise CDDL. +1. [ ] Integrate into Cardano node CLI. +1. [ ] Validate with Mithril. +1. [ ] Rollout and ecosystem tooling. + +## References + +1. [CARv2 format documentation](https://ipld.io/specs/transport/car/carv2/) +1. [Draft Canonical ledger state snapshot and immutable data formats CIP](https://github.com/cardano-scaling/CIPs/pull/9) +1. [Mithril](https://docs.cardano.org/developer-resources/scalability-solutions/mithril) +1. [Canonical ledger state CIP draft by Paul Clark](https://hackmd.io/Q9eSEMYESICI9c4siTnEfw) +1. [Deterministically Encoded CBOR in CBOR RFC](https://datatracker.ietf.org/doc/html/rfc8949#section-4.2) +1. [A Deterministic CBOR Application Profile](https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor) + +## Copyright + +This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). diff --git a/CIP-0165/format/format.ksy b/CIP-0165/format/format.ksy new file mode 100644 index 0000000000..e2c6ea438b --- /dev/null +++ b/CIP-0165/format/format.ksy @@ -0,0 +1,216 @@ +meta: + id: scls_file + title: Container for the cardano ledger state + file-extension: scls + ks-version: 0.9 + endian: be + +doc: | + A seekable, versioned container for CBOR payloads whose structure is defined + by an external CDDL schema. + +seq: + - id: record + type: scls_record + doc: Typed records with data + repeat: eos + +types: + scls_record: + seq: + - id: len_payload + type: u4 + doc: Size of the record, including size and record type + - id: payload + type: scls_record_data + doc: payload of the record + size: len_payload + scls_record_data: + seq: + - id: record_type + type: u1 + doc: Type of the record + - id: record_data + doc: Record payload + # size: _parent.len_payload - 1 + size-eos: true + type: + switch-on: record_type + cases: + 0x00: rec_header + 0x01: rec_manifest + 0x10: rec_chunk + rec_header: + doc: Header block + seq: + - id: magic + contents: SCLS + doc: Magic bytes "SCLS" + - id: version + type: u4 + doc: Version of the file format + - id: network_id + type: u1 + enum: network_id + doc: Description of the network + - id: slot_no + type: u8 + doc: Description of slot. + rec_manifest: + doc: Manifest — is a trailer in the file that describes information of about the file contents + seq: + - id: total_entries + type: u8 + doc: total amount of entries in the file + - id: total_chunks + type: u8 + doc: total amount of chunks in the file + - id: summary + type: summary + doc: information about the file + - id: namespace_info + type: namespace_info + repeat: until + repeat-until: _.len_ns == 0 + doc: information about the namespaces + - id: prev_manifest + type: u8 + doc: absolute offset of the previous manifest, zero if there is no + - id: root_hash + type: digest + doc: merkle tree root of the live entries + - id: offset + type: u4 + doc: relative offset to the beginning of the block + rec_chunk: + doc: Chunk - is a block with data + seq: + - id: seqno + type: u8 + doc: Sequential number of the chunk + - id: format + type: u1 + enum: chunk_format + - id: len_ns + type: u4 + doc: size of the namespace + - id: ns + type: str + encoding: UTF-8 + doc: namespace name + size: len_ns + - id: data + type: entries_block(len_key) + size: len_data # substream; entries parse to EOS here + doc: payload parsed as entries + - id: entries_count + type: u4 + doc: Number of entries in the chunk + - id: digest + type: digest + doc: blake28 hash of the entries in the block + instances: + # size of record_data for this scls_record (total - record_type:u1) + rec_payload_size: + value: _parent._parent.len_payload - 1 + ns_size: + value: 4 + len_ns + len_data: + value: rec_payload_size - (8 + 1 + ns_size + 4 + 28) + doc: seqno=8, format=1, entries_count=4, digest=28. + len_key: + value: | + (ns == "utxo") ? 32 : + # TODO: uncomment when defined + # (ns == "stake") ? 28 : + # (ns == "pool") ? 28 : + 0 + entries_block: + params: + - id: len_key + type: u2 + seq: + - id: entries + type: entry(len_key) + repeat: eos + entry: + params: + - id: len_key + type: u2 + seq: + - id: len_body + type: u4 + - id: body + type: entry_body(len_key) + size: len_body + entry_body: + doc: Body of the entry with the key of the fixes size, that depends on the namespace + params: + - id: len_key + type: u2 + seq: + - id: key + doc: fixed size key + size: len_key + - id: value + doc: cbor encoded entry + size-eos: true + summary: + doc: Summary + seq: + - id: created_at + doc: absolute timestamp when file was generated in ISO8061 format + type: tstr + - id: tool_bytes + doc: name of the tool that has generated the file + type: tstr + - id: comment + doc: optional comment + type: tstr + namespace_info: + seq: + - id: len_ns + type: u4 + - id: ns_info + type: ns_info + if: len_ns != 0 + ns_info: + seq: + - id: entries_count + type: u8 + doc: number of entries in the namespace + - id: chunks_count + type: u8 + doc: number of chunks in the namespace + - id: namespaces_bytes + type: str + size: _parent.len_ns + doc: contents of the namespace name + encoding: UTF-8 + - id: digest + doc: merkle-tree hash of the alive entries in the namespace + type: digest + tstr: + seq: + - id: len_data + type: u4 + doc: size of the string + - id: data + type: str + encoding: UTF-8 + doc: value of the string + size: len_data + digest: + doc: Digest of the data + seq: + - id: data + doc: blake28 hash of data + size: 28 +enums: + network_id: + 0: mainnet + 1: testnet + chunk_format: + 0: raw + 1: zstd + 2: zstde diff --git a/CIP-0165/format/scls_file.html b/CIP-0165/format/scls_file.html new file mode 100644 index 0000000000..d5024c92b5 --- /dev/null +++ b/CIP-0165/format/scls_file.html @@ -0,0 +1,436 @@ + + + + + + + + + + + + SclsFile format specification + + +
+

SclsFile format specification

+ + + +

Type: SclsFile

+ +

A seekable, versioned container for CBOR payloads whose structure is defined +by an external CDDL schema. +

+ + + + + + + + + +
OffsetSizeIDTypeNote
0...recordSclsRecordTyped records with data
+ +

Enum: network_id

+ + + + + + + + + + + +
IDNameNote
0mainnet
1testnet
+ +

Enum: chunk_format

+ + + + + + + + + + + + + + +
IDNameNote
0raw
1zstd
2zstde
+ +

Type: EntriesBlock

+ + + + + + + + + + +
OffsetSizeIDTypeNote
0...entriesEntry
+ +

Type: Digest

+ +

Digest of the data

+ + + + + + + + + +
OffsetSizeIDTypeNote
0...datablake28 hash of data
+ +

Type: RecManifest

+ +

Manifest — is a trailer in the file that describes information of about the file contents

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...total_entriesu8betotal amount of entries in the file
8...total_chunksu8betotal amount of chunks in the file
16...summarySummaryinformation about the file
???...namespace_infoNamespaceInfoinformation about the namespaces
???...prev_manifestu8beabsolute offset of the previous manifest, zero if there is no
???...root_hashDigestmerkle tree root of the live entries
???...offsetu4berelative offset to the beginning of the block
+ +

Type: Entry

+ + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...len_bodyu4be
4...bodyEntryBody
+ +

Type: NsInfo

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...entries_countu8benumber of entries in the namespace
8...chunks_countu8benumber of chunks in the namespace
16...namespaces_bytesstr(UTF-8)contents of the namespace name
???...digestDigestmerkle-tree hash of the alive entries in the namespace
+ +

Type: RecChunk

+ +

Chunk - is a block with data

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...seqnou8beSequential number of the chunk
8...formatu1→ChunkFormat
9...len_nsu4besize of the namespace
13...nsstr(UTF-8)namespace name
???...dataEntriesBlockpayload parsed as entries
???...entries_countu4beNumber of entries in the chunk
???...digestDigestblake28 hash of the entries in the block
+value instance: ValueInstanceSpec(InstanceIdentifier(rec_payload_size),List(types, rec_chunk, instances, rec_payload_size),DocSpec(None,List()),BinOp(Attribute(Attribute(Name(identifier(_parent)),identifier(_parent)),identifier(len_payload)),Sub,IntNum(5)),None,Some(CalcIntType)) +value instance: ValueInstanceSpec(InstanceIdentifier(ns_size),List(types, rec_chunk, instances, ns_size),DocSpec(None,List()),BinOp(IntNum(4),Add,Name(identifier(len_ns))),None,Some(CalcIntType)) +value instance: ValueInstanceSpec(InstanceIdentifier(len_data),List(types, rec_chunk, instances, len_data),DocSpec(None,List()),BinOp(Name(identifier(rec_payload_size)),Sub,BinOp(BinOp(BinOp(BinOp(IntNum(8),Add,IntNum(1)),Add,Name(identifier(ns_size))),Add,IntNum(4)),Add,IntNum(28))),None,Some(CalcIntType)) +value instance: ValueInstanceSpec(InstanceIdentifier(len_key),List(types, rec_chunk, instances, len_key),DocSpec(None,List()),IfExp(Compare(Name(identifier(ns)),Eq,Str(utxo)),IntNum(32),IfExp(Compare(Name(identifier(ns)),Eq,Str(stake)),IntNum(28),IfExp(Compare(Name(identifier(ns)),Eq,Str(pool)),IntNum(28),IntNum(0)))),None,Some(Int1Type(true))) + +

Type: RecHeader

+ +

Header block

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...magicMagic bytes "SCLS"
4...versionu4beVersion of the file format
8...network_idu1→NetworkIdDescription of the network
9...slot_nou8beDescription of slot.
+ +

Type: SclsRecord

+ + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...len_payloadu4beSize of the record, including size and record type
4...payloadSclsRecordDatapayload of the record
+ +

Type: SclsRecordData

+ + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...record_typeu1Type of the record
1...record_dataSwitchType(Name(identifier(record_type)),Map(IntNum(0) -> UserTypeFromBytes(List(rec_header),None,List(),BytesEosType(None,false,None,None),None), IntNum(1) -> UserTypeFromBytes(List(rec_manifest),None,List(),BytesEosType(None,false,None,None),None), IntNum(16) -> UserTypeFromBytes(List(rec_chunk),None,List(),BytesEosType(None,false,None,None),None), Name(identifier(_)) -> BytesEosType(None,false,None,None)),true)Record payload
+ +

Type: NamespaceInfo

+ + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...len_nsu4be
4...ns_infoNsInfo
+ +

Type: Tstr

+ + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...len_datau4besize of the string
4...datastr(UTF-8)value of the string
+ +

Type: EntryBody

+ +

Body of the entry with the key of the fixes size, that depends on the namespace

+ + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...keyfixed size key
???...valuecbor encoded entry
+ +

Type: Summary

+ +

Summary

+ + + + + + + + + + + + + + + + + + + + + + + +
OffsetSizeIDTypeNote
0...created_atTstrabsolute timestamp when file was generated in ISO8061 format
???...tool_bytesTstrname of the tool that has generated the file
???...commentTstroptional comment
+ +
+ + + + + + + + diff --git a/CIP-0165/format/scls_file.svg b/CIP-0165/format/scls_file.svg new file mode 100644 index 0000000000..54c37adfa5 --- /dev/null +++ b/CIP-0165/format/scls_file.svg @@ -0,0 +1,908 @@ + + + + +cluster__scls_file + +SclsFile + + +cluster__entries_block + +SclsFile::EntriesBlock + + +cluster__digest + +SclsFile::Digest + + +cluster__rec_manifest + +SclsFile::RecManifest + + +cluster__entry + +SclsFile::Entry + + +cluster__ns_info + +SclsFile::NsInfo + + +cluster__rec_chunk + +SclsFile::RecChunk + + +cluster__rec_header + +SclsFile::RecHeader + + +cluster__scls_record + +SclsFile::SclsRecord + + +cluster__scls_record_data + +SclsFile::SclsRecordData + + +cluster__namespace_info + +SclsFile::NamespaceInfo + + +cluster__tstr + +SclsFile::Tstr + + +cluster__entry_body + +SclsFile::EntryBody + + +cluster__summary + +SclsFile::Summary + + + +scls_file__seq + + +pos + + +size + + +type + + +id + +0 + +... + +SclsRecord + +record + +repeat to end of stream + + + +scls_record__seq + + +pos + + +size + + +type + + +id + +0 + +4 + +u4be + +len_payload + +4 + +(len_payload - 4) + +SclsRecordData + +payload + + + +scls_file__seq:record_type->scls_record__seq + + + + + +entries_block__seq + + +pos + + +size + + +type + + +id + +0 + +... + +Entry + +entries + +repeat to end of stream + + + +entry__seq + + +pos + + +size + + +type + + +id + +0 + +4 + +u4be + +len_body + +4 + +len_body + +EntryBody + +body + + + +entries_block__seq:entries_type->entry__seq + + + + + +digest__seq + + +pos + + +size + + +type + + +id + +0 + +28 + + +data + + + +rec_manifest__seq + + +pos + + +size + + +type + + +id + +0 + +8 + +u8be + +total_entries + +8 + +8 + +u8be + +total_chunks + +16 + +... + +Summary + +summary + +... + +... + +NamespaceInfo + +namespace_info + +repeat until _.len_ns == 0 + +... + +8 + +u8be + +prev_manifest + +... + +28 + +Digest + +root_hash + +... + +4 + +u4be + +offset + + + +rec_manifest__seq:root_hash_type->digest__seq + + + + + +namespace_info__seq + + +pos + + +size + + +type + + +id + +0 + +4 + +u4be + +len_ns + +4 + +... + +NsInfo + +ns_info + + + +rec_manifest__seq:namespace_info_type->namespace_info__seq + + + + + +summary__seq + + +pos + + +size + + +type + + +id + +0 + +... + +Tstr + +created_at + +... + +... + +Tstr + +tool_bytes + +... + +... + +Tstr + +comment + + + +rec_manifest__seq:summary_type->summary__seq + + + + + +entry__seq:len_body_type->entry__seq:body_size + + + + + +entry_body__seq + + +pos + + +size + + +type + + +id + +0 + +len_key + + +key + +... + + + + +value + + + +entry__seq:body_type->entry_body__seq + + + + + +ns_info__seq + + +pos + + +size + + +type + + +id + +0 + +8 + +u8be + +entries_count + +8 + +8 + +u8be + +chunks_count + +16 + +_parent.len_ns + +str(UTF-8) + +namespaces_bytes + +... + +28 + +Digest + +digest + + + +ns_info__seq:digest_type->digest__seq + + + + + +rec_chunk__seq + + +pos + + +size + + +type + + +id + +0 + +8 + +u8be + +seqno + +8 + +1 + +u1→ChunkFormat + +format + +9 + +4 + +u4be + +len_ns + +13 + +len_ns + +str(UTF-8) + +ns + +... + +len_data + +EntriesBlock + +data + +... + +4 + +u4be + +entries_count + +... + +28 + +Digest + +digest + + + +rec_chunk__seq:data_type->entries_block__seq + + + + + +rec_chunk__seq:digest_type->digest__seq + + + + + +rec_chunk__seq:len_ns_type->rec_chunk__seq:ns_size + + + + + +rec_chunk__inst__ns_size + + +id + + +value + +ns_size + +(4 + len_ns) + + + +rec_chunk__seq:len_ns_type->rec_chunk__inst__ns_size + + + + + +rec_chunk__inst__len_key + + +id + + +value + +len_key + +(ns == "utxo" ? 32 : (ns == "stake" ? 28 : (ns == "pool" ? 28 : 0))) + + + +rec_chunk__seq:ns_type->rec_chunk__inst__len_key + + + + + +rec_chunk__seq:ns_type->rec_chunk__inst__len_key + + + + + +rec_chunk__seq:ns_type->rec_chunk__inst__len_key + + + + + +rec_chunk__inst__rec_payload_size + + +id + + +value + +rec_payload_size + +(_parent._parent.len_payload - 5) + + + +rec_chunk__inst__len_data + + +id + + +value + +len_data + +(rec_payload_size - ((((8 + 1) + ns_size) + 4) + 28)) + + + +rec_chunk__inst__rec_payload_size:rec_payload_size_type->rec_chunk__inst__len_data + + + + + +rec_chunk__inst__ns_size:ns_size_type->rec_chunk__inst__len_data + + + + + +rec_chunk__inst__len_data:len_data_type->rec_chunk__seq:data_size + + + + + +rec_header__seq + + +pos + + +size + + +type + + +id + +0 + +4 + + +magic + +4 + +4 + +u4be + +version + +8 + +1 + +u1→NetworkId + +network_id + +9 + +8 + +u8be + +slot_no + + + +scls_record__seq:len_payload_type->rec_chunk__inst__rec_payload_size + + + + + +scls_record__seq:len_payload_type->scls_record__seq:payload_size + + + + + +scls_record_data__seq + + +pos + + +size + + +type + + +id + +0 + +1 + +u1 + +record_type + +1 + +... + +switch (record_type) + +record_data + + + +scls_record__seq:payload_type->scls_record_data__seq + + + + + +scls_record_data__seq:record_type_type->scls_record_data__seq:record_data_type + + + + + +scls_record_data__seq_record_data_switch + + +case + + +type + +0 + +RecHeader + +1 + +RecManifest + +16 + +RecChunk + + + +scls_record_data__seq:record_data_type->scls_record_data__seq_record_data_switch + + + + + +scls_record_data__seq_record_data_switch:case1->rec_manifest__seq + + + + + +scls_record_data__seq_record_data_switch:case2->rec_chunk__seq + + + + + +scls_record_data__seq_record_data_switch:case0->rec_header__seq + + + + + +namespace_info__seq:len_ns_type->rec_manifest__seq:namespace_info__repeat + + + + + +namespace_info__seq:len_ns_type->ns_info__seq:namespaces_bytes_size + + + + + +namespace_info__seq:ns_info_type->ns_info__seq + + + + + +tstr__seq + + +pos + + +size + + +type + + +id + +0 + +4 + +u4be + +len_data + +4 + +len_data + +str(UTF-8) + +data + + + +tstr__seq:len_data_type->tstr__seq:data_size + + + + + +summary__seq:created_at_type->tstr__seq + + + + + +summary__seq:tool_bytes_type->tstr__seq + + + + + +summary__seq:comment_type->tstr__seq + + + + + +entry_body__params +entry_body__params + + + +entry_body__params:len_key_type->entry_body__seq:key_size + + + + + \ No newline at end of file diff --git a/CIP-0165/namespaces/README.md b/CIP-0165/namespaces/README.md new file mode 100644 index 0000000000..139e7a336c --- /dev/null +++ b/CIP-0165/namespaces/README.md @@ -0,0 +1,17 @@ +# Namespaces + +This is directory of the supported namespaces. + +Each namespace defines a non-intersecting slices of the data. + +| Shortname | Content | Key size | +| --------- | ------------------------------- | -------- | +| utxo | UTXOs | TxIn(transaction + offset) | +| stake | Stake delegation | TBD | +| rewards | Reward accounts | TBD | +| params | Protocol parameters | TBD | +| pots | Accounting pots (reserves etc.) | TBD | +| stake_pools | Stake Pools State | TBD | +| drep | DRep state | TBD | +| gov | Governance action state | 0 | +| hdr | Header state (e.g. nonces) | TBD | diff --git a/CIP-0165/namespaces/utxo.cddl b/CIP-0165/namespaces/utxo.cddl new file mode 100644 index 0000000000..392e94adeb --- /dev/null +++ b/CIP-0165/namespaces/utxo.cddl @@ -0,0 +1,118 @@ +; This file was auto-generated from huddle. Please do not modify it directly! + +; entry in scls file +generic_record = {key : a0, value : b0} + +; entry in utxo namespace +record_entry = generic_record + + +tx_in = [hash32, uint .size 2] + +hash32 = bytes .size 32 + +tx_out = [0, shelley_tx_out// 1, babbage_tx_out] + +shelley_tx_out = [address, amount : value, ? datum_hash : hash32] + +address = bytes + +value = coin/ [coin, multiasset] + +coin = uint + +multiasset = {* policy_id => {+ asset_name => a0}} + +policy_id = hash28 + +hash28 = bytes .size 28 + +asset_name = bytes .size (0 .. 32) + +positive_coin = 1 .. 18446744073709551615 + +; NEW starting with babbage +; datum_option +; script_ref +babbage_tx_out = {0 : address, 1 : value, ? 2 : datum_option, ? 3 : script_ref} + +datum_option = [0, hash32// 1, data] + +data = #6.24(bytes .cbor plutus_data) + +plutus_data = + constr + / {* plutus_data => plutus_data} + / [* plutus_data] + / big_int + / bounded_bytes + +constr = + #6.121([* a0]) + / #6.122([* a0]) + / #6.123([* a0]) + / #6.124([* a0]) + / #6.125([* a0]) + / #6.126([* a0]) + / #6.127([* a0]) + / #6.102([uint, [* a0]]) + +big_int = int/ big_uint/ big_nint + +big_uint = #6.2(bounded_bytes) + +; The real bounded_bytes does not have this limit. it instead has +; a different limit which cannot be expressed in CDDL. +; +; The limit is as follows: +; - bytes with a definite-length encoding are limited to size 0..64 +; - for bytes with an indefinite-length CBOR encoding, each chunk is +; limited to size 0..64 +; ( reminder: in CBOR, the indefinite-length encoding of +; bytestrings consists of a token #2.31 followed by a sequence +; of definite-length encoded bytestrings and a stop code ) +bounded_bytes = bytes .size (0 .. 64) + +big_nint = #6.3(bounded_bytes) + +script_ref = #6.24(bytes .cbor script) + +script = + [ 0, native_script + // 1 + , bytes + // 2 + , bytes + // 3 + , bytes + ] + + +native_script = + [ script_pubkey + // script_all + // script_any + // script_n_of_k + // invalid_before + // invalid_hereafter + ] + + +script_pubkey = (0, hash28) + +script_all = (1, [* native_script]) + +script_any = (2, [* native_script]) + +script_n_of_k = (3, n : int64, [* native_script]) + +int64 = -9223372036854775808 .. 9223372036854775807 + +invalid_before = (4, slot_no) + +slot_no = uint .size 8 + +invalid_hereafter = (5, slot_no) +