-
Notifications
You must be signed in to change notification settings - Fork 0
BlobDictionary #732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlobDictionary #732
Conversation
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughAdds a new BlobDictionary implementation for byte-keyed maps using 6-byte chunking and hybrid List/Map node storage and a helper bytesAsU48. Replaces internal HashDictionary usage with BlobDictionary in TruncatedHashDictionary, adds a new HashDictionary subclass of BlobDictionary and a deprecated StringHashDictionary alias, changes iterator return types from Generator to Iterator in ImmutableHashDictionary, adds unit tests for BlobDictionary, multiple benchmark scripts under benchmarks/, refines a generator type annotation in BytesBlob.chunks, and replaces a Node assert usage with a local deepEqual in a test runner. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/bytes/bytes.ts (1)
166-170: Guard against size <= 0 to prevent infinite loop.If
sizeis 0 or negative,i += sizenever progresses. Add a precondition check.- *chunks(size: number): Generator<BytesBlob, undefined, void> { - for (let i = 0; i < this.length; i += size) { + *chunks(size: number): Generator<BytesBlob, undefined, void> { + // Require a positive, non-zero chunk size + check`${Number.isInteger(size) && size > 0} BytesBlob.chunks(size) requires a positive integer size`; + for (let i = 0; i < this.length; i += size) { yield BytesBlob.blobFrom(this.raw.subarray(i, i + size)); } }
🧹 Nitpick comments (15)
packages/core/trie/trie.test.ts (1)
356-357: Make leaves comparison order-insensitive (align with earlier pattern).
leaves()iteration order isn’t guaranteed. Compare sorted keys instead to avoid flakiness.Based on learnings
- assert.deepStrictEqual(Array.from(actual.nodes.leaves()), Array.from(trie.nodes.leaves())); + const actualLeavesSorted = Array.from(actual.nodes.leaves()) + .map((x) => x.getKey().toString()) + .sort(); + const expectedLeavesSorted = Array.from(trie.nodes.leaves()) + .map((x) => x.getKey().toString()) + .sort(); + assert.deepStrictEqual(actualLeavesSorted, expectedLeavesSorted);benchmarks/bytes/bytes-to-number.ts (2)
48-49: Sanity-check both converters produce identical results before benchmarking.Avoid benchmarking apples vs oranges; assert equality once outside the measured hot path.
-const arr = generateArrayOfHashes(ARRAY_SIZE, HASH_SIZE); +const arr = generateArrayOfHashes(ARRAY_SIZE, HASH_SIZE); +// Sanity-check: both methods must agree +{ + const ok = arr.every((x) => bytesAsU48WithBitOps(x.raw) === bytesAsU48WithoutBitOps(x.raw)); + if (!ok) { + throw new Error("bytesAsU48WithBitOps and bytesAsU48WithoutBitOps disagree"); + } +}
33-36: Deterministic, full‑range byte generation (optional).For reproducible runs and full byte range:
- Use a seeded PRNG (e.g., mulberry32).
- Use 256 to include 0xff.
-function generateHash<T extends number>(len: T): Bytes<T> { +function mulberry32(seed: number) { + return function () { + let t = (seed += 0x6d2b79f5); + t = Math.imul(t ^ (t >>> 15), t | 1); + t ^= t + Math.imul(t ^ (t >>> 7), t | 61); + return ((t ^ (t >>> 14)) >>> 0) / 4294967296; + }; +} +const rand = mulberry32(0xC0FFEE); + +function generateHash<T extends number>(len: T): Bytes<T> { const result: number[] = []; for (let i = 0; i < len; i += 1) { - const val = Math.floor(Math.random() * 255); + const val = Math.floor(rand() * 256); result.push(val); } return Bytes.fromNumbers(result, len); }benchmarks/collections/hash-dict-vs-blob-dict_set.ts (1)
39-47: Confirm benchmark semantics: cold inserts vs steady‑state updates.The map is created once per test case and reused across iterations; subsequent runs measure overwrites, not pure inserts. If you want cold insert performance, construct the map inside the measured function or reset between cycles (if supported by the harness).
Option A (cold insert per invocation):
- add(`BlobDictionary(${threshold})`, () => { - const map = BlobDictionary.new<OpaqueHash, number>(threshold); - - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(LONG_COLLISION_KEYS[k], k); - } - }; - }), + add(`BlobDictionary(${threshold})`, () => { + return () => { + const map = BlobDictionary.new<OpaqueHash, number>(threshold); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(LONG_COLLISION_KEYS[k], k); + } + }; + }),If steady‑state updates are intended, consider updating the benchmark name/description to reflect that.
Also applies to: 75-83
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts (2)
53-53: Nit: fix typo in suite title (“collition” → “collision”).- `Comparing delete operation in two hash dicts using long collition keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`, + `Comparing delete operation in two hash dicts using long collision keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`,
94-96: Title mismatch: this suite benchmarks delete, not set.- `Comparing set operation in two hash dicts using hash keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`, + `Comparing delete operation in two hash dicts using hash keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`,benchmarks/collections/hash-dict-vs-blob-dict_get.ts (2)
53-53: Nit: fix typo in suite title (“collition” → “collision”).- `Comparing get operation in two hash dicts using long collition keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`, + `Comparing get operation in two hash dicts using long collision keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`,
94-96: Title mismatch: this suite benchmarks get, not set.- `Comparing set operation in two hash dicts using hash keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`, + `Comparing get operation in two hash dicts using hash keys and BlobDictionary(n: [${MIN_THRESHOLD}: ${MAX_THRESHOLD}))`,packages/core/collections/hash-dictionary.ts (1)
109-121: Avoid duplication: export the threshold constant for reuse.
BLOB_DICTIONARY_THRESHOLDis also defined intruncated-hash-dictionary.ts. Prefer a single exported source to prevent drift.-const BLOB_DICTIONARY_THRESHOLD = 5; +export const BLOB_DICTIONARY_THRESHOLD = 5;Then import and reuse it in
packages/core/collections/truncated-hash-dictionary.ts. Based on learnings.packages/core/collections/blob-dictionary.test.ts (2)
7-7: Fix test typos.-const TRESHOLDS = [0, 5, 10]; +const THRESHOLDS = [0, 5, 10]; @@ - it("should distuingigh shorted chunk and padded with 0s", () => { + it("should distinguish short chunk vs the same chunk padded with 0s", () => {Also update
for (const threshold of TRESHOLDS)usages accordingly.Also applies to: 29-37
10-27: Add a max-value edge case for bytesAsU48.To guard against sign/bitwise pitfalls, add an all-0xff 6‑byte case.
describe("bytesAsU48", () => { @@ it("should convert empty bytes to 0", () => { const bytes = BytesBlob.parseBlob("0x").raw; const expectedResult = 0; const result = bytesAsU48(bytes); assert.strictEqual(result, expectedResult); }); + it("should handle 0xffffffffffff (6 bytes all ones)", () => { + const bytes = BytesBlob.parseBlob("0xffffffffffff").raw; + const expectedResult = (2 ** 48 - 1) * 8 + 6; + const result = bytesAsU48(bytes); + assert.strictEqual(result, expectedResult); + });This will surface any 32‑bit bitwise overflow issues in the implementation. Based on learnings.
packages/core/collections/blob-dictionary.ts (4)
346-358: Make bytesAsU48 purely arithmetic; avoid 32‑bit bitwise sign issues.Bitwise shifts coerce to signed 32‑bit and can introduce negative intermediates. Pure arithmetic keeps it simple and still exact (max ≈ 2^51 < 2^53).
export function bytesAsU48(bytes: Uint8Array): number { const len = bytes.length; check`${len <= CHUNK_SIZE} Length has to be <= ${CHUNK_SIZE}, got: ${len}`; - let value = bytes[3] | (bytes[2] << 8) | (bytes[1] << 16) | (bytes[0] << 24); - - for (let i = 4; i < bytes.length; i++) { - value = value * 256 + bytes[i]; - } + let value = 0; + for (let i = 0; i < len; i++) { + value = value * 256 + bytes[i]; + } return value * 8 + len; }
187-215: Early‑return in get() for ListChildren to avoid extra iterations.When at a ListChildren node, the remainder subkey is definitive: return the found leaf, the node leaf if remainder is empty, or undefined. This avoids repeatedly advancing the generator without changing node depth.
get(key: K): V | undefined { - let node: MaybeNode<K, V> = this.root; - const pathChunksGenerator = key.chunks(CHUNK_SIZE); - let depth = 0; - - while (node !== undefined) { - const maybePathChunk = pathChunksGenerator.next().value; - - if (node.children instanceof ListChildren) { - const subkey = BytesBlob.blobFrom(key.raw.subarray(depth * CHUNK_SIZE)); - const child = node.children.find(subkey); - if (child !== null) { - return child.value; - } - } - - if (maybePathChunk === undefined) { - return node.getLeaf()?.value; - } - - if (node.children instanceof MapChildren) { - const pathChunk: KeyChunk = asOpaqueType(maybePathChunk); - node = node.children.getChild(pathChunk); - depth += 1; - } - } - - return undefined; + let node: MaybeNode<K, V> = this.root; + const pathChunks = key.chunks(CHUNK_SIZE); + let depth = 0; + + for (;;) { + if (node === undefined) return undefined; + + if (node.children instanceof ListChildren) { + const subkey = BytesBlob.blobFrom(key.raw.subarray(depth * CHUNK_SIZE)); + const child = node.children.find(subkey); + if (child !== null) return child.value; + return subkey.length === 0 ? node.getLeaf()?.value : undefined; + } + + const maybePathChunk = pathChunks.next().value; + if (maybePathChunk === undefined) { + return node.getLeaf()?.value; + } + + const pathChunk: KeyChunk = asOpaqueType(maybePathChunk); + node = (node.children as MapChildren<K, V>).getChild(pathChunk); + depth += 1; + } }
119-129: Compute keyChunk only for MapChildren branch to avoid unused work.keyChunk is unused when the current node has ListChildren and we return early; compute it lazily in the MapChildren branch.
- const keyChunk: KeyChunk = asOpaqueType(maybeKeyChunk); - if (node.children instanceof ListChildren) { const subkey = BytesBlob.blobFrom(key.raw.subarray(CHUNK_SIZE * depth)); const leaf = value !== undefined ? node.children.insert(subkey, { key, value }) : node.children.remove(subkey); @@ - if (children instanceof MapChildren) { - const maybeNode = children.getChild(keyChunk); + if (children instanceof MapChildren) { + const keyChunk: KeyChunk = asOpaqueType(maybeKeyChunk); + const maybeNode = children.getChild(keyChunk);Also applies to: 138-156
69-71: Default threshold: consider > 0 as per benchmark note.Benchmarks indicate threshold 0 underperforms; consider a safer default (e.g., 2–3) or validate input to discourage 0 in production paths.
Do we want to change the default or at least assert non‑negative and document recommended values?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
benchmarks/bytes/bytes-to-number.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_delete.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_get.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_set.ts(1 hunks)bin/test-runner/w3f/preimages.ts(1 hunks)packages/core/bytes/bytes.ts(1 hunks)packages/core/collections/blob-dictionary.test.ts(1 hunks)packages/core/collections/blob-dictionary.ts(1 hunks)packages/core/collections/hash-dictionary.ts(2 hunks)packages/core/collections/index.ts(1 hunks)packages/core/collections/truncated-hash-dictionary.ts(3 hunks)packages/core/trie/trie.test.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/bytes/bytes.tspackages/core/collections/index.tspackages/core/collections/blob-dictionary.test.tspackages/core/trie/trie.test.tspackages/core/collections/blob-dictionary.tspackages/core/collections/truncated-hash-dictionary.tspackages/core/collections/hash-dictionary.ts
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/core/bytes/bytes.tspackages/core/collections/index.tsbin/test-runner/w3f/preimages.tspackages/core/collections/blob-dictionary.test.tsbenchmarks/collections/hash-dict-vs-blob-dict_set.tsbenchmarks/bytes/bytes-to-number.tspackages/core/trie/trie.test.tsbenchmarks/collections/hash-dict-vs-blob-dict_get.tsbenchmarks/collections/hash-dict-vs-blob-dict_delete.tspackages/core/collections/blob-dictionary.tspackages/core/collections/truncated-hash-dictionary.tspackages/core/collections/hash-dictionary.ts
🧠 Learnings (1)
📚 Learning: 2025-06-10T12:11:27.374Z
Learnt from: tomusdrw
PR: FluffyLabs/typeberry#419
File: packages/core/trie/nodesDb.ts:26-33
Timestamp: 2025-06-10T12:11:27.374Z
Learning: In the trie implementation (`packages/core/trie/nodesDb.ts`), the `leaves()` method intentionally yields both `NodeType.Leaf` and `NodeType.EmbedLeaf` nodes by filtering out only `NodeType.Branch` nodes. Both leaf types are valid leaf nodes that should be included in leaf iteration.
Applied to files:
packages/core/trie/trie.test.ts
🧬 Code graph analysis (9)
bin/test-runner/w3f/preimages.ts (1)
packages/core/utils/test.ts (1)
deepEqual(43-195)
packages/core/collections/blob-dictionary.test.ts (2)
packages/core/bytes/bytes.ts (2)
BytesBlob(18-171)Bytes(176-242)packages/core/collections/blob-dictionary.ts (1)
bytesAsU48(346-358)
benchmarks/collections/hash-dict-vs-blob-dict_set.ts (4)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/hash-dictionary.ts (1)
StringHashDictionary(27-105)
benchmarks/bytes/bytes-to-number.ts (2)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/misc/benchmark/setup.ts (2)
configure(12-15)save(17-36)
benchmarks/collections/hash-dict-vs-blob-dict_get.ts (5)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/misc/benchmark/setup.ts (3)
add(1-1)configure(12-15)save(17-36)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/hash-dictionary.ts (1)
StringHashDictionary(27-105)
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts (4)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/hash-dictionary.ts (1)
StringHashDictionary(27-105)
packages/core/collections/blob-dictionary.ts (3)
packages/core/bytes/bytes.ts (2)
BytesBlob(18-171)TEST_COMPARE_USING(19-21)packages/core/utils/opaque.ts (2)
asOpaqueType(37-39)Opaque(33-35)packages/core/utils/debug.ts (2)
assertNever(36-38)check(15-26)
packages/core/collections/truncated-hash-dictionary.ts (3)
packages/core/hash/hash.ts (6)
OpaqueHash(18-18)TruncatedHash(27-27)HASH_SIZE(10-10)HASH_SIZE(11-11)TRUNCATED_HASH_SIZE(14-14)TRUNCATED_HASH_SIZE(15-15)packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/core/collections/blob-dictionary.ts (3)
BlobDictionary(6-335)entries(295-299)TEST_COMPARE_USING(54-58)
packages/core/collections/hash-dictionary.ts (2)
packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: benchmarks (22.x)
- GitHub Check: state_transition (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: test (22.x)
🔇 Additional comments (3)
bin/test-runner/w3f/preimages.ts (1)
126-126: LGTM: Consistent use of internal deepEqual utility.The change from Node's
assert.deepEqualto the internaldeepEqualutility is appropriate and maintains consistency with the existing usage on line 122. This removes the external dependency and provides more comprehensive comparison capabilities for complex objects.packages/core/bytes/bytes.ts (1)
67-71: Verify graypaper link points to current version.We touched this file; coding guideline asks to ensure graypaper links are current when code changes are made. Please confirm the URL remains valid and version-correct.
As per coding guidelines
packages/core/collections/index.ts (1)
2-2: Public export looks good.Re-exporting
blob-dictionaryfrom collections index is correct and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts (1)
15-21: Compile-time fix: use BytesBlob.blobFromString (and import it).Same issue as in set.ts; update import and call site.
Apply:
-import { Bytes } from "@typeberry/bytes"; +import { Bytes, BytesBlob } from "@typeberry/bytes"; @@ function longCollisionKey(n: number) { - const key = Bytes.blobFromString(`${n}`); + const key = BytesBlob.blobFromString(`${n}`); const ret = Bytes.zero(HASH_SIZE); ret.raw.set(key.raw, 0); ret.raw.reverse(); return ret; }Scan for remaining usages:
#!/bin/bash rg -nP --type=ts '\bBytes\.blobFromString\(' -C2
🧹 Nitpick comments (2)
benchmarks/collections/hash-dict-vs-blob-dict_set.ts (1)
39-47: Benchmark integrity: reset state each iteration to measure consistent inserts.Maps are created once in the factory; subsequent iterations measure overwrites into a hot map. Create the map inside the measured fn.
Apply:
add(`BlobDictionary(${threshold})`, () => { - const map = BlobDictionary.new<OpaqueHash, number>(threshold); - - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(LONG_COLLISION_KEYS[k], k); - } - }; + return () => { + const map = BlobDictionary.new<OpaqueHash, number>(threshold); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(LONG_COLLISION_KEYS[k], k); + } + }; }), @@ add("StringHashDictionary", () => { - const map = StringHashDictionary.new<OpaqueHash, number>(); - - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(LONG_COLLISION_KEYS[k], k); - } - }; + return () => { + const map = StringHashDictionary.new<OpaqueHash, number>(); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(LONG_COLLISION_KEYS[k], k); + } + }; }), @@ add(`BlobDictionary(${threshold})`, () => { - const map = BlobDictionary.new<OpaqueHash, number>(threshold); - - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(HASH_KEYS[k], k); - } - }; + return () => { + const map = BlobDictionary.new<OpaqueHash, number>(threshold); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(HASH_KEYS[k], k); + } + }; }), @@ add("StringHashDictionary", () => { - const map = StringHashDictionary.new<OpaqueHash, number>(); - - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(HASH_KEYS[k], k); - } - }; + return () => { + const map = StringHashDictionary.new<OpaqueHash, number>(); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(HASH_KEYS[k], k); + } + }; }),Confirm benchmark harness calls the returned fn multiple times per test; if it supports per-iteration setup hooks, prefer those instead.
Also applies to: 54-61, 75-83, 90-98
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts (1)
39-49: Benchmark integrity: keep the dataset non-empty across iterations.Maps are prefilled once in the factory; after the first measured run, deletes empty the map and later iterations delete non-existent keys. Recreate/populate inside the measured fn.
Apply:
add(`BlobDictionary(${threshold})`, () => { - const map = BlobDictionary.new<OpaqueHash, number>(threshold); - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(LONG_COLLISION_KEYS[k], k); - } - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.delete(LONG_COLLISION_KEYS[k]); - } - }; + return () => { + const map = BlobDictionary.new<OpaqueHash, number>(threshold); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(LONG_COLLISION_KEYS[k], k); + } + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.delete(LONG_COLLISION_KEYS[k]); + } + }; }), @@ add("StringHashDictionary", () => { - const map = StringHashDictionary.new<OpaqueHash, number>(); - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(LONG_COLLISION_KEYS[k], k); - } - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.delete(LONG_COLLISION_KEYS[k]); - } - }; + return () => { + const map = StringHashDictionary.new<OpaqueHash, number>(); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(LONG_COLLISION_KEYS[k], k); + } + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.delete(LONG_COLLISION_KEYS[k]); + } + }; }), @@ add(`BlobDictionary(${threshold})`, () => { - const map = BlobDictionary.new<OpaqueHash, number>(threshold); - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(HASH_KEYS[k], k); - } - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.delete(HASH_KEYS[k]); - } - }; + return () => { + const map = BlobDictionary.new<OpaqueHash, number>(threshold); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(HASH_KEYS[k], k); + } + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.delete(HASH_KEYS[k]); + } + }; }), @@ add("StringHashDictionary", () => { - const map = StringHashDictionary.new<OpaqueHash, number>(); - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.set(HASH_KEYS[k], k); - } - return () => { - for (let k = 0; k < NO_OF_KEYS; k += 1) { - map.delete(HASH_KEYS[k]); - } - }; + return () => { + const map = StringHashDictionary.new<OpaqueHash, number>(); + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.set(HASH_KEYS[k], k); + } + for (let k = 0; k < NO_OF_KEYS; k += 1) { + map.delete(HASH_KEYS[k]); + } + }; }),If the harness supports beforeEach/afterEach hooks, use them instead to avoid measuring setup costs while ensuring consistent starting state.
Also applies to: 55-65, 79-89, 96-106
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_get.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_set.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- benchmarks/collections/hash-dict-vs-blob-dict_get.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
benchmarks/collections/hash-dict-vs-blob-dict_delete.tsbenchmarks/collections/hash-dict-vs-blob-dict_set.ts
🧬 Code graph analysis (2)
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts (5)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/misc/benchmark/setup.ts (2)
configure(12-15)save(17-36)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/hash-dictionary.ts (1)
StringHashDictionary(27-105)
benchmarks/collections/hash-dict-vs-blob-dict_set.ts (5)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/misc/benchmark/setup.ts (3)
add(1-1)configure(12-15)save(17-36)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/hash-dictionary.ts (1)
StringHashDictionary(27-105)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: benchmarks (22.x)
- GitHub Check: e2e (22.x)
- GitHub Check: run (22.x)
- GitHub Check: test (22.x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
benchmarks/collections/hash-dict-vs-blob-dict_get.ts (1)
3-3: Critical: Import BytesBlob and use BytesBlob.blobFromString.This issue was already flagged in a previous review. Line 3 must import
BytesBlobfrom@typeberry/bytes, and line 16 must useBytesBlob.blobFromStringinstead ofBytes.blobFromString.Apply this diff:
-import { Bytes } from "@typeberry/bytes"; +import { Bytes, BytesBlob } from "@typeberry/bytes";function longCollisionKey(n: number) { - const key = Bytes.blobFromString(`${n}`); + const key = BytesBlob.blobFromString(`${n}`); const ret = Bytes.zero(HASH_SIZE); ret.raw.set(key.raw, 0); ret.raw.reverse(); return ret; }Also applies to: 16-16
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_get.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_set.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- benchmarks/collections/hash-dict-vs-blob-dict_delete.ts
- benchmarks/collections/hash-dict-vs-blob-dict_set.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
benchmarks/collections/hash-dict-vs-blob-dict_get.ts
🧬 Code graph analysis (1)
benchmarks/collections/hash-dict-vs-blob-dict_get.ts (4)
packages/core/bytes/bytes.ts (1)
Bytes(176-242)packages/core/collections/blob-dictionary.ts (1)
BlobDictionary(6-335)packages/core/hash/hash.ts (1)
OpaqueHash(18-18)packages/core/collections/hash-dictionary.ts (1)
StringHashDictionary(27-105)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: benchmarks (22.x)
- GitHub Check: run (22.x)
|
@mateuszsikora can you address the two remaining comments? |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/core/collections/truncated-hash-dictionary.ts (1)
11-15: Consider importing the threshold constant if it exists elsewhere.If
BLOB_DICTIONARY_THRESHOLDis also defined inhash-dictionary.ts, importing it here would eliminate duplication and ensure both dictionaries use the same threshold.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/collections/hash-dict-vs-blob-dict_delete.ts(1 hunks)benchmarks/collections/hash-dict-vs-blob-dict_get.ts(1 hunks)packages/core/collections/truncated-hash-dictionary.ts(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- benchmarks/collections/hash-dict-vs-blob-dict_delete.ts
- benchmarks/collections/hash-dict-vs-blob-dict_get.ts
🧰 Additional context used
📓 Path-based instructions (2)
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/collections/truncated-hash-dictionary.ts
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/core/collections/truncated-hash-dictionary.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 351
File: packages/jam/state-merkleization/index.ts:19-37
Timestamp: 2025-04-24T19:48:31.051Z
Learning: In the typeberry codebase, `Bytes` inherits from `BytesBlob`, so a `Bytes` instance can be directly used where a `BytesBlob` is expected without needing conversion.
📚 Learning: 2025-06-10T12:20:17.513Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 419
File: packages/jam/state/state-inmemory.ts:141-149
Timestamp: 2025-06-10T12:20:17.513Z
Learning: In the `InMemoryService.copyFrom` function in `packages/jam/state/state-inmemory.ts`, duplicate checking for `(hash, length)` pairs in the lookup history is not necessary because the function operates under the assumption that the input `ServiceEntries` comes from an existing well-formed state, which already maintains the invariant of unique lookup history entries per hash and length combination.
Applied to files:
packages/core/collections/truncated-hash-dictionary.ts
🧬 Code graph analysis (1)
packages/core/collections/truncated-hash-dictionary.ts (3)
packages/core/hash/hash.ts (4)
OpaqueHash(18-18)TruncatedHash(27-27)TRUNCATED_HASH_SIZE(14-14)TRUNCATED_HASH_SIZE(15-15)packages/core/bytes/bytes.ts (1)
Bytes(177-243)packages/core/collections/blob-dictionary.ts (3)
BlobDictionary(6-335)entries(295-299)TEST_COMPARE_USING(54-58)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: run (22.x)
- GitHub Check: state_transition (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: e2e (22.x)
- GitHub Check: test (22.x)
🔇 Additional comments (4)
packages/core/collections/truncated-hash-dictionary.ts (4)
6-9: Clean implementation of key truncation.The helper correctly ensures all keys are exactly
TRUNCATED_HASH_SIZEbytes. Usingsubarraycreates a view without copying data, and the approach is consistent with the past review suggestions.
27-38: Excellent type consistency and implementation.The refactor correctly:
- Uses
TruncatedHashas the internal dictionary key type- Applies
getTruncatedKeyconsistently infromEntries- Removes the unnecessary
HashWithZeroedBittype alias- Maintains the public API with
T extends OpaqueHashfor caller flexibilityThis addresses all concerns from the previous review.
50-71: Consistent and correct method implementations.All dictionary operations (
get,has,set,delete) follow the same clean pattern: truncate the key viagetTruncatedKey, then delegate to the underlyingBlobDictionary. This ensures type safety and eliminates the redundant conversions that were previously flagged.
79-81: Perfect fix for the redundant conversion.The simplified
yield* this.dict.entries()correctly removes the unnecessary byte conversion that was flagged in the previous review. Since the underlying dictionary already storesTruncatedHashkeys, direct delegation is both correct and more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/core/collections/truncated-hash-dictionary.ts (1)
6-13: Good optimization for already-truncated keys.The early return at lines 8-10 correctly avoids allocating new
Bytesobjects when the key is alreadyTRUNCATED_HASH_SIZE, addressing the performance concern from previous reviews. The logic is sound and the comment is clear.Consider adding an explicit return type annotation for clarity:
-function getTruncatedKey(key: OpaqueHash | TruncatedHash) { +function getTruncatedKey(key: OpaqueHash | TruncatedHash): TruncatedHash { // Always return exactly TRUNCATED_HASH_SIZE bytes.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/core/collections/truncated-hash-dictionary.ts(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
packages/core/**/*.ts
⚙️ CodeRabbit configuration file
packages/core/**/*.ts: Core packages must not import any JAM-related packages
(i.e. packages defined underpackages/jam/**)
Files:
packages/core/collections/truncated-hash-dictionary.ts
**/*.ts
⚙️ CodeRabbit configuration file
**/*.ts: rules from./CODESTYLE.mdshould be adhered to.
**/*.ts: Any function whose documentation mention it must not be used in production code,
can be safely used in*.test.tsfiles. Other usage should be carefuly reviewed
and the comment must explain why it's safe to use there.
**/*.ts:asconversions must not be used. Suggest usingtryAsconversion methods.
**/*.ts: Classes withstatic Codecfield must have private constructor and staticcreatemethod.
**/*.ts: Casting abigint(orU64) usingNumber(x)must have an explanation comment why
it is safe.
**/*.ts: When making changes to code with comments containing links (in classes, constants, methods, etc.)
to graypaper.fluffylabs.dev, ensure those links point to the current version for this update.
Files:
packages/core/collections/truncated-hash-dictionary.ts
🧠 Learnings (1)
📚 Learning: 2025-06-10T12:20:17.513Z
Learnt from: tomusdrw
Repo: FluffyLabs/typeberry PR: 419
File: packages/jam/state/state-inmemory.ts:141-149
Timestamp: 2025-06-10T12:20:17.513Z
Learning: In the `InMemoryService.copyFrom` function in `packages/jam/state/state-inmemory.ts`, duplicate checking for `(hash, length)` pairs in the lookup history is not necessary because the function operates under the assumption that the input `ServiceEntries` comes from an existing well-formed state, which already maintains the invariant of unique lookup history entries per hash and length combination.
Applied to files:
packages/core/collections/truncated-hash-dictionary.ts
🧬 Code graph analysis (1)
packages/core/collections/truncated-hash-dictionary.ts (3)
packages/core/hash/hash.ts (4)
OpaqueHash(18-18)TruncatedHash(27-27)TRUNCATED_HASH_SIZE(14-14)TRUNCATED_HASH_SIZE(15-15)packages/core/bytes/bytes.ts (1)
Bytes(177-243)packages/core/collections/blob-dictionary.ts (3)
BlobDictionary(6-335)entries(295-299)TEST_COMPARE_USING(54-58)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: run (22.x)
- GitHub Check: test (22.x)
- GitHub Check: e2e (22.x)
🔇 Additional comments (1)
packages/core/collections/truncated-hash-dictionary.ts (1)
34-39: Clean refactoring to BlobDictionary backing store.The migration from
HashDictionarytoBlobDictionaryis well-executed:
fromEntriescorrectly maps entries throughgetTruncatedKeybefore delegating toBlobDictionary.fromEntries- Constructor properly typed with
BlobDictionary<TruncatedHash, V>- All CRUD methods (
get,has,set,delete) consistently use thegetTruncatedKeyhelperentries()cleanly delegates withyield*- Type safety preserved: public API accepts
T extends OpaqueHashwhile internal storage usesTruncatedHashThis addresses previous concerns about unnecessary
asOpaqueconversions and maintains clear separation between the public API contract and internal implementation.Also applies to: 42-42, 45-45, 54-75, 84-84
|
@fluffylabs-bot benchmark |
|
✅ Benchmark workflow triggered successfully! 🎉 🔗 Check the Actions tab for workflow progress. |
View all
Benchmarks summary: 83/83 OK ✅ |
Picofuzz Benchmark Resultsfallback
safrole
storage
storage_light
🤖 Automated benchmark |
What?
I added an implementation of
BlobDictionarythat replaces ourHashDictionaryandTruncatedHashDictionary.Benchmarks
The
BlobDictionaryhas a threshold parameter that defines how many items an array must contain before it is converted into a map. The assumption here is that maps are slower than arrays when the number of items is small.However, benchmark results show that this parameter doesn’t have a large impact. Even if we have
n = 1Mkeys, after the first split (for hash-based keys), we end up with a map containing n keys, where all values are array nodes of length1. In cases with long key collisions, this effect occurs deeper in the tree.Therefore, it is important that the threshold value remains greater than zero.
Next steps
Since benchmark results indicate that the performance of
BlobDictionaryis comparable toStringHashDictionary, I’d like to merge it in its current form.In subsequent PRs, I plan to:
BytesBlob.isEqualToby introducing afastIsEqualTovariant that operates onU48chunks instead of single bytes;Benchmark results
set operation
long collision keys
hash keys
get operation
long collision keys
hash keys
delete operation
long collision keys
hash keys