Add hash operator class for cross-document equality by tobyhede · Pull Request #164 · cipherstash/encrypt-query-language

tobyhede · 2026-02-12T01:29:15Z

Problem

PostgreSQL hash-based operations (hash joins, GROUP BY, DISTINCT,
UNION) on eql_v2_encrypted columns were not supported. The planner
could only use nested loop or merge join strategies for encrypted
column equality, and the HASHES flag was incorrectly set on
cross-type operators where independent hashing of each side is not
possible.

Solution

Add eql_v2.hash_encrypted() function that computes a 32-bit
integer hash from HMAC-256 or Blake3 index terms, with HMAC priority
Register a PostgreSQL hash operator class for eql_v2_encrypted
with the same-type equality operator
Remove HASHES from cross-type (encrypted/jsonb) and non-equality
operators (<>, ~~, ~~*) since hash joins require independent hashing
of each side
Update Supabase build exclusion glob to also exclude
hash_operator_class.sql

Summary by CodeRabbit

New Features
- Enables hash-based operations on encrypted columns (hash joins, GROUP BY, DISTINCT, UNION) via a new dedicated hash operator class.
Bug Fixes / Reliability
- Hashing now prefers Blake3 then HMAC for consistent encrypted-value hashes; clear errors when required hash terms are missing.
- Adjusted operator declarations to align with the new hash class.
Tests
- Added extensive tests covering hash joins, grouping, DISTINCT, UNION, nulls, cross-type comparisons, and error paths.
Chores
- Broadened build tooling discovery for operator-class SQL files.

Add 20 tests for the hash operator class covering hash joins, GROUP BY, DISTINCT, UNION, IN/NOT IN, STE vec unwrapping, NULL handling, cross-type equality, forced hash join strategy, and error paths for missing index terms and multi-element STE vectors.

Introduce eql_v2.hash_encrypted() which computes a 32-bit integer hash from HMAC-256 or Blake3 index terms, enabling PostgreSQL hash-based operations (hash joins, GROUP BY, DISTINCT, UNION) on encrypted columns. Register the hash operator class with the same-type equality operator so the planner can choose hash join strategy for encrypted = encrypted.

…perators Remove the HASHES flag from cross-type (encrypted/jsonb) equality, inequality, and LIKE operators. Hash joins require independent hashing of each side, which is only possible for same-type operators registered in the hash operator class. Cross-type operators retain MERGES for merge join support.

Change the Supabase build exclusion pattern from **/operator_class.sql to **/*operator_class.sql so it matches both the btree operator_class.sql and the new hash_operator_class.sql. Include updated protect dependency files.

Replace raw ORE-only test data with create_encrypted_json() which includes hmac/blake3 index terms required for hash-based GROUP BY aggregation now that the hash operator class is registered.

tobyhede · 2026-02-12T01:37:51Z

@CodeRabbit full review

coderabbitai · 2026-02-12T01:38:00Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-02-12T01:38:07Z

📝 Walkthrough

Walkthrough

Adds hash support for encrypted values: new function eql_v2.hash_encrypted, a hash operator family/class, removal of HASHES flags from several operators, a build script glob change, and extensive sqlx tests exercising hash joins and related behaviors.

Changes

Cohort / File(s)	Summary
Hash Function Implementation `src/encrypted/hash.sql`	Creates `eql_v2.hash_encrypted(val eql_v2_encrypted) RETURNS integer` that converts to STE Vec and returns Blake3 hashtext if present, else HMAC-256 hashtext, otherwise raises an exception.
Hash Operator Infrastructure `src/operators/hash_operator_class.sql`	Adds `eql_v2.encrypted_hash_operator_family` (USING hash) and `eql_v2.encrypted_hash_operator_class` (DEFAULT FOR TYPE `eql_v2_encrypted` USING hash) with OPERATOR 1 = `=` and FUNCTION 1 = `eql_v2.hash_encrypted(...)`.
Operator Flag Removals `src/operators/=.sql`, `src/operators/<>.sql`, `src/operators/~~.sql`	Removes `HASHES` from multiple operator declarations (equality, inequality, pattern-match variants across `eql_v2_encrypted` ↔ `eql_v2_encrypted`, `eql_v2_encrypted` ↔ `jsonb`, and `jsonb` ↔ `eql_v2_encrypted`); retains `MERGES` and other clauses.
Build Script `tasks/build.sh`	Changes Supabase exclusion pattern from `/operator_class.sql` to `/*operator_class.sql`, broadening matched filenames during dependency discovery.
Hash Operation Test Suite `tests/sqlx/tests/hash_operator_tests.rs`	Adds extensive tests for hash joins, GROUP BY, DISTINCT, UNION, IN/NOT IN, self-joins, direct `hash_encrypted` calls, error cases when indexes absent, NULL and STE vector edge cases, and forced hash join execution.
Operator Class Test Update `tests/sqlx/tests/operator_class_tests.rs`	Refactors `group_by_encrypted_column` test to use a sqlx fixture (`../fixtures/encrypted_json`) and `create_encrypted_json` inserts instead of manual ORE-encrypted copying.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant PG as PostgreSQL
    participant OpClass as "eql_v2.encrypted_hash_operator_class"
    participant HashFn as "eql_v2.hash_encrypted"

    Client->>PG: Submit query using hash operation (HASH JOIN / GROUP BY / DISTINCT)
    PG->>OpClass: Resolve hash operator class for encrypted type
    PG->>HashFn: Call eql_v2.hash_encrypted(encrypted_val)
    HashFn->>HashFn: Convert value to STE Vec
    alt Blake3 index exists
        HashFn->>HashFn: Return hashtext(blake3_value)
    else HMAC-256 index exists
        HashFn->>HashFn: Return hashtext(hmac_256_value)
    else Neither exists
        HashFn->>HashFn: Raise exception
    end
    HashFn->>PG: Return integer hash
    PG->>PG: Use hash for join/grouping/distinct
    PG->>Client: Return results

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibble code in moonlit rows,
Blake3 leads where starlight goes,
HMAC hums its steady tune,
Hash joins hop beneath the moon,
Encrypted fields now dance in rows.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding a hash operator class for encrypted columns to enable hash-based operations.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cross-document-eq

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hash_encrypted previously preferred HMAC over Blake3, but compare() uses the first index term present in BOTH operands (ORE > HMAC > Blake3). When value A has hm+b3 and value B has only b3, compare uses Blake3 (equal), but hash_encrypted(A) used HMAC while hash_encrypted(B) used Blake3 — producing different hashes for equal values. Reversed priority to Blake3-first so any two values that compare equal will always hash identically. Updated tests to reflect new priority.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/sqlx/tests/hash_operator_tests.rs`:
- Around line 586-600: The SET LOCAL statements and the join query must run on
the same connection inside an explicit transaction: begin a transaction (e.g.,
let mut tx = pool.begin().await?), run sqlx::query("SET LOCAL enable_nestloop =
off").execute(&mut tx).await?, sqlx::query("SET LOCAL enable_mergejoin =
off").execute(&mut tx).await?, then run the join with
sqlx::query_scalar(...).fetch_one(&mut tx).await?. Finally commit the
transaction (tx.commit().await?). Keep the existing error context ("forced hash
join failed") on the fetch_one call.

🧹 Nitpick comments (1)

tests/sqlx/tests/hash_operator_tests.rs (1)

183-213: Avoid probabilistic hash inequality assertions.

hash1 != hash3 is extremely likely but not guaranteed for 32‑bit hashes. Consider removing the probabilistic assertion to prevent rare flakes.

Proposed change

-    // Different encrypted values may produce different hashes (very likely but not guaranteed)
-    let hash3: i32 = sqlx::query_scalar("SELECT eql_v2.hash_encrypted(create_encrypted_json(2))")
-        .fetch_one(&pool)
-        .await
-        .context("hash_encrypted call 3 failed")?;
-
-    // While hash collisions are theoretically possible, these test values should differ
-    assert_ne!(
-        hash1, hash3,
-        "Different encrypted values should (almost certainly) produce different hashes"
-    );

tests/sqlx/tests/hash_operator_tests.rs

Add 3 end-to-end tests exercising real SQL query paths with mixed-index rows (hm+b3 vs b3-only for same logical value): - mixed_index_hash_join: forced hash join finds match - mixed_index_group_by_dedup: GROUP BY merges into one group - mixed_index_union_dedup: UNION deduplicates to one row These cover the execution paths that the P1 bug would break, beyond the direct function-level test in hash_function_uses_blake3_first.

…ssertion - Wrap SET LOCAL + join query in explicit transactions in forced_hash_join_via_planner_hints and mixed_index_hash_join so planner hints are guaranteed to be active for the query. - Remove probabilistic hash inequality assertion in hash_function_directly that could flake on 32-bit hash collisions.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/sqlx/tests/hash_operator_tests.rs`:
- Around line 391-414: The test hmac_and_blake3_produce_different_hashes
currently uses assert_ne! on two i32 hashes returned by
eql_v2.hash_encrypted(create_encrypted_json(...)) which leaves a tiny flake risk
from 32‑bit collisions; fix by replacing the non-deterministic inequality check
with a deterministic check against fixed expected values (compute the expected
hash outputs for create_encrypted_json(1, 'hm') and create_encrypted_json(1,
'b3') once and use assert_eq! on the scalar results), or if a longer digest is
available, change the SQL to return the full digest/bytea and compare those
bytes instead; if neither is possible, add a clear comment in
hmac_and_blake3_produce_different_hashes acknowledging the tiny collision risk
and keep the assertion.

coderabbitai · 2026-02-12T03:04:42Z

tests/sqlx/tests/hash_operator_tests.rs

+#[sqlx::test(fixtures(path = "../fixtures", scripts("encrypted_json")))]
+async fn hmac_and_blake3_produce_different_hashes(pool: PgPool) -> Result<()> {
+    // Test: HMAC and Blake3 code paths produce different hashes for same id
+    // Catches regression where the wrong branch is taken
+
+    let hash_hmac: i32 =
+        sqlx::query_scalar("SELECT eql_v2.hash_encrypted(create_encrypted_json(1, 'hm'))")
+            .fetch_one(&pool)
+            .await
+            .context("hash with hmac-only failed")?;
+
+    let hash_b3: i32 =
+        sqlx::query_scalar("SELECT eql_v2.hash_encrypted(create_encrypted_json(1, 'b3'))")
+            .fetch_one(&pool)
+            .await
+            .context("hash with blake3-only failed")?;
+
+    assert_ne!(
+        hash_hmac, hash_b3,
+        "HMAC and Blake3 should produce different hashes for same id"
+    );
+
+    Ok(())
+}


⚠️ Potential issue | 🟡 Minor

Rare flake risk from 32‑bit hash collision.
assert_ne! on two 32‑bit hashes can collide (very low probability). If you want to eliminate even tiny flake risk, consider asserting against a fixed test vector or adding a note acknowledging the collision risk.

🤖 Prompt for AI Agents

In `@tests/sqlx/tests/hash_operator_tests.rs` around lines 391 - 414, The test hmac_and_blake3_produce_different_hashes currently uses assert_ne! on two i32 hashes returned by eql_v2.hash_encrypted(create_encrypted_json(...)) which leaves a tiny flake risk from 32‑bit collisions; fix by replacing the non-deterministic inequality check with a deterministic check against fixed expected values (compute the expected hash outputs for create_encrypted_json(1, 'hm') and create_encrypted_json(1, 'b3') once and use assert_eq! on the scalar results), or if a longer digest is available, change the SQL to return the full digest/bytea and compare those bytes instead; if neither is possible, add a clear comment in hmac_and_blake3_produce_different_hashes acknowledging the tiny collision risk and keep the assertion.

src/encrypted/hash.sql

freshtonic · 2026-02-12T23:29:37Z

I see that the underlying hash is only 32 bits. That's quite limiting for a hash collision perspective (see below).

Upon hash collision do we fall back to performing equality on the hash index term (rather than the derived 32 bit hash)?

freshtonic

I'm concerned about the 32 bit hash size due to weak collision resistance. ~80K items is all that is required for a 50% chance of a collision.

This might not be a problem if the operator class falls back to an equality check on the full index term when the smaller hash collides.

src/encrypted/hash.sql

freshtonic

Left some comments.

tobyhede added 5 commits February 12, 2026 12:26

fix(test): update operator_class_tests to use encrypted_json fixture

2042d4d

Replace raw ORE-only test data with create_encrypted_json() which includes hmac/blake3 index terms required for hash-based GROUP BY aggregation now that the hash operator class is registered.

tobyhede force-pushed the cross-document-eq branch from a676717 to 2042d4d Compare February 12, 2026 01:30

tobyhede changed the title ~~Cross document eq~~ Add hash operator class for cross-document equality Feb 12, 2026

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

tests/sqlx/tests/hash_operator_tests.rs Outdated Show resolved Hide resolved

tobyhede added 2 commits February 12, 2026 13:51

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

freshtonic reviewed Feb 12, 2026

View reviewed changes

src/encrypted/hash.sql Show resolved Hide resolved

freshtonic self-requested a review February 12, 2026 23:32

freshtonic reviewed Feb 12, 2026

View reviewed changes

src/encrypted/hash.sql Show resolved Hide resolved

freshtonic reviewed Feb 12, 2026

View reviewed changes

coderdan approved these changes Feb 12, 2026

View reviewed changes

tobyhede merged commit 9b76ca7 into main Feb 15, 2026
5 checks passed

tobyhede deleted the cross-document-eq branch February 15, 2026 23:45

freshtonic self-requested a review February 15, 2026 23:47

Conversation

tobyhede commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

tobyhede commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

freshtonic commented Feb 12, 2026

Uh oh!

freshtonic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

freshtonic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tobyhede commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading