Skip to content

Add hash operator class for cross-document equality#164

Merged
tobyhede merged 8 commits intomainfrom
cross-document-eq
Feb 15, 2026
Merged

Add hash operator class for cross-document equality#164
tobyhede merged 8 commits intomainfrom
cross-document-eq

Conversation

@tobyhede
Copy link
Contributor

@tobyhede tobyhede commented Feb 12, 2026

Problem

PostgreSQL hash-based operations (hash joins, GROUP BY, DISTINCT,
UNION) on eql_v2_encrypted columns were not supported. The planner
could only use nested loop or merge join strategies for encrypted
column equality, and the HASHES flag was incorrectly set on
cross-type operators where independent hashing of each side is not
possible.

Solution

  • Add eql_v2.hash_encrypted() function that computes a 32-bit
    integer hash from HMAC-256 or Blake3 index terms, with HMAC priority
  • Register a PostgreSQL hash operator class for eql_v2_encrypted
    with the same-type equality operator
  • Remove HASHES from cross-type (encrypted/jsonb) and non-equality
    operators (<>, ~~, ~~*) since hash joins require independent hashing
    of each side
  • Update Supabase build exclusion glob to also exclude
    hash_operator_class.sql

Summary by CodeRabbit

  • New Features

    • Enables hash-based operations on encrypted columns (hash joins, GROUP BY, DISTINCT, UNION) via a new dedicated hash operator class.
  • Bug Fixes / Reliability

    • Hashing now prefers Blake3 then HMAC for consistent encrypted-value hashes; clear errors when required hash terms are missing.
    • Adjusted operator declarations to align with the new hash class.
  • Tests

    • Added extensive tests covering hash joins, grouping, DISTINCT, UNION, nulls, cross-type comparisons, and error paths.
  • Chores

    • Broadened build tooling discovery for operator-class SQL files.

Add 20 tests for the hash operator class covering hash joins, GROUP BY,
DISTINCT, UNION, IN/NOT IN, STE vec unwrapping, NULL handling, cross-type
equality, forced hash join strategy, and error paths for missing index
terms and multi-element STE vectors.
Introduce eql_v2.hash_encrypted() which computes a 32-bit integer hash
from HMAC-256 or Blake3 index terms, enabling PostgreSQL hash-based
operations (hash joins, GROUP BY, DISTINCT, UNION) on encrypted columns.

Register the hash operator class with the same-type equality operator
so the planner can choose hash join strategy for encrypted = encrypted.
…perators

Remove the HASHES flag from cross-type (encrypted/jsonb) equality,
inequality, and LIKE operators. Hash joins require independent hashing
of each side, which is only possible for same-type operators registered
in the hash operator class. Cross-type operators retain MERGES for
merge join support.
Change the Supabase build exclusion pattern from **/operator_class.sql
to **/*operator_class.sql so it matches both the btree operator_class.sql
and the new hash_operator_class.sql. Include updated protect dependency
files.
Replace raw ORE-only test data with create_encrypted_json() which
includes hmac/blake3 index terms required for hash-based GROUP BY
aggregation now that the hash operator class is registered.
@tobyhede
Copy link
Contributor Author

@CodeRabbit full review

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Adds hash support for encrypted values: new function eql_v2.hash_encrypted, a hash operator family/class, removal of HASHES flags from several operators, a build script glob change, and extensive sqlx tests exercising hash joins and related behaviors.

Changes

Cohort / File(s) Summary
Hash Function Implementation
src/encrypted/hash.sql
Creates eql_v2.hash_encrypted(val eql_v2_encrypted) RETURNS integer that converts to STE Vec and returns Blake3 hashtext if present, else HMAC-256 hashtext, otherwise raises an exception.
Hash Operator Infrastructure
src/operators/hash_operator_class.sql
Adds eql_v2.encrypted_hash_operator_family (USING hash) and eql_v2.encrypted_hash_operator_class (DEFAULT FOR TYPE eql_v2_encrypted USING hash) with OPERATOR 1 = = and FUNCTION 1 = eql_v2.hash_encrypted(...).
Operator Flag Removals
src/operators/=.sql, src/operators/<>.sql, src/operators/~~.sql
Removes HASHES from multiple operator declarations (equality, inequality, pattern-match variants across eql_v2_encryptedeql_v2_encrypted, eql_v2_encryptedjsonb, and jsonbeql_v2_encrypted); retains MERGES and other clauses.
Build Script
tasks/build.sh
Changes Supabase exclusion pattern from **/operator_class.sql to **/*operator_class.sql, broadening matched filenames during dependency discovery.
Hash Operation Test Suite
tests/sqlx/tests/hash_operator_tests.rs
Adds extensive tests for hash joins, GROUP BY, DISTINCT, UNION, IN/NOT IN, self-joins, direct hash_encrypted calls, error cases when indexes absent, NULL and STE vector edge cases, and forced hash join execution.
Operator Class Test Update
tests/sqlx/tests/operator_class_tests.rs
Refactors group_by_encrypted_column test to use a sqlx fixture (../fixtures/encrypted_json) and create_encrypted_json inserts instead of manual ORE-encrypted copying.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant PG as PostgreSQL
    participant OpClass as "eql_v2.encrypted_hash_operator_class"
    participant HashFn as "eql_v2.hash_encrypted"

    Client->>PG: Submit query using hash operation (HASH JOIN / GROUP BY / DISTINCT)
    PG->>OpClass: Resolve hash operator class for encrypted type
    PG->>HashFn: Call eql_v2.hash_encrypted(encrypted_val)
    HashFn->>HashFn: Convert value to STE Vec
    alt Blake3 index exists
        HashFn->>HashFn: Return hashtext(blake3_value)
    else HMAC-256 index exists
        HashFn->>HashFn: Return hashtext(hmac_256_value)
    else Neither exists
        HashFn->>HashFn: Raise exception
    end
    HashFn->>PG: Return integer hash
    PG->>PG: Use hash for join/grouping/distinct
    PG->>Client: Return results
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibble code in moonlit rows,
Blake3 leads where starlight goes,
HMAC hums its steady tune,
Hash joins hop beneath the moon,
Encrypted fields now dance in rows.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding a hash operator class for encrypted columns to enable hash-based operations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cross-document-eq

Comment @coderabbitai help to get the list of available commands and usage tips.

@tobyhede tobyhede changed the title Cross document eq Add hash operator class for cross-document equality Feb 12, 2026
hash_encrypted previously preferred HMAC over Blake3, but compare() uses
the first index term present in BOTH operands (ORE > HMAC > Blake3).
When value A has hm+b3 and value B has only b3, compare uses Blake3
(equal), but hash_encrypted(A) used HMAC while hash_encrypted(B) used
Blake3 — producing different hashes for equal values.

Reversed priority to Blake3-first so any two values that compare equal
will always hash identically. Updated tests to reflect new priority.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/sqlx/tests/hash_operator_tests.rs`:
- Around line 586-600: The SET LOCAL statements and the join query must run on
the same connection inside an explicit transaction: begin a transaction (e.g.,
let mut tx = pool.begin().await?), run sqlx::query("SET LOCAL enable_nestloop =
off").execute(&mut tx).await?, sqlx::query("SET LOCAL enable_mergejoin =
off").execute(&mut tx).await?, then run the join with
sqlx::query_scalar(...).fetch_one(&mut tx).await?. Finally commit the
transaction (tx.commit().await?). Keep the existing error context ("forced hash
join failed") on the fetch_one call.
🧹 Nitpick comments (1)
tests/sqlx/tests/hash_operator_tests.rs (1)

183-213: Avoid probabilistic hash inequality assertions.

hash1 != hash3 is extremely likely but not guaranteed for 32‑bit hashes. Consider removing the probabilistic assertion to prevent rare flakes.

Proposed change
-    // Different encrypted values may produce different hashes (very likely but not guaranteed)
-    let hash3: i32 = sqlx::query_scalar("SELECT eql_v2.hash_encrypted(create_encrypted_json(2))")
-        .fetch_one(&pool)
-        .await
-        .context("hash_encrypted call 3 failed")?;
-
-    // While hash collisions are theoretically possible, these test values should differ
-    assert_ne!(
-        hash1, hash3,
-        "Different encrypted values should (almost certainly) produce different hashes"
-    );

Add 3 end-to-end tests exercising real SQL query paths with mixed-index
rows (hm+b3 vs b3-only for same logical value):

- mixed_index_hash_join: forced hash join finds match
- mixed_index_group_by_dedup: GROUP BY merges into one group
- mixed_index_union_dedup: UNION deduplicates to one row

These cover the execution paths that the P1 bug would break, beyond
the direct function-level test in hash_function_uses_blake3_first.
…ssertion

- Wrap SET LOCAL + join query in explicit transactions in
  forced_hash_join_via_planner_hints and mixed_index_hash_join so
  planner hints are guaranteed to be active for the query.
- Remove probabilistic hash inequality assertion in hash_function_directly
  that could flake on 32-bit hash collisions.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/sqlx/tests/hash_operator_tests.rs`:
- Around line 391-414: The test hmac_and_blake3_produce_different_hashes
currently uses assert_ne! on two i32 hashes returned by
eql_v2.hash_encrypted(create_encrypted_json(...)) which leaves a tiny flake risk
from 32‑bit collisions; fix by replacing the non-deterministic inequality check
with a deterministic check against fixed expected values (compute the expected
hash outputs for create_encrypted_json(1, 'hm') and create_encrypted_json(1,
'b3') once and use assert_eq! on the scalar results), or if a longer digest is
available, change the SQL to return the full digest/bytea and compare those
bytes instead; if neither is possible, add a clear comment in
hmac_and_blake3_produce_different_hashes acknowledging the tiny collision risk
and keep the assertion.

Comment on lines +391 to +414
#[sqlx::test(fixtures(path = "../fixtures", scripts("encrypted_json")))]
async fn hmac_and_blake3_produce_different_hashes(pool: PgPool) -> Result<()> {
// Test: HMAC and Blake3 code paths produce different hashes for same id
// Catches regression where the wrong branch is taken

let hash_hmac: i32 =
sqlx::query_scalar("SELECT eql_v2.hash_encrypted(create_encrypted_json(1, 'hm'))")
.fetch_one(&pool)
.await
.context("hash with hmac-only failed")?;

let hash_b3: i32 =
sqlx::query_scalar("SELECT eql_v2.hash_encrypted(create_encrypted_json(1, 'b3'))")
.fetch_one(&pool)
.await
.context("hash with blake3-only failed")?;

assert_ne!(
hash_hmac, hash_b3,
"HMAC and Blake3 should produce different hashes for same id"
);

Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rare flake risk from 32‑bit hash collision.
assert_ne! on two 32‑bit hashes can collide (very low probability). If you want to eliminate even tiny flake risk, consider asserting against a fixed test vector or adding a note acknowledging the collision risk.

🤖 Prompt for AI Agents
In `@tests/sqlx/tests/hash_operator_tests.rs` around lines 391 - 414, The test
hmac_and_blake3_produce_different_hashes currently uses assert_ne! on two i32
hashes returned by eql_v2.hash_encrypted(create_encrypted_json(...)) which
leaves a tiny flake risk from 32‑bit collisions; fix by replacing the
non-deterministic inequality check with a deterministic check against fixed
expected values (compute the expected hash outputs for create_encrypted_json(1,
'hm') and create_encrypted_json(1, 'b3') once and use assert_eq! on the scalar
results), or if a longer digest is available, change the SQL to return the full
digest/bytea and compare those bytes instead; if neither is possible, add a
clear comment in hmac_and_blake3_produce_different_hashes acknowledging the tiny
collision risk and keep the assertion.

@freshtonic
Copy link
Contributor

I see that the underlying hash is only 32 bits. That's quite limiting for a hash collision perspective (see below).

Upon hash collision do we fall back to performing equality on the hash index term (rather than the derived 32 bit hash)?

image

@freshtonic freshtonic self-requested a review February 12, 2026 23:32
Copy link
Contributor

@freshtonic freshtonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the 32 bit hash size due to weak collision resistance. ~80K items is all that is required for a 50% chance of a collision.

This might not be a problem if the operator class falls back to an equality check on the full index term when the smaller hash collides.

Copy link
Contributor

@freshtonic freshtonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

@tobyhede tobyhede merged commit 9b76ca7 into main Feb 15, 2026
5 checks passed
@tobyhede tobyhede deleted the cross-document-eq branch February 15, 2026 23:45
@freshtonic freshtonic self-requested a review February 15, 2026 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants