From baa6dedbaa182516359cdc44b95dcc66bb986802 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 5 Jul 2023 22:59:00 -0700 Subject: [PATCH 01/20] Diagram --- README.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 78 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7afd80c..d76565b 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,78 @@ -# pomo-cryptable -A hierarchically encrypted tuple store +# Cryptable v0.1.0 + +## Editors + +- [Quinn Wilton], [Fission Codes] +- [Brooklyn Zelenka], [Fission Codes] + +## Authors + +- [Quinn Wilton], [Fission Codes] +- [Brooklyn Zelenka], [Fission Codes] + +# Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]. + +# Abstract + +A hierarchically encrypted EVAC tuple store. + +# 1 Introduction + +WNFS + +# 2 Heirarchical Read Control + +``` mermaid +flowchart TD + classDef virtual stroke:#333,stroke-dasharray: 5 5; + + seed --> store:::virtual + + subgraph Virtual + store --> ent1:::virtual + ent1 -.-> ent2 + ent1 ----> attr1-1:::virtual + ent2 ----> attr2-1:::virtual + attr1-1 -.-> attr1-2:::virtual + end + + subgraph Keys ["Derived Keys"] + key1:::virtual + key2:::virtual + key3:::virtual + key4:::virtual + key5:::virtual + key6:::virtual + end + + attr1-1 --> key1{"πŸ”‘1.1.1"} --> val1 + attr1-2 --> key2{"πŸ”‘1.2.1"} --> val2 + key2 -.-> key3{"πŸ”‘1.2.2"} --> val3 + key3 -.-> key4{"πŸ”‘1.2.3"} --> val4 + + attr2-1 --> key5{"πŸ”‘2.1.1"} --> val5 + key5 -.-> key6{"πŸ”‘2.1.2"} --> val6 + + subgraph Concrete + val1("(ent1, attr1-1, val1, [])") + val2("(ent1, attr1-2, val2, [])") + val3("(ent1, attr1-2, val3, [])") + val4("(ent2, attr1-2, val4, [cidX, cidY])") + val5("(ent2, attr2-1, val5, [cidX])") + val6("(ent2, attr2-1, val6, [])") + end + + + classDef red fill:#fdc + classDef yellow fill:#fdfd96 + classDef green fill:lightgreen + class Virtual red + class Keys yellow + class Concrete green +``` + +# 3 Key Derivation + +A store is begun From 1ca1420e8811e205019110dd20dd1e632df8c475 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 5 Jul 2023 23:05:53 -0700 Subject: [PATCH 02/20] Fix diagram rendering --- README.md | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index d76565b..1097227 100644 --- a/README.md +++ b/README.md @@ -47,13 +47,13 @@ flowchart TD key6:::virtual end - attr1-1 --> key1{"πŸ”‘1.1.1"} --> val1 - attr1-2 --> key2{"πŸ”‘1.2.1"} --> val2 - key2 -.-> key3{"πŸ”‘1.2.2"} --> val3 - key3 -.-> key4{"πŸ”‘1.2.3"} --> val4 + attr1-1 --> key1{"πŸ”‘1.1.1"} --> val1 + attr1-2 --> key2{"πŸ”‘1.2.1"} --> val2 + key2 -.-> key3{"πŸ”‘1.2.2"} --> val3 + key3 -.-> key4{"πŸ”‘1.2.3"} --> val4 - attr2-1 --> key5{"πŸ”‘2.1.1"} --> val5 - key5 -.-> key6{"πŸ”‘2.1.2"} --> val6 + attr2-1 --> key5{"πŸ”‘2.1.1"} --> val5 + key5 -.-> key6{"πŸ”‘2.1.2"} --> val6 subgraph Concrete val1("(ent1, attr1-1, val1, [])") @@ -62,15 +62,6 @@ flowchart TD val4("(ent2, attr1-2, val4, [cidX, cidY])") val5("(ent2, attr2-1, val5, [cidX])") val6("(ent2, attr2-1, val6, [])") - end - - - classDef red fill:#fdc - classDef yellow fill:#fdfd96 - classDef green fill:lightgreen - class Virtual red - class Keys yellow - class Concrete green ``` # 3 Key Derivation From 6adae764a707c504576b192eed60ff2d12218fc6 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 5 Jul 2023 23:07:38 -0700 Subject: [PATCH 03/20] Add missig keyword --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1097227..26f8ce3 100644 --- a/README.md +++ b/README.md @@ -62,6 +62,7 @@ flowchart TD val4("(ent2, attr1-2, val4, [cidX, cidY])") val5("(ent2, attr2-1, val5, [cidX])") val6("(ent2, attr2-1, val6, [])") + end ``` # 3 Key Derivation From 5dedbdc3a077fcaa0a5d2422cf513e707cb29e7e Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 5 Jul 2023 23:41:02 -0700 Subject: [PATCH 04/20] Notes --- README.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 65 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 26f8ce3..f524f1e 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,23 @@ A hierarchically encrypted EVAC tuple store. WNFS -# 2 Heirarchical Read Control +- Scope: read control only; writes are out of scope of THIS spec + +## 1.x + +- Manage as few keys as possible +- Flexible enough to store in multiple ways (i.e. the data layer below this) +- Sensible defaults: Simple rules _inside_ a store (though stores can be broken up arbitrarily to more custom control) + +TODO bring the data/object table from WNFS here + +## 1.x Lookup Performance + +- Raw decryption speed +- Seek on large stores +- Small network footprint whenever possible + +# 2 Heirarchical Encryption ``` mermaid flowchart TD @@ -32,10 +48,19 @@ flowchart TD subgraph Virtual store --> ent1:::virtual - ent1 -.-> ent2 - ent1 ----> attr1-1:::virtual - ent2 ----> attr2-1:::virtual + ent1 -..-> ent2 + ent1 ~~~ attr1-1:::virtual + ent1 ----> attr1-1 + ent2 --> attr2-1:::virtual + ent2 ~~~~ attr1-1 attr1-1 -.-> attr1-2:::virtual + + val1-1-1:::virtual + val1-2-1:::virtual + val1-2-2:::virtual + val1-2-3:::virtual + val2-1-1:::virtual + val2-1-2:::virtual end subgraph Keys ["Derived Keys"] @@ -47,13 +72,13 @@ flowchart TD key6:::virtual end - attr1-1 --> key1{"πŸ”‘1.1.1"} --> val1 - attr1-2 --> key2{"πŸ”‘1.2.1"} --> val2 - key2 -.-> key3{"πŸ”‘1.2.2"} --> val3 - key3 -.-> key4{"πŸ”‘1.2.3"} --> val4 + attr1-1 --> val1-1-1 --> key1{"πŸ”‘1.1.1"} --> val1 + attr1-2 -----> val1-2-1 --> key2{"πŸ”‘1.2.1"} --> val2 + val1-2-1 -.-> val1-2-2 --> key3{"πŸ”‘1.2.2"} --> val3 + val1-2-2 -.-> val1-2-3 --> key4{"πŸ”‘1.2.3"} --> val4 - attr2-1 --> key5{"πŸ”‘2.1.1"} --> val5 - key5 -.-> key6{"πŸ”‘2.1.2"} --> val6 + attr2-1 -------> val2-1-1 --> key5{"πŸ”‘2.1.1"} --> val5 + val2-1-1 -.-> val2-1-2 --> key6{"πŸ”‘2.1.2"} --> val6 subgraph Concrete val1("(ent1, attr1-1, val1, [])") @@ -67,4 +92,33 @@ flowchart TD # 3 Key Derivation -A store is begun +Skip ratchet, but different use from WNFS + +A store MUST be seeded with a random nonce of at least 128 bits. + +A cryptstore MAY have an unlimited number of levels, but at minimum it MUST contain the following levels: + +- StoreRoot +- Entity +- Attribute +- Value + +Being granted access to a + +MUST be equipped with a one-way merge function that takes two or more keys and deterministically derives a new value. Concatenating and hashing with SHA2-256 or BLAKE3 is RECOMMENDED. + +## 3.1 Vertical Derivation + +Derivation of + +## 3.2 Horizonal Derivation + +To lock any level to a single version, merge the + +# 4 Semantic Collison + +A field of a particular value MAY be assigned multiple times. + +# 5 Prior Art + +- Skip Ratchet & WNFS From 87c4a7bf8f02142067588d94f8e15eb337e03e79 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Thu, 6 Jul 2023 00:21:07 -0700 Subject: [PATCH 05/20] Some notes for tomrorow --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index f524f1e..9230a12 100644 --- a/README.md +++ b/README.md @@ -90,6 +90,9 @@ flowchart TD end ``` +- Justify the order +- explain why no cross linking shenannigans + # 3 Key Derivation Skip ratchet, but different use from WNFS From 68d99e405908b0211c3332ae60d826ae97c77824 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Mon, 10 Jul 2023 11:27:12 -0700 Subject: [PATCH 06/20] WIP --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9230a12..59264fb 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ WNFS - Manage as few keys as possible - Flexible enough to store in multiple ways (i.e. the data layer below this) + - Sensible defaults: Simple rules _inside_ a store (though stores can be broken up arbitrarily to more custom control) TODO bring the data/object table from WNFS here From ef3b16406c090bfc3c0db569a139c9bc62f443f2 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Mon, 10 Jul 2023 22:51:25 -0700 Subject: [PATCH 07/20] Expand motivation section and start on explaini the heirarchy --- README.md | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 59264fb..cea3d71 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S # Abstract -A hierarchically encrypted EVAC tuple store. +A mechanism for hierarchically encrypting triple stores # 1 Introduction @@ -24,6 +24,50 @@ WNFS - Scope: read control only; writes are out of scope of THIS spec +## 1.1 Motivation + +Facts in a Datalog or RDF database + +While possible to design a system that can query by any field, all available top-down solutions trade off trust (hexastore, layered range trees) or performance (FHE, zkSNARKs). + +Cryptable assumes that data will most frequently be granted heirarchically. This is notably often not how data is _accessed_, but does make for a simple mental of what is being shared. It is assumed that in granting access to "everything" about an entity, that this implies all of its fields. For example, granting access to all records with a `name` attribute, or where the value field is set to `Nara` without access to the rest of the entity are less common. As such, favouring the common case for granting read access is reasonable. + +TODO Time: snapshot, ranges + +The design outlined in this specification MAY be extended by futher splitting data manually across multiple stores. + +Once data is retrieved and decrypted, it MAY be indexed locally (e.g. hexastore). + +# 2 Query Dimensions + +One of largest challenges with encypting datalog facts is that the access patterns are not known in advance. While it's possible to structure EAV(C) fields as an orthogonal $n$-dimensional tensor, and query in any order, this has major drawbacks. Representing data in this way tends to rely on duplication, indexing, and/or cyclical cross linking. This is not feasible in a Byzantine threat model. + +To make the problem tractable, access patterns are broken into two very broad categories which often interact: tabular heirarchy + +### 1.2.1 Tabular Heirarchy + +For the pruposes of this design, we treat the quad store as a triple store $\langle e, a, \langle v, c \rangle \rangle$. + +An intuitive access control layout is following the order: + +``` mermaid +erDiagram + User ||--|{ Store: owns + Store ||--|{ Entity: contains + Entity ||--|{ Attribute: contains + Attribute ||--|{ Value: contains +``` + +### 1.2.2 Temporal Access + +On tables + +### 1.2.3 DAG History + +History in systems like PomoDB are represented as an acyclic hash graph. While there are several techniques (k-anonymity, OT) that make it possible to search directly on history, they typically to require local secondary indices (or FHE). As tabular data access control MAY + +Reading a CID in the `causedBy` field of a quad MUST NOT immedietly grant access to the entire transative history. While this is an important access pattern in many graph queries, it is not desired for tabular queries. Typically the shape of graph data is more important in queries with a small number of common entities and attributes. Granting access to the entire history of those paths is thus viable. + ## 1.x - Manage as few keys as possible @@ -126,3 +170,11 @@ A field of a particular value MAY be assigned multiple times. # 5 Prior Art - Skip Ratchet & WNFS + +# 6 FAQ + +## 6.1 What About Fully Homomorphic Encryption? + +## 6.2 Why Not k-Anonymity? + +## 6.3 Why From fad02ad1d0b18a13bc67ed6020306fa7c6bb5cc1 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Mon, 10 Jul 2023 23:46:55 -0700 Subject: [PATCH 08/20] WIP Diagramming --- README.md | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 60 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cea3d71..f8bb51f 100644 --- a/README.md +++ b/README.md @@ -58,10 +58,69 @@ erDiagram Attribute ||--|{ Value: contains ``` -### 1.2.2 Temporal Access +### 1.2.2 Stream Access On tables +Needs work: + +``` mermaid +flowchart + user --> s1 + + s1 --> e2 + + subgraph S1 + s1 + + subgraph E1 + s1 --> e1 + + subgraph A1-1 + e1 --> a1-1 + a1-1 --> a1-1s[...] + end + + subgraph A1-2 + direction LR + + a1-1 -.-> a1-2 + a1-2 + a1-2 --> v1-2-1 + + subgraph vals + %% direction LR + + + v1-2-1 + v1-2-2 + v1-2-3 + + v1-2-1 -.-> v1-2-2 -.-> v1-2-3 + end + end + end + + v1-2-1 -->|u * s1 * e1 * a1-2 * v1-2-1| factA + v1-2-2 -->|u * s1 * e1 * a1-2 * v1-2-2| factB + v1-2-3 -->|u * s1 * e1 * a1-2 * v1-2-3| factC + + subgraph facts + factA + factC + factB + end + + subgraph E2 + e2 --> e2s[...] + end + end + + subgraph S2 + s1 -.-> s2 --> s2s[...] + end +``` + ### 1.2.3 DAG History History in systems like PomoDB are represented as an acyclic hash graph. While there are several techniques (k-anonymity, OT) that make it possible to search directly on history, they typically to require local secondary indices (or FHE). As tabular data access control MAY From e7c9cc32bc823986b9f529781431d37dc0fa12f6 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 20:57:21 -0700 Subject: [PATCH 09/20] Work out how to do CID history scanning --- README.md | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index f8bb51f..321df9e 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Cryptable v0.1.0 +# CrypTable v0.1.0 ## Editors @@ -16,7 +16,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S # Abstract -A mechanism for hierarchically encrypting triple stores +A mechanism for hierarchically encrypting triple stores, securing data temporally, and random access to data by relation. # 1 Introduction @@ -26,9 +26,9 @@ WNFS ## 1.1 Motivation -Facts in a Datalog or RDF database +Facts in a tuplestore are consistently -While possible to design a system that can query by any field, all available top-down solutions trade off trust (hexastore, layered range trees) or performance (FHE, zkSNARKs). +While possible to design a system that can query by any field, all available top-down solutions trade off trust (hexastore, layered range trees) or performance (FHE, SNARK indices). Cryptable assumes that data will most frequently be granted heirarchically. This is notably often not how data is _accessed_, but does make for a simple mental of what is being shared. It is assumed that in granting access to "everything" about an entity, that this implies all of its fields. For example, granting access to all records with a `name` attribute, or where the value field is set to `Nara` without access to the rest of the entity are less common. As such, favouring the common case for granting read access is reasonable. @@ -40,7 +40,7 @@ Once data is retrieved and decrypted, it MAY be indexed locally (e.g. hexastore) # 2 Query Dimensions -One of largest challenges with encypting datalog facts is that the access patterns are not known in advance. While it's possible to structure EAV(C) fields as an orthogonal $n$-dimensional tensor, and query in any order, this has major drawbacks. Representing data in this way tends to rely on duplication, indexing, and/or cyclical cross linking. This is not feasible in a Byzantine threat model. +One of largest challenges with encypting datalog facts is that the access patterns are not known in advance. While it's possible to structure EAV(C) fields as an orthogonal $n$-dimensional tensor, and query in any order, this has major drawbacks in both the cleartext and ciphertext cases. Representing data in this way tends to rely on duplication, indexing, and/or cyclical cross linking. This is not feasible in a Byzantine threat model. To make the problem tractable, access patterns are broken into two very broad categories which often interact: tabular heirarchy @@ -48,16 +48,17 @@ To make the problem tractable, access patterns are broken into two very broad ca For the pruposes of this design, we treat the quad store as a triple store $\langle e, a, \langle v, c \rangle \rangle$. -An intuitive access control layout is following the order: +In reality, each of these relationships is completely orthogonal, but for the pruposes of access control, we simplify the base case to a linear relationship: ``` mermaid erDiagram - User ||--|{ Store: owns + Root ||--|{ Store: owns Store ||--|{ Entity: contains Entity ||--|{ Attribute: contains Attribute ||--|{ Value: contains ``` + ### 1.2.2 Stream Access On tables @@ -123,9 +124,20 @@ flowchart ### 1.2.3 DAG History -History in systems like PomoDB are represented as an acyclic hash graph. While there are several techniques (k-anonymity, OT) that make it possible to search directly on history, they typically to require local secondary indices (or FHE). As tabular data access control MAY +History in systems like PomoDB are represented as an acyclic hash graph. + +The most general solution requires local secondary indices (or FHE). Given that these do not match our performance or + +While there are several techniques (k-anonymity, OT) that make it possible to search directly on history, + + +Reading a CID in the `causedBy` field of a quad MUST NOT immedietly grant access to the entire transative history. Doing so would be potentially dangerous, especially if it crossed between stores. The semantics of the `causedBy` relation do not match that of access control. + +While this is an important access pattern in many graph queries, it is not desired for tabular queries. Typically the shape of graph data is more important in queries with a small number of common entities and attributes. Granting access to the entire history of those paths is thus viable. -Reading a CID in the `causedBy` field of a quad MUST NOT immedietly grant access to the entire transative history. While this is an important access pattern in many graph queries, it is not desired for tabular queries. Typically the shape of graph data is more important in queries with a small number of common entities and attributes. Granting access to the entire history of those paths is thus viable. +Instead, the CrypTable ony provides searchable encyprtion via an authentication tag. This is not an HMAC, since the goal is to avoid calculating the unique key for every fact. A cryptographically secure hash function MUST be used, and a nonce based on the [store ID] MUST be concatenated to the cleartext CID before hahsing. This tag MAY be places anywhere on the fact's envelope. When stored in an associative map using this tag as the entry label is RECOMMENDED. + +If space overhead is a concern, this tag MAY be further anonymized via truncation or XOR folding. Note that this does not increase the k-anonymity as the tag is already indistinguishable from any other tag. ## 1.x @@ -137,7 +149,7 @@ Reading a CID in the `causedBy` field of a quad MUST NOT immedietly grant access TODO bring the data/object table from WNFS here ## 1.x Lookup Performance - + * [x] - Raw decryption speed - Seek on large stores - Small network footprint whenever possible @@ -236,4 +248,24 @@ A field of a particular value MAY be assigned multiple times. ## 6.2 Why Not k-Anonymity? + + +TODO: actually, heck why NOT k-anym for CID histories as tags or the names of files. + + + ## 6.3 Why + + + + + + +---------------------- + + + +NOTES + + +- Nested, but you can always contruct the pointer to any position in the EAV cube deterministically without doing all of teh intermediate lookups From 9f67e0c25b089488f231061e0c6aa57b50cb9da1 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:24:49 -0700 Subject: [PATCH 10/20] Continuing to flesh out --- README.md | 90 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 51 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 321df9e..b4398d5 100644 --- a/README.md +++ b/README.md @@ -38,12 +38,36 @@ The design outlined in this specification MAY be extended by futher splitting da Once data is retrieved and decrypted, it MAY be indexed locally (e.g. hexastore). -# 2 Query Dimensions +## 1.2 Goals + + +- Manage as few keys as possible +- Flexible enough to store in multiple ways (i.e. the data layer below this) + +- Sensible defaults: Simple rules _inside_ a store (though stores can be broken up arbitrarily to more custom control) + +TODO bring the data/object table from WNFS here + + * [x] +- Raw decryption speed +- Seek on large stores +- Small network footprint whenever possible + +# 2 Terminology + +| Term | Description | +|------------------|-------------------------------------------------------------| +| Encrypted Region | Nestable encrypted collecions: stores, entities, attributes | +| Accessible Scope | All of the facts that the viewer is capable of decrypting | + +# 3 Query Dimensions One of largest challenges with encypting datalog facts is that the access patterns are not known in advance. While it's possible to structure EAV(C) fields as an orthogonal $n$-dimensional tensor, and query in any order, this has major drawbacks in both the cleartext and ciphertext cases. Representing data in this way tends to rely on duplication, indexing, and/or cyclical cross linking. This is not feasible in a Byzantine threat model. To make the problem tractable, access patterns are broken into two very broad categories which often interact: tabular heirarchy +- Nested, but you can always contruct the pointer to any position in the EAV cube deterministically without doing all of teh intermediate lookups + ### 1.2.1 Tabular Heirarchy For the pruposes of this design, we treat the quad store as a triple store $\langle e, a, \langle v, c \rangle \rangle$. @@ -58,6 +82,7 @@ erDiagram Attribute ||--|{ Value: contains ``` +This grants the ability to discover and access new enrties in the heirarchy without having to perform a linear table scan. ### 1.2.2 Stream Access @@ -135,27 +160,30 @@ Reading a CID in the `causedBy` field of a quad MUST NOT immedietly grant access While this is an important access pattern in many graph queries, it is not desired for tabular queries. Typically the shape of graph data is more important in queries with a small number of common entities and attributes. Granting access to the entire history of those paths is thus viable. -Instead, the CrypTable ony provides searchable encyprtion via an authentication tag. This is not an HMAC, since the goal is to avoid calculating the unique key for every fact. A cryptographically secure hash function MUST be used, and a nonce based on the [store ID] MUST be concatenated to the cleartext CID before hahsing. This tag MAY be places anywhere on the fact's envelope. When stored in an associative map using this tag as the entry label is RECOMMENDED. - +Instead, the CrypTable ony provides searchable encyprtion via an authentication tag. This is not an HMAC, since the goal is to avoid calculating the unique key for every fact. A cryptographically secure hash function MUST be used, and a nonce based on the combination of the [scoped attribute's hash ID] MUST be concatenated to the cleartext CID before hahsing. This tag MAY be places anywhere on the fact's envelope. When stored in an associative map using this tag as the entry label is RECOMMENDED. + If space overhead is a concern, this tag MAY be further anonymized via truncation or XOR folding. Note that this does not increase the k-anonymity as the tag is already indistinguishable from any other tag. -## 1.x +This does leak a small amount of data: if a user does not have the decryption key for some fact, but has a reference to the CID in a fact that they do have access to a successor of (and thus they have the attrbte hash ID), they can discover that the fact's entity and attribute, but not it's value or causal values. -- Manage as few keys as possible -- Flexible enough to store in multiple ways (i.e. the data layer below this) +Coupled with the with the attribite tag derivation (e.g. the RSA accumluator in WNFS), scans across the store MAY be performed either strictly linearly with minimal jumps between encrypted regions. -- Sensible defaults: Simple rules _inside_ a store (though stores can be broken up arbitrarily to more custom control) +## Padding -TODO bring the data/object table from WNFS here - -## 1.x Lookup Performance - * [x] -- Raw decryption speed -- Seek on large stores -- Small network footprint whenever possible +Leaking data through length # 2 Heirarchical Encryption +The tabular heirarchy is as follows: + +1. Root +2. Stores +3. Entities +4. Attributes +5. Values + +Note that `causedBy` does not occur in this heirarchy. To have access to a point in history, the usre MUST have access to the relevant fact in the tabular heirarchy. + ``` mermaid flowchart TD classDef virtual stroke:#333,stroke-dasharray: 5 5; @@ -206,9 +234,6 @@ flowchart TD end ``` -- Justify the order -- explain why no cross linking shenannigans - # 3 Key Derivation Skip ratchet, but different use from WNFS @@ -242,30 +267,17 @@ A field of a particular value MAY be assigned multiple times. - Skip Ratchet & WNFS -# 6 FAQ - -## 6.1 What About Fully Homomorphic Encryption? - -## 6.2 Why Not k-Anonymity? - - +# 6 Acknowledgements -TODO: actually, heck why NOT k-anym for CID histories as tags or the names of files. +Thanks to [Philipp KrΓΌger][matheus23] for his work on [WNFS] +Many thanks to [Steven Allen] for conversations about WNFS that applied to the + -## 6.3 Why +[HMAC Indexing]: https://soatok.blog/2023/03/01/database-cryptography-fur-the-rest-of-us/#hmac-indexing +[WNFS]: https://github.com/wnfs-wg/ +[matheus23]: https://github.com/matheus23 +[stebalien]: https://github.com/stebalien - - - - - ----------------------- - - - -NOTES - - -- Nested, but you can always contruct the pointer to any position in the EAV cube deterministically without doing all of teh intermediate lookups + From 034739925b64f9e225a6e2fc918c365c1b0c92fa Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:35:16 -0700 Subject: [PATCH 11/20] Ner reference --- README.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index b4398d5..c0b31e8 100644 --- a/README.md +++ b/README.md @@ -60,7 +60,9 @@ TODO bring the data/object table from WNFS here | Encrypted Region | Nestable encrypted collecions: stores, entities, attributes | | Accessible Scope | All of the facts that the viewer is capable of decrypting | -# 3 Query Dimensions +# 3 Encryption Geometry[^Unknowable Geometry] + +[^Unknowable Geometry]: https://www.youtube.com/watch?v=hEVeBDhWlRw One of largest challenges with encypting datalog facts is that the access patterns are not known in advance. While it's possible to structure EAV(C) fields as an orthogonal $n$-dimensional tensor, and query in any order, this has major drawbacks in both the cleartext and ciphertext cases. Representing data in this way tends to rely on duplication, indexing, and/or cyclical cross linking. This is not feasible in a Byzantine threat model. @@ -172,6 +174,10 @@ Coupled with the with the attribite tag derivation (e.g. the RSA accumluator in Leaking data through length +## Semantic Collison + +A field of a particular value MAY be assigned multiple times. + # 2 Heirarchical Encryption The tabular heirarchy is as follows: @@ -225,12 +231,12 @@ flowchart TD val2-1-1 -.-> val2-1-2 --> key6{"πŸ”‘2.1.2"} --> val6 subgraph Concrete - val1("(ent1, attr1-1, val1, [])") - val2("(ent1, attr1-2, val2, [])") - val3("(ent1, attr1-2, val3, [])") - val4("(ent2, attr1-2, val4, [cidX, cidY])") - val5("(ent2, attr2-1, val5, [cidX])") - val6("(ent2, attr2-1, val6, [])") + val1("(ent1, attr1-1, val1-1-1, [])") --> tag1(tag: a1b) + val2("(ent1, attr1-2, val1-2-1, [])") --> tag2(tag: 2c3) + val3("(ent1, attr1-2, val1-2-2, [])") --> tag3(tag: d4e) + val4("(ent1, attr1-2, val1-2-3, [cidX, cidY])") --> tag4(tag: 5f6) + val5("(ent2, attr2-1, val2-1-2, [cidX])") --> tag5(tag: g7h) + val6("(ent2, attr2-1, val2-1-3, [])") --> tag6(tag: 8i9) end ``` @@ -259,9 +265,8 @@ Derivation of To lock any level to a single version, merge the -# 4 Semantic Collison +# 4 Tag Derivation -A field of a particular value MAY be assigned multiple times. # 5 Prior Art From 2af54a312639f64033cec42cec013a5f952c5243 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:36:26 -0700 Subject: [PATCH 12/20] How do GitHub Flavored Markdown? --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c0b31e8..88a929b 100644 --- a/README.md +++ b/README.md @@ -60,7 +60,9 @@ TODO bring the data/object table from WNFS here | Encrypted Region | Nestable encrypted collecions: stores, entities, attributes | | Accessible Scope | All of the facts that the viewer is capable of decrypting | -# 3 Encryption Geometry[^Unknowable Geometry] +# 3 Encryption Geometry + +Footnote test[^Unknowable Geometry] [^Unknowable Geometry]: https://www.youtube.com/watch?v=hEVeBDhWlRw From e0f5c2ff41217c8bc0263e731b7335b57a911bcf Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:37:31 -0700 Subject: [PATCH 13/20] now? --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 88a929b..000cfb0 100644 --- a/README.md +++ b/README.md @@ -62,9 +62,9 @@ TODO bring the data/object table from WNFS here # 3 Encryption Geometry -Footnote test[^Unknowable Geometry] +Footnote test[^UnknowableGeometry] -[^Unknowable Geometry]: https://www.youtube.com/watch?v=hEVeBDhWlRw +[^UnknowableGeometry]: https://www.youtube.com/watch?v=hEVeBDhWlRw One of largest challenges with encypting datalog facts is that the access patterns are not known in advance. While it's possible to structure EAV(C) fields as an orthogonal $n$-dimensional tensor, and query in any order, this has major drawbacks in both the cleartext and ciphertext cases. Representing data in this way tends to rely on duplication, indexing, and/or cyclical cross linking. This is not feasible in a Byzantine threat model. From af9f8d01211ada0aaf3032547af63ba943f62689 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:37:50 -0700 Subject: [PATCH 14/20] Annoyed --- README.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/README.md b/README.md index 000cfb0..489b9c7 100644 --- a/README.md +++ b/README.md @@ -60,9 +60,7 @@ TODO bring the data/object table from WNFS here | Encrypted Region | Nestable encrypted collecions: stores, entities, attributes | | Accessible Scope | All of the facts that the viewer is capable of decrypting | -# 3 Encryption Geometry - -Footnote test[^UnknowableGeometry] +# 3 Encryption Geometry[^UnknowableGeometry] [^UnknowableGeometry]: https://www.youtube.com/watch?v=hEVeBDhWlRw From d2ba8f9084cd3c27ec628dc9f663ed38420b6647 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:46:27 -0700 Subject: [PATCH 15/20] So many levels --- README.md | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 78 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 489b9c7..2ef47d1 100644 --- a/README.md +++ b/README.md @@ -190,6 +190,76 @@ The tabular heirarchy is as follows: Note that `causedBy` does not occur in this heirarchy. To have access to a point in history, the usre MUST have access to the relevant fact in the tabular heirarchy. +``` mermaid +flowchart + subgraph root + subgraph store1 + direction LR + + subgraph entity1.1 + direction LR + + subgraph attribute1.1.1 + value1.1.1.1 + value1.1.1.2 + value1.1.1.3 + end + + subgraph attribute1.1.2 + value1.1.2.1 + value1.1.2.2 + end + end + + subgraph entity1.2 + direction LR + + subgraph attribute1.2.1 + value1.2.1.1 + value1.2.1.2 + value1.2.1.3 + end + + subgraph attribute1.2.2 + value1.2.2.1 + value1.2.2.2 + end + end + end + + subgraph store2 + direction LR + + subgraph entity2.1 + direction LR + + subgraph attribute2.1.1 + value2.1.1.1 + end + + subgraph attribute2.1.2 + value2.1.2.1 + value2.1.2.2 + end + end + end + end + + store1 -.-> store2 + entity1.1 -.->entity1.2 + attribute1.1.1 -.-> attribute1.1.2 + + value1.1.1.1 -.-> value1.1.1.2 -.-> value1.1.1.3 + value1.1.2.1 -.-> value1.1.2.2 + + attribute1.2.1 -.-> attribute1.2.2 + value1.2.1.1 -.-> value1.2.1.2 -.-> value1.2.1.3 + value1.2.2.1 -.-> value1.2.2.2 + + attribute2.1.1 -.-> attribute2.1.2 + value2.1.2.1 -.-> value2.1.2.2 +``` + ``` mermaid flowchart TD classDef virtual stroke:#333,stroke-dasharray: 5 5; @@ -268,11 +338,17 @@ To lock any level to a single version, merge the # 4 Tag Derivation -# 5 Prior Art +# 5 Key Rotation + +also Post COmpromise + + + +# 6 Prior Art - Skip Ratchet & WNFS -# 6 Acknowledgements +# 7 Acknowledgements Thanks to [Philipp KrΓΌger][matheus23] for his work on [WNFS] From fa09d582774945d68c3dd2e5054a1ee0cb5d9b5d Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:48:12 -0700 Subject: [PATCH 16/20] Better layout --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2ef47d1..4c77603 100644 --- a/README.md +++ b/README.md @@ -217,12 +217,12 @@ flowchart subgraph attribute1.2.1 value1.2.1.1 value1.2.1.2 - value1.2.1.3 end subgraph attribute1.2.2 value1.2.2.1 value1.2.2.2 + value1.2.2.3 end end end @@ -253,8 +253,8 @@ flowchart value1.1.2.1 -.-> value1.1.2.2 attribute1.2.1 -.-> attribute1.2.2 - value1.2.1.1 -.-> value1.2.1.2 -.-> value1.2.1.3 - value1.2.2.1 -.-> value1.2.2.2 + value1.2.1.1 -.-> value1.2.1.2 + value1.2.2.1 -.-> value1.2.2.2 -.-> value1.2.2.3 attribute2.1.1 -.-> attribute2.1.2 value2.1.2.1 -.-> value2.1.2.2 From ccd03c80925c6aba193ba3f3eb005c78ffe82872 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:49:02 -0700 Subject: [PATCH 17/20] move diagram --- README.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 4c77603..c65d60a 100644 --- a/README.md +++ b/README.md @@ -260,6 +260,23 @@ flowchart value2.1.2.1 -.-> value2.1.2.2 ``` +# 3 Key Derivation + +Skip ratchet, but different use from WNFS + +A store MUST be seeded with a random nonce of at least 128 bits. + +A cryptstore MAY have an unlimited number of levels, but at minimum it MUST contain the following levels: + +- StoreRoot +- Entity +- Attribute +- Value + +Being granted access to a + +MUST be equipped with a one-way merge function that takes two or more keys and deterministically derives a new value. Concatenating and hashing with SHA2-256 or BLAKE3 is RECOMMENDED. + ``` mermaid flowchart TD classDef virtual stroke:#333,stroke-dasharray: 5 5; @@ -310,23 +327,6 @@ flowchart TD end ``` -# 3 Key Derivation - -Skip ratchet, but different use from WNFS - -A store MUST be seeded with a random nonce of at least 128 bits. - -A cryptstore MAY have an unlimited number of levels, but at minimum it MUST contain the following levels: - -- StoreRoot -- Entity -- Attribute -- Value - -Being granted access to a - -MUST be equipped with a one-way merge function that takes two or more keys and deterministically derives a new value. Concatenating and hashing with SHA2-256 or BLAKE3 is RECOMMENDED. - ## 3.1 Vertical Derivation Derivation of From 087eea805b1b855aa1d5cfb62e2c009c4c4027ef Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 21:57:47 -0700 Subject: [PATCH 18/20] Note to self --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index c65d60a..8c151c8 100644 --- a/README.md +++ b/README.md @@ -337,6 +337,8 @@ To lock any level to a single version, merge the # 4 Tag Derivation +AEAD + # 5 Key Rotation From d5b54271ef6df4ce27c52f817029ad69293d9dd7 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 12 Jul 2023 22:29:33 -0700 Subject: [PATCH 19/20] Okay just to fleshout now --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8c151c8..64cfeeb 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ A mechanism for hierarchically encrypting triple stores, securing data temporall # 1 Introduction -WNFS +- Unlike WNFS's temporal cryptree, a cryptable has a fixed nesting depth - Scope: read control only; writes are out of scope of THIS spec @@ -30,7 +30,7 @@ Facts in a tuplestore are consistently While possible to design a system that can query by any field, all available top-down solutions trade off trust (hexastore, layered range trees) or performance (FHE, SNARK indices). -Cryptable assumes that data will most frequently be granted heirarchically. This is notably often not how data is _accessed_, but does make for a simple mental of what is being shared. It is assumed that in granting access to "everything" about an entity, that this implies all of its fields. For example, granting access to all records with a `name` attribute, or where the value field is set to `Nara` without access to the rest of the entity are less common. As such, favouring the common case for granting read access is reasonable. +Cryptable assumes that data will most frequently be granted heirarchically. This is notably often not how data is _accessed_, but does make for a simple mental of what is being shared. It is assumed that in granting access to "everything" about an entity, that this implies all of its fields. For example, granting access to all records with a `name` attribute, or where the value field is set to `Boris` without access to the rest of the entity are less common. As such, favouring the common case for granting read access is reasonable. TODO Time: snapshot, ranges From 822bbd6e31bc8226be8037f62a4f4c400ddc3433 Mon Sep 17 00:00:00 2001 From: Brooklyn Zelenka Date: Wed, 9 Aug 2023 23:17:18 -0700 Subject: [PATCH 20/20] Expand intro prose --- README.md | 48 ++++++++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 64cfeeb..80e6819 100644 --- a/README.md +++ b/README.md @@ -16,42 +16,45 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S # Abstract -A mechanism for hierarchically encrypting triple stores, securing data temporally, and random access to data by relation. +CrypTable is a hierarchically encrypted triple store. It secures data by entity-attribute-value as well as temporally, and comes with efficient random access and diffing. # 1 Introduction -- Unlike WNFS's temporal cryptree, a cryptable has a fixed nesting depth +## 1.1 Motivation -- Scope: read control only; writes are out of scope of THIS spec +While possible to design a system that can query by any field, all available top-down solutions trade off trust (hexastore, layered range trees) or performance (FHE, SNARK indices). Working with an encrypted database comes with tradeoffs. The naive approach is to encrypt the entire database. While this may be sufficient for simple cases, it forces developers to use separate databases to implement access contol for different users. At the other extreme are [homomorphically encrypted databases], where arbitrary queries may be performed on a fully encryted databases with arbitrary controls, but at the sacrifice of speed and dynamism. -## 1.1 Motivation +In a [local-first] context, records need to be able to move freely between machines. As such, access control needs to travel with the data itself. This implies that the result of a remote query must include the access control in the resulting records. -Facts in a tuplestore are consistently +## 1.2 Design Goals -While possible to design a system that can query by any field, all available top-down solutions trade off trust (hexastore, layered range trees) or performance (FHE, SNARK indices). +- Users manually manage as few keys as possible +- Agnostic to persistence layer +- Use established cryptography +- Efficient seek in large stores +- Minimize network footprint +- Row-level access control +- Range-level access control -Cryptable assumes that data will most frequently be granted heirarchically. This is notably often not how data is _accessed_, but does make for a simple mental of what is being shared. It is assumed that in granting access to "everything" about an entity, that this implies all of its fields. For example, granting access to all records with a `name` attribute, or where the value field is set to `Boris` without access to the rest of the entity are less common. As such, favouring the common case for granting read access is reasonable. +## 1.3 Approach -TODO Time: snapshot, ranges +[Triple stores] represent all data ("facts") into entity-attribute-value (EAV) triples. Taken together, these triples may be treated as tables or graphs. This consistent structure is advantageous for granular access control. It is trivial to encrypt a single triple. Extending read-control scope by the entity-attribute-value heirarchy, access to groups of related triples may be granted together at once. -The design outlined in this specification MAY be extended by futher splitting data manually across multiple stores. +[Cryptree]s are a well known way of organizing heirarchical encryption. The [Webnative File System][WNFS] (WNFS) is a cryptree, extended with temporal access control. Once data is retrieved and decrypted, it MAY be indexed locally (e.g. hexastore). -## 1.2 Goals +Cryptable assumes that data will most frequently be granted heirarchically. This is notably often not how data is _accessed_, but does make for a simple mental of what is being shared. It is assumed that in granting access to "everything" about an entity, that this implies all of its fields. For example, granting access to all records with a `name` attribute, or where the value field is set to `Boris` without access to the rest of the entity are less common. As such, favouring the common case for granting read access is reasonable. + +CryptTable is a -- Manage as few keys as possible -- Flexible enough to store in multiple ways (i.e. the data layer below this) +- Unlike WNFS's temporal cryptree, a cryptable has a fixed nesting depth -- Sensible defaults: Simple rules _inside_ a store (though stores can be broken up arbitrarily to more custom control) +- Scope: read control only; writes are out of scope of THIS spec -TODO bring the data/object table from WNFS here - * [x] -- Raw decryption speed -- Seek on large stores -- Small network footprint whenever possible +The design outlined in this specification MAY be extended by futher splitting data manually across multiple stores. # 2 Terminology @@ -60,6 +63,15 @@ TODO bring the data/object table from WNFS here | Encrypted Region | Nestable encrypted collecions: stores, entities, attributes | | Accessible Scope | All of the facts that the viewer is capable of decrypting | +![](https://github.com/wnfs-wg/spec/raw/main/spec/diagrams/layer_dimensions.svg) + +| Visibility | Layer | Node | Link | +|------------|-------|-------------|-------------------------| +| Decrypted | File | WNFS File | File Path | +| Decrypted | Data | CBOR Object | `NameAccumulator` + Key | +| Encrypted | Data | IPLD Block | CID | + + # 3 Encryption Geometry[^UnknowableGeometry] [^UnknowableGeometry]: https://www.youtube.com/watch?v=hEVeBDhWlRw