Skip to content
This repository was archived by the owner on Jun 6, 2025. It is now read-only.

Commit b815a56

Browse files
tb06904cn337131wb36499
authored
Gh-543: Improve Cypher docs (#544)
* add new cypher pages and tidy existing gremlin ones * update existing gremlin docs * make defaults clearer * Update docs/user-guide/apis/opencypher.md Co-authored-by: cn337131 <[email protected]> --------- Co-authored-by: cn337131 <[email protected]> Co-authored-by: wb36499 <[email protected]>
1 parent 5fc5b01 commit b815a56

File tree

8 files changed

+246
-38
lines changed

8 files changed

+246
-38
lines changed

docs/administration-guide/gaffer-deployment/gremlin.md

Lines changed: 30 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ outline how to obtain this reference using the REST API.
3535

3636
As mentioned previously the recommended way to use Gremlin queries is via the
3737
Websocket in the Gaffer REST API. To do this you will need to provide a config
38-
file that sets up the Gaffer Tinkerpop library (a.k.a 'GafferPop'). The file can
38+
file that sets up the Gaffer TinkerPop library (a.k.a 'GafferPop'). The file can
3939
either be added to `/gaffer/gafferpop.properties` in the container, or at a
4040
custom path by setting the `gaffer.gafferpop.properties` key in the
4141
`store.properties` file. This file can be blank but it is still recommended to
@@ -63,29 +63,41 @@ The `gafferpop.properties`, file is the configuration for GafferPop. If using
6363
the REST API there is no mandatory properties you need to set since you already
6464
will have configured the Graph in the existing `store.properties` file. However,
6565
adding some default values in for operation modifiers, such as a limit for
66-
`GetAllElement` operations, is good practice.
66+
`GetElements` and `GetAllElements` operations, is good practice.
6767

6868
```properties
69-
# Default operation config
70-
gaffer.elements.getalllimit=5000
69+
# Some default GafferPop configuration
70+
gaffer.elements.getlimit=20000
7171
gaffer.elements.hasstepfilterstage=PRE_AGGREGATION
7272
```
7373

7474
A full breakdown of the available properties is as follows:
7575

76-
!!! note
77-
Many of these are for standalone GafferPop Graphs so may be ignored if using
78-
the REST API.
76+
!!! tip
77+
Most of these can be overridden on a per query basis see the [Gremlin options page](../../reference/gremlin-guide/gaffer-options.md)
78+
for details.
7979

80-
| Property Key | Description | Used in REST API |
80+
| Property Key | Default | Description |
8181
| --- | --- | --- |
82-
| `gremlin.graph` | The Tinkerpop graph class we should use for construction. | No |
83-
| `gaffer.graphId` | The graph ID of the Tinkerpop graph. | No |
84-
| `gaffer.storeproperties` | The path to the store properties file. | No |
85-
| `gaffer.schemas` | The path to the directory containing the graph schema files. | No |
86-
| `gaffer.userId` | The default user ID for the Tinkerpop graph. | No (User is always set via the [`UserFactory`](../security/user-control.md).) |
87-
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed | No |
88-
| `gaffer.rest.timeout` | The timeout for gremlin queries submitted to the REST API in ms. Default is 2 mins if not specified. | Yes |
89-
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/custom-features.md)) | Yes |
90-
| `gaffer.elements.getalllimit` | The default limit for unseeded queries e.g. `g.V()`. | Yes |
91-
| `gaffer.elements.hasstepfilterstage` | The default stage to apply any `has()` steps e.g. `PRE_AGGREGATION` | Yes |
82+
| `gaffer.rest.timeout` | `120000` | The timeout for gremlin queries submitted to the REST API in ms. |
83+
| `gaffer.operation.options` | None | Default `Operation` options in the form `key:value` |
84+
| `gaffer.elements.getlimit` | `20000` | The default limit applied to get element operations called by TinkerPop e.g. `GetElements` or `GetAllElements`. |
85+
| `gaffer.elements.hasstepfilterstage` | `PRE_AGGREGATION` | The default stage to apply any `has()` steps e.g. `PRE_AGGREGATION` |
86+
| `gaffer.includeOrphanedVertices` | `false` | Should orphaned vertices be returned by default in a result, these are vertices on an edge that have no associated entity in the Gaffer graph. Queries will likely be slower if enabled. |
87+
88+
You can also create a standalone GafferPop Graph outside of the REST API. If
89+
doing so there are some additional properties available. These would usually be
90+
configured in the store properties or graph config.
91+
92+
!!! note
93+
It's recommended to use the REST API where possible in which case these
94+
can be ignored.
95+
96+
| Property Key | Description |
97+
| --- | --- |
98+
| `gremlin.graph` | The TinkerPop graph class we should use for construction. |
99+
| `gaffer.graphId` | The graph ID of the Tinkerpop graph. |
100+
| `gaffer.storeproperties` | The path to the store properties file. |
101+
| `gaffer.schemas` | The path to the directory containing the graph schema files. |
102+
| `gaffer.userId` | The default user ID for the TinkerPop graph (user is always set via the [`UserFactory`](../security/user-control.md) in the REST API.) |
103+
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed |

docs/reference/glossary.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,14 @@ hide:
1313
| Node | A node is what Gaffer calls an entity |
1414
| Properties | A property is a key/value pair that stores data on both edges and entities |
1515
| Element | The word is used to describe edges or entities |
16-
| Stores | A Gaffer store represents the backing database responsbile for storing or facilitating access to a graph |
16+
| Stores | A Gaffer store represents the backing database responsible for storing or facilitating access to a graph |
1717
| Operations | An operation is an instruction / function that you send to the API to manipulate and query a graph |
18+
| View | Used in Gaffer like a filter it lets you view the data differently in a query, often used to filter the data you get back from a given operation |
1819
| Matched vertex | `matchedVertex` is a field added to Edges which are returned by Gaffer queries, stating whether your seeds matched the source or destination |
1920
| Python | A programming language that is used to build applications. Gaffer uses Python to interact with the API |
2021
| Java | A object oriented programming language used to build software. Gaffer is primarily built in Java |
2122
| Database | A database is a collection of organised structured information or data typically stored in a computer system |
2223
| API | Application Programming Interface. An API is for one or more services / systems to communicate with each other |
2324
| JSON | JavaScript Object Notation is a text based format for representing structure data based on JavaScript object syntax |
25+
| GafferPop | The library used to translate Gremlin queries to Gaffer operations using the TinkerPop framework |
26+
| Orphaned Vertices | Vertices on an edge without any associated entity in the Graph |

docs/reference/gremlin-guide/gaffer-options.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,14 @@ Note that any options should be passed as a list or dictionary.
3333
g.with_("operationOptions", {"gaffer.federatedstore.operation.graphIds": "graphA"}).V().to_list()
3434
```
3535

36-
## GetElements Limit
36+
## Get Elements Limit
3737

3838
Key `getElementsLimit`
3939

40-
Limits the amount of elements returned if performing a query which returns a large amount of elements e.g. a
41-
`GetAllElements` operation. This will override the default for the current
42-
query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
40+
Limits the amount of elements that can be returned for each `GetElements` or
41+
`GetAllElements` query ran by TinkerPop. This applies a Gaffer `Limit`
42+
operation in the translated operation chain. This will override the default for
43+
the current query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
4344
for more detail on setting up defaults.
4445

4546
!!! example
@@ -76,3 +77,20 @@ from translation.
7677
```groovy
7778
g.with("cypher", "MATCH (p:person) RETURN p").call().toList()
7879
```
80+
81+
## Include Orphaned Vertices
82+
83+
Key: `includeOrphanedVertices`
84+
85+
The option to set if orphaned vertices should be included in the result.
86+
Orphaned vertices are deemed as vertices on an edge that have no
87+
associated Gaffer entity with them. Enabling this will likely result in slower
88+
query performance as each vertex on an edge needs to be checked. The orphaned
89+
vertices returned will be in a special `id` group. This will override the default for
90+
the current query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
91+
for more detail on setting up defaults.
92+
93+
!!! example
94+
```groovy
95+
g.with("includeOrphanedVertices", "true").V().toList()
96+
```

docs/user-guide/apis/gremlin-api.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -68,17 +68,16 @@ and get results back.
6868
### REST API Endpoints
6969

7070
The Gremlin endpoints provide a similar interface to running Gaffer Operations.
71-
They accept a plaintext Gremlin Groovy or OpenCypher query and will return
72-
the results in [GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
71+
They accept a plaintext Gremlin Groovy query and will return the results in
72+
[GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
7373
format.
7474

7575
The two endpoints are:
7676

7777
- `/rest/gremlin/execute` - Runs a Gremlin Groovy script and outputs the result
7878
as GraphSONv3 JSON.
79-
- `/rest/gremlin/cypher/execute` - Translates a Cypher query to Gremlin and
80-
executes it returning a GraphSONv3 JSON result. Note will always append a
81-
`.toList()` to the translation.
79+
- `/rest/gremlin/explain` - Runs a Gremlin Groovy script and returns an
80+
explanation of what Gaffer operations it ran.
8281

8382
A query can be submitted via the Swagger UI or simple POST request such as:
8483

@@ -100,7 +99,4 @@ gc = gaffer_connector.GafferConnector("http://localhost:8080/rest")
10099

101100
# Execute and return gremlin
102101
gremlin_result = gc.execute_gremlin("g.V('1').toList()")
103-
104-
# Execute and return cypher
105-
cypher_result = gc.execute_cypher("MATCH (n) WHERE ID(n) = '1' RETURN n")
106102
```

docs/user-guide/apis/opencypher.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# openCypher API
2+
3+
!!! warning
4+
The openCypher API is still experimental, it is provided by a
5+
translation layer to Gremlin from the [OpenCypher project](https://github.com/opencypher/cypher-for-gremlin).
6+
Due to this, the implementation may experience the same [limitations](../query/gremlin/gremlin-limits.md)
7+
as the Gremlin API. It's performance is unknown but likely slower than
8+
Gremlin or Standard Gaffer Operations.
9+
10+
## What is openCypher?
11+
12+
openCypher is an open source implementation of Cypher - the most widely
13+
adopted, fully-specified, and open query language for property graph databases.
14+
The original Cypher language was developed by Neo4j®.
15+
16+
Cypher is a declarative graph query language that allows for expressive and
17+
efficient querying and updating of the graph store. Cypher is a relatively
18+
simple but still very powerful language. Complicated database queries can easily
19+
be expressed through Cypher.
20+
21+
!!! tip
22+
Please see the [full reference guide](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf)
23+
from the openCypher organisation for more details.
24+
25+
## How to Query a Graph
26+
27+
There are two main methods of using openCypher in Gaffer, these are via Gremlin
28+
using a websocket by wrapping the query in a Gremlin `with()` step. Or by
29+
submitting queries via the REST Endpoints like standard Gaffer Operations. More
30+
details on setting up the Gremlin side can be found on its [respective page](./gremlin-api.md).
31+
32+
!!! note
33+
Both methods require a running [Gaffer REST API](./rest-api.md) instance.
34+
35+
### Using the `with()` Step
36+
37+
The most full featured way to use openCypher is to simply add it into a Gremlin
38+
query. This is done using the options interface, known in Gremlin as a `with()`
39+
step. More information on how to run a Gaffer option in Gremlin is available in
40+
the [reference guide](../../reference/gremlin-guide/gaffer-options.md) but
41+
general usage is outlined below:
42+
43+
```groovy
44+
g.with("cypher", "MATCH (n) WHERE ID(n) = '1' RETURN n").call().toList()
45+
```
46+
47+
### REST API Endpoints
48+
49+
The endpoints provide a similar interface to running Gaffer Operations. They
50+
accept a plaintext openCypher query and will return the results in
51+
[GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
52+
format.
53+
54+
The two endpoints for openCypher are:
55+
56+
- `/rest/gremlin/cypher/execute` - Translates a Cypher query to Gremlin and
57+
executes it returning a GraphSONv3 JSON result. Note will always append a
58+
`.toList()` to the translation.
59+
- `/rest/gremlin/cypher/explain` - Translates a Cypher query to Gremlin,
60+
executes it and returns an explanation of what Gremlin query and Gaffer
61+
operations it ran.
62+
63+
A query can be submitted via the Swagger UI or simple POST request such as:
64+
65+
```bash
66+
curl -X 'POST' \
67+
'http://localhost:8080/rest/gremlin/cypher/execute' \
68+
-H 'accept: application/x-ndjson' \
69+
-H 'Content-Type: text/plain' \
70+
-d 'MATCH (n:'\''something'\'') RETURN n'
71+
```
72+
73+
You can also utilise [Gafferpy](./python-api.md) to connect and run queries
74+
using the endpoints.
75+
76+
```python
77+
from gafferpy import gaffer_connector
78+
79+
gc = gaffer_connector.GafferConnector("http://localhost:8080/rest")
80+
81+
# Execute and return cypher
82+
cypher_result = gc.execute_cypher("MATCH (n) WHERE ID(n) = '1' RETURN n")
83+
```

docs/user-guide/query/gremlin/gremlin-limits.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,10 @@ but some features may also be yet to be implemented.
66

77
Current TinkerPop features not present in the GafferPop implementation:
88

9-
- Unseeded queries run a `GetAllElements` with a configured limit applied,
10-
this limit can be configured per query or will default to 5000.
9+
- Each `GetElements` or `GetAllElements` query ran by TinkerPop will have a
10+
Gaffer `Limit` operation also applied. This limit can be configured via the
11+
[GafferPop properties](../../../administration-guide/gaffer-deployment/gremlin.md)
12+
or per query, but will default to 20000 if not otherwise specified.
1113
- Gaffer graphs are readonly to Gremlin queries.
1214
- TinkerPop Graph Computer is not supported.
1315
- TinkerPop Transactions are not supported.
@@ -30,7 +32,7 @@ Current known limitations or bugs:
3032
may get results back when you realistically shouldn't.
3133
- Input seeds to Gaffer operations are deduplicated.
3234
Therefore, the results of a query against a GafferPop graph may be different than a standard Gremlin graph.
33-
For example, for the Tinkerpop Modern graph:
35+
For example, for the TinkerPop Modern graph:
3436
```text
3537
(Gremlin) g.V().out() = [v2, v3, v3, v3, v4, v5]
3638
(GafferPop) g.V().out() = [v2, v3, v4, v5]
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# openCypher in Gaffer
2+
3+
!!! warning
4+
The openCypher API is still experimental, it is provided by a
5+
translation layer to Gremlin from the [OpenCypher project](https://github.com/opencypher/cypher-for-gremlin).
6+
Due to this, the implementation may experience the same [limitations](../query/gremlin/gremlin-limits.md)
7+
as the Gremlin API. It's performance is unknown but likely slower than
8+
Gremlin or Standard Gaffer Operations.
9+
10+
## openCypher Querying
11+
12+
Generally the syntax and features of using openCypher in Gaffer are the same as
13+
using Cypher in other graph databases. Most of the features you will have used
14+
in standard Cypher should be available. The layer in Gaffer targets
15+
[openCypher v9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf)
16+
meaning any features outside of that version cannot be guaranteed.
17+
18+
Full guides on querying using Cypher are available elsewhere however, a few useful
19+
queries to get you started are available here. Translation of what exact Gremlin query
20+
this maps to is provided.
21+
22+
!!! example ""
23+
Seeded vertex query using string IDs.
24+
25+
=== "cypher"
26+
```cypher
27+
MATCH (n) WHERE ID(n) IN ['0', '1', '2', '3'] RETURN n
28+
```
29+
30+
=== "Gremlin"
31+
```groovy
32+
g.V().has('~id', within('0', '1', '2', '3')).project('n').by(__.valueMap().with('~tinkerpop.valueMap.tokens')).toList()
33+
```
34+
35+
!!! example ""
36+
Seeded edge query using string IDs.
37+
38+
=== "cypher"
39+
```cypher
40+
MATCH (s)-[r]->(d) WHERE ID(r) IN ['[0, 1]', '[2, 3]'] RETURN r
41+
```
42+
43+
=== "Gremlin"
44+
```groovy
45+
g.E().has('~id', within('[0, 1]', '[2, 3]')).project('r').by(__.project(' cypher.element', ' cypher.inv', ' cypher.outv').by(__.valueMap().with('~tinkerpop.valueMap.tokens')).by(__.inV().id()).by(__.outV().id())).toList()
46+
```
47+
48+
!!! example ""
49+
Filtering on group and properties.
50+
51+
=== "cypher"
52+
```cypher
53+
MATCH (n:person) WHERE n.age > toInteger(25) AND n.`full-name` CONTAINS 'John' RETURN n
54+
```
55+
56+
=== "Gremlin"
57+
```groovy
58+
g.V().as('n').hasLabel('person').has('full-name', containing('John')).where(__.constant(25d).map(cypherToInteger()).is(neq(' cypher.null')).as(' GENERATED1').select('n').values('age').where(gt(' GENERATED1'))).select('n').project('n').by(__.choose(neq(' cypher.null'), __.valueMap().with('~tinkerpop.valueMap.tokens'))).toList()
59+
```
60+
61+
!!! example ""
62+
Transform and project on properties.
63+
64+
=== "cypher"
65+
```cypher
66+
MATCH (n) RETURN (n.age * 1000), reverse(n.name)
67+
```
68+
69+
=== "Gremlin"
70+
```groovy
71+
g.V().as('n').project('(n.age * 1000)', 'reverse(n.name)').by(__.constant(1000).as('__GENERATED1').select('n').choose(neq(' cypher.null'), __.choose(__.values('age'), __.values('age'), __.constant(' cypher.null'))).choose(__.or(__.is(eq(' cypher.null')), __.select('__GENERATED1').is(eq(' cypher.null'))), __.constant(' cypher.null'), __.math('_ * __GENERATED1'))).by(__.choose(neq(' cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant(' cypher.null'))).map(cypherReverse())).toList()
72+
```
73+
74+
## Limitations and Considerations
75+
76+
There are a few limitations you need to be aware of using this API. Generally
77+
these stem from the translation layer but are also due to a fundamental
78+
difference in the way Gaffer and Cypher are intended to be used.
79+
80+
- If using the `with()` step all numbers are longs by default you need to
81+
specifically change them to integers if required e.g. `toInteger(1)`. You can
82+
also change them to floats with `toFloat(1.2)`.
83+
- How data is returned is different to normal Gremlin. It will be returned as
84+
key value maps where each `RETURN` in the cypher query is a key.
85+
- Currently the version of the openCypher translator is stuck at v1.0.0 due to
86+
the Gaffer scala version. This means not all features of openCypher are
87+
available e.g. no `replace()` function. The [reference guide](../../reference/gremlin-guide/custom-functions.md)
88+
attempts to document all custom Cypher functions available.
89+
- Need to be considerate of how it maps to Gremlin and Gaffer as something like
90+
this: `MATCH (n) WHERE (n:person OR n:software)` will do a `GetAllElements`
91+
with nothing in the Gaffer View. In this case you should use an `OPTIONAL MATCH` instead.
92+
- Gaffer groups or properties with a `-` in require wrapping in back ticks.

mkdocs.yml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -86,23 +86,25 @@ nav:
8686
- 'What is Python?': 'user-guide/gaffer-basics/what-is-python.md'
8787
- 'What is Cardinality?': 'user-guide/gaffer-basics/what-is-cardinality.md'
8888
- 'What is Aggregation?': 'user-guide/gaffer-basics/what-is-aggregation.md'
89+
- 'Graph Schema': 'user-guide/schema.md'
8990
- Available APIs:
9091
- 'Spring REST': 'user-guide/apis/rest-api.md'
9192
- 'Python (gafferpy)': 'user-guide/apis/python-api.md'
9293
- 'Java': 'user-guide/apis/java-api.md'
9394
- 'Gremlin (GafferPop)': 'user-guide/apis/gremlin-api.md'
95+
- 'openCypher': 'user-guide/apis/opencypher.md'
9496
- Querying:
9597
- Gaffer Query Syntax:
9698
- 'Operations': 'user-guide/query/gaffer-syntax/operations.md'
9799
- 'Filtering Data': 'user-guide/query/gaffer-syntax/filtering.md'
98100
- 'FAQs': 'user-guide/query/gaffer-syntax/faqs.md'
99101
- Import/Export:
100102
- 'Using CSV Data': 'user-guide/query/gaffer-syntax/import-export/csv.md'
101-
- Apache Gremlin:
103+
- Gremlin:
102104
- 'Gremlin in Gaffer': 'user-guide/query/gremlin/gremlin.md'
103-
- 'GafferPop Features': 'user-guide/query/gremlin/custom-features.md'
104-
- 'GafferPop Limitations': 'user-guide/query/gremlin/gremlin-limits.md'
105-
- 'Graph Schemas': 'user-guide/schema.md'
105+
- 'Features': 'user-guide/query/gremlin/custom-features.md'
106+
- 'Limitations': 'user-guide/query/gremlin/gremlin-limits.md'
107+
- 'openCypher': 'user-guide/query/opencypher.md'
106108
- Developer Guide:
107109
- 'Introduction': 'development-guide/introduction.md'
108110
- 'Ways of Working': 'development-guide/ways-of-working.md'

0 commit comments

Comments
 (0)