Skip to content

Commit e770a75

Browse files
author
David Baker Effendi
authored
Added traversal steps from flatgraph repo (#2)
* Added traversal steps from flatgraph repo * Added more properties and discussed the `.start` step * Added details about each step type * Fixed `glimpse-of-a-simple-use-case` link
1 parent 5ae4bc4 commit e770a75

File tree

2 files changed

+181
-28
lines changed

2 files changed

+181
-28
lines changed

content/_index.md

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -99,31 +99,12 @@ We tried to streamline the dependency tree as much as possible: flatgraph-core m
9999
* generic graph traversals (a.k.a. steps)
100100
* based on the always double check the result type
101101
* copy from joern docs traversal-basics.md
102-
* l/toSet/toSeq
103-
* properties
104-
* property
105-
* out/in/label -> only for nodes
106-
* nodeCount
107-
* groupCount
108-
* size
109-
* collectAll / collect
110-
* cast[]
111-
* filter/filterNot
112-
* where/whereNot
113-
* copy some basics from joern
114-
* repeat and friends
115-
* path tracking
116-
* go through the remainder of the flatgraph api (incl all extension steps)
117-
* describe and verify what imports are required for those steps
118102
* describe the difference of steps between Iterator[X] and X
119-
* also describe .start step
120103
* algorithms: e.g. shortest path
121104
* import/export formats
122105
* logo:
123106
* generate one? similar to joern? asked fabs
124107
* replace / get rid of the relearn logo
125108
* go through TODOs in text above
126109
* bring online
127-
* host on github pages?
128-
* configure url in hugo.toml
129110
* link from flatgraph repo

content/traversals/_index.md

Lines changed: 181 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,195 @@ title = "Traversal steps"
33
weight = 3
44
+++
55

6-
The most important basic traversal steps will be the ones generated for your domain as highlighted in [glimpse-of-a-simple-use-case](index.html#glimpse-of-a-simple-use-case).
6+
The most important basic traversal steps will be the ones generated for your domain as highlighted in [glimpse-of-a-simple-use-case](../_index.html#glimpse-of-a-simple-use-case).
77

88
In addition to the generated domain-specific steps based on your schema, there's some basic traversal steps that you can use generically across domains, e.g. to traverse from a node (or an `Iterator[Node]`) to their neighbors, lookup their properties etc.
99
There are also more advanced steps like `repeat` and advanced features like path tracking which will be described further below.
1010

1111
{{% notice tip %}}
12-
flatgraph traversals are based on Scala's Iterator, so you can also use all regular [collection methods](https://docs.scala-lang.org/scala3/book/collections-methods.html).
12+
flatgraph traversals are based on Scala's Iterator, so you can also use all regular [collection methods](https://docs.scala-lang.org/scala3/book/collections-methods.html). If you want to begin a traversal from a given node, the `.start` method will wrap that node in a traversal making the traversal steps available.
1313
{{% /notice %}}
1414

15+
## Step Types
16+
17+
The various traversal queries can be divided into a number of types: Filter, Map, Side Effect, and Terminal.
18+
19+
### Filter Steps
20+
21+
_Filter Steps_ are atomic traversals that filter nodes according to given criteria. The most common filter step is aptly-named `filter`, which continues the traversal in the step it suffixes for all nodes which pass its criterion. Its criterion is represented by a lambda function which has access to the node of the previous step and returns a boolean. Continuing with the previous example, let us execute a query which returns all `METHOD` nodes of the Code Property Graph for [`X42`](https://github.com/ShiftLeftSecurity/x42.git), but only if their `IS_EXTERNAL` property is set to `false`:
22+
23+
```java
24+
joern> cpg.method.filter(_.isExternal == false).name.toList
25+
res11: List[String] = List("main")
26+
```
27+
28+
{{% notice tip %}}
29+
A note on Scala lambda functions:
30+
In the example above, we used the lambda function `_.isExternal == false` as the predicate for the filter.
31+
The `_` is simply syntactic sugar referring to the parameter of the function, so this could be rewritten
32+
as `method => method.isExternal == false`.
33+
{{% /notice %}}
34+
35+
Dissecting this query, we have `cpg` as the root object, a node-type step `method` which returns all nodes of type `METHOD`, a filter step `where(_.isExternal == false)` which continues the traversal only for nodes which have their `IS_EXTERNAL` property set to `false` (with `_` referencing the individual nodes, and `isExternal` a property directive which accesses their `IS_EXTERNAL` property), followed by a property directive `name` which returns the values of the `NAME` property of the nodes that passed the _Filter Step_, and finally an _Execution Directive_ `toList` which executes the traversal and returns the results in a list.
36+
37+
A shorter version of a query which returns the same results as the one above can be written using a _Property-Filter Step_. Property-filter steps are _Filter Steps_ which continue the traversal only for nodes which have a specific value in the property the _Property Filter Step_ refers to:
38+
39+
```java
40+
joern> cpg.method.isExternal(false).name.toList
41+
res11: List[String] = List("main")
42+
```
43+
44+
Dissecting the query again, `cpg` is the root object, `method` is a node-type step, `isExternal(false)` is a property-filter step that filters for nodes which have `false` as the value of their `IS_EXTERNAL` property, `name` is a property directive, and `toList` is the execution directive you are already familiar with.
45+
46+
{{% notice tip %}}
47+
Be careful not to mix up property directives with property-filter steps, they look awfully similar.
48+
Consider that:
49+
50+
a) `cpg.method.isExternal(true).name.toList` returns all `METHOD` nodes which have the `IS_EXTERNAL` property set to `true` (in this case, 10 results)
51+
52+
b) `cpg.method.isExternal.toList` returns the value of the `IS_EXTERNAL` property for all `METHOD` nodes in the graph (12 results)
53+
54+
c) `cpg.method.isExternal.name.toList` is an invalid query which will not execute
55+
{{% /notice %}}
56+
57+
A final _Filter Step_ we will look at is named `where`. Unlike `filter`, this doesn't take a simple predicate `A => Boolean`, but instead takes a `Traversal[A] => Traversal[_]`. I.e. you supply a traversal which will be executed at the current position. The resulting Traversal will preserves elements if the provided traversal has _at least one_ result. The previous query that used a _Property Filter Step_ can be re-written using `where` like so:
58+
59+
```java
60+
joern> cpg.method.where(_.isExternal(false)).name.toList
61+
res24: List[String] = List("main")
62+
```
63+
64+
Maybe not particularly useful-seeming given this specific example, but keep it in the back of your head, because `filter` is a handy tool to have in the toolbox. Next up, _Map Steps_.
65+
66+
### Map Steps
67+
68+
_Map Steps_ are traversals that map a set of nodes into a different form given a function. _Map Steps_ are a powerful mechanism when you need to transform results to fit your specifics. For example, say you'd like to return both the `IS_EXTERNAL` and the `NAME` properties of all `METHOD` nodes in `X42`'s Code Property Graph. You can achieve that with the following query:
69+
70+
```java
71+
joern> cpg.method.map(node => (node.isExternal, node.name)).toList
72+
res6: List[(Boolean, String)] = List(
73+
(false, "main"),
74+
(true, "fprintf"),
75+
(true, "exit"),
76+
(true, "<operator>.logicalAnd"),
77+
(true, "<operator>.equals"),
78+
(true, "<operator>.greaterThan"),
79+
(true, "strcmp"),
80+
(true, "<operator>.indirectIndexAccess"),
81+
(true, "printf")
82+
)
83+
```
84+
85+
Don't be intimidated by the syntax used in the `map` _Step_ above. If you examine `map(node => (node.isExternal, node.name))` for a bit, you might be able to infer that the first `node` simply defines the variable that represents the node which preceeds the `map` _Step_, that the ASCII arrow `=>` is just syntax that preceeds the body of a lambda function, and that `(node.isExternal, node.name)` means that the return value of the lambda is a list which contains the value of the `isExternal` and `name` _Property Directives_ for each of the nodes matched in the previous step and also passed into the lambda. In most cases in which you need `map`, you can simply follow the pattern above. But should you ever feel constrained by the common pattern shown, remember that the function for the `map` step is written in the Scala programming language, a fact which opens up a wide range of possibilities if you invest a little time learning the language.
86+
87+
### Side Effect Steps
88+
89+
_Side Effect Steps_ are traversal steps that perform an action or modify the state of the traversal without altering the path of the traversal itself. They do not directly contribute to the results that are returned, but they might be used to store information, log data, or manipulate variables during traversal. These steps can be thought of as adding "side effects" to the traversal that can be useful for various purposes like counting, aggregating, or modifying data.
90+
91+
### Terminal Steps
92+
93+
_Terminal Steps_ are steps that end the traversal and return the final result. Once a terminal step is reached, the traversal is considered complete, and it provides the output in some form (e.g., a list, a set, or a single element). Unlike intermediate steps that continue building the traversal, terminal steps execute the traversal and stop further processing. After a terminal step, the traversal cannot be continued or extended; it’s finished.
94+
95+
## Traversal Steps
96+
97+
The steps described below are available when called on an `Iterator`. For these to be available, the following packaged must be imported, i.e., `import flatgraph.traversal.language.*`.
1598

1699
#### Basic steps
17100
Assuming you have an `Iterator[X]`, where `X` is typically a domain specific type, but could also be flatgraph's root type for nodes [`GNode`](https://github.com/joernio/flatgraph/blob/92f4cc4b84bf6b8315971128995a75872376dcff/core/src/main/java/flatgraph/GNode.java), here's a (non-exhaustive) list of basic traversal steps.
18101

19-
| Name | Default | Notes |
20-
|-----------------------|-------------------|-------------|
21-
| **page** | _&lt;empty&gt;_ | Mandatory reference to the page. |
22-
| **onempty** | `disable` | Defines what to do with the button if the content overlay is empty:<br><br>- `disable`: The button is displayed in disabled state.<br>- `hide`: The button is removed. |
23-
| **onwidths** | _&lt;varying&gt;_ | The action, that should be executed if the site is displayed in the given width:<br><br>- `show`: The button is displayed in its given area<br>- `hide`: The button is removed.<br>- `area-XXX`: The button is moved from its given area into the area `XXX`. |
24-
| **onwidthm** | _&lt;varying&gt;_ | See above. |
25-
| **onwidthl** | _&lt;varying&gt;_ | See above. |
102+
| Name | Type | Notes |
103+
| ----------------------- | ----------- | ----------------------------------------------------------------------------------------------------------- |
104+
| **and** | Filter | Only preserves elements for which _all of_ the given traversals have at least one result. |
105+
| **cast[B]** | Map | Casts all elements to given type `B`. |
106+
| **choose** | Filter | Allows to implement conditional semantics: if, if/else, if/elseif, if/elseif/else, ... |
107+
| **coalesce** | Filter | Evaluates the provided traversals in order and returns the first traversal that emits at least one element. |
108+
| **collectAll[B]** | Filter | Collects all elements of the provided class `B` (beware of type-erasure). |
109+
| **dedup** | Filter | Deduplicate elements of this traversal - a.k.a. distinct, unique. |
110+
| **dedupBy** | Filter | Deduplicate elements of this traversal by a given function. |
111+
| **discardPathTracking** | Side Effect | Disables path tracking, and any tracked paths so far. |
112+
| **enablePathTracking** | Side Effect | Enable path tracking - prerequisite for path/simplePath steps. |
113+
| **filter** | Filter | Filters in everything that evaluates to _true_ by the given transformation function. |
114+
| **filterNot** | Filter | Filters in everything that evaluates to _false_ by the given transformation function. |
115+
| **groupCount** | Map | Group elements and count how often they appear. |
116+
| **groupCount[B]** | Map | Group elements by a given transformation function and count how often the results appear. |
117+
| **head** | Terminal | The first element of the traversal. |
118+
| **is** | Filter | Filters in everything that _is_ the given value. |
119+
| **or** | Filter | Only preserves elements for which _at least one of_ the given traversals has at least one result. |
120+
| **l/toSet/toSeq** | Terminal | Execute the traversal and returns the result as a list, set, or indexed sequence respectively. |
121+
| **last** | Terminal | The last element of the traversal. |
122+
| **not** | Filter | Filters out everything that _is not_ the given value. Alias for `whereNot`. |
123+
| **path** | Terminal | Retrieve entire paths that have been traversed thus far. |
124+
| **repeat** | Map | Repeat the given traversal. |
125+
| **sideEffect** | Side Effect | Perform side effect without changing the contents of the traversal. |
126+
| **simplePath** | Filter | Ensure the traversal does not include any paths that visit the same node more than once. |
127+
| **size** | Terminal | Total size of elements in the traversal. |
128+
| **sorted** | Map | Sort elements by their natural order. |
129+
| **sortBy** | Map | Sort elements by the value of the given transformation function. |
130+
| **union** | Filter | Union/sum/aggregate/join given traversals from the current point. |
131+
| **within** | Filter | Filters out all elements that are _not_ in the provided set. |
132+
| **without** | Filter | Filters out all elements that _are_ in the provided set. |
133+
| **where** | Filter | Only preserves elements if the provided traversal has at least one result. |
134+
| **whereNot** | Filter | Only preserves elements if the provided traversal does _not_ have any results. |
135+
136+
#### Node Steps
137+
138+
When starting the traversal from an `Iterator` of nodes [`GNode`](https://github.com/joernio/flatgraph/blob/92f4cc4b84bf6b8315971128995a75872376dcff/core/src/main/java/flatgraph/GNode.java).
139+
140+
| Name | Type | Notes |
141+
| ----------------- | ---------- | ---------------------------------------------------------------------------------------- |
142+
| **both** | Map/Filter | Follow both in and out-neighbours for a given node. Can be restricted by edge type. |
143+
| **bothE** | Map/Filter | Follow both in and out-edges for a given node. Can be restricted by edge type. |
144+
| **hasLabel** | Filter | Filters in nodes that match the given labels. Alias for `label` |
145+
| **id** | Map/Filter | Return a unique identifier(s) for the node(s) in the traversal. Can filter by given IDs. |
146+
| **in** | Map/Filter | In-neighbours for a given node. Can be restricted by edge type. |
147+
| **inE** | Map/Filter | In-edges for a given node. Can be restricted by edge type. |
148+
| **out** | Map/Filter | Out-neighbours for a given node. Can be restricted by edge type. |
149+
| **outE** | Map/Filter | Out-edges for a given node. Can be restricted by edge type. |
150+
| **property** | Map | Retrieve the value for a single property for the defined property name. |
151+
| **propertiesMap** | Map | Retrieves all entity properties as a map. |
152+
| **label** | Map/Filter | Node label. Can filter by given labels. |
153+
| **labelNot** | Filter | Inverse of `label`. |
154+
155+
#### Edge Steps
156+
157+
When starting the traversal from an `Iterator` of nodes [`Edge`](https://github.com/joernio/flatgraph/blob/92f4cc4b84bf6b8315971128995a75872376dcff/core/src/main/scala/flatgraph/Edge.scala).
158+
159+
| Name | Type | Notes |
160+
| ------- | ---- | ------------------------------------------------- |
161+
| **src** | Map | Traverse to the source node (out-going node). |
162+
| **dst** | Map | Traverse to the destination node (incoming node). |
163+
164+
## Property Directives
165+
166+
The steps described below are available when called on the entity/object directly. These are available as methods or properties on the objects so no import is necessary.
167+
168+
#### Graph Steps
169+
170+
Steps available from an instance of [`Graph`](https://github.com/joernio/flatgraph/blob/92f4cc4b84bf6b8315971128995a75872376dcff/core/src/main/scala/flatgraph/Graph.scala).
171+
172+
| Name | Type | Notes |
173+
| ------------- | -------- | ---------------------------------------------------------------------------------- |
174+
| **allNodes** | Map | The nodes of the graph. |
175+
| **allEdges** | Map | The edges of the graph. |
176+
| **edgeCount** | Terminal | The total edges in the graph. Can be restricted by a given label. |
177+
| **nodes** | Filter | Create a traversal from the nodes of the graph that match the given IDs or labels. |
178+
| **nodeCount** | Terminal | Total nodes in the graph. Can be restricted by a given label. |
179+
180+
#### Edge Steps
181+
182+
Steps available from an instance of [`Edge`](https://github.com/joernio/flatgraph/blob/92f4cc4b84bf6b8315971128995a75872376dcff/core/src/main/scala/flatgraph/Edge.scala).
183+
184+
| Name | Type | Notes |
185+
| ---------------- | ---- | ---------------------------------------------- |
186+
| **label** | Map | The edge label. |
187+
| **propertyName** | Map | The property value of the edge, if one exists. |
188+
189+
#### Node Steps
190+
191+
Steps available from an instance of [`GNode`](https://github.com/joernio/flatgraph/blob/92f4cc4b84bf6b8315971128995a75872376dcff/core/src/main/java/flatgraph/GNode.java).
192+
193+
| Name | Type | Notes |
194+
| --------- | ---- | ---------------------------------------------------- |
195+
| **graph** | Map | The graph this node belongs to. |
196+
| **id** | Map | The node identifier. |
197+
| **label** | Map | The node label. |

0 commit comments

Comments
 (0)