|
| 1 | +== Explore a POLE dataset |
| 2 | + |
| 3 | +A **P**ersons **O**bjects **L**ocations **Events** datamodel focuses on the relationships between people, objects, locations and events and is a model ideal to be used in law enforcement and intelligence investigations. |
| 4 | + |
| 5 | +image::{img}/pole_model_visual.jpeg[] |
| 6 | + |
| 7 | +In this guide, you will learn: |
| 8 | + |
| 9 | +//* How to import and refactor a POLE dataset ******** Do we need to refactor? |
| 10 | +* How to query the graph and answer questions using Cypher |
| 11 | +* How to refactor your data |
| 12 | +* How to use the built-in Cypher function shortest path |
| 13 | +* How to use aggregation functions in Cypher |
| 14 | +
|
| 15 | +In the next section, you will import the POLE dataset into Neo4j and refactor some of its properties. |
| 16 | + |
| 17 | +== POLE dataset and model |
| 18 | + |
| 19 | +[role=NX_TAB_NAV,tab=import] |
| 20 | +pagelaunch::[] |
| 21 | + |
| 22 | +Use the button to import the data into Neo4j. |
| 23 | + |
| 24 | +button::Import POLE[rolw=NX_IMPORT_LOAD,endpoint=https://neo4j-graph-examples.github.io/pole/data/pole-data-importer.zip] |
| 25 | + |
| 26 | +Crime data for this demo was downloaded from public sources (http://data.gov.uk), and is freely provided for download with locations defined to the block or street level and crimes defined by month only (i.e. no day or timestamp). |
| 27 | +This public crime data does not include any sort of information about persons related to crimes, not even as anonymised tokens - it supplies only crime and location data, or in other words only the 'L' and 'E' for the POLE model. |
| 28 | +This demo uses street crime data for Greater Manchester, UK from August 2017. |
| 29 | + |
| 30 | +With the data imported, navigate to the `Query` tab to visualize a representation of the graph model by running the following query: |
| 31 | + |
| 32 | +[source,cypher] |
| 33 | +---- |
| 34 | +CALL db.schema.visualization() |
| 35 | +---- |
| 36 | + |
| 37 | +[NOTE] |
| 38 | +==== |
| 39 | +The arrow button icon:ArrowIcon[] copies the query to the clipboard. |
| 40 | +The play button icon:PlayIcon[] executes the query and returns the results. |
| 41 | +==== |
| 42 | + |
| 43 | +You can see that there are 11 different node labels and that these are connected to each other and themselves by various different relationship types. |
| 44 | + |
| 45 | +The `Person` node is especially interesting since it appears to have multiple relationships to itself. |
| 46 | +In the dataset, there are more than 300 different `Person` nodes that are related to _each other_ in different ways and not related to themselves. |
| 47 | + |
| 48 | +You will explore the data further in the next step. |
| 49 | + |
| 50 | +== Crimes committed |
| 51 | + |
| 52 | +Using the data model and Cypher, you can answer questions like: |
| 53 | + |
| 54 | +* What type of crimes were committed? |
| 55 | +* What is the most common crime? |
| 56 | +* What location has the highest crime rate? |
| 57 | + |
| 58 | +The following query looks at the nodes with the label `Crime` and uses the built-in aggregation function `count()` to count the number of crimes committed: |
| 59 | + |
| 60 | +.Number of crimes |
| 61 | +[source,cypher] |
| 62 | +---- |
| 63 | +MATCH (c:Crime) |
| 64 | +RETURN labels(c), count(c) AS total |
| 65 | +---- |
| 66 | + |
| 67 | + Not all crime is equal and some crimes are more serious than others. |
| 68 | + The following query lets you see the different types of crimes committed and the number of times they were committed by using the `count()` function and ordering the results in descending order: |
| 69 | + |
| 70 | +.Different types of crimes |
| 71 | +[source,cypher] |
| 72 | +---- |
| 73 | +MATCH (c:Crime) |
| 74 | +RETURN c.type AS crime_type, count (c) AS total |
| 75 | +ORDER BY count(c) DESC |
| 76 | +---- |
| 77 | + |
| 78 | +If you recall the graph model, a crime can involve a person, a vehicle or an object. |
| 79 | + |
| 80 | +The following query lets you see which crime(s) involved an object: |
| 81 | + |
| 82 | +.Crimes involving an object |
| 83 | +[source,cypher] |
| 84 | +---- |
| 85 | +MATCH (o:Object)-[:INVOLVED_IN]->(c:Crime) |
| 86 | +RETURN c.type AS crime_type, count(c) AS total |
| 87 | +ORDER BY count(c) DESC |
| 88 | +---- |
| 89 | + |
| 90 | +[NOTE] |
| 91 | +.Challenge |
| 92 | +==== |
| 93 | +Can you rewrite the query to show the crimes that involved a person? |
| 94 | +
|
| 95 | +[source,cypher] |
| 96 | +---- |
| 97 | +MATCH (o:Object)-[:INVOLVED_IN]->(c:Crime) |
| 98 | +RETURN c.type AS crime_type, count(c) AS total |
| 99 | +ORDER BY count(c) DESC |
| 100 | +---- |
| 101 | +
|
| 102 | +Hint: If you don't remember the data model, you can always run `CALL db.schema.visualization()` to see it again. |
| 103 | +==== |
| 104 | + |
| 105 | +[%collapsible] |
| 106 | +.Reveal the solution |
| 107 | +==== |
| 108 | +[source,cypher] |
| 109 | +---- |
| 110 | +MATCH (p:Person)-[:PARTY_TO]->(c:Crime) |
| 111 | +RETURN c.type AS crime_type, count(c) AS total |
| 112 | +ORDER by count(c) DESC |
| 113 | +---- |
| 114 | +==== |
| 115 | + |
| 116 | +In the next section you will refactor properties and look at locations in the graph. |
| 117 | + |
| 118 | +== Locations |
| 119 | + |
| 120 | +The Point data type allows you to use location based functions in Cypher. |
| 121 | +Data Importer doesn't support natively creating Point data types. |
| 122 | +In order to work with locations in the POLE dataset, you need to create a `point` property on the `Location` nodes. |
| 123 | +Currently the `Location` nodes have a `latitude` and `longitude` property and you can use these to create a `point` property. |
| 124 | + |
| 125 | +.Refactor `Location` nodes |
| 126 | +[source,cypher] |
| 127 | +---- |
| 128 | +MATCH (l:Location) |
| 129 | +SET l.position = point({latitude: l.latitude, longitude: l.longitude}) |
| 130 | +---- |
| 131 | + |
| 132 | +Which locations have the highest crime rate? |
| 133 | +The dataset contains a lot of locations, so it is sensible to put a limit on the number of locations returned. |
| 134 | + |
| 135 | +.Locations with the highest crime rate |
| 136 | +[source,cypher] |
| 137 | +---- |
| 138 | +MATCH (l:Location)<-[:OCCURRED_AT]-(:Crime) |
| 139 | +RETURN l.address AS locale, l.postcode AS postcode, count(l) AS total |
| 140 | +ORDER BY count(l) DESC |
| 141 | +LIMIT 20 |
| 142 | +---- |
| 143 | + |
| 144 | +This query matches locations with crimes returns the `address` and `postcode` properties of the `Location`nodes` and counts all non-null occurences crimes that occurred at that location and orders the results in descending order. |
| 145 | +The `LIMIT` clause limits the number of results returned to 20 and these are ordered by the number of crimes committed at that location in descending order. |
| 146 | + |
| 147 | +If you turn the query around and look at the number of crimes committed in the vicinity of a particular address, you can use the newly refactored `point`property of the `Location` nodes. |
| 148 | + |
| 149 | +You can pick any address as your starting point, but for this query you will use an address that may sound familiar. |
| 150 | + |
| 151 | +.Crimes committed in the vicinity of Coronation Street |
| 152 | +[source,cypher] |
| 153 | +---- |
| 154 | +MATCH (l:Location {address: '1 Coronation Street'}) |
| 155 | +WITH point(l) AS corrie |
| 156 | +MATCH (x:Location)<-[:OCCURRED_AT]-(c:Crime) |
| 157 | +WITH x, c, point.distance(point(x), corrie) AS distance |
| 158 | +WHERE distance < 500 |
| 159 | +RETURN x.address AS address, count(c) AS crime_total, collect(distinct(c.type)) AS crime_type, distance |
| 160 | +ORDER BY distance |
| 161 | +LIMIT 10 |
| 162 | +---- |
| 163 | + |
| 164 | +This is a complex query that pipelines the results from one part of the query to the next. |
| 165 | +The first part of the query matches the `Location` node with the address `1 Coronation Street` and the |
| 166 | +`WITH` clause takes the `point` of that location and assigns it to the variable `corrie` and pipes `corrie` to the next part of the query. |
| 167 | +The second `MATCH` clause matches other locations (x) where crimes (c) were committed and then uses the spation function `point.distance`to calculate the distance between the various other locations and `1 Coronation Street`. |
0 commit comments