Skip to content

Commit 847839e

Browse files
committed
first half of guide
1 parent bf7c2e7 commit 847839e

File tree

3 files changed

+180
-0
lines changed

3 files changed

+180
-0
lines changed

.vscode/settings.json

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{}
+167
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
== Explore a POLE dataset
2+
3+
A **P**ersons **O**bjects **L**ocations **Events** datamodel focuses on the relationships between people, objects, locations and events and is a model ideal to be used in law enforcement and intelligence investigations.
4+
5+
image::{img}/pole_model_visual.jpeg[]
6+
7+
In this guide, you will learn:
8+
9+
//* How to import and refactor a POLE dataset ******** Do we need to refactor?
10+
* How to query the graph and answer questions using Cypher
11+
* How to refactor your data
12+
* How to use the built-in Cypher function shortest path
13+
* How to use aggregation functions in Cypher
14+
15+
In the next section, you will import the POLE dataset into Neo4j and refactor some of its properties.
16+
17+
== POLE dataset and model
18+
19+
[role=NX_TAB_NAV,tab=import]
20+
pagelaunch::[]
21+
22+
Use the button to import the data into Neo4j.
23+
24+
button::Import POLE[rolw=NX_IMPORT_LOAD,endpoint=https://neo4j-graph-examples.github.io/pole/data/pole-data-importer.zip]
25+
26+
Crime data for this demo was downloaded from public sources (http://data.gov.uk), and is freely provided for download with locations defined to the block or street level and crimes defined by month only (i.e. no day or timestamp).
27+
This public crime data does not include any sort of information about persons related to crimes, not even as anonymised tokens - it supplies only crime and location data, or in other words only the 'L' and 'E' for the POLE model.
28+
This demo uses street crime data for Greater Manchester, UK from August 2017.
29+
30+
With the data imported, navigate to the `Query` tab to visualize a representation of the graph model by running the following query:
31+
32+
[source,cypher]
33+
----
34+
CALL db.schema.visualization()
35+
----
36+
37+
[NOTE]
38+
====
39+
The arrow button icon:ArrowIcon[] copies the query to the clipboard.
40+
The play button icon:PlayIcon[] executes the query and returns the results.
41+
====
42+
43+
You can see that there are 11 different node labels and that these are connected to each other and themselves by various different relationship types.
44+
45+
The `Person` node is especially interesting since it appears to have multiple relationships to itself.
46+
In the dataset, there are more than 300 different `Person` nodes that are related to _each other_ in different ways and not related to themselves.
47+
48+
You will explore the data further in the next step.
49+
50+
== Crimes committed
51+
52+
Using the data model and Cypher, you can answer questions like:
53+
54+
* What type of crimes were committed?
55+
* What is the most common crime?
56+
* What location has the highest crime rate?
57+
58+
The following query looks at the nodes with the label `Crime` and uses the built-in aggregation function `count()` to count the number of crimes committed:
59+
60+
.Number of crimes
61+
[source,cypher]
62+
----
63+
MATCH (c:Crime)
64+
RETURN labels(c), count(c) AS total
65+
----
66+
67+
Not all crime is equal and some crimes are more serious than others.
68+
The following query lets you see the different types of crimes committed and the number of times they were committed by using the `count()` function and ordering the results in descending order:
69+
70+
.Different types of crimes
71+
[source,cypher]
72+
----
73+
MATCH (c:Crime)
74+
RETURN c.type AS crime_type, count (c) AS total
75+
ORDER BY count(c) DESC
76+
----
77+
78+
If you recall the graph model, a crime can involve a person, a vehicle or an object.
79+
80+
The following query lets you see which crime(s) involved an object:
81+
82+
.Crimes involving an object
83+
[source,cypher]
84+
----
85+
MATCH (o:Object)-[:INVOLVED_IN]->(c:Crime)
86+
RETURN c.type AS crime_type, count(c) AS total
87+
ORDER BY count(c) DESC
88+
----
89+
90+
[NOTE]
91+
.Challenge
92+
====
93+
Can you rewrite the query to show the crimes that involved a person?
94+
95+
[source,cypher]
96+
----
97+
MATCH (o:Object)-[:INVOLVED_IN]->(c:Crime)
98+
RETURN c.type AS crime_type, count(c) AS total
99+
ORDER BY count(c) DESC
100+
----
101+
102+
Hint: If you don't remember the data model, you can always run `CALL db.schema.visualization()` to see it again.
103+
====
104+
105+
[%collapsible]
106+
.Reveal the solution
107+
====
108+
[source,cypher]
109+
----
110+
MATCH (p:Person)-[:PARTY_TO]->(c:Crime)
111+
RETURN c.type AS crime_type, count(c) AS total
112+
ORDER by count(c) DESC
113+
----
114+
====
115+
116+
In the next section you will refactor properties and look at locations in the graph.
117+
118+
== Locations
119+
120+
The Point data type allows you to use location based functions in Cypher.
121+
Data Importer doesn't support natively creating Point data types.
122+
In order to work with locations in the POLE dataset, you need to create a `point` property on the `Location` nodes.
123+
Currently the `Location` nodes have a `latitude` and `longitude` property and you can use these to create a `point` property.
124+
125+
.Refactor `Location` nodes
126+
[source,cypher]
127+
----
128+
MATCH (l:Location)
129+
SET l.position = point({latitude: l.latitude, longitude: l.longitude})
130+
----
131+
132+
Which locations have the highest crime rate?
133+
The dataset contains a lot of locations, so it is sensible to put a limit on the number of locations returned.
134+
135+
.Locations with the highest crime rate
136+
[source,cypher]
137+
----
138+
MATCH (l:Location)<-[:OCCURRED_AT]-(:Crime)
139+
RETURN l.address AS locale, l.postcode AS postcode, count(l) AS total
140+
ORDER BY count(l) DESC
141+
LIMIT 20
142+
----
143+
144+
This query matches locations with crimes returns the `address` and `postcode` properties of the `Location`nodes` and counts all non-null occurences crimes that occurred at that location and orders the results in descending order.
145+
The `LIMIT` clause limits the number of results returned to 20 and these are ordered by the number of crimes committed at that location in descending order.
146+
147+
If you turn the query around and look at the number of crimes committed in the vicinity of a particular address, you can use the newly refactored `point`property of the `Location` nodes.
148+
149+
You can pick any address as your starting point, but for this query you will use an address that may sound familiar.
150+
151+
.Crimes committed in the vicinity of Coronation Street
152+
[source,cypher]
153+
----
154+
MATCH (l:Location {address: '1 Coronation Street'})
155+
WITH point(l) AS corrie
156+
MATCH (x:Location)<-[:OCCURRED_AT]-(c:Crime)
157+
WITH x, c, point.distance(point(x), corrie) AS distance
158+
WHERE distance < 500
159+
RETURN x.address AS address, count(c) AS crime_total, collect(distinct(c.type)) AS crime_type, distance
160+
ORDER BY distance
161+
LIMIT 10
162+
----
163+
164+
This is a complex query that pipelines the results from one part of the query to the next.
165+
The first part of the query matches the `Location` node with the address `1 Coronation Street` and the
166+
`WITH` clause takes the `point` of that location and assigns it to the variable `corrie` and pipes `corrie` to the next part of the query.
167+
The second `MATCH` clause matches other locations (x) where crimes (c) were committed and then uses the spation function `point.distance`to calculate the distance between the various other locations and `1 Coronation Street`.

server.py

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/usr/bin/env python3
2+
from http.server import HTTPServer, SimpleHTTPRequestHandler, test
3+
import sys
4+
5+
class CORSRequestHandler (SimpleHTTPRequestHandler):
6+
def end_headers (self):
7+
self.send_header('Access-Control-Allow-Origin', '*')
8+
SimpleHTTPRequestHandler.end_headers(self)
9+
10+
if __name__ == '__main__':
11+
test(CORSRequestHandler, HTTPServer, port=int(sys.argv[1]) if len(sys.argv) > 1 else 8000)
12+

0 commit comments

Comments
 (0)