You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: 2-Working-With-Data/06-non-relational/README.md
+9-9
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Data is not limited to relational databases. This lesson focuses on non-relation
6
6
7
7
Spreadsheets are a popular way to store and explore data because it requires less work to setup and get started. In this lesson you'll learn the basic components of a spreadsheet, as well as formulas and functions. The examples will be illustrated with Microsoft Excel, but most of the parts and topics will have similar names and steps in comparison to other spreadsheet software.
8
8
9
-

9
+

10
10
11
11
A spreadsheet is a file and will be accessible in the file system of a computer, device, or cloud based file system. The software itself may be browser based or an application that must be installed on a computer or downloaded as an app. In Excel these files are also defined as **workbooks** and this terminology will be used the remainder of this lesson.
12
12
@@ -18,30 +18,30 @@ With these basic elements of an Excel workbook, we'll use and an example from [M
18
18
19
19
The spreadsheet file named "InventoryExample" is a formatted spreadsheet of items within an inventory that contains three worksheets, where the tabs are labeled "Inventory List", "Inventory Pick List" and "Bin Lookup". Row 4 of the Inventory List worksheet is the header, which describes the value of each cell in the header column.
20
20
21
-

21
+

22
22
23
23
There are instances where a cell is dependent on the values of other cells to generate its value. The Inventory List spreadsheet keeps track of the cost of every item in its inventory, but what if we need to know the value of everything in the inventory? [**Formulas**](https://support.microsoft.com/en-us/office/overview-of-formulas-34519a4e-1e8d-4f4b-84d4-d642c4f63263) perform actions on cell data and is used to calculate the cost of the inventory in this example. This spreadsheet used a formula in the Inventory Value column to calculate the value of each item by multiplying the quantity under the QTY header and its costs by the cells under the COST header. Double clicking or highlighting a cell will show the formula. You'll notice that formulas start with an equals sign, followed by the calculation or operation.
24
24
25
-

25
+

26
26
27
27
We can use another formula to add all the values of Inventory Value together to get its total value. This could be calculated by adding each cell to generate the sum, but that can be a tedious task. Excel has [**functions**](https://support.microsoft.com/en-us/office/sum-function-043e1c7d-7726-4e80-8f32-07b23e057f89), or predefined formulas to perform calculations on cell values. Functions require arguments, which are the required values used to perform these calculations. When functions require more than one argument, they will need to be listed in a particular order or the function may not calculate the correct value. This example uses the SUM function, and uses the values of on Inventory Value as the argument to add generate the total listed under row 3, column B (also referred to as B3).
28
28
29
29
## NoSQL
30
30
31
31
NoSQL is an umbrella term for the different ways to store non-relational data and can be interpreted as "non-SQL", "non-relational" or "not only SQL". These type of database systems can be categorized into 4 types.
32
32
33
-

33
+

34
34
> Source from [Michał Białecki Blog](https://www.michalbialecki.com/2018/03/18/azure-cosmos-db-key-value-database-cloud/)
35
35
36
36
[Key-value](https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data#keyvalue-data-stores) databases pair unique keys, which are a unique identifier associated with a value. These pairs are stored using a [hash table](https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/) with an appropriate hashing function.
37
37
38
38
39
-

39
+

40
40
> Source from [Microsoft](https://docs.microsoft.com/en-us/azure/cosmos-db/graph/graph-introduction#graph-database-by-example)
41
41
42
42
[Graph](https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data#graph-data-stores) databases describe relationships in data and are represented as a collection of nodes and edges. A node represents an entity, something that exists in the real world such as a student or bank statement. Edges represent the relationship between two entities Each node and edge have properties that provides additional information about each node and edges.
43
43
44
-

44
+

45
45
46
46
[Columnar](https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data#columnar-data-stores) data stores organizes data into columns and rows like a relational data structure but each column is divided into groups called a column family, where the all the data under one column is related and can be retrieved and changed in one unit.
47
47
@@ -75,11 +75,11 @@ You can download and install the emulator [for Windows here](https://aka.ms/cosm
75
75
76
76
The Emulator launches a browser window, where the Explorer view allows you to explore documents.
77
77
78
-

78
+

79
79
80
80
If you're following along, click on "Start with Sample" to generate a sample database called SampleDB. If you expand Sample DB by clicking on the arrow you'll find a container called `Persons`, a container holds a collection of items, which are the documents within the container. You can explore the four individual documents under `Items`.
81
81
82
-

82
+

83
83
84
84
#### Querying Document Data with the Cosmos DB Emulator
85
85
@@ -89,7 +89,7 @@ We can also query the sample data by clicking on the new SQL Query button (secon
89
89
90
90
`SELECT * FROM c where c.age < 40`
91
91
92
-

92
+

93
93
94
94
The query returns two documents, notice the age value for each document is less than 40.
| 1 | I | like | to| use | Python | and | Pandas | very | much |
99
99
100
100
We can also use Series as columns, and specify column names using dictionary:
101
101
```python
102
102
df = pd.DataFrame({ 'A' : a, 'B' : b })
103
103
```
104
104
This will give us a table like this:
105
105
106
-
|| A | B |
107
-
|---|---|---|
108
-
| 0 | 1 | I |
109
-
| 1 | 2 | like |
110
-
| 2 | 3 | to |
111
-
| 3 | 4 | use |
112
-
| 4 | 5 | Python |
113
-
| 5 | 6 | and |
114
-
| 6 | 7 | Pandas |
115
-
| 7 | 8 | very |
116
-
| 8 | 9 | much |
106
+
|| A | B|
107
+
|---|---|------ |
108
+
| 0 | 1 | I|
109
+
| 1 | 2 | like|
110
+
| 2 | 3 | to|
111
+
| 3 | 4 | use|
112
+
| 4 | 5| Python |
113
+
| 5 | 6 | and|
114
+
| 6 | 7| Pandas |
115
+
| 7 | 8 | very|
116
+
| 8 | 9 | much|
117
117
## 🚀 Challenge
118
118
119
119
First problem we will focus on is modelling of epidemic spread of COVID-19. In order to do that, we will use the data on the number of infected individuals in different countries, provided by the [Center for Systems Science and Engineering](https://systems.jhu.edu/) (CSSE) at [Johns Hopkins University](https://jhu.edu/). Dataset is available in [this GitHub Repository](https://github.com/CSSEGISandData/COVID-19).
120
120
121
-
Since we want to demonstrate how to deal with data, we invite you to open [`notebook-covidspread.ipynb`](notebook-covidspread.ipynb) and read it from top to bottom. You can also execute cells, and do some challenges that we have leaf for you along the way.
121
+
Since we want to demonstrate how to deal with data, we invite you to open [`notebook-covidspread.ipynb`](notebook-covidspread.ipynb) and read it from top to bottom. You can also execute cells, and do some challenges that we have set for you along the way.
0 commit comments