Skip to content

Commit f557f45

Browse files
committed
Continue work on data processing with python
1 parent 14aa09d commit f557f45

File tree

7 files changed

+287
-47
lines changed

7 files changed

+287
-47
lines changed

2-Working-With-Data/07-python/README.md

+45-2
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,9 @@ import matplotlib.pyplot as plt
4343
from scipy import ... # you need to specify exact sub-packages that you need
4444
```
4545

46-
Pandas is centered around the following basic concepts:
46+
Pandas is centered around a few basic concepts.
47+
48+
### Series
4749

4850
**Series** is a sequence of values, similar to a list or numpy array. The main difference is that series also has and **index**, and when we operate on series (eg., add them), the index is taken into account. Index can be as simple as integer row number (it is the index used by default when creating a series from list or array), or it can have a complex structure, such as date interval.
4951

@@ -71,11 +73,52 @@ total_items = items_sold.add(additional_items,fill_value=0)
7173
total_items.plot()
7274
```
7375
![Time Series Plot](images/timeseries-2.png)
76+
77+
> **Note** that we are not using simple syntax `total_items+additional_items`. If we did, we would have received a lot of `NaN` (*Not a Number*) values in the resulting series. This is because there are missing values for some of the index point in the `additional_items` series, and adding `Nan` to anything results in `NaN`. Thus we need to specify `fill_value` parameter during addition.
78+
79+
With time series, we can also **resample** the series with different time intervals. For example, suppose we want to compute mean sales volume monthly. We can use the following code:
80+
```python
81+
monthly = total_items.resample("1M").mean()
82+
ax = monthly.plot(kind='bar')
83+
```
84+
![Monthly Time Series Averages](images/timeseries-3.png)
85+
86+
### DataFrame
87+
88+
A DataFrame is essentially a collection of series with the same index. We can combine several series together into a DataFrame:
89+
```python
90+
a = pd.Series(range(1,10))
91+
b = pd.Series(["I","like","to","play","games","and","will","not","change"],index=range(0,9))
92+
df = pd.DataFrame([a,b])
93+
```
94+
This will create a horizontal table like this:
95+
| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
96+
|---|---|---|---|---|---|---|---|---|---|
97+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
98+
| 1 | I | like | to | use | Python | and | Pandas | very | much |
99+
100+
We can also use Series as columns, and specify column names using dictionary:
101+
```python
102+
df = pd.DataFrame({ 'A' : a, 'B' : b })
103+
```
104+
This will give us a table like this:
105+
106+
| | A | B |
107+
|---|---|---|
108+
| 0 | 1 | I |
109+
| 1 | 2 | like |
110+
| 2 | 3 | to |
111+
| 3 | 4 | use |
112+
| 4 | 5 | Python |
113+
| 5 | 6 | and |
114+
| 6 | 7 | Pandas |
115+
| 7 | 8 | very |
116+
| 8 | 9 | much |
74117
## 🚀 Challenge
75118

76119
First problem we will focus on is modelling of epidemic spread of COVID-19. In order to do that, we will use the data on the number of infected individuals in different countries, provided by the [Center for Systems Science and Engineering](https://systems.jhu.edu/) (CSSE) at [Johns Hopkins University](https://jhu.edu/). Dataset is available in [this GitHub Repository](https://github.com/CSSEGISandData/COVID-19).
77120

78-
Since we want to demonstrate how to deal with data, we invite you to open [`notebook-pandas.ipynb`](notebook-pandas.ipynb) and read it from top to bottom. You can also execute cells, and do some challenges that we have leaf for you along the way.
121+
Since we want to demonstrate how to deal with data, we invite you to open [`notebook-covidspread.ipynb`](notebook-covidspread.ipynb) and read it from top to bottom. You can also execute cells, and do some challenges that we have leaf for you along the way.
79122

80123

81124

Loading
Loading
Loading

2-Working-With-Data/07-python/notebook.ipynb

+242-45
Large diffs are not rendered by default.

2-Working-With-Data/07-python/solution/notebook.ipynb

Whitespace-only changes.

2-Working-With-Data/07-python/translations/README.es.md

Whitespace-only changes.

0 commit comments

Comments
 (0)