Skip to content

Commit 1ae0cbc

Browse files
committed
Update pandas_basic.ipynb
1 parent 84e1ec9 commit 1ae0cbc

File tree

1 file changed

+82
-66
lines changed

1 file changed

+82
-66
lines changed

Module 2 - Python for Data Analysis/03. Pandas for Beginners/pandas_basic.ipynb

+82-66
Original file line numberDiff line numberDiff line change
@@ -4,68 +4,60 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Putting Some Pandas In Your Python 🐼\n",
7+
"# **Putting Some Pandas In Your Python 🐼**\n",
88
"\n",
99
"<img style=\"float: right;\" width=\"400\" height=\"400\" src=\"image/00_pandas.jpg\">\n",
1010
"\n",
11-
"## Introduction to Pandas\n",
12-
"`pandas` is a Python package providing **fast, flexible, and expressive data structures** designed to make working with `relational or labeled data` both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.\n",
13-
"\n",
14-
"\n",
15-
"Reference: https://pandas.pydata.org/docs/getting_started/index.html\n",
16-
"\n",
17-
"**Question: What are the Data Structures in Pandas?** \n",
18-
"**Answer:** Series (similar to 1 dim numpy array) and DataFrame (similar to 2 dim numpy array)\n",
19-
"\n",
20-
"**Installation Command** \n",
21-
"<code>! pip install pandas</code>\n",
22-
"\n",
23-
"**Importing Pandas** \n",
24-
"<code>import pandas as pd</code>\n",
25-
"\n",
26-
"### What's covered in this notebook?\n",
27-
"1. Pandas Data Structure - Series (ndarray-like)\n",
11+
"### **What's covered in this notebook?**\n",
12+
"1. Introduction to Pandas\n",
13+
" - What is Pandas?\n",
14+
" - What kind of data does Pandas handle?\n",
15+
" - What are the Data Structures in Pandas?\n",
16+
" - How do I read and write tabular data?\n",
17+
" - Installation\n",
18+
" - Importing Pandas Module\n",
19+
"2. Series\n",
2820
"\t- Creating Series using Python list or dict\n",
2921
"\t- Creating Series from Numpy ndarray\n",
3022
"\t- Creating Series from scalar\n",
3123
"\t- Accessing Properties/Attributes and Methods of Series\n",
32-
"\t- Accessing data using Indexing and Slicing\n",
33-
"2. Pandas Data Structure - DataFrame\n",
24+
"\t- Accessing data using Indexing and Slicing (Read Operation)\n",
25+
"3. DataFrame\n",
3426
"\t- Creating DataFrame using Python dict, list or tuple\n",
3527
"\t- Creating DataFrame using Numpy Array\n",
3628
"\t- Accessing Attributes/Properties and Methods of DataFrame\n",
37-
"3. Working with Tabular Data\n",
29+
"4. Working with Tabular Data\n",
3830
"\t- Dataframe to .csv & .xlsx\n",
3931
"\t- Reading .xlsx File\n",
4032
"\t- Reading .csv File - Iris Dataset\n",
41-
"4. Non-Visual Data Analysis using Pandas (Statistical Analysis)\n",
33+
"5. Non-Visual Data Analysis using Pandas (Statistical Analysis)\n",
4234
"\t- sum()\n",
4335
"\t- min() and max()\n",
4436
"\t- mean(), median(), var() and std()\n",
4537
"\t- describe() to summarize the data\n",
4638
"\t- corr(), skew() and kurt()\n",
4739
"\t- count(), unique() and value_counts() for categorical column\n",
4840
"\t- DataFrame.agg()\n",
49-
"5. Accessing Data in a DataFrame using Indexing and Slicing in Pandas DataFrame\n",
41+
"6. Accessing Data in a DataFrame using Indexing and Slicing in Pandas DataFrame\n",
5042
"\t- Reading .csv File - Weather Dataset\n",
5143
"\t- Filtering Single Column vs Multiple Columns from a ` DataFrame`\n",
5244
"\t- Filtering Rows from a `DataFrame`\n",
5345
"\t- Filtering specific rows and columns from a `DataFrame`\n",
5446
"\t- loc() vs iloc()\n",
55-
"6. Renaming Columns, Modifying DataTypes, Creating New Columns and Deleting Columns in Pandas DataFrame\n",
47+
"7. Renaming Columns, Modifying DataTypes, Creating New Columns and Deleting Columns in Pandas DataFrame\n",
5648
"\t- Reading .csv File - Retail Store Sales Data\n",
5749
"\t- Renaming Columns\n",
5850
" - Modifying Columns DataTypes\n",
5951
"\t- Creating a Derived Column\n",
6052
"\t- Creating columns using apply() function\n",
6153
" - Deleting column(s) in DataFrame\n",
62-
"7. Adding/Inserting Row(s)\n",
54+
"8. Adding/Inserting Row(s)\n",
6355
"\t- Reading .xlsx File - Weather Data\n",
6456
"\t- Insert Row(s) using pandas.concat()\n",
6557
"\t- Inserting a Row using List - .loc[] and .iloc[]\n",
6658
"\t- Inserting a Row at a Specific Index of a DataFrame\n",
6759
"\t- Saving DataFrame to .xlsx\n",
68-
"8. Handling TimeSeries Data\n",
60+
"9. Handling TimeSeries Data\n",
6961
"\t- Reading .csv File - Online Store Sales Data\n",
7062
"\t- pd.to_datetime()\n",
7163
"\t- Working with DateTime in Pandas\n",
@@ -75,14 +67,41 @@
7567
"\t- Creating a Column containing Delivery Time in Number of Days\n",
7668
"\t- Improve Performance by Setting Date Column as the Index\n",
7769
"\t- Sorting Data Based on Index vs Values and Resetting Index\n",
78-
"9. Summary"
70+
"10. Summary"
7971
]
8072
},
8173
{
8274
"cell_type": "markdown",
8375
"metadata": {},
8476
"source": [
85-
"## Getting Started"
77+
"## **Introduction to Pandas**\n",
78+
"\n",
79+
"### **Question: What is Pandas?** \n",
80+
"**Answer:** `pandas` is a Python package providing **fast, flexible, and expressive data structures** designed to make working with `relational or labeled data` both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python.\n",
81+
"\n",
82+
"### **Question: What kind of data does Pandas handle?** \n",
83+
"**Answer:** When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data.\n",
84+
"\n",
85+
"**Reference:** https://pandas.pydata.org/docs/getting_started/index.html\n",
86+
"\n",
87+
"### **Question: What are the Data Structures in Pandas?** \n",
88+
"**Answer:** Pandas provides two types of classes for handling data:\n",
89+
"1. **Series:** a one-dimensional labeled array holding data of any type such as integers, strings, Python objects etc.\n",
90+
"2. **DataFrame:** a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.\n",
91+
"\n",
92+
"\n",
93+
"### **Question: How do I read and write tabular data?** \n",
94+
"**Answer:** pandas supports the integration with many file formats or data sources out of the box (csv, excel, sql, json, parquet,…). Importing data from each of these data sources is provided by function with the prefix `read_*`. Similarly, the `to_*` methods are used to store data."
95+
]
96+
},
97+
{
98+
"cell_type": "markdown",
99+
"metadata": {},
100+
"source": [
101+
"### **Installation**\n",
102+
"\n",
103+
"**Installation Command** \n",
104+
"<code>! pip install pandas</code>\n"
86105
]
87106
},
88107
{
@@ -110,7 +129,10 @@
110129
"cell_type": "markdown",
111130
"metadata": {},
112131
"source": [
113-
"## Import Pandas Module"
132+
"### **Importing Pandas Module**\n",
133+
"\n",
134+
"**Importing Pandas** \n",
135+
"<code>import pandas as pd</code>"
114136
]
115137
},
116138
{
@@ -119,33 +141,32 @@
119141
"metadata": {},
120142
"outputs": [],
121143
"source": [
122-
"import pandas as pd\n",
123-
"import numpy as np"
144+
"import pandas as pd"
124145
]
125146
},
126147
{
127148
"cell_type": "markdown",
128149
"metadata": {},
129150
"source": [
130-
"## Pandas Data Structure - Series (ndarray-like)\n",
151+
"## **Series**\n",
131152
"`Series` is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the **index**. \n",
132153
"\n",
133154
"The basic method to create a `Series` is to call: \n",
134155
"<code>s = pd.Series(data, index=index)</code> \n",
135156
"\n",
136-
"**Important Note:** Series data structures are `value-mutable` (the values they contain can be altered) but `not size-mutable`. \n",
137-
"\n",
138157
"Here, data can be many different things:\n",
139-
"> a Python list or dict \n",
140-
"> an ndarray \n",
141-
"> a scalar value (like 5)"
158+
"1. a Python list or dict \n",
159+
"2. an ndarray \n",
160+
"3. a scalar value (like 5)\n",
161+
"\n",
162+
"**Important Note:** Series data structures are `value-mutable` (the values they contain can be altered) but `not size-mutable`. \n"
142163
]
143164
},
144165
{
145166
"cell_type": "markdown",
146167
"metadata": {},
147168
"source": [
148-
"### Creating Series using Python list or dict"
169+
"### **Creating Series using Python list or dict**"
149170
]
150171
},
151172
{
@@ -253,7 +274,7 @@
253274
"cell_type": "markdown",
254275
"metadata": {},
255276
"source": [
256-
"### Creating Series from Numpy ndarray"
277+
"### **Creating Series from Numpy ndarray**"
257278
]
258279
},
259280
{
@@ -301,7 +322,7 @@
301322
"cell_type": "markdown",
302323
"metadata": {},
303324
"source": [
304-
"### Creating Series from scalar"
325+
"### **Creating Series from scalar**"
305326
]
306327
},
307328
{
@@ -333,7 +354,7 @@
333354
"cell_type": "markdown",
334355
"metadata": {},
335356
"source": [
336-
"### Accessing Properties/Attributes and Methods of Series"
357+
"### **Accessing Properties/Attributes and Methods of Series**"
337358
]
338359
},
339360
{
@@ -470,7 +491,7 @@
470491
"cell_type": "markdown",
471492
"metadata": {},
472493
"source": [
473-
"### Accessing data using Indexing and Slicing"
494+
"### **Accessing data using Indexing and Slicing**"
474495
]
475496
},
476497
{
@@ -714,7 +735,7 @@
714735
"cell_type": "markdown",
715736
"metadata": {},
716737
"source": [
717-
"## Pandas Data Structure - DataFrame\n",
738+
"## **Pandas Data Structure - DataFrame**\n",
718739
"\n",
719740
"Pandas is a general 2D labeled, **value and size-mutable** tabular structure with potentially heterogeneously-typed column.\n",
720741
"\n",
@@ -732,20 +753,20 @@
732753
"> Each column in a DataFrame is a `Series` \n",
733754
"> You can do things by `applying a method` to a DataFrame or Series \n",
734755
"\n",
735-
"### Creating a Pandas DataFrame\n",
756+
"### **Creating a Pandas DataFrame**\n",
736757
"**Syntax** \n",
737758
"<code>df = pd.DataFrame(data, index=idxs, columns=cols)</code> \n",
738759
"\n",
739760
"Here data can be many different things:\n",
740-
"> Python Dict, List or Tuple \n",
741-
"> Numpy array"
761+
"1. Python Dict, List or Tuple \n",
762+
"2. Numpy array"
742763
]
743764
},
744765
{
745766
"cell_type": "markdown",
746767
"metadata": {},
747768
"source": [
748-
"### Creating DataFrame using Python dict, list or tuple"
769+
"### **Creating DataFrame using Python dict, list or tuple**"
749770
]
750771
},
751772
{
@@ -1157,7 +1178,7 @@
11571178
"cell_type": "markdown",
11581179
"metadata": {},
11591180
"source": [
1160-
"### Creating DataFrame using Numpy Array"
1181+
"### **Creating DataFrame using Numpy Array**"
11611182
]
11621183
},
11631184
{
@@ -1925,7 +1946,7 @@
19251946
"cell_type": "markdown",
19261947
"metadata": {},
19271948
"source": [
1928-
"### Accessing Attributes/Properties and Methods of DataFrame"
1949+
"### **Accessing Attributes/Properties and Methods of DataFrame**"
19291950
]
19301951
},
19311952
{
@@ -2442,7 +2463,7 @@
24422463
"cell_type": "markdown",
24432464
"metadata": {},
24442465
"source": [
2445-
"## Working with Tabular Data\n",
2466+
"## **Working with Tabular Data**\n",
24462467
"\n",
24472468
"**Question: How do I read and write tabular data?** \n",
24482469
"**Answer:** pandas supports the integration with many file formats or data sources out of the box (csv, excel, sql, json, parquet,…). Importing data from each of these data sources is provided by function with the prefix `read_*`. Similarly, the `to_*` methods are used to store data.\n",
@@ -2459,7 +2480,7 @@
24592480
"cell_type": "markdown",
24602481
"metadata": {},
24612482
"source": [
2462-
"### Dataframe to .csv & .xlsx"
2483+
"### **Dataframe to .csv & .xlsx**"
24632484
]
24642485
},
24652486
{
@@ -2723,7 +2744,7 @@
27232744
"cell_type": "markdown",
27242745
"metadata": {},
27252746
"source": [
2726-
"### Reading .xlsx File"
2747+
"### **Reading .xlsx File**"
27272748
]
27282749
},
27292750
{
@@ -2871,7 +2892,7 @@
28712892
"cell_type": "markdown",
28722893
"metadata": {},
28732894
"source": [
2874-
"### Reading .csv File - Iris Dataset"
2895+
"### **Reading .csv File - Iris Dataset**"
28752896
]
28762897
},
28772898
{
@@ -3077,7 +3098,7 @@
30773098
"cell_type": "markdown",
30783099
"metadata": {},
30793100
"source": [
3080-
"## Non-Visual Data Analysis using Pandas (Statistical Analysis)\n",
3101+
"## **Non-Visual Data Analysis using Pandas (Statistical Analysis)**\n",
30813102
"\n",
30823103
"<img style=\"float: right;\" width=\"300\" height=\"300\" src=\"image/03_reduction.PNG\">\n",
30833104
"\n",
@@ -3208,8 +3229,7 @@
32083229
"ExecuteTime": {
32093230
"end_time": "2018-06-07T06:06:14.307489Z",
32103231
"start_time": "2018-06-07T06:06:14.293530Z"
3211-
},
3212-
"scrolled": false
3232+
}
32133233
},
32143234
"outputs": [
32153235
{
@@ -3323,8 +3343,7 @@
33233343
"ExecuteTime": {
33243344
"end_time": "2018-06-07T06:09:00.507842Z",
33253345
"start_time": "2018-06-07T06:09:00.502868Z"
3326-
},
3327-
"scrolled": false
3346+
}
33283347
},
33293348
"outputs": [
33303349
{
@@ -3349,8 +3368,7 @@
33493368
"ExecuteTime": {
33503369
"end_time": "2018-06-07T06:09:22.661615Z",
33513370
"start_time": "2018-06-07T06:09:22.656629Z"
3352-
},
3353-
"scrolled": false
3371+
}
33543372
},
33553373
"outputs": [
33563374
{
@@ -3884,8 +3902,7 @@
38843902
"ExecuteTime": {
38853903
"end_time": "2018-06-06T16:21:03.420633Z",
38863904
"start_time": "2018-06-06T16:21:03.389280Z"
3887-
},
3888-
"scrolled": false
3905+
}
38893906
},
38903907
"outputs": [
38913908
{
@@ -4219,8 +4236,7 @@
42194236
"ExecuteTime": {
42204237
"end_time": "2018-06-06T16:24:23.622839Z",
42214238
"start_time": "2018-06-06T16:24:23.597741Z"
4222-
},
4223-
"scrolled": false
4239+
}
42244240
},
42254241
"outputs": [
42264242
{
@@ -19211,7 +19227,7 @@
1921119227
"name": "python",
1921219228
"nbconvert_exporter": "python",
1921319229
"pygments_lexer": "ipython3",
19214-
"version": "3.9.13"
19230+
"version": "3.9.6"
1921519231
},
1921619232
"toc": {
1921719233
"nav_menu": {},
@@ -19256,5 +19272,5 @@
1925619272
}
1925719273
},
1925819274
"nbformat": 4,
19259-
"nbformat_minor": 2
19275+
"nbformat_minor": 4
1926019276
}

0 commit comments

Comments
 (0)