rsokl · vspenubarthi · Jul 31, 2019 · Jul 27, 2020 · Jul 27, 2020 · Jul 27, 2020
diff --git a/Python/Module5_OddsAndEnds/WorkingWithFiles.md b/Python/Module5_OddsAndEnds/WorkingWithFiles.md
@@ -249,6 +249,119 @@ with open("a_poem.txt", mode="r") as my_open_file:
 ```
 <!-- #endregion -->
 
+<!-- #region -->
+## Working with Comma Seperated Value Files
+
+Comma Seperated Value (CSV) files are commonly used to store data that you might typically find in a table. 
+These files can be formatted in many ways, but the typical format is to have each of the column values in the table be separated by commas while having a newline separate each row. 
+Suppose we have the following table of test scores:
+
+|         | Exam 1 (%)           | Exam 2 (%) |
+| ------------- |:-------------:| -----:|
+| Ashley     | $93$ | $95$ |
+| Brad     | $84$      |   $100$ |
+| Cassie | $99$      |    $87$ |
+
+This table depicts the test scores of three students across 2 exams. 
+Here is what the corresponding CSV file might look like:
+
+```python
+name,exam one score,exam two score
+Ashley,93,95
+Brad,84,100
+Cassie,99,87
+```
+In addition to the fact that the first line typically contains a header, you are also allowed to have spaces within each of columns as well.
+
+<div class="alert alert-warning">
+
+**Note**: 
+
+It is not guaranteed that all CSV files are actually comma separated. 
+Non-standard CSV files will typically come with instructions on how the data is organized. 
+In general, it is a good practice to open up the CSV file and look at the first few lines to get a sense of how it is organized (unless the file is too large).
+</div>
+
+### How to parse CSVs with NumPy
+
+We will first look into parsing and storing CSV data using our favorite package: `numpy`!
+
+To demonstrate how importing a CSV works, we will try to import [a costal waves dataset](https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba/data) from Kaggle. 
+After you extract the *.csv* from the *.zip*, rename it to *costal_dataset.csv*.
+```python
+from numpy import genfromtxt # genfromtxt() allows for easy parsing of CSVs
+my_data = genfromtxt(r"./Downloads/costal_dataset.csv", delimiter=',') 
+```
+`genfromtxt()` takes in CSV file path and delimiter (the character used to split the data, typically comma for CSV).
+Let's check out some properties of the CSV:
+
+```python
+>>> type(my_data)
+numpy.ndarray
+
+>>> my_data.shape
+(43729, 7)
+
+#Let's look at the actual data
+>>> my_data
+array([[    nan,     nan,     nan, ...,     nan,     nan,     nan],
+       [    nan, -99.9  , -99.9  , ..., -99.9  , -99.9  , -99.9  ],
+       [    nan,   0.875,   1.39 , ...,   4.506, -99.9  , -99.9  ],
+       ...,
+       [    nan,   2.157,   3.43 , ...,  12.89 ,  97.   ,  21.95 ],
+       [    nan,   2.087,   2.84 , ...,  10.963,  92.   ,  21.95 ],
+       [    nan,   1.926,   2.98 , ...,  12.228,  84.   ,  21.95 ]])
+```
+You may notice that there are some `nan` values present when we look at this perticular set of data. 
+Typically, if there are non-numerical values in the file, such as headers and dates, importing it into a NumPy array will turn those values into `nan`.
+
+### How to parse CSVs with Pandas
+
+A really popular library for parsing CSVs is the [Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html "Pandas Documentation") library. Here is a quick way to parse a CSV using Pandas:
+```Python
+import pandas as pd
+my_data =pd.read_csv(r"./Downloads/costal_dataset.csv", sep=',',header=None)
+```
+That's it! 
+The method `read_csv()` imports the CSV into the variable `my_data`.
+This method has similar input parameters to `genfromtxt()` and many extra optional parameters as well. 
+Look at the docstring for more information.
+
+ Let's parse the same [ocean waves csv](https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba/data) from before but with Pandas instead of NumPy:
+
+```Python
+>>> type(my_data)
+pandas.core.frame.DataFrame #Notice that this is a custom type
+
+>>> my_data.shape
+(43729, 7)
+
+>>> my_data.values #This is how we access the values as an array
+array([['Date/Time', 'Hs', 'Hmax', ..., 'Tp', 'Peak Direction', 'SST'],
+       ['01/01/2017 00:00', '-99.9', '-99.9', ..., '-99.9', '-99.9',
+        '-99.9'],
+       ['01/01/2017 00:30', '0.875', '1.39', ..., '4.506', '-99.9',
+        '-99.9'],
+       ...,
+       ['30/06/2019 22:30', '2.157', '3.43', ..., '12.89', '97', '21.95'],
+       ['30/06/2019 23:00', '2.087', '2.84', ..., '10.963', '92',
+        '21.95'],
+       ['30/06/2019 23:30', '1.926', '2.98', ..., '12.228', '84',
+        '21.95']], dtype=object)
+```
+One of the coolest features of Pandas is how it nicely organizes the parsed CSV data for visualization. 
+Here is how `my_data` is displayed in a Jupyter Notebook:
+
+```Python
+my_data[0:21] #Prints out first 20 values in nice format
+```
+![Pandas Parsed Figure](pics/Pandas_CSV.jpg)
+
+One of the main advantages of Pandas is that it **treats all the data as strings**, while NumPy only deals with numerical values. 
+This allows Pandas to store information such as headers and date, while NumPy cannot. 
+Read the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/index.html "Documentation Link") for more information.
+<!-- #endregion -->
+
 <!-- #region -->
 ## Globbing for Files
 

diff --git a/Python/Module5_OddsAndEnds/pics/Pandas_CSV.jpg b/Python/Module5_OddsAndEnds/pics/Pandas_CSV.jpg