Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IntroRemoteDS-pre] Accessing Data in L3_DataPreparation Notebook is different from Course Script #381

Open
leriomaggio opened this issue Nov 17, 2021 · 0 comments
Labels
Type: Improvement 📈 Performance improvement not introducing a new feature or requiring a major refactor

Comments

@leriomaggio
Copy link

Description

In current version of the notebook, the data is accessed directly from GitHub:

#Load data 
import pandas as pd 

raw_data = pd.read_csv("https://raw.githubusercontent.com/OpenMined/PySyft/dev/notebooks/course3/dataset/L3_data.csv")

whereas the snippet shown in the C3L3C2 - Data Acquisition! is:

# Load data
import pandas as pd
raw_data = pd.read_csv("dataset/L3_raw_data.csv")
raw_data.head()

I do appreciate opening the data from GitHub (esp. if running the notebook in Colab) but it's a bit pointless if running the code from local Jupyter.

Therefore, I resorted changing my own notebook replacing with the following line:

# Load data

import pandas as pd
from pathlib import Path 
from os import path as p

BASE_FOLDER = Path(p.abspath(p.curdir))
DATA_FOLDER = BASE_FOLDER / "dataset"
if DATA_FOLDER.exists():
    datafile_ref = DATA_FOLDER / "L3_raw_data.csv"
else:
    datafile_ref = "https://raw.githubusercontent.com/OpenMined/PySyft/dev/notebooks/course3/dataset/L3_data.csv"

raw_data = pd.read_csv(datafile_ref)

Are you interested in working on this improvement yourself?

  • Yes, I am.

That's just a suggestion.
Feel free to close and reject the issue, if you'd prefer keep it otherwise :)

@leriomaggio leriomaggio added the Type: Improvement 📈 Performance improvement not introducing a new feature or requiring a major refactor label Nov 17, 2021
leriomaggio added a commit to leriomaggio/courses that referenced this issue Nov 17, 2021
This commit fixes a few minors in the L3_DataPreparation notebook
after coding through the first block of Lecture 3 in Introduction
to Remote Data Science pre-release.

This commit addresses Issue OpenMined#380 OpenMined#381 OpenMined#382
Moreover:
- duplicates in raw_data should be applied inplace
- plt.show() whenever a plot is generated to avoid
the repr of the Axis object
- the plot_extrapolated_country function had a missing import
of numpy as np
- x = list(range(53)) has been updated as per drop inplace
- iloc is used whenever accessing an entry in the DataFrame.
(Without iloc, the code doesn't work)
@leriomaggio leriomaggio changed the title Accessing Data in L3_DataPreparation Notebook is different from Course Script [IntroRemoteDS-pre] Accessing Data in L3_DataPreparation Notebook is different from Course Script Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Improvement 📈 Performance improvement not introducing a new feature or requiring a major refactor
Projects
None yet
Development

No branches or pull requests

1 participant