This is the repository that contains all the material/code required to get started with the mentorship programme. A few points of administration:
-
The length of the mentorship is around 5 weeks.
-
We assume you have some prior knowledge of programming.
-
For any help with the course, you can contact your mentor. A better option would be to open an issue on this repository, so that others can see your question, and it'll prevent any replicated effort on the part of the mentor.
-
All your code will be pushed to GitHub, so if you haven't already, create a GitHub account. Fork and clone this repository and create your respective folders (refer the sample folder with my name).
-
Create a README.md in your folder where you can keep track of your progress over the next month. The mentors will be using the README.md as a progress tracker. (Refer the sample README.md given)
Don't be afraid to ask any questions (however irrelevant you think it may be). The mentors are here to help you every step of the way.
-
Language: We'll be using Python3 throughout this course. So familiarise yourself with the language. Also learn to install packages using pip.
-
Libraries (Installation):
a. NumPy: Used for matrix computations.
b. Pandas: Used for data analysis.
c. Matplotlib: Used for data visualization
-
Tools:
b. git: You'll be using GitHub for all your code/assignment submission, so learn the basics of git: pull, push, add, commit.
Since every one prefers a different approach to learning, we're gonna try our best to accomodate each style. Every topic has multiple levels of resources:
-
Articles/Blogs: This will give you a detailed explanation for each topic alongwith the relevant mathematics.
-
Code: If you prefer to learn by looking at the codebase, we'll link practical implementations of the topic(wherever appropriate).
-
Lectures: We'll link free online YouTube lectures (wherever appropriate).
The recommendation would be to either use Lectures or Articles to get a solid grasp of the conceptual details, and to use the Code as a reference during the assignment. Please note that we don't tolerate any plagiarism.
At the end of each week you will be given a set of tasks to complete. This could either be a report, or a coding assignment. All submissions will happen via GitHub.
Basics of Python:
a. Python Fundamentals
b. Variables
c. Data Types
d. Operators
e. Conditions and Loops
f. Python Functions
g. Python Data Structures
Python for Machine Learning
Submit your codes in Python in the Task 1 folder.
1.1 Python If-Else
1.2 Word Order
1.3 Time Delta
1.4 Matrix Script (Optional)
NumPy
Pandas
Matplotlib
Submit your code in the Task 2 folder.
2.1 Download this dataset and perform the following tasks:
-
Load the data (both train and test)
-
Print the shape and display the data (using .head())
-
Check if there are missing values in the data and replace them with "NaN"
2.2 Use this dataset and draw a line plot similar to that as shown here
-
Lecture (Recommended)
Submit your code in the Task 3 folder.
Submit your code in the Task 4 folder.
4.1 Using Logistic Regression
4.2 Using KNN
Introduction to K-Means Clustering
-
Article (A bit long but quite useful)
Support Vector Machine
Submit your code in the Task 5 folder.
Goal : To precisely predict individuals’ income using data collected from the 1994 U.S. Census. Your goal is to build a model that accurately predicts whether an individual makes more than $50,000.
Dataset : UCI Machine Learning Repository
Dataset Description : This dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, by Ron Kohavi.
Features
-
age
: Age -
workclass
: Working Class (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked) -
education_level
: Level of Education (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool) -
education-num
: Number of educational years completed -
marital-status
: Marital status (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse) -
occupation
: Work Occupation (Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces) -
relationship
: Relationship Status (Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried) -
race
: Race (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black) -
sex
: Sex (Female, Male) -
capital-gain
: Monetary Capital Gains -
capital-loss
: Monetary Capital Losses -
hours-per-week
: Average Hours Per Week Worked -
native-country
: Native Country (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands)
Target Variable
income
: Income Class (<=50K, >50K)
Machine Learning is a vast field and since the mentorship programme was limited to 4/5 weeks we covered only the basic algorithms. Below are listed some other important algorithms. I've provided a basic introduction/implementation blog for each. However you can go ahead and explore these topics further.
Decision Trees and Random Forest
Perceptrons and Neural Network
Artificial Neural Network
Convolutional Neural Network
Transfer Learning
Recurrent Neural Network
Generative Adversarial Networks
Deep Convolutional GAN's (DCGANs)