Skip to content

Commit 4950c73

Browse files
committed
Added iogalle
1 parent 5e27212 commit 4950c73

File tree

5 files changed

+1836
-0
lines changed

5 files changed

+1836
-0
lines changed

.DS_Store

0 Bytes
Binary file not shown.

iogalle/README.md

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
WAGE PREDICTOR
2+
==============
3+
VIDEO LINK: https://photos.app.goo.gl/VJ6oWVugFbgz1XhAA
4+
5+
--------
6+
Overview
7+
--------
8+
My project fits a linear model to an online dataset containing information
9+
about people's wages, height, sex, race, education level, and age. The predict_wages
10+
module processes the data and fits the model, and the wages_console allows users
11+
to interact with the model and enter their own information to get a prediction.
12+
This project also calculates how the wage predictions change if a person's sex
13+
or race were different.
14+
15+
-----------------
16+
Technical Details
17+
-----------------
18+
The predict_wages module processes the data by reading in a csv file, and then cleans
19+
the data by separating the wage information from the feature information, and replaces
20+
the categorical race and gender information with one-hot encoding. Finally, it
21+
uses least squares to fit a model to the data. The wages_console prompts the user
22+
for input, and uses this input to create a new user feature vector. Using the model
23+
created by the predict_wages module, we make a wage prediction for the new user.
24+
25+
-------------
26+
Prerequisites
27+
-------------
28+
numpy, pandas, random
29+
30+
----------------
31+
Running the Code
32+
----------------
33+
Run the wages_console.py script with "python3 wages_console.py". This program will
34+
fit the model to the data, then prompt the user to enter input, and finally report
35+
its predictions.
36+
37+
------------
38+
Example Test
39+
------------
40+
$ python3 wages_console.py
41+
42+
Welcome to the Wage Predictor!
43+
Enter some information about yourself, and we'll predict your wage.
44+
Then see how your wage would change if your gender or race were different.
45+
46+
47+
Creating model...
48+
Model fitted.
49+
50+
----------------------------------
51+
Please enter your height in inches. 61
52+
Please enter your sex. female
53+
Please enter your race. Hispanic
54+
Please enter the number of years you attended school. 14
55+
Please enter your age in years. 20
56+
57+
We predict that your wage will be $13789.06.
58+
59+
If you were male, your predicted wage would be $17552.54 higher.
60+
If you were white, your predicted wage would be $22145.33 higher.
61+
62+
Again? (Y/N) N
63+
Goodbye!
64+
65+
----------
66+
Known Bugs
67+
----------
68+
Unfortunately I had some trouble finding datasets that fit the needs of this
69+
project, so this is the only dataset containing wage, race, and gender information
70+
that I could find. The dataset didn't have much documentation, so I have no idea
71+
what the wage values mean (e.g. annual income?). Also, the linear model may not be
72+
the best fit, because sometimes the model predicts huge negative values, which doesn't
73+
seem to make too much sense...
74+
Further testing required!
75+
76+
-------
77+
Authors
78+
-------
79+
I completed this assignment independently, but I used some of the code from
80+
Assignments 1 and 2 as a template.

iogalle/predict_wages.py

+137
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Name: Isabel Gallegos
4+
Creates a linear model of the relationship between height, sex, race, education
5+
level, and age and expected wage.
6+
"""
7+
import numpy as np
8+
import pandas
9+
10+
INPUT_FILENAME = 'wages.csv'
11+
12+
def load_data(filename):
13+
"""
14+
Loads the wage data from a csv file.
15+
16+
Arguments:
17+
filename -- the file that contains the wage data
18+
19+
Returns:
20+
wage_data -- numpy array with the following fields
21+
"earn"
22+
"height"
23+
"sex"
24+
"race"
25+
"ed"
26+
"age"
27+
"""
28+
wage_data = pandas.read_csv(filename, delimiter = ',').to_numpy()
29+
return wage_data
30+
31+
32+
def clean_data(wage_data):
33+
"""
34+
Cleans the wage data into two numpy arrays.
35+
36+
Arguments:
37+
wage_data -- a numpy array of wages and features (height, sex, race, ed, age)
38+
39+
Returns:
40+
X -- a numpy array of length 1379 x 5 where each row has the form
41+
[height, sex, race, ed, age]
42+
y -- a 1379-length numpy array, where y[i] is the wage associated with X[i]
43+
"""
44+
# get wages
45+
y = wage_data[:,0]
46+
y = y.reshape(y.size, 1)
47+
y = y.astype(float)
48+
49+
# get features, convert strings to 0/1 values
50+
height = wage_data[:,1]
51+
height = height.reshape(height.size, 1)
52+
53+
sex = wage_data[:,2]
54+
male = np.where(sex == "male", 1, 0)
55+
male = male.reshape(male.size, 1)
56+
female = np.where(sex == "female", 1, 0)
57+
female = female.reshape(female.size, 1)
58+
59+
race = wage_data[:,3]
60+
white = np.where(race == "white", 1, 0)
61+
white = white.reshape(white.size, 1)
62+
black = np.where(race == "black", 1, 0)
63+
black = black.reshape(black.size, 1)
64+
hispanic = np.where(race == "hispanic", 1, 0)
65+
hispanic = hispanic.reshape(hispanic.size, 1)
66+
other = np.where(race == "other", 1, 0)
67+
other = other.reshape(other.size, 1)
68+
69+
ed = wage_data[:,4]
70+
ed = ed.reshape(ed.size, 1)
71+
age = wage_data[:,5]
72+
age = age.reshape(age.size, 1)
73+
74+
X = np.concatenate((height, male, female, white, black, hispanic, other, ed, age), axis=1)
75+
X = X.astype(float)
76+
77+
return X, y
78+
79+
80+
def fit_model(X, y):
81+
"""
82+
Processes the wage data by fitting a line to it.
83+
84+
Arguments:
85+
X -- an 1379 x 9 numpy array, where each row is of the form [height, male,
86+
female, white, black, hispanic, other, ed, age] representing a single data point
87+
y -- a 800-length numpy array, where y[i] is the wage for the individual
88+
associated with X[i]
89+
90+
Returns:
91+
weights of the model
92+
"""
93+
return np.linalg.lstsq(X, y, rcond=None)[0]
94+
95+
96+
def create_model():
97+
"""
98+
Loads and cleans data, and uses linear regression to model relationship
99+
between the features [height, male, female, white, black, hispanic, other, ed, age]
100+
and wage.
101+
102+
Returns:
103+
weights -- weights of the model
104+
"""
105+
print("\nCreating model...")
106+
# Load data
107+
wage_data = load_data(INPUT_FILENAME)
108+
109+
# Inform user about data
110+
if wage_data is None:
111+
print("Warning: no data recieved.")
112+
return
113+
114+
# Clean data
115+
clean_data(wage_data)
116+
try:
117+
X, y = clean_data(wage_data)
118+
except TypeError:
119+
X, y = None, None
120+
121+
# Inform user about data
122+
if X is None and y is None:
123+
print("Warning: no data cleaned.")
124+
return
125+
126+
# Process data
127+
try:
128+
weights = fit_model(X, y)
129+
except TypeError:
130+
weights = None
131+
132+
if weights is None:
133+
print("Warning: no model fitted.")
134+
return
135+
136+
print("Model fitted.\n")
137+
return weights

0 commit comments

Comments
 (0)