Data Visualization

<- Back to Home

import pandas as pd
import numpy as np
from plotly import express as px
from matplotlib import pyplot as plt
import missingno as msn
import seaborn as sns
%matplotlib inline

Reading File

df = pd.read_csv("fake_job_postings.csv")
df.head(2)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	job_id	title	location	department	salary_range	company_profile	description	requirements	benefits	telecommuting	has_company_logo	has_questions	employment_type	required_experience	required_education	industry	function	fraudulent
0	1	Marketing Intern	US, NY, New York	Marketing	NaN	We're Food52, and we've created a groundbreaki...	Food52, a fast-growing, James Beard Award-winn...	Experience with content management systems a m...	NaN	0	1	0	Other	Internship	NaN	NaN	Marketing	0
1	2	Customer Service - Cloud Video Production	NZ, , Auckland	Success	NaN	90 Seconds, the worlds Cloud Video Production ...	Organised - Focused - Vibrant - Awesome!Do you...	What we expect from you:Your key responsibilit...	What you will get from usThrough being part of...	0	1	0	Full-time	Not Applicable	NaN	Marketing and Advertising	Customer Service	0

print(df.columns)

Index(['job_id', 'title', 'location', 'department', 'salary_range',
       'company_profile', 'description', 'requirements', 'benefits',
       'telecommuting', 'has_company_logo', 'has_questions', 'employment_type',
       'required_experience', 'required_education', 'industry', 'function',
       'fraudulent'],
      dtype='object')

Question 1 :: How many Datapoints are present in the data?

Question 2 :: How many Features are present in the data?

print("sol1:- Total Number Of DataPoints:- {}.".format(df.shape[0]))
print("sol2:- Total Number of features:- {}.".format(df.shape[1]))

sol1:- Total Number Of DataPoints:- 17880.
sol2:- Total Number of features:- 18.

Question 3 :: check for null values ?

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17880 entries, 0 to 17879
Data columns (total 18 columns):
job_id                 17880 non-null int64
title                  17880 non-null object
location               17534 non-null object
department             6333 non-null object
salary_range           2868 non-null object
company_profile        14572 non-null object
description            17879 non-null object
requirements           15185 non-null object
benefits               10670 non-null object
telecommuting          17880 non-null int64
has_company_logo       17880 non-null int64
has_questions          17880 non-null int64
employment_type        14409 non-null object
required_experience    10830 non-null object
required_education     9775 non-null object
industry               12977 non-null object
function               11425 non-null object
fraudulent             17880 non-null int64
dtypes: int64(5), object(13)
memory usage: 2.5+ MB

msn.matrix(df)

<matplotlib.axes._subplots.AxesSubplot at 0x7f78397e4208>

msn.heatmap(df)

<matplotlib.axes._subplots.AxesSubplot at 0x7f7833f10780>

msn.bar(df)

<matplotlib.axes._subplots.AxesSubplot at 0x7f783a141940>

for item in df.columns:
    print("{} uniques: {}".format(item,df[item].unique().size))

job_id uniques: 17880
title uniques: 11231
location uniques: 3106
department uniques: 1338
salary_range uniques: 875
company_profile uniques: 1710
description uniques: 14802
requirements uniques: 11969
benefits uniques: 6206
telecommuting uniques: 2
has_company_logo uniques: 2
has_questions uniques: 2
employment_type uniques: 6
required_experience uniques: 8
required_education uniques: 14
industry uniques: 132
function uniques: 38
fraudulent uniques: 2

Features:-

job_id: - Every job have a different id
title:- Job have a title
location:- Location of job
department:- Job department(ex:- marketing etc)
salary_range:- range of salary
company_profile:- what actually campany do like it is food company or tech company.
description :- Full descripton of job
requirements :- What are the Requirements
benefits:- What are the extra benifit
Telecommuting:- binary variable
has_company_logo:- binary variable
has_questions:- binary variable
employment_type:- full-time or part time
required_experience:- internship of how much experience needed
required_education:- Minimum qualification
industry:- Ex-marketing and advrtisement
function:- functionality of job
fraudulent:- it is fraud or not

Question 4 :: How many datapoints Are Fraudent in the given data ?

print("sol :: Number Of Fraudent job :: {}".format(df['fraudulent'].mean() * df['fraudulent'].size))

sol :: Number Of Fraudent job :: 866.0

msn.dendrogram(df)

<matplotlib.axes._subplots.AxesSubplot at 0x7f783a2177b8>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NoteBook1.md

NoteBook1.md

Data Visualization

Reading File

Features:-

Files

NoteBook1.md

Latest commit

History

NoteBook1.md

File metadata and controls

Data Visualization

Reading File

Features:-